Wednesday, March 30, 2011

Converting specical characters in XML stream

I have an XML stream which contains special characters such ' stored in a CString object. Is there any method other than replacing individual characters in the stream to convert these special characters ?

From stackoverflow
  • I frankly don't see another option.

  • If you can, install a filter in the writer. This allows you to write the stream char by char and replace the special characters when you encounter them (saving you from having to allocate a second string object).

    Try to output as many characters of the string at once as possible because calling write() in a loop for every single character is expensive. Instead use this pseudo code:

    int start = 0;
    for (int i=0; i<str.length(); i++) {
        char c = str.getChar(i);
        String emit = null;
        switch (c) {
        case '<': emit = "&lt;"; break;
        case '>': emit = "&gt;"; break;
        case '\'': emit = "&apos;"; break;
        case '"': emit = "&quot;"; break;
        }
        if (emit != null) {
            write(str,start,i);
            start = i;
            write(emit);
        }
    }
    if (start != str.length()) {
        write(str);
    }
    

    In the common case, the loop will traverse the string once (which is fast) and call write() once.

    If you can't install a filter in the writer, you can use the same code to filter the string. In the common case, you will simply return the input string. In the if (emit != null), you can then allocate a new copy. But since this will only happen if there are special characters in the string, this is much cheaper than creating a copy for every string.

  • The only characters that need escaping are " < > &.

    But I would recommend the use a standard XML library. That would not only take care of escaping, but a lot of other problems (encoding, entities, validation, etc.)

    andynormancx : I couldn't agree more, the chances of getting everything 100% right all the time when rolling your own XML routines are slim. I have to deal all the time with people who have decided to roll their own XML routines and they almost always end up creating incorrectly formed XML in some way.

0 comments:

Post a Comment