Sunday, April 3, 2011

msxml removes line breaks in CDATA section

I have a simple XML with a CDATA section like:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<config>
    <input>
    <![CDATA[
line
another line
and another
    ]]>
    </input>
    ...
</config>

And I have the current code for parsing the CDATA section using MSXML.

for (int i = 0, count = pChildNodes->Getlength(); i < count; ++i) {
 IXMLDOMNodePtr pNode = pChildNodes->Getitem(i);
 if (pNode->GetnodeType() != NODE_COMMENT && pNode->GetnodeType() != NODE_TEXT) {
  if (pNode->GetnodeType() == NODE_CDATA_SECTION) {
   IXMLDOMCDATASectionPtr pCData = pNode;
   _bstr_t a = pCData->Getdata();
   _variant_t b = pCData->GetnodeValue();
   _bstr_t c = pCData->Gettext();
   _bstr_t d = pCData->Getxml();

But none of the a, b, c or d keeps the line breaks that are in the XML. And this is the output:

lineanother lineand another

When I create the document I set the preserve white space flag:

m_pXmlDoc->put_preserveWhiteSpace(VARIANT_TRUE);

Do you have any ideas on how can I get the value of the CDATA section considering the line breaks?

From stackoverflow
  • I don't think CDATA is supposed to preserve whitespace. It's usually used to escape characters such as < or >. http://www.javacommerce.com/displaypage.jsp?name=whitespa.sql&id=18238 this may be of some help.

  • Why not something like Base64 encode the data before you store it in the XML document? Then you don't even need the CDATA tag. Just Base 64 decode the value when you retrieve it and the original data will all be preserved.

    There are two negatives to this solutions:

    1. The data stored will be slightly larger (as Base 64 bytes)
    2. You will obviously lose plain-text readability in the XML file (as it will be Base 64 encoded)

    Of course, the positive: don't need to worry about CDATA issues, which will hopefully outweigh the negatives for your situation.

    url encode, html encode, and add slashes are all alternatives that will require extra work to implement, but leave some readability intact along with smaller sizes.

    Cheers

0 comments:

Post a Comment