Thursday, March 31, 2011

XML Parsing with C#?

I'm working on a project for school that involves a heavy amount of XML Parsing. I'm coding in C#, but I have yet to find a "suitable" method of parsing this XML out. There's several different ways I've looked at, but haven't gotten it right yet; so I have come to you. Ideally, I'm looking for something kind of similar to Beautiful Soup in Python (sort of).

I was wondering if there was any way to convert XML like this:

<config>
    <bgimg>C:\\background.png</bgimg>
    <nodelist>
        <node>
            <oid>012345</oid>
            <image>C:\\image.png</image>
            <label>EHRV</label>
            <tooltip>
                <header>EHR Viewer</header>
                <body>Version 1.0</body>
                <icon>C:\\ico\ehrv.png</icon>
            </tooltip>
            <msgSource>8181:iqLog</msgSource>
        </nodes>
    </nodeList>
<config>

Into an Array/Hastable/Dictionary/Other like this:

Array
(
["config"] => array
    (
    ["bgimg"] => "C:\\background.png"
    ["nodelist"] => array
        (
        ["node"] => array
            (
            ["oid"] => "012345"
            ["image"] => "C:\\image.png"
            ["label"] => "Version 1.0"
            ["tooltip"] => array
                (
                ["header"] => "EHR Viewer"
                ["body"] => "Version 1.0"
                ["icon"] => "C:\\ico\ehrv.png"
                )
            ["msgSource"] => "8181:iqLog"
            )
        )
    )
)

Even just giving me a decent resource to look through would be really helpful. Thanks a ton.

From stackoverflow
  • XmlDocument + XPath is pretty much all you ever need in .NET to parse XML.

  • I would look into Linq to Xml. This gives you an object structure similar to the Xml file that is fairly easy to traverse.

  • You can also use serialization to convert the XML text back into a strongly typed class instance.

  • There must be 1/2 dozen different ways to do this in C#. My favorite uses the System.Xml namespace, particularly System.Xml.Serialization.

    You use a command line tool called xsd.exe to turn an xml sample into an xsd schema file (tip: make sure your nodelist has more than one node in the sample), and then use it again on the schema to turn that into a C# class file you can load into your project and easily use with the System.Xml.Serialization.XmlSerializer class.

  • There's no shame in using an old-fashioned XmlDocument:

    var xml = "<config>hello world</config>";
    var doc = new System.Xml.XmlDocument();
    doc.LoadXml(xml);
    var nodes = doc.SelectNodes("/config");
    
  • I personally like to map XML elements to classes and viceversa using System.Xml.Serialization.XmlSerializer class.

    http://msdn.microsoft.com/es-es/library/system.xml.serialization.xmlserializer(VS.80).aspx

  • You should defiantly use LINQ to XML, A.K.A. XLINQ. There is a nice tool called LINQPad that you should check out. It has nice features, from a comprehensive examples library to allowing you to directly query an SQL database via Linq to SQL. Best of all, it lets you test your queries before putting them into code.

  • I personally use XPathDocument, XPathNavigator and XPathNodeIterator e.g.

    XPathDocument xDoc = new XPathDocument(CHOOSE SOURCE!);
    
    XPathNavigator xNav = xDoc.CreateNavigator();
    
    XPathNodeIterator iterator = xNav.Select("nodes/node[@SomePredicate = 'SomeValue']");
    
    while (iterator.MoveNext())
    {
        string val = iterator.Current.SelectSingleNode("nodeWithValue");
    
        // etc etc
    }
    
  • The best approach will be dictated by what you actually want to do with the data once you've parsed it out.

    If you want to pass it around in a structured-but-not-tied-to-XML fashion, XML Serialization is probably your best bet. This will also get you closest to what you've described, though you'll be dealing with an object graph rather than nested maps.

    If you are just looking for a convenient format to query for specific bits of data, your best option would be LINQ to Xml. Alternatively, you could use the more traditional classes in the System.Xml namespace (starting with XmlDocument) and query using XPath.

    You could also use any of these techniques (or an XmlTextReader) as building blocks to create the datastructure you've described but, barring some special need, I don't think it'll give you any more versatility than what the other approaches will.

  • Yeah, i agree.. The linq-way is very nice. And i especially like the way you write XML using it.

    It is much more simple using the "objects in objects"-way.

0 comments:

Post a Comment