This post responds to Mike Champion's comment on my previous XLinq blog post. I clarify the XML file used (Wikipedia XML Abstract) and explain why I chose an XMLReader for its speed, especially when combined with custom data structures for a cyclic graph representation. XLinq's syntax and lambda expressions felt less intuitive for my task of converting XML into SQL statements. The project involves relating "title" elements with "sublink" entities, resulting in a complex graph structure not easily handled by XLinq without excessive data duplication and memory consumption. While XStreamingElement offers some improvement by avoiding redundant data scans, I desire deferred data loading for processing only necessary slices of the XML. This approach could handle selects, wheres, and counts efficiently in a single pass, and even joins with clever indexing. Defining a schema during XML iteration seems redundant when XLinq expressions already specify data requirements. Pre-loading entire XML documents into memory feels inefficient when only a small portion is used. I propose deferring data loading until needed, despite potential issues with repeated XDocument inspections. Ideally, XLinq should scale without forcing users to revert to less efficient methods due to data size limitations. I inquire about potential hard limits and scaling formulas related to XML document size in XLinq.
I'm highlighting a comment from Mike Champion, XLinq's program manager at Microsoft, addressing the issue of querying large XML files with XLinq. He discusses their current investigation into this problem and seeks feedback on how large XML documents are typically structured. Specifically, he asks about the structure of my 900MB XML file to better understand user needs and design appropriate solutions within XLinq. He mentions exploring options like a LINQ-queryable XmlReader or a lazy evaluation approach similar to XStreamingElement, while aiming for simplicity and avoiding dependencies on schemas or XPath. He's open to further discussion via his blog's contact form.
I've been experimenting with XLinq in C# 3.0, but I'm not impressed with its querying capabilities. It seems to require loading the entire XML document into memory, which caused problems when I tried to process a 900MB file. A simple XML reader was much more efficient for this task. I'd like to see an XLinq implementation that can process XML data in a streaming fashion, similar to SAX or XmlReader, to avoid memory issues. This would make it more practical for large documents. Perhaps XLinq already supports this, but I haven't found how. For now, it seems best suited for smaller files.
In this post, I explored C# 3.0 and XLinq by parsing a music style XML document from MusicMoz. I created a simple class "TagCategory" to store the style name and category. Then, using XLinq, I loaded the XML, extracted the "style" elements, and created a List of TagCategory objects. The code concisely retrieves and stores the data using object initializers. Feel free to share your feedback or suggestions for improvement!