hi..
Using LINQ to XML (and how to build a custom RSS Feed Reader with it)
One of the big programming model improvements being made in .NET 3.5 is the work being done to make querying data a first class programming concept. We call this overall querying programming model "LINQ", which stands for .NET Language Integrated Query.
LINQ
supports a rich extensibility model that facilitates the creation
of efficient domain-specific providers for data sources. .NET 3.5 ships
with built-in libraries that enable LINQ support against Objects, XML,
and Databases.
What is LINQ to XML?
LINQ to XML is a built-in LINQ data provider that is implemented within the "System.Xml.Linq" namespace in .NET 3.5.
LINQ
to XML provides a clean programming model that enables you to read,
construct and write XML data. You can use LINQ to XML to perform LINQ
queries over XML that you retrieve from the file-system, from a remote
HTTP URL or web-service, or from any in-memory XML content.
LINQ
to XML provides much richer (and easier) querying and data shaping
support than the low-level XmlReader/XmlWriter API in .NET today. It
also ends up being much more efficient (and uses much less memory) than
the DOM API that XmlDocument provides.
Using LINQ to XML to query a local XML File
To
get a sense of how LINQ to XML works, we can create a simple XML file
on our local file-system like below that uses a custom schema we've
defined to store RSS feeds:
I
could then use the new XDocument class within the System.Xml.Linq
namespace to open and query the XML document above. Specifically, I
want to filter the <Feed> elements in the XML file and return a
sequence of the non-disabled RSS feeds (where a disabled feed is a
<Feed> element with a "status" attribute whose value is
"disabled"). I could accomplish this by writing the code below:
VB:
C#:
Notice
in the code-snippets above how I'm loading the XML file using the
XDocument.Load(path) static method - which returns back an XDocument
object. Because I'm running this code within ASP.NET, I'm using the
Server.MapPath(path) helper method to resolve the correct path for my
XML file relative to the page I'm running the code on.
Once
I have an XDocument object for my XML file I can then write a LINQ
query expression to retrieve the XML data I'm looking for. In the code
above I'm querying over each of the <Feed> elements within the XML
file. This is driven by this opening clause in the LINQ query
expression:
from feed in feedXML.Decedents("Feed")
I'm
then applying a filter that only returns back those "Feed" elements
that either don't have a "status" attribute, or whose "status" attribute
value is not set to "disabled":
Where (feed.Attribute("status") Is Nothing) OrElse (feed.Attribute("status").Value <> "disabled")
I
am then using the select clause in our LINQ expression to indicate what
data I want returned. If I simply wrote "select feed", LINQ to XML
would return back a sequence of XElement objects that represents each of
the XML element nodes that match my filter. In the code samples above,
though, I am using the shaping/projection features of LINQ to instead
define a new anonymous type
on the fly, and I am defining two properties on it - Name and Feed -
that I want populated using the <Name> and <Url>
sub-elements under each <Feed> element:
Select Name = feed.Element("Name").Value, Url = feed.Element("Url").Value
As
you can see above (and below), I can then work against this returned
sequence of data just like I would any collection or array in .NET. VS
2008 provides full intellisense and compilation checking support over
this anonymous type sequence:
I
can also data-bind the results against any UI control in ASP.NET,
Windows Forms, or WPF. For example, assuming I had a dropdownlist
control defined in my page like so:
I could use the below LINQ to XML code to databind the results to it:
This will then produce a nice drop-downlist in our HTML page like so:
Hmm - What is this "anonymous type" thing?
In
my code above I've taken advantage of a new language feature in VB and
C# called "anonymous types". Anonymous types enable developers to
concisely define inline CLR types within code, without having to
explictly define a formal class declaration of the type. You can learn
more about them in my previous New "Orcas" Language Feature: Anonymous Types blog post.
While
anonymous types can be super useful when you want to locally iterate
and work with data, we'll often want/need to define a standard class
when passing the results of our LINQ query between multiple classes,
across class library assemblies, and over web-services.
To enable this, I could define a non-anonymous class called "FeedDefinition" to represent our Feed data like so:
Note above how I'm using the new "Automatic Properties" feature of C# to define the properties (and avoid having to define a field for them).
I
could then write the below method to return back a generics based
List<FeedDefinition> collection containing FeedDefinition objects:
Note
above how the only change I've made to the LINQ to XML query we were
using before is to change the "select" clause from "select new" (with no
type-name) to "select new FeedDefinition". With this change I'm now
returning a sequence of FeedDefinition objects that I can pass from
class to class, assembly to assembly, and across web-services.
Using LINQ to XML to Retrieve a Remote RSS XML Feed
The
XDocument.Load(path) static method supports the ability open both XML
files from the file-system, as well as remote XML feeds returned from an
HTTP URL. This enables you to use it to access remote RSS feeds, REST
APIs, as well as any other XML feed published on the web.
For an example of this in action, let's take a look at the XML of my blog's RSS feed (http://weblogs.asp.net/scottgu/rss.aspx):
I
could write the LINQ to XML code below to retrieve the above blog post
data from my RSS feed, and work with the individual feed items as .NET
objects:
Note
above how I am converting the "Published" field in the RSS field -
which is a string in the XML - to a .NET DateTime object. Notice also
how LINQ to XML includes a built-in XNamespace type that provides a
type-safe way to declare and work with XML Namespaces (which I need
to-do to retrieve the <slash:comments> element).
I
could then take advantage of the composition features of LINQ to
perform a further sub-query on the result, so that I filter over only
those RSS posts that were published within the last 7 days using the
code below:
As
you can see above, you can feed the results of one LINQ query
expression to be the input of another LINQ expression. This enables
you to write very clean, highly composable, code.
Using LINQ Sub-Queries within a LINQ to XML Query Expression
If
you look at the raw XML of my RSS feed, you'll notice that the "tag"
comments for each post are stored as repeated <category> elements
directly below each <item> element:
When
designing the object model for a "BlogEntry" class, I might want to
represent these <category> values as a sub-collection of strings.
For example, using a "Tags" property that is a generic list of type
string:
You
might be wondering - how do we take a flat collection of
<category> elements under <item> and transform them into a
nested sub-collection of strings? The nice thing about LINQ is that it
makes this type of scenario easy by allowing us to use nested LINQ query
expressions like so:
This
"shaping" power of LINQ, and its ability to take flat data structures
and make them hierarchical (and take hierarchical data structures and
make them flat) is super powerful. You can use this feature with any
type of data source - regardless of whether it is XML, SQL, or plain old
objects/arrays/collections.
Putting it all Together with a Simple RSS Feed Reader
The
code snippets I've walked through above demonstrate how you can easily
write LINQ to XML code to retrieve a list of RSS feeds from a local XML
file, and how to remotely query an RSS feed to retrieve an individual
feed's details and individual item post contents. I could obviously
then take the resulting feed contents and data-bind it to a ASP.NET
GridView or ListView control to provide a nice view of the blog feed:
I've
built a simple sample application that puts all of these snippets
together to deliver a simple RSS Reader with LINQ to XML and the new
<asp:ListView> control. You can download it here. Included in the download is both a VB and C# version of the application.
Summary
LINQ
to XML provides a really powerful way to efficiently query, filter, and
shape/transform XML data. You can use it both against local XML
content, as well as remote XML feeds. You can use it to easily
transform XML data into .NET objects and collections that you can
further manipulate and transfer across your application.
LINQ
to XML uses the same core LINQ query syntax and concepts that LINQ to
SQL, LINQ to Objects, LINQ to SharePoint, LINQ to Amazon, LINQ to
NHibernate, etc. use when querying data. You can learn more about the
LINQ query syntax and the supporting language features being added to VB
and C# to support it from these previous blog posts of mine:
You might also find these blog posts of mine useful to learn more about LINQ to SQL:
Part 1: Introduction to LINQ to SQL
Part 2: Defining our Data Model Classes
Part 3: Querying our Database
Part 4: Updating our Database