While looking over the SyndicationFeed class and related classes I found out something
quite annoying: Microsoft put together this wonderful class infrastructure for
handling various kinds of Syndication Feeds, in .NET 3.5, but they cannot handle
the old style ATOM .03 feed schema (xmlns="http://purl.org/atom/ns#"). If you attempt to use the SyndicationFeed.Load method, you get this:
"The element with name 'feed' and namespace 'http://purl.org/atom/ns#' is not
an allowed feed format.". Frankly, I think the error message should have been written more like "We decided
we didn't want to bother with ATOM .03 feeds, so tough titsky on you!".
That's too bad, because a huge number of feeds, including most of Google's news,
gmail and other feeds, are still delivered in this format. I have no idea what
the rationale for this omission was, nor do I care to speculate. The bottom line
is, .NET 3.5 SyndicationFeed classes cannot handle the format.
So, what should a developer do? Well, you can either spend a lot of time figuring
out how to override the existing infrastructure, or you can just roll your own.
In my case since I was mostly interested in gathering and displaying feed items,
all I needed was the <item> or <entry> collection from the respective
feed. Since all feeds are well-formed XML, I decided to start from that common
denominator.
The code I present here is relatively simple: I start out with a GenericFeedItem
class as a container for the Title, Link, Description and PubDate items, and
I use an XmlTextReader with a switch block to traverse the DOM of the retrieved
feed, testing for and adding the correct elements and canonicalizing their names.
The result is a simple, fast way to parse any feed (adding additional switch
tests as needed) and return a standardized List<GenericFeedItem> collection
that is always the same and can be databound.
The XmlTextReader class is perfect for this scenario because it provides fast, non-cached,
forward-only access to XML data - similar to the way a SQLDataReader handles
data from a SQL Server query. The switch block can be easily modified to accomodate
additional feed schemas.
Here is the ultra-simple GenericFeedItem class:
using System;
namespace PAB.FeedParser
{
[Serializable]
public class GenericFeedItem
{
public string Title { get; set; }
public string Link { get; set; }
public string Description { get; set; }
public DateTime PubDate { get; set; }
}
}
And here is the GenericFeedParser class, with plenty of inline comments to explain
what is happening:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml;
using PAB.FeedParser;
namespace PAB.FeedParser
{
public class GenericFeedParser
{
public List<GenericFeedItem> ReadFeedItems(string url)
{
//create a List of type Dictionary<string,string> for the element names and
values
var
items = new List<Dictionary<string, string>>();
// declare a Dictionary to capture each current Item in the while loop
Dictionary<string, string> currentItem = null;
// Wrap a new XmlTextReader around the url of the feed
var
reader = new XmlTextReader(url);
/// Read each element with the reader
while (reader.Read())
{
// if it's an element, we want to process it
if (reader.NodeType == XmlNodeType.Element)
{
string name = reader.Name;
if (name.ToLowerInvariant() == "item" || name.ToLowerInvariant() == "entry")
{
// Save previous item
if (currentItem != null)
items.Add(currentItem);
// Create new item
currentItem
= new Dictionary<string, string>();
}
else if (currentItem != null)
{
reader.Read();
// some feeds can have duplicate keys, so we don't want to blow up here:
if (!currentItem.Keys.Contains(name))
currentItem.Add(name,
reader.Value);
}
}
}
// now create a List of type GenericFeedItem
var
itemList = new List<GenericFeedItem>();
// iterate all our items from the reader
foreach (var d in items)
{
var
itm = new GenericFeedItem();
//do a switch on the Key of the Dictionary <string, string> of each item
foreach (string k in d.Keys)
{
switch (k)
{
case "title":
itm.Title
= d[k];
break;
case "link":
itm.Link
= d[k];
break;
case "published":
case "pubDate":
case "issued":
DateTime
dt ;
DateTime.TryParse(d[k],out dt);
itm.PubDate
= dt != DateTime.MinValue ? dt : DateTime.Now;
break;
case "content":
case "description":
itm.Description
= d[k];
break;
default:
break;
}
}
// add the created item to our List
itemList.Add(itm);
}
return itemList;
}
}
}
In
order to use this arrangement (say in a web page with a GridView) one would use code similar to this:
protected void DropDownList1_SelectedIndexChanged(object sender, EventArgs e)
{
if( DropDownList1.SelectedValue=="") return;
var
parser = new GenericFeedParser();
List<GenericFeedItem>
items = parser.ReadFeedItems(DropDownList1.SelectedValue);
GridView1.DataSource
= items;
GridView1.DataBind();
}
That's all it takes! You can throw virtually any kind of feed at this and it will
happily return a List of type GenericFeedItem for you. If I have missed any of
the common feed schemas in this exercise, it is a simple matter to modify the
switch block as shown above in order to accomodate them.
In the downloadable solution, I've enhanced the arrangement to permit the addition
of a SyndicationFormat class which identifies the feed type and provides the
title of the feed as well. You can download the full Visual Studio 2008 Solution which includes a web project with a page that will try each feed type and display
the results , including the feed type and it's title.
Peter...great article.
I'm having issues with Atom10 feeds....doesn't seem to bring in the links.
Any hints on how best to handle this.
Thanks again for the parser...was really helpful.