Build a Google Atom Newgroup Feed Engine
by Peter A. Bromberg, Ph.D.

Peter Bromberg

" Success is more a function of consistent common sense than it is of genius. " -- An Wang

Recently Google added a new feature that is not widely promoted - Atom Feeds on Google Groups. These feeds show the last 100 posts in any newsgroup, and can be constructed by adding "/feed/msgs.xml" to the end of any Google Group URI. For example, let's look at the Atom Feed for microsoft.public.dotnet.languages.csharp:

http://groups-beta.google.com/group/microsoft.public.dotnet.languages.csharp/feed/msgs.xml

Note that this is rendered by the browser as nicely formatted HTML. This is because the Atom XML Document has a built-in xslt stylesheet reference:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?>
<feed version="0.3" xmlns="http://purl.org/atom/ns#">



However, if you add this to your favorite RSS newsreader, it will subscribe to it just like any other RSS feed. A nice way to keep up with "the latest" from your favorite newsgroup. In fact, I notice that MVP Robbe Morris, chief geek of eggheadcafe.com has been busily posting at the C# newsgroup, earning his MVP brownie points!

When this came out, my first thought was, "I wonder what I can do with this to generate useful content?". I've created a number of solutions that parse RSS and display it, I even wrote an engine that aggregates RSS search content from 11 different sources, combining them and removing duplicate links. You can view that here if you are interested. But, I had never tried to handle Atom format, which seems to be Google's favorite syndication medium.

With RSS, it's extremely easy- you can actually load the remote RSS xml document directly into a DataSet using its ReadXml method. At that point, all we need to do is iterate over the tables in the DataSet, testing for the one that has the correct column names (e.g., "link", "title","description", "pubDate", etc.). The DataSet can then be supplied as the DataSource to your favorite ASP.NET server control and displayed on the page.

However, Atom doesn't cooperate as nicely because it has defined namespaces, so we need to add a Namespace Manager and use the overload to SelectNodes with it, supplying the prefix we've assigned, like so:

XmlNodeList items = xmlDoc.SelectNodes("//atom:entry",mgr);

The above would return a NodeList containing all the <entry> nodes, which correspond to the RSS <item> nodes that you may be familiar with.

First, I started out looking for a control to handle this and I found Scott Mitchell's RSS Feed Control, which I modified to handle the Atom feeds. However when I was done, I realized that I had "locked myself in" on the display side with an HTML Table type display. Scott's control is very sophisticated, and I'd highly recommend looking at his work if you are interested in authoring ASP.NET ServerControls, but what I really wanted was an engine - something that would return my results in a handy format (say, a DataSet or DataTable) and let me decide what control I want to use to display it. So, I wrapped up my methods into a class library and included an embedded resource text file that contains all the Microsoft - related English-language public newsgroups. Let me share the source of the library here; it's really pretty simple:

using System;
using System.Data;
using System.Xml;
using System.IO;
using System.Reflection ;
using System.Web;

namespace AtomGenerator
{
 /// <summary>
 /// Atom Feed DataTable Generator and Groups List Generator
 /// </summary>
 public class Generator
 {
  public Generator()
  {   
  }
  // the default ad code is pre-set: (in this case, for eggheadcafe.com)
  private string _adCode="<custom ad code here, omitted for brevity>";
  public string AdCode
  {
   get{return _adCode;}
   set{_adCode=value;}
  }
  private double _hoursToCache=12;
  public double HoursToCache
  {
   get{return _hoursToCache;}
   set{_hoursToCache=value;}
  }
  public DataTable GenerateGroups()
  {
   DataTable tbl =null;
   if(System.Web.HttpContext.Current.Cache["AtomGroupsList"]==null)
   {
    // groups.txt is an embedded resource--
    Stream stm =
            Assembly.GetExecutingAssembly().GetManifestResourceStream("AtomGenerator.groups.txt");
    byte[] b = new byte[(int)stm.Length];
    stm.Read(b,0,(int)stm.Length);   
    char[] delim ={'\n'};
    // chop off the remaining return control character-
    string theItems =System.Text.Encoding.ASCII.GetString(b).Replace("\r","");
    string[] listItems=theItems.Split(delim);
    tbl = new DataTable();
    tbl.Columns.Add("GroupName");
    tbl.Columns.Add("GroupUrl");
    tbl.AcceptChanges();
    DataRow row=null;
    string groupName=String.Empty;
    string groupUrl=String.Empty;
    foreach(string name in listItems)
    {
     row=tbl.NewRow();    
     groupName=name.TrimEnd();   
     groupUrl="http://groups-beta.google.com/group/microsoft.public"+groupName +"/feed/msgs.xml";
     row.ItemArray=new object[] {groupName,groupUrl} ;
     tbl.Rows.Add(row);
    }
    tbl.AcceptChanges();
    System.Web.HttpContext.Current.Cache.Insert("AtomGroupsList",
                      tbl,null,System.Web.Caching.Cache.NoAbsoluteExpiration,
                                TimeSpan.FromHours(this._hoursToCache),
                                      System.Web.Caching.CacheItemPriority.Normal,null);
   }
   else
   {
    tbl=(DataTable)System.Web.HttpContext.Current.Cache["AtomGroupsList"];
   }
   return tbl;
  }

  public  DataTable GetAtomData(string url)
  {
   DataTable tbl = new DataTable();
   if(System.Web.HttpContext.Current.Cache[url]==null)
   {    
    tbl.Columns.Add("issued");
    tbl.Columns.Add("title");
    tbl.Columns.Add("link");
    tbl.Columns.Add("summary");
    tbl.AcceptChanges();
    XmlDocument xmlDoc = new XmlDocument();
    try 
    {
     xmlDoc.Load(url);
    }
    catch
    {
     return tbl;
    }
    System.Xml.XmlNamespaceManager mgr = new XmlNamespaceManager(xmlDoc.NameTable);
    mgr.AddNamespace("atom","http://purl.org/atom/ns#");  
    XmlNodeList items = xmlDoc.SelectNodes("//atom:entry",mgr);
    if (items == null)
     // XML not in expected format
     throw new FormatException("Atom feed is not in expected format. ");
    else
    {    
     string title=String.Empty ;
     string link =String.Empty ;
     string summary = String.Empty ;
     string author=String.Empty;
     string issuedDate=String.Empty ;
     for (int i=0;i<items.Count;i++)
     {
      if(i % 30==0 && i >-1)
      {
       title="ADV:";
       link="";
       summary=this._adCode;
       issuedDate=DateTime.Now.ToString("g");
       author="";  
      }
      else
      {     
       XmlNode nodTitle = items[i];
       title = nodTitle.SelectSingleNode("atom:title",mgr).InnerText ;
       link = items[i].SelectSingleNode("atom:link",mgr).Attributes["href"].InnerText ;  
       summary = items[i].SelectSingleNode("atom:summary",mgr).InnerText ;     
       author=items[i].SelectSingleNode("//atom:name",mgr).InnerText ;
       issuedDate = items[i].SelectSingleNode("atom:issued",mgr).InnerText ;
      }

      DateTime issuedDT = DateTime.MinValue;     
      if (summary.Length == 0 && title.Length==0)      
       throw new FormatException("Atom specification requires at minimum 
                     a title or description.  Item contains neither title nor description.");
      try
      {
       if (issuedDate.Length > 0)
        issuedDT = DateTime.Parse(issuedDate);
      }
      catch
      {      
       issuedDT = DateTime.Now;
      }    
      DataRow row=null;
      row=tbl.NewRow();
      row.ItemArray=new object[]{issuedDT,title,link,summary};
      tbl.Rows.Add(row);     
     }
     System.Web.HttpContext.Current.Cache.Insert(url,tbl,null,
                               System.Web.Caching.Cache.NoAbsoluteExpiration,
                                         TimeSpan.FromHours(this._hoursToCache),
                                               System.Web.Caching.CacheItemPriority.Normal,null);
    }
   }
   else
   {
    tbl = (DataTable)System.Web.HttpContext.Current.Cache[url];
   }
   return tbl;
  }
 }
}

So, in a nutshell, this generator engine has two methods, one to provide a list of groups suitable for binding to a DropDownList, and the other to provide a DataTable of the content links for a specified newsgroup. It also inserts your favorite advertisement code (e.g., Google Adsense, etc.) up to 3 times automatically. In addition, it caches the content based on the group name so that requests for previously requested groups get served quickly.

Now let's switch over to the ASP.NET page I've contrived to use the engine. You'll see I've added a method to generate a bunch of hyperlinks, one for each group, so that if the search engine spiders hit your page, they will index each generated link:

using System;
using System.Collections;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Web;
using System.Web.SessionState;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.Web.UI.HtmlControls;
using System.Xml;
using AtomGenerator;

namespace atomweb
{ 
 public class Default : System.Web.UI.Page
 {   
  protected System.Web.UI.WebControls.Label Label1;
  protected System.Web.UI.WebControls.Label Label2;
  protected System.Web.UI.WebControls.DataGrid dg;
  protected System.Web.UI.WebControls.Panel Panel1;  
  protected System.Web.UI.WebControls.DropDownList DropDownList1; 
  private AtomGenerator.Generator g=null;  
 
  private void Page_Load(object sender, System.EventArgs e)
  {   
    g = new  Generator();
  // g.AdCode="your html or script ad code here";
   DataTable groupsTbl = g.GenerateGroups();
   // Always create the list of links for spiders to follow
   CreateGroupsLinks(groupsTbl);

   if(Request.QueryString["group"]!=null)
   {
    string groupName=Request.QueryString["group"].ToString();   
    string dataUrl=  
               "http://groups-beta.google.com/group/microsoft.public"
                                        + groupName + "/feed/msgs.xml";
    DataTable tbl = g.GetAtomData(dataUrl);
    dg.DataSource =tbl;
    dg.DataBind(); 
   }

   if(!IsPostBack)
   { 
    this.DropDownList1.DataSource=groupsTbl;
    DropDownList1.DataTextField="groupName";
    DropDownList1.DataValueField="groupUrl";
    DropDownList1.DataBind();       
   }   
  }

  #region Web Form Designer generated code
  override protected void OnInit(EventArgs e)
  {
 
   InitializeComponent();
   base.OnInit(e);
  }
  
 private void InitializeComponent()
  {    
   this.DropDownList1.SelectedIndexChanged += new System.EventHandler(this.DropDownList1_SelectedIndexChanged);
   this.Load += new System.EventHandler(this.Page_Load);

  }
  #endregion
  
  private void CreateGroupsLinks(DataTable tbl)
  {  
   for(int i=0;i<tbl.Rows.Count;i++)
   {
    DataRow row = tbl.Rows[i];
    HyperLink lnk = new  System.Web.UI.WebControls.HyperLink();
    string linkUrl="Default.aspx?group="+row["groupName"].ToString();
    lnk.NavigateUrl = linkUrl;
    lnk.Text =row["groupName"].ToString();
    lnk.BackColor=Color.White;
    lnk.ForeColor=Color.White; 
    this.Panel1.Controls.Add(lnk);
    this.Panel1.Controls.Add(new LiteralControl("<br/>"));     
   }
  }
  private void DropDownList1_SelectedIndexChanged(object sender, System.EventArgs e)
  {
   string group=DropDownList1.Items[DropDownList1.SelectedIndex].Text.TrimEnd();
   string dataUrl=DropDownList1.Items[DropDownList1.SelectedIndex].Value;  
    DataTable tbl = g.GetAtomData(dataUrl);
   dg.DataSource =tbl;
   dg.DataBind();    
  }  
 }
}

And, Voilà! I have a nice engine that provides useful content, even with advertising included if I want it. Here is a link to the finished product for your viewing pleasure!

You can download the full solution for Visual Studio.NET 2003 here.

 

 


Peter Bromberg is a C# MVP, MCP, and .NET consultant who has worked in the banking and financial industry for 20 years. He has architected and developed web - based corporate distributed application solutions since 1995, and focuses exclusively on the .NET Platform. Pete's samples at GotDotNet.com have been downloaded over 41,000 times. You can read Peter's UnBlog Here.  --><--NOTE: Post QUESTIONS on FORUMS!

Article Discussion: