An Alexa Site Data Utility Class

By Peter Bromberg

Peter puts together a neat class to capture Alexa Site data on any tracked domain - with no API and no developer license key.

Alexa has been around for quite some time, and provides a lot of information about sites that rank in it's "Top 100,000" list, as determined by the visiting habits of the thousands of geeks who happily wear the Alexa Toolbar on their browsers. Many SEO "experts" say that Alexa data is skewed, and they may very well be correct. However, the fact remains that within the universe of Alexa "contributors" the data is very consistent and still quite valuable. According to Alexa:

The traffic rank is based on three months of aggregated historical traffic data from millions of Alexa Toolbar users and is a combined measure of page views and users (reach). As a first step, Alexa computes the reach and number of page views for all sites on the Web on a daily basis.

The main Alexa traffic rank is based on the geometric mean of these two quantities averaged over time (so that the rank of a site reflects both the number of users who visit that site as well as the number of pages on the site viewed by those users)

You can get a good overview of Alexa methodology and terms at their FAQ page.

You can also affect your own Alexa rank by using Alexa's own redirect url scheme. For example the url below will redirect through Alexa and on to eggheadcafe.com:

http://redirect.alexa.com/redirect?www.eggheadcafe.com

While Alexa provides an extensive API including one to get site information, it requires a license key. The key doesn't cost anything, so if you prefer API's, help yourself to one. However, many developers are not aware that there is a url that Alexa uses to return site data without a developer key or API. The url scheme is very simple:

 http://alexa.com/xml/dad?url=microsoft.com

 Using this neat little "trick" enabled me to create a nifty class that will return all the goodies in a DataSet, containing three DataTables - one for the Site Owner data and basic Alexa ranking stats, one for Other Domains owned by the site owner, and one for related sites.

 Without further adieu, here is all the code for the utility class:

using System.Data;
using System.Xml;

namespace AlexaDataLib
{
    public static class AlexaData
    {
        public static DataSet GetSiteData(string domain)
        {
            // Create a DataSet, then DataTables for each group of data
            // then add the respective needed DataColumns to each
            DataSet ds = new DataSet();
            DataTable dtRelated = new DataTable();
            dtRelated.Columns.Add("HREF");
            dtRelated.Columns.Add("TITLE");

            DataTable dtSiteData = new DataTable();
            dtSiteData.TableName = "SiteData";
            dtSiteData.Columns.Add("TITLE");
            dtSiteData.Columns.Add("STREET");
            dtSiteData.Columns.Add("CITY");
            dtSiteData.Columns.Add("STATE");
            dtSiteData.Columns.Add("ZIP");
            dtSiteData.Columns.Add("COUNTRY");
            dtSiteData.Columns.Add("OWNER");
            dtSiteData.Columns.Add("PHONE");
            dtSiteData.Columns.Add("EMAIL");
            dtSiteData.Columns.Add("CREATED");
            dtSiteData.Columns.Add("LINKSIN");
            dtSiteData.Columns.Add("SPEED");
            dtSiteData.Columns.Add("POPULARITY");
            dtSiteData.Columns.Add("REACH");
            dtSiteData.Columns.Add("DESC");

            DataTable dtDomains = new DataTable();
            dtDomains.Columns.Add("DOMAIN");
            dtDomains.TableName = "Domains";
            // create the alexa request url
            string url = "http://alexa.com/xml/dad?url=" + domain;

            XmlDocument doc = new XmlDocument();
            // load the xml document from the url
            doc.Load(url);
            XmlNodeList relatedNods = doc.SelectNodes("//RL");
            DataRow relRow = null;
            string href = "";
            string titl = "";
            foreach (XmlNode nod in relatedNods)
            {
                relRow = dtRelated.NewRow();
                href = nod.Attributes["HREF"].InnerText;
                titl = nod.Attributes["TITLE"].InnerText;
                relRow.ItemArray = new object[] {href, titl};
                dtRelated.Rows.Add(relRow);
            }
            dtRelated.TableName = "RelatedSites";
            ds.Tables.Add(dtRelated);
            XmlNodeList domainsNods = doc.SelectNodes("//DO");
            DataRow doRow = null;
            string dom = "";

            foreach (XmlNode nod in domainsNods)
            {
                doRow = dtDomains.NewRow();
                dom = nod.Attributes["DOMAIN"].InnerText;
                doRow.ItemArray = new object[] {dom};
                dtDomains.Rows.Add(doRow);
            }

            ds.Tables.Add(dtDomains);
            string title = doc.SelectSingleNode("//SITE").Attributes[1].InnerText;
            string street = doc.SelectSingleNode("//ADDR").Attributes[0].InnerText;
            string city = doc.SelectSingleNode("//ADDR").Attributes[1].InnerText;
            string state = doc.SelectSingleNode("//ADDR").Attributes[2].InnerText;
            string zip = doc.SelectSingleNode("//ADDR").Attributes[3].InnerText;
            string country = doc.SelectSingleNode("//ADDR").Attributes[4].InnerText;
            string owner = doc.SelectSingleNode("//OWNER").Attributes[0].InnerText;
            string phone = doc.SelectSingleNode("//PHONE").Attributes[0].InnerText;
            string email = doc.SelectSingleNode("//EMAIL").Attributes[0].InnerText;
            string created = doc.SelectSingleNode("//CREATED").Attributes[0].InnerText;
            string linksin = doc.SelectSingleNode("//LINKSIN").Attributes[0].InnerText;
            string speed = doc.SelectSingleNode("//SPEED").Attributes[0].InnerText;
            string popularity = doc.SelectSingleNode("//POPULARITY").Attributes[1].InnerText;
            string reach = doc.SelectSingleNode("//REACH").Attributes[0].InnerText;
            string desc = doc.SelectSingleNode("//SITE").Attributes[2].InnerText;
            DataRow row = dtSiteData.NewRow();
            row.ItemArray =
                new object[]
                    {
                        title, street, city, state, zip, country, owner, phone, email, 
                        created, linksin, speed, popularity,reach, desc
                    };
            dtSiteData.Rows.Add(row);
            ds.Tables.Add(dtSiteData);
            return ds;
        }
    }
}
Most developers should be able to walk through the above code with no explanations, so I'll leave it to stand on it's own.  Astute readers may be wondering why I didn't just use the "ReadXml" method of the DataSet class. Believe me, that was the first thing I tried. Unfortunately, DataSet isn't that smart - and if it finds duplicate field names in different tables, it will choke. Alexa's XML Document has such issues.

For fun, take a look at some of the Alexa data that's been hacked. For example, look up Live.com. The owner is listed as "Hacker Rootkit.Com.cn", and the related sites have obviously been hacked by somebody from China that wanted a good laugh. There are all kinds of little tricks one learns in this area. For example, try these two google searches and see which one gives better results:   

[yoursite.com*] -site:yoursite.com     --or --    link:yoursite.com

The downloadable Visual Studio 2005 Solution  includes a nice test harness web page with a DetailsView and two GridViews to show off the data. If you are just curious and would like to try out a live version of this, I've got one up on my  "Playground Site" at IttyUrl.Net.

Enjoy.
Popularity  (741 Views)
Picture
Biography - Peter Bromberg
Peter Bromberg is a C# MVP, MCP, and .NET expert who has worked in banking, financial and telephony for over 20 years. Pete focuses exclusively on the .NET Platform, and currently develops SOA and other .NET applications for a Fortune 500 clientele. Peter enjoys producing digital photo collage with Maya,playing jazz flute, the beach, and fine wines. You can view Peter's UnBlog and IttyUrl sites. Follow Microsoft MVP
Create New Account
Article Discussion: An Alexa Site Data Utility Class
Peter Bromberg posted at Saturday, October 20, 2007 4:32 PM
reply
need to ask something sir
carl john redor replied to Peter Bromberg at Saturday, October 20, 2007 4:32 PM
hi sir, im carl john redor, a student from Philippines. I have to work with my thesis using visual studio 2005.. i just want to ask how to code timing method or the "in/out timing" in loaning process of a school library? i really need help sir... i hope you can reply with my post, as soon as you read this... im a beginner sir.. thank you...
reply