Alexa has been around for quite some time, and provides a lot of information about sites that rank in it's "Top 100,000" list, as determined by the visiting habits of the thousands of geeks who happily wear the Alexa Toolbar on their browsers. Many SEO "experts" say that Alexa data is skewed, and they may very well be correct. However, the fact remains that within the universe of Alexa "contributors" the data is very consistent and still quite valuable. According to Alexa:
The traffic rank is based on three months of aggregated historical traffic data from millions of Alexa Toolbar users and is a combined measure of page views and users (reach). As a first step, Alexa computes the reach and number of page views for all sites on the Web on a daily basis.
The main Alexa traffic rank is based on the geometric mean of these two quantities averaged over time (so that the rank of a site reflects both the number of users who visit that site as well as the number of pages on the site viewed by those users)
You can get a good overview of Alexa methodology and terms at their FAQ page.
You can also affect your own Alexa rank by using Alexa's own redirect url scheme. For example the url below will redirect through Alexa and on to eggheadcafe.com:
http://redirect.alexa.com/redirect?www.eggheadcafe.com
While Alexa provides an extensive API including one to get site information, it requires a license key. The key doesn't cost anything, so if you prefer API's, help yourself to one. However, many developers are not aware that there is a url that Alexa uses to return site data without a developer key or API. The url scheme is very simple:
http://alexa.com/xml/dad?url=microsoft.com
Using this neat little "trick" enabled me to create a nifty class that will return all the goodies in a DataSet, containing three DataTables - one for the Site Owner data and basic Alexa ranking stats, one for Other Domains owned by the site owner, and one for related sites.
Without further adieu, here is all the code for the utility class:
using System.Data;
using System.Xml;
namespace AlexaDataLib
{
public static class AlexaData
{
public static DataSet GetSiteData(string domain)
{
// Create a DataSet, then DataTables for each group of data
// then add the respective needed DataColumns to each
DataSet ds = new DataSet();
DataTable dtRelated = new DataTable();
dtRelated.Columns.Add("HREF");
dtRelated.Columns.Add("TITLE");
DataTable dtSiteData = new DataTable();
dtSiteData.TableName = "SiteData";
dtSiteData.Columns.Add("TITLE");
dtSiteData.Columns.Add("STREET");
dtSiteData.Columns.Add("CITY");
dtSiteData.Columns.Add("STATE");
dtSiteData.Columns.Add("ZIP");
dtSiteData.Columns.Add("COUNTRY");
dtSiteData.Columns.Add("OWNER");
dtSiteData.Columns.Add("PHONE");
dtSiteData.Columns.Add("EMAIL");
dtSiteData.Columns.Add("CREATED");
dtSiteData.Columns.Add("LINKSIN");
dtSiteData.Columns.Add("SPEED");
dtSiteData.Columns.Add("POPULARITY");
dtSiteData.Columns.Add("REACH");
dtSiteData.Columns.Add("DESC");
DataTable dtDomains = new DataTable();
dtDomains.Columns.Add("DOMAIN");
dtDomains.TableName = "Domains";
// create the alexa request url
string url = "http://alexa.com/xml/dad?url=" + domain;
XmlDocument doc = new XmlDocument();
// load the xml document from the url
doc.Load(url);
XmlNodeList relatedNods = doc.SelectNodes("//RL");
DataRow relRow = null;
string href = "";
string titl = "";
foreach (XmlNode nod in relatedNods)
{
relRow = dtRelated.NewRow();
href = nod.Attributes["HREF"].InnerText;
titl = nod.Attributes["TITLE"].InnerText;
relRow.ItemArray = new object[] {href, titl};
dtRelated.Rows.Add(relRow);
}
dtRelated.TableName = "RelatedSites";
ds.Tables.Add(dtRelated);
XmlNodeList domainsNods = doc.SelectNodes("//DO");
DataRow doRow = null;
string dom = "";
foreach (XmlNode nod in domainsNods)
{
doRow = dtDomains.NewRow();
dom = nod.Attributes["DOMAIN"].InnerText;
doRow.ItemArray = new object[] {dom};
dtDomains.Rows.Add(doRow);
}
ds.Tables.Add(dtDomains);
string title = doc.SelectSingleNode("//SITE").Attributes[1].InnerText;
string street = doc.SelectSingleNode("//ADDR").Attributes[0].InnerText;
string city = doc.SelectSingleNode("//ADDR").Attributes[1].InnerText;
string state = doc.SelectSingleNode("//ADDR").Attributes[2].InnerText;
string zip = doc.SelectSingleNode("//ADDR").Attributes[3].InnerText;
string country = doc.SelectSingleNode("//ADDR").Attributes[4].InnerText;
string owner = doc.SelectSingleNode("//OWNER").Attributes[0].InnerText;
string phone = doc.SelectSingleNode("//PHONE").Attributes[0].InnerText;
string email = doc.SelectSingleNode("//EMAIL").Attributes[0].InnerText;
string created = doc.SelectSingleNode("//CREATED").Attributes[0].InnerText;
string linksin = doc.SelectSingleNode("//LINKSIN").Attributes[0].InnerText;
string speed = doc.SelectSingleNode("//SPEED").Attributes[0].InnerText;
string popularity = doc.SelectSingleNode("//POPULARITY").Attributes[1].InnerText;
string reach = doc.SelectSingleNode("//REACH").Attributes[0].InnerText;
string desc = doc.SelectSingleNode("//SITE").Attributes[2].InnerText;
DataRow row = dtSiteData.NewRow();
row.ItemArray =
new object[]
{
title, street, city, state, zip, country, owner, phone, email,
created, linksin, speed, popularity,reach, desc
};
dtSiteData.Rows.Add(row);
ds.Tables.Add(dtSiteData);
return ds;
}
}
}Most developers should be able to walk through the above code with no explanations, so I'll leave it to stand on it's own. Astute readers may be wondering why I didn't just use the "ReadXml" method of the DataSet class. Believe me, that was the first thing I tried. Unfortunately, DataSet isn't that smart - and if it finds duplicate field names in different tables, it will choke. Alexa's XML Document has such issues.
For fun, take a look at some of the Alexa data that's been hacked. For example, look up Live.com. The owner is listed as "Hacker Rootkit.Com.cn", and the related sites have obviously been hacked by somebody from China that wanted a good laugh. There are all kinds of little tricks one learns in this area. For example, try these two google searches and see which one gives better results:
[yoursite.com*] -site:yoursite.com --or -- link:yoursite.com
The downloadable Visual Studio 2005 Solution includes a nice test harness web page with a DetailsView and two GridViews to show off the data. If you are just curious and would like to try out a live version of this, I've got one up on my "Playground Site" at IttyUrl.Net.
Enjoy. |