AdSense, SEO, RSS, Management Tips for Large Web Sites

By Robbe Morris

Robbe shares a few thoughts on Adsense, SEO, and techniques for getting the most out of large content web sites. He discusses content pagination, rss feeds, ad placement, search engine bots, and performance.

EggHeadCafe has grown steadily over the last 9 years and I figured it was time I shared a few things we've learned about getting the most out of advertising on a large web site. I'll leave out a lot of the common things you'll read elsewhere and just focus on the things I think you'll get the most out of.

In 2009, we continued to use Google AdSense as our primary ad delivery platform along with Lake Quincy Media for certain areas where AdSense just didn't perform well. We are a custom AdSense publisher now which is why I chose not to address common AdSense formatting and placement suggestions. Let me take you through a list of recommendations for displaying content that uses a contextual based ad network. I'll also discuss a few of the complications.

1. Words matter

Google AdSense works reasonably well for content sites whose keywords match well with high paying advertisers. Unfortunately for EggHeadCafe, Google AdSense is less than optimal although it performs better than most other widely available alternatives. Much of the content on a developer site like EggHeadCafe deals with technical keywords that are ambiguous in their meaning. An article on JavaScript cookies can often lead to contextual ads for cookie recipes. Forum posts about a Grid control often returns ads for grid computing or even electrical grids. Excel can trigger ads for automobiles with the word Excel in their name.

Not only do you need to be aware of keywords and how they are interpreted in your articles and forum posts, you need to be aware of the text in links that surround the content in your heading, right, left, and bottom navigation. Even though you may use Google AdSense's section targeting for inclusion or using the ignore attribute, it doesn't always work the way you want. The impact isn't always immediate either in a negative or positive way. For instance, if your page content is largely about Microsoft Excel and you have navigation links that reference SharePoint, you are likely to get SharePoint ads as well. In fact, because of Google AdSense's ranking algorythum, SharePoint ads could dominate a Microsoft Excel page.

With that mind, we recently started collapsing our left navigation topic selection options for content pages. We've seen a slow but sure return to more accurate ad targeting (never all that great to begin with) and higher click thrus. The click thru rate increase is due to both fewer items on the page to distract eyes as well as better ad targeting. This scenario is just one of many places that doing the right thing for a visitor has an adverse affect on your advertising capabilities. We also show different user interfaces to our signed in members than casual non-members. Often times, we don't show ads on pages being viewed by a member because of extremely low click thrus. So, we can show things like our topic navigation to members without contextual ad search bots being affected negatively.

The second option is looking at certain keywords and either wrapping other words around them or using a synonym. For instance, instead of just using Excel, I've found that prefacing it with the word Microsoft can help ad targeting. This is pretty easy to do in articles but of course much harder to accurately do in forum posts submitted by others. Another example is using the word "hierarchy" instead of "tree" or "treeview" wherever it would still make sense to the person reading your article.

As you can see, words matter. So, carefully consider the subject matter of your content and what keywords could be interpreted differently than you might want and act accordingly. If you are using a contextual ad provider, learn what subject matter and keywords pay better than others. Wherever possible, try to pick topics that have a relatively small number of keywords required to trigger certain types of ads that would do well on your site. As an example, the term "Microsoft Excel Macro" is much easier to target ads for than "C# ADO.NET and Performance Tests".

If you happen to run a .NET programming web site, you'll want to try and identify keywords that are likely to trigger third party control vendors. They typically have the best click thrus for this audience. Wherever possible, you would want to use wording that would trigger AdSense to believe your page is a likely gateway to a purchasing decision. Admittedly, this is easier said than done and not always a good fit for your articles. Just beware the tactic can have a positive affect on your earnings per thousand impressions (eCPM).

2. RSS Feeds

RSS feeds can be a double edged sword. Aside from never publishing an entire article or an entire forum post in our RSS feeds, I recommend exporting a different title and summary description than the one used on your web site. I also recommend creating titles and summary descriptions that are not as keyword rich and also have their keywords positioned after several "noise" words. For example, and article with a title of "Branding SharePoint Search Center With A Bing Look and Feel" might be titled in the RSS feed as "Learn To Custom Brand Search Center In SharePoint To Look Like Bing". The second title positions its important keywords like SharePoint and Search Enter in a different order than most people would likely type into Google.

Why is this important?

Duplicate content penalties in major search engines can occur when too much of your content exists exactly as it does on other web sites. Using the exact same title and description on your site as those who republish your RSS feed can lead to these sites competing with yours for search engine ranking for your own article! If your own site has a solid search engine ranking history, you will eventually win out (often after 2-3 weeks). By using different titles and summary descriptions, you still get the benefit of these other sites linking to you and driving traffic but you are now much more likely to get your page ranked higher and sooner than their RSS listing page for the specific keyword phrases you think will most often be used.

The same works in reverse. Most sites that republish RSS feeds do so in an automated way. If this is a key part of your web site traffic, I'd recommend against this. I'd take the RSS feeds and manually review each entry and make the text much more unique to your web site.

One last suggestion is to include an html anchor tag in the summary description that links directly to your page. Many RSS feed syndicators and republishers run clicks to your article through a redirect page of their own. This often leads to you not getting the search engine optimization benefit of a backlink to your site. While some feed republishers will strip out your html anchor tag, most do not. You can make the decision as to whether you want to permit future easy access to your feed from republishers who alter your content.

3. Noise Words In Titles

As I mentioned above, the order of words in the page title can impact your overall search engine ranking. Try to keep "noise" words like and, or, the, how, to, etc... to a minimum and almost always make sure they are not the first few words in your title. Also make sure that you create a portion of the page that tightly condenses the most important keywords in an organic or natural way. You'll see this in summary descriptions for all of our articles. Sometimes our authors do a good job with this and sometimes they don't.

4. Set Search Engine Bot Traps

Create a variety of deep linked pages that only a search engine bot is likely to find while crawling your site. You'll want to do this for various combinations of "x" number of clicks deep from the home page. This tactic can be quite helpful in understanding how deep the engines crawl your site and how often they go that deep. You could log this behavior or even email yourself when a known bot finds the trap. If the bots find your deep trap pages often, then they are likely crawling your other pages as well. This can also be useful for scheduling deployment times in order to reduce the instances where your site is purposely down while deploying updates.

5. Beware of Adsense's "First in Html" Behavior

To some this is common knowledge. To others it is a complete unknown. As of today, AdSense still uses the placement of your Google AdSense JavaScript code in the order it appears in your HTML source code. Higher paying ads often appear in the first block of AdSense code the media-partners google bot finds while crawling your web site. So, be aware that you may be placing code for lower performing areas on your page first in the HTML and not realizing that the best paying ads are being rendered there.

6. Header Ads

I have found that ads placed at the very top of the page above navigation buttons/links tend to perform much worse than the very same ad format placed just below the navigation buttons/links. If you watch major news sites or other high traffic sites with advertising, you'll see that they too are aware of this. When deciding where to place ads on your pages, try (as best you can) to notice what areas of the page your eyes focus on and where they wonder over. In this case, the top navigation forms a natural cutoff area for people's eye focus. Placing the ads just below the navigation keeps your ads within view of their eyes during that first few critical seconds of viewing your page.

7. Images Next AdSense Ads

At the time this article was written, it was not against Google AdSense TOS to place images next to AdSense ads as long as the images were not similar to the products being shown in the ads or somehow encouraged users to click on ads due to the content of the image. Some AdSense publishers have had considerable success placing generic harmless images next to ad blocks. I have not found there to much difference in most of our ad blocks. In fact, I've often found negative impacts to click thru percentages. So, if you intend to use this tactic, be careful about your image selection and placement and watch your stats closely.

8. Content Listings That Use Pagination

This is a very important subject for large content sites. Most sites show listings in newest to oldest order. Thus, when new content gets added, items that were on page one now get shifted to page two. This simple expected aspect of pagination of new content in newest to oldest order creates real world problems for your web site.

The first issue is that a search engine bots that visits your pagination grid on Monday will index the content of the page as it existed on Monday. When a visitor is directed to your pagination grid page on Wednesday, the content will almost certainly have changed. Often times, the item the search engine said would be there no longer is. This is incredibly frustrating to end users. You've also lost the eventual click thru to the full article for forum post they were after.

The second issue is that the search engine bot cannot enable your specific pagination grid page to gain long term traction as a highly ranked page because the page content is always changing drastically and people are often immediately going back to google to select a different search result.

Admittedly, I did not fully realize the issues with this scenario for many years. Once realizing it, it took a little time to come up with an effective strategy to not only address the problems but also to maximize the benefit.

How do I solve this problem?

What we do at EggHeadCafe is create two different versions (urls) of our pagination pages. One for real users and one for bots/users who found the bot version of the page via a search engine. As you would expect the real user version handles the classic newest to oldest tactic. What we do for the bots is a whole other story.

Many sections of our site hold thousands upon thousands of discussion threads. To page these properly requires performance intensive queries. Under normal load, this is not a problem. Under simultaneous bot crawls from google, bing, yahoo, ask.com, or any of these other small search engine bots, this becomes a real problem.

What we opted to do was write the paged data to disk on demand in order ascending to descending all with a fixed number of items per page. So, old paginated grids would maintain the same listings over a long period of time. New content is indexed towards the end of the possible page numbers. Whenever we show a paginated grid page, we read the data from disk instead of the database. If we ever need to clean up the data, we just deleted the data files written to disk and write them again on demand.

The primary reason we went to disk versus in-memory cache was dealing with performance problems with heavy bot traffic or whenever the app pool recycled. Since the data was unlikely to change over a period of months, it just made sense to query once and read from disk in the future.

In order to properly create your pagination buttons to ensure that the search engine bots find new pages each time they visit an old paginated grid page, you need to make sure the "Last" button really points to the last page of available content as it exists in your database right now. To manage this, we created jobs to update TotalThreads and TotalPosts columns for each of our Topic and TopicArea cross reference tables. We run these jobs every x number of minutes throughout the day. When we render our paginated grid pages (old or new), we use these values from Topic and TopicArea tables to accurately render the proper number of pagination buttons.

One other important tactic is recognizing the difference between a real human visitor and a known bot and knowing when to properly exit the real user out of the cached versions of your paginated grids into your live/real time versions. This can be different for every web site. So, carefully examine the user experience while developing your own strategy for your pagination grids.

And last but not least, if a known bot finds our real user version, we 301 permanent direct them to our bot version. Something like this:

if (SessionController.IsABot())
{
Response.Clear();
Response.ClearHeaders();
Response.Status="301 Moved Permanently";
Response.AddHeader("Location", "http://www.eggheadcafe.com/SomeBotVersionOfThePagingGrid.aspx");
return;
}

At the end of the day, we not only provide a better experience for users that find our paginated grids from the search engine, we've also created thousands of new and permantent rich content pages. You will find that over time, the traffic to these pages will increase to a point where they are very important to your overall revenue.


9. Banned IP Address List

Use your bot traps mentioned above to identify ip addresses that eat of your bandwith and server resources but do not deliver traffic to your site. Sometimes you may need to block access by user agent.  However, my experience has been that most of the people you'll want to block have a reasonably small ip address range they are working with and often use bogus user agent strings.

Here's how...  Load their ip addresses up in memory and stop them from accessing your site at runtime. In ASP.NET, you can stop the request like this in your global.asax.cs:

using System.Data.SqlClient;

public static ObservableCollection<string> BadIPs;

protected void Application_Start(object sender, EventArgs e)
{
LoadBadIPs();
}

protected void Application_PreRequestHandlerExecute(object sender, EventArgs e)
{
if (!IsPageSuitableForBadIP(Request.Url)) return;
if (!IsBadIP(Request.UserHostAddress)) return;
HttpContext.Current.Response.StatusCode = 404;
HttpContext.Current.Response.SuppressContent = true;
HttpContext.Current.Response.End();
}

public static bool IsPageSuitableForBadIP(Uri url)
{
if (url.ToString().Contains(".aspx")) return true;
if (url.ToString().Contains(".asp")) return true;
return false;
}

public static void LoadBadIPs()
{
BadIPs = new ObservableCollection<string>();

using (SqlConnection conn = new SqlConnection("your connection string"))
{
conn.Open();
using (SqlCommand cmd = new SqlCommand("Your sql to get bad ips", conn))
{
var reader = cmd.ExecuteReader();
while (reader.Read()) { BadIPs.Add(reader[0].ToString().Trim()); }
}
}

}


public static bool IsBadIP(string ipAddress)
{
if (BadIPs == null) return false;
foreach (string item in BadIPs) if (ipAddress.StartsWith(item)) return true;
return false;
}

10. AdSense Ads Just Above Or Below Pagination Buttons

For pagination grids accessed by real viewers, you'll want to default the number of items shown to be small enough to ensure your AdSense ads are visible for most screen resolutions. You'll also want your ads separate from your pagination buttons either with plenty of space or with some sort of border. This not only helps with avoiding accidently clicks and can also help with legitimate click thru percentages by making the ads more visible.

Hopefully, you've found the 10 tips helpful. If you have additional suggestions or comments, please submit them below.

Popularity  (1542 Views)
Picture
Biography - Robbe Morris
Robbe has been a Microsoft MVP in C# since 2004. He is also the co-founder of EggHeadCafe.com which provides .NET articles, book reviews, software reviews, and software download and purchase advice.  Robbe also loves to scuba dive and go deep sea fishing in the Florida Keys or off the coast of Daytona Beach. Microsoft MVP
Create New Account
Article Discussion: Adsense and SEO Tips for Large Web Sites
Robbe Morris posted at Saturday, November 28, 2009 12:40 PM