Make The WebClient Class follow redirects and get Target Url
By Peter Bromberg
How to make the .NET WebClient class follow redirects and get the target url
Unlike its brother HttpWebRequest, the WebClient class automatically follows redirects,
but if you need to get the "final" url, you'll need to "wrap"
your WebClient in a class that derives from System.Net.WebClient. Here's an example:
using System;
using System.Net;
public class MyWebClient : WebClient
{
Uri _responseUri;
public Uri ResponseUri
{
get { return _responseUri; }
}
protected override WebResponse GetWebResponse(WebRequest request)
{
WebResponse response = null;
try
{
response = base.GetWebResponse(request);
_responseUri = response.ResponseUri;
}
catch
{
}
return response;
}
}
By overriding the GetWebResponse method, we can populate a ResponseUri property with
the final target of any 302 rediirects. Redirects are very common in all kinds
of websites as they allow the owner to count hits, and log information before
sending you on your merry way to the target.
Here's some sample code that goes through a whole list of integer "Redirect
Ids", assembles the page title and final url, and saves these to a delimited
text file that can be read later:
static string urlbase="http://sitewithredirect.com/Redirect.aspx?id=";
static void ProcessUrls()
{
string regex = @"(?<=<title.*>)([\s\S]*)(?=</title>)";
for (int i =1; i < 4000; i++)
{
string item = i.ToString();
string url = urlbase + item;
string content = null;
string targetUrl = null;
string title = null;
MyWebClient w = new MyWebClient();
try
{
content = w.DownloadString(url);
targetUrl = w.ResponseUri.ToString();
Regex rex = new Regex(regex, RegexOptions.IgnoreCase);
title = rex.Match(content).Value.Trim();
System.Diagnostics.Debug.WriteLine(targetUrl);
}
catch
{
System.Diagnostics.Debug.WriteLine(item);
}
w.Dispose();
if (targetUrl != null && title != null )
{
System.IO.File.AppendAllText(@"urls.txt", targetUrl + "|" + title + "\r\n");
}
}
}
Popularity (2302 Views)
 |
| Biography - Peter Bromberg |
Peter Bromberg is a C# MVP, MCP, and .NET expert who has worked in banking, financial and telephony for over 20 years. Pete focuses exclusively on the .NET Platform, and currently develops SOA and other .NET applications for a Fortune 500 clientele. Peter enjoys producing digital photo collage with Maya,playing jazz flute, the beach, and fine wines. You can view Peter's UnBlog and IttyUrl sites.
|  |
|
|
Article Discussion: Make The WebClient Class follow redirects
Zeljko Mrcic replied
to Peter Bromberg at Tuesday, September 28, 2010 4:43 AM
Hi...great article...solved me a lot off problems...
and question:
if I have uri like http://somedomain.com/september/
and its giving me a response, like normal request, like http://somedomain.com/september/septemberdata.hml
how can I find which file is at the end of the uri?
urgent...please help