Search EggHeadCafe's Job Board
EggHeadCafe Silverlight WPF ASP.NET VB.NET C# Excel SQL Server SharePoint
search
.NET Framework GroupsView
.NET Distributed_Apps
.NET
.NET ADO.NET
.NET ASP.NET
.NET ASP.NET Security
.NET ASP.NET Webcontrols
.NET ASP.NET Web Services
.NET Clr
.NET Compact Framework
.NET Drawing
.NET Interop
.NET Performance
.NET Web Services
.NET Windows Forms
.NET Windows Forms Controls
.NET General
.NET Csharp
.NET Visual Basic
.NET Vc
.NET Security
.NET Xml
Vsnet Debugging
Xml
Xsl
Scripting Jscript
Scripting Visual Basicscript
Scripting Wsh
Smartphone Developer
Visual Basic Com
Visual Basic Controls
Visual Basic Crystal
Visual Basic Database Ado
Visual Basic Syntax
Visual Basic Winapi
Vc Atl
Vc Debugger
Vc Language
Vc Mfc
Vc Stl
Visio Developer Visual Basica
Windowsce Embedded Vc
Windows Powershell
Visual Basic Vista Compatibility
Deployment Server
.NET Micro Porting

Group SummariesView
.NET Framework
Access
BizTalk
Certifications
CRM
DDK
Exchange Server
FoxPro
French
French .NET
Games
German
German .NET
Graphic Design
IIS
Internet
ISA Server
Italian
Italian .NET
Maps
MCIS
Miscellaneous
Mobile Apps
Money
MSN
Networking
Office
Ops Mgr
Publisher
Security
SharePoint
Small Business
Spanish
Spanish .NET
SQL Server
Systems Management Server
Transaction Server
Virtual PC / Virtual Server
Visual Studio
Win32
Windows 2000
Windows 2003 Server
Windows 7
Windows Live
Windows Media
Windows Update
Windows Vista
Windows XP
 

View All Microsoft NET Csharp Posts  Ask A New Question 

Multi-Threaded App

Robert Sheppard posted on Thursday, February 14, 2008 3:42 PM

I am new to C# and am trying to build a multi-threaded web crawler. I want
to crawl many sites all at once. I know how to use IHTMLDocument2 to parse
the document object but I want to launch multiple threads to parse each
induvidual web page.

With the WebBrowser control I can start parsing when I get the
Documet_Complete event but how can I do this with each web site on a
different thread? How are the Document_Complete events
handled in a multi-threaded environment?

This is an Asycronous operation and so I cannot see how it can be
done.
reply

 

Robert, This would be difficult in this situation.

Nicholas Paldino [.NET/C# MVP] posted on Thursday, February 14, 2008 4:18 PM

Robert,

This would be difficult in this situation.  You couldn't use the
WebBrowser control, because it needs to be tied to a UI thread.

You could use MSHTML through COM interop.  However, you would have to
make sure that every thread that you use MSHTML on is set up so that the
ApartmentState for that thread is STA.  I am not sure about this, but I also
believe you would have to pump messages in order for the events to work
correctly.

Needless to say, it's a better idea in this case to use
HttpWebRequest/HttpWebResponse and then take the content from those and set
the content of a new MSHTML instance in your thread to the content
downloaded.  This way, you don't have to wait for MSHTML to download the
document, and you can work with it right away.


--
- Nicholas Paldino [.NET/C# MVP]
- mvp@spam.guard.caspershouse.com
reply

Thanks... I will look at HttpWebRequest/HttpWebResponse.

Robert Sheppard posted on Thursday, February 14, 2008 5:54 PM

Thanks... I will look at HttpWebRequest/HttpWebResponse. The old VB6 crawler
that I am porting from was using the WebBrowser control, which works fine
but very slow. Let me stress SLOW.
Thanks again for the help.

also
set
want
parse
reply

Robert, Do you have a specific need to parse the entire document, or are

Nicholas Paldino [.NET/C# MVP] posted on Thursday, February 14, 2008 10:19 PM

Robert,

Do you have a specific need to parse the entire document, or are you
looking for specific parts?  If you don't need to parse the entire document,
and what you are looking to scrape from the HTML is specific, then using
HttpWebRequest and HttpWebResponse will probably simplify things
considerably.


--
- Nicholas Paldino [.NET/C# MVP]
- mvp@spam.guard.caspershouse.com
reply


Previous Microsoft NET Csharp conversation.