search
Japanese Chinese Nederlands Espanol Italiano Deutsch Francais Twitter Rss Feeds
.NET Framework GroupsView
Deployment Server
.NET Distributed_Apps
.NET
.NET ADO.NET
.NET ASP.NET
.NET ASP.NET Security
.NET ASP.NET Webcontrols
.NET ASP.NET Web Services
.NET Clr
.NET Compact Framework
.NET Drawing
.NET Interop
.NET Micro Porting
.NET Performance
.NET Web Services
.NET Windows Forms
.NET Windows Forms Controls
.NET General
.NET Csharp
.NET Visual Basic
.NET Vc
.NET Security
.NET Xml
Scripting Jscript
Scripting Visual Basicscript
Scripting Wsh
Smartphone Developer
Visual Basic Com
Visual Basic Controls
Visual Basic Crystal
Visual Basic Database Ado
Visual Basic Syntax
Visual Basic Vista Compatibility
Visual Basic Winapi
Vc Atl
Vc Debugger
Vc Language
Vc Mfc
Vc Stl
Visio Developer Visual Basica
Vsnet Debugging
Windows Powershell
Windowsce Embedded Vc
Xml
Xsl

Group SummariesView
.NET Framework
Access
BizTalk
Certifications
CRM
DDK
Exchange Server
FoxPro
French
French .NET
Games
German
German .NET
Graphic Design
IIS
Internet
ISA Server
Italian
Italian .NET
Maps
MCIS
Miscellaneous
Mobile Application Development
Money
MSN
Networking
Office
Ops Mgr
Publisher
Security
SharePoint
Small Business
Spanish
Spanish .NET
SQL Server
Systems Management Server
Transaction Server
Virtual PC / Virtual Server
Visual Studio
Win32
Windows 2000
Windows 2003 Server
Windows 7
Windows Live
Windows Media
Windows Update
Windows Vista
Windows XP
 

View All Microsoft NET Csharp Posts  Ask A New Question 

Multi-Threaded App - Robert Sheppard

Thursday, February 14, 2008 3:42 PM

I am new to C# and am trying to build a multi-threaded web crawler. I want
to crawl many sites all at once. I know how to use IHTMLDocument2 to parse
the document object but I want to launch multiple threads to parse each
induvidual web page.

With the WebBrowser control I can start parsing when I get the
Documet_Complete event but how can I do this with each web site on a
different thread? How are the Document_Complete events
handled in a multi-threaded environment?

This is an Asycronous operation and so I cannot see how it can be
done.
reply
 

Robert, This would be difficult in this situation. - Nicholas Paldino [.NET/C# MVP]

Thursday, February 14, 2008 4:18 PM

Robert,

This would be difficult in this situation.  You couldn't use the
WebBrowser control, because it needs to be tied to a UI thread.

You could use MSHTML through COM interop.  However, you would have to
make sure that every thread that you use MSHTML on is set up so that the
ApartmentState for that thread is STA.  I am not sure about this, but I also
believe you would have to pump messages in order for the events to work
correctly.

Needless to say, it's a better idea in this case to use
HttpWebRequest/HttpWebResponse and then take the content from those and set
the content of a new MSHTML instance in your thread to the content
downloaded.  This way, you don't have to wait for MSHTML to download the
document, and you can work with it right away.


--
- Nicholas Paldino [.NET/C# MVP]
- mvp@spam.guard.caspershouse.com
reply

Thanks... I will look at HttpWebRequest/HttpWebResponse. - Robert Sheppard

Thursday, February 14, 2008 5:54 PM

Thanks... I will look at HttpWebRequest/HttpWebResponse. The old VB6 crawler
that I am porting from was using the WebBrowser control, which works fine
but very slow. Let me stress SLOW.
Thanks again for the help.

also
set
want
parse
reply

Robert, Do you have a specific need to parse the entire document, or are - Nicholas Paldino [.NET/C# MVP]

Thursday, February 14, 2008 10:19 PM

Robert,

Do you have a specific need to parse the entire document, or are you
looking for specific parts?  If you don't need to parse the entire document,
and what you are looking to scrape from the HTML is specific, then using
HttpWebRequest and HttpWebResponse will probably simplify things
considerably.


--
- Nicholas Paldino [.NET/C# MVP]
- mvp@spam.guard.caspershouse.com
reply

Previous Microsoft NET Csharp conversation.