search
Twitter Rss Feeds
MicrosoftArticlesForumsGroups
C# .NET
VB.NET
Visual Studio .NET
ADO.NET
Xml/Xslt
VB 6.0
.NET CF
GDI+
LINQ
Deployment
Security
FoxPro
Silverlight / WPF
Entity Framework
RIA Services

Web ProgrammingArticlesForumsGroups
JavaScript
ASP
ASP.NET
Web Services

Non-MicrosoftArticlesForumsGroups
NHibernate
Perl
PHP
Ruby
Java
Linux / Unix
Apple
Open Source

DatabasesArticlesForumsGroups
SQL Server
Access
Oracle
MySQL
Other Databases

OfficeArticlesForumsGroups
Microsoft Excel
Microsoft Word
Microsoft Powerpoint
Publisher
Money

Operating SystemsArticlesForumsGroups
Windows 7
Windows Server
Windows Vista
Windows XP
Windows Update
MAC
Linux / UNIX

Server PlatformsArticlesForumsGroups
Share Point
BizTalk
Site Server
Exhange Server
IIS
Transaction Server

Graphic DesignArticlesForumsGroups
Macromedia Flash
Adobe PhotoShop
Microsoft Expression

OtherArticlesForumsGroups
Subversion / CVS
Ask Dr. Dotnetsky
Active Directory
Networking
Uninstall Virus
Job Openings
Reviews
Search Engines
Resumes

 
Build a .NET Windows Forms HTML Form Parser
By Peter A. Bromberg, Ph.D.
Printer - Friendly Version
Peter Bromberg

Recently I needed to get form data from HTML form pages over the Internet to build components that received form posts and processed the data into a database table for later conversion to Xml and other processing. It soon became obvious that I might need to do this often so the idea of building some sort of utility to do this came up. It turned out that I decided to use the Classic COM WebBrowser control, along with MSHTML which provides Internet Explorer with complete HTML Document Object Model parsing.

The idea I had in mind was to grab the page using the WebBrowser Control, then use MSHTML to iterate over the FORM collection, adding each element to a DataTable so it could be displayed in a DataGrid on the Winform app. Of course, this concept is not limited to Windows Forms - you could do this in a web page, and proceed to process the Form data in the DataTable in any way you see fit. Probably with a web page, you would choose to use the WebClient or WebRequest classes. Then, when you get the document, you would need to take the following line:

doc = DirectCast(AxWebBrowser1.Document, mshtml.HTMLDocument)

and modify it.

One nice side bonus to using this approach is that by simply adding the DataTable to a new DataSet, we can use the WriteXml method to save our form metadata to a nicely formatted XmlDocument on the hard drive, or even save it to a database. From this point, enterprising developers who feel so inclined would be able to use this with either CodeDom or XSD to generate a class that represents an instance of your HTML Form, making an object-oriented programming approach to handling custom HTML Form data much easier.

This code is in VB.NET cause the shop that I've been working at seems to think that's the way to go, but it's only a minor inconvenience. In fact, I've been writing so much VB.NET code lately its kind of like "deja-vu all over again". Here are the main methods all located in the single form. It's not very pretty from an OOP perspective because I was reusing the same project to do a whole bunch of hacking, but most people should be able to follow it fairly easily:

 Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs)   _
                                         Handles Button1.Click
  DataGrid1.DataSource = Nothing
  dtForm.Rows.Clear()
        Dim url As String = TextBox1.Text
        If url = "" Then
            MessageBox.Show("enter url please.")
            Return
        End If
        AxWebBrowser1.Navigate(url)
    End Sub
 Private Sub AxWebBrowser1_DocumentComplete(ByVal sender As Object, ByVal e As _
   AxSHDocVw.DWebBrowserEvents2_DocumentCompleteEvent) Handles
 AxWebBrowser1.DocumentComplete
  doc = DirectCast(AxWebBrowser1.Document, mshtml.HTMLDocument)
  DataGrid1.DataSource = Nothing
  dtForm.Rows.Clear()
  Try
   ' you could enhance this...did I forget any?
   GetAll("input")
   GetAll("textarea")
   GetAll("select")
   DataGrid1.DataSource = dtForm
  Catch ex As Exception
   MessageBox.Show(ex.Message & vbCrLf & ex.StackTrace)
  End Try
 End Sub
 Public Function GetAll(ByVal strTagName As String) As Boolean
  Dim all As mshtml.IHTMLElementCollection = doc.getElementsByTagName(strTagName)
  Dim elm As mshtml.IHTMLElement
  Dim strName As String
  Dim strId As String
  Dim strvalue As String
  Dim strType As String
  For Each elm In all
   strName = elm.getAttribute("NAME")
   strId = elm.id
   strvalue = elm.innerText
   strType = elm.getAttribute("type")
   Dim retval As Object() = {elm.tagName, strType, strName,
 IIf(elm.tagName <> "SELECT", strvalue, "")}
   dtForm.Rows.Add(retval)
   If elm.tagName = "SELECT" Then
    Dim chElem As mshtml.IHTMLElement
    Dim ch As mshtml.IHTMLElementCollection = elm.children
    For Each chElem In ch
     Dim retval2 As Object() = {"", "", "", chElem.tagName, chElem.getAttribute("value"),
 chElem.innerText}
     dtForm.Rows.Add(retval2)
    Next
   End If
  Next
  dtForm.AcceptChanges()
  Return True
 End Function
 Private Sub AxWebBrowser1_NavigateComplete2(ByVal sender As Object, _
  ByVal e As AxSHDocVw.DWebBrowserEvents2_NavigateComplete2Event) _
 Handles  AxWebBrowser1.NavigateComplete2
  doc = DirectCast(AxWebBrowser1.Document, mshtml.HTMLDocument)
 End Sub

 Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) _
                                                  Handles MyBase.Load
  dtForm = New DataTable
  dtForm.Columns.Add("TYPE")
  dtForm.Columns.Add("SUBTYPE")
  dtForm.Columns.Add("VALUE")
  dtForm.Columns.Add("SubElement")
  dtForm.Columns.Add("SubValue")
  dtForm.Columns.Add("SubDisplay")
  dtForm.AcceptChanges()
 End Sub

 Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) _
                                                                       Handles Button2.Click
  Dim ds As DataSet = New DataSet
  ds.Tables.Add(dtForm)
  ds.WriteXml("FormData.xml")
  Label1.Text = "Data Saved to FormData.xml"
 End Sub

The result when you point the app at msdn.microsoft.com, looks like so:

Your saved Xml will look like the following snippet:

<Table1>
<TYPE>SELECT</TYPE>
<SUBTYPE>select-one</SUBTYPE>
<VALUE>prodtech</VALUE>
<SubElement />
</Table1>
<Table1>
<TYPE />
<SUBTYPE />
<VALUE />
<SubElement>OPTION</SubElement>
<SubValue>0</SubValue>
<SubDisplay>Choose a Product or Technology</SubDisplay>
</Table1>
<Table1>
<TYPE />
<SUBTYPE />
<VALUE />
<SubElement>OPTION</SubElement>
<SubValue>/access/</SubValue>
<SubDisplay>Access</SubDisplay>
</Table1>

The complete VS.NET 2003 Solution can be downloaded from the link below:

Download the Source Code that accompanies this article

 


Peter Bromberg is a C# MVP, MCP, and .NET consultant who has worked in the banking and financial industry for 20 years. He has architected and developed web - based corporate distributed application solutions since 1995, and focuses exclusively on the .NET Platform. Pete's samples at GotDotNet.com have been downloaded over 41,000 times. You can read Peter's UnBlog Here.  --><--NOTE: Post QUESTIONS on FORUMS!
Do you have a question or comment about this article? Have a programming problem you need to solve? Post it at eggheadcafe.com forums and receive immediate email notification of responses.