|
Recently I needed to get form data from HTML form pages over the Internet
to build components that received form posts and processed the data into
a database table for later conversion to Xml and other processing. It soon
became obvious that I might need to do this often so the idea of building
some sort of utility to do this came up. It turned out that I decided to
use the Classic COM WebBrowser control, along with MSHTML which provides
Internet Explorer with complete HTML Document Object Model parsing.
The idea I had in mind was to grab the page using the
WebBrowser Control, then use MSHTML to iterate over the FORM collection,
adding each element to a DataTable so it could be displayed in a
DataGrid on the Winform app. Of course, this concept is not limited
to Windows Forms - you could do this in a web page, and proceed to
process the Form data in the DataTable in any way you see fit. Probably
with a web page, you would choose to use the WebClient or WebRequest
classes. Then, when you get the document, you would need to take
the following line:
doc = DirectCast(AxWebBrowser1.Document,
mshtml.HTMLDocument)
and modify it.
One nice side bonus to using this approach is
that by simply adding the DataTable to a new DataSet, we can use
the WriteXml method to save our form metadata to a nicely formatted
XmlDocument on the hard drive, or even save it to a database. From
this point, enterprising developers who feel so inclined would be
able to use this with either CodeDom or XSD to generate a class that
represents an instance of your HTML Form, making an object-oriented
programming approach to handling custom HTML Form data much easier.
This code is in VB.NET cause the shop that I've been working at seems
to think that's the way to go, but it's only a minor inconvenience.
In fact, I've been writing so much VB.NET code lately its kind of
like "deja-vu all over again". Here are the main methods all located
in the single form. It's not very pretty from an OOP perspective
because I was reusing the same project to do a whole bunch of hacking,
but most people should be able to follow it fairly easily:
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) _
Handles Button1.Click
DataGrid1.DataSource = Nothing
dtForm.Rows.Clear()
Dim url As String = TextBox1.Text
If url = "" Then
MessageBox.Show("enter url please.")
Return
End If
AxWebBrowser1.Navigate(url)
End Sub
Private Sub AxWebBrowser1_DocumentComplete(ByVal sender As Object, ByVal e As _
AxSHDocVw.DWebBrowserEvents2_DocumentCompleteEvent) Handles
AxWebBrowser1.DocumentComplete
doc = DirectCast(AxWebBrowser1.Document, mshtml.HTMLDocument)
DataGrid1.DataSource = Nothing
dtForm.Rows.Clear()
Try
' you could enhance this...did I forget any?
GetAll("input")
GetAll("textarea")
GetAll("select")
DataGrid1.DataSource = dtForm
Catch ex As Exception
MessageBox.Show(ex.Message & vbCrLf & ex.StackTrace)
End Try
End Sub
Public Function GetAll(ByVal strTagName As String) As Boolean
Dim all As mshtml.IHTMLElementCollection = doc.getElementsByTagName(strTagName)
Dim elm As mshtml.IHTMLElement
Dim strName As String
Dim strId As String
Dim strvalue As String
Dim strType As String
For Each elm In all
strName = elm.getAttribute("NAME")
strId = elm.id
strvalue = elm.innerText
strType = elm.getAttribute("type")
Dim retval As Object() = {elm.tagName, strType, strName,
IIf(elm.tagName <> "SELECT", strvalue, "")}
dtForm.Rows.Add(retval)
If elm.tagName = "SELECT" Then
Dim chElem As mshtml.IHTMLElement
Dim ch As mshtml.IHTMLElementCollection = elm.children
For Each chElem In ch
Dim retval2 As Object() = {"", "", "", chElem.tagName, chElem.getAttribute("value"),
chElem.innerText}
dtForm.Rows.Add(retval2)
Next
End If
Next
dtForm.AcceptChanges()
Return True
End Function
Private Sub AxWebBrowser1_NavigateComplete2(ByVal sender As Object, _
ByVal e As AxSHDocVw.DWebBrowserEvents2_NavigateComplete2Event) _
Handles AxWebBrowser1.NavigateComplete2
doc = DirectCast(AxWebBrowser1.Document, mshtml.HTMLDocument)
End Sub
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) _
Handles MyBase.Load
dtForm = New DataTable
dtForm.Columns.Add("TYPE")
dtForm.Columns.Add("SUBTYPE")
dtForm.Columns.Add("VALUE")
dtForm.Columns.Add("SubElement")
dtForm.Columns.Add("SubValue")
dtForm.Columns.Add("SubDisplay")
dtForm.AcceptChanges()
End Sub
Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) _
Handles Button2.Click
Dim ds As DataSet = New DataSet
ds.Tables.Add(dtForm)
ds.WriteXml("FormData.xml")
Label1.Text = "Data Saved to FormData.xml"
End Sub |
The result when you point the app at msdn.microsoft.com,
looks like so:
Your saved Xml will look like the following snippet:
<Table1>
<TYPE>SELECT</TYPE>
<SUBTYPE>select-one</SUBTYPE>
<VALUE>prodtech</VALUE>
<SubElement />
</Table1>
<Table1>
<TYPE />
<SUBTYPE />
<VALUE />
<SubElement>OPTION</SubElement>
<SubValue>0</SubValue>
<SubDisplay>Choose a Product or Technology</SubDisplay>
</Table1>
<Table1>
<TYPE />
<SUBTYPE />
<VALUE />
<SubElement>OPTION</SubElement>
<SubValue>/access/</SubValue>
<SubDisplay>Access</SubDisplay>
</Table1>
The complete VS.NET 2003 Solution can be downloaded from
the link below:
Download the Source Code that accompanies this article |