| This is a web - based implementation
of converting HTML to well-formed XHTML using Chris Lovett of Microsoft's
excellent SGMLReader.
Chris's code has a command - line interface; however I needed an in-memory
implementation for some work we're experimenting on that takes well-formed
XHTML and converts it to RTF for display in a RichTextBox control. There
are many other uses for XHTML compliant HTML, not the least of which is
the fact that an XHTML page is a legitimate, well-formed XML document,
which opens up a whole new range of possibilities for HTML processing
when you think about it...
In order to make this work as a class library for use on the web or in-memory
in an application, I needed to write a small "helper class",
and I also needed to change the way errors are written in Lovett's SgmlReader
class to a string property (the existing code was designed to write errors
to an optional log file with a TextWriter, I needed to be able to return
the concatenated error string to the web page for display instead). Below
appears my helper class code:
using System;
using Sgml;
using System.IO;
using System.Xml;
using System.Text;
using System.Web;namespace SgmlReaderDll
{
/// <summary>
/// Helper class to allow string processing using SGMLReader/Parser
/// </summary>
public class SGMLReaderHelper
{
private string _errors;
public string Errors
{
get
{
return _errors;
}
set
{
_errors = value;
}
} public SGMLReaderHelper()
{
}
public string ProcessString(string strInputHtml)
{
string strOutputXhtml = String.Empty;
SgmlReader reader = new SgmlReader();
reader.DocType ="HTML";
StringReader sr = new System.IO.StringReader(strInputHtml);
reader.InputStream = sr;
StringWriter sw = new StringWriter();
XmlTextWriter w =new XmlTextWriter( sw);
reader.Read();
while(!reader.EOF)
{
w.WriteNode(reader,true);
}
w.Flush();
w.Close();
this.Errors=reader.ErrorLog;
return sw.ToString();
}
}
} |
There are a lot of interesting uses for this type of utility. One which
I use again and again is the ability to take an HTML web page that is
not XHTML compliant, run it through this utility, and get back a valid
XML document that fixes attributes with no quotes around them, self-closes
HTML tags that need to be closed, and automatically surrounds script blocks
in CDATA sections. The result can be saved with an XSL extension, and
you are on your way to creating your XSL Stylesheet for your XML Transformation
to create dynamic web pages!
And now for the fun part. Click the link below, which will bring you
to the ASP.NET web page that allows you to paste your HTML and receive
back XHTML, along with a report from Chris's creation that reports any
errors:
Try
the HTML to XHTML web page
As always, the full solution may be downloaded from the
link below. Thanks to Chris Lovett for some really useful code.
Download
the code that accompanies this article
Peter Bromberg is a C# MVP, MCP, and .NET consultant who has worked in the banking and financial industry for 20 years. He has architected and developed web - based corporate distributed application solutions since 1995, and focuses exclusively on the .NET Platform. Pete's samples at GotDotNet.com have been downloaded over 41,000 times. You can read Peter's UnBlog Here. --><-- NOTE: Post QUESTIONS on FORUMS! |  |
Do you have a question or comment about this article? Have a programming problem you need to solve? Post it at eggheadcafe.com forums and receive immediate email notification of responses.
|