|
Most developers who've spent any time
working with XML, especially when it needs to be transmitted from client
browser to IIS over the wire and back again, are painfully aware of a
major limitation inherent in this area: an XML document can often be
three times the size of just the raw data that it's supposed to be
transmitting! As my buddy and co-site developer Robbe once said, "Why
the hell would somebody want to send all that garbage over the wire?"
Tongue-in-cheek, there ended up being more truth to his statement than
I ever imagined. There are, of course, some sensible ways to cut down
on this problem - using attributes instead of elements only, reducing
the size of the tags to only a couple of characters, even to the point
where we will only transmit the elements / nodes that actually contain
data, "leaving out" the empty elements, and if necessary, "reconstruct"
the full XML Document on the server using a template of sorts. And there
are other neat tricks people have come up with.
But the bottom line is, if you are in
a bandwidth-sparse situation where, for example you have people connecting
to an application over 56K modems, the transmission of the XML data over
the wire and back can prove to be a major bottleneck.
I looked into this problem and quite frankly,
didn't find any solutions that seemed worth implementing. So I started
playing around with some data compression algorithms. My first thought
was to try to use Huffman or GZip in script, but I gave up pretty quickly.
When you try to implement a compression algorithm in an interpreted scripting
environment, you can pretty much bet that it's going to take even longer
to get the data compressed that it would to just send it over the wire
as - is.
So then I started looking into some components.
I played with a few, got some promising results, and then I took a look
at XCeed's Streaming Compression Library. This particular component offers
about six different compression methods, is COM compliant (meaning it
can be installed on the client and instantiated in a client - side VBscript
or Javascript function with CreateObject (or new ActiveXObject in JS)
and it seems to offer the most "Bang for the Buck". I am consistently
getting compression ratios of 80% to 95% and more with XML documents.
Compression of a 110K tag-heavy XML Document can take up to four or more
seconds on the client side, so obviously there are some tradeoffs to measure.
But decompression of the compressed document might only take 17/100ths
second. They give you a 20 day free trial, and then the component is only
about $150 or so and I believe it offers an unlimited royalty free distribution
license, which is what I need.
So I set about to see what could be done with this. Long story short,
they have just about ZERO documentation or examples using ASP in either
VBScript or Javascript, and so I had no choice but to blaze my own path.
They do have 2 VB samples, so at least I could borrow some code and comment
out data type declarations and other stuff that wasn't VBScript compliant.
(Correction 8/17/01 - the Xceed people liked my article and have informed
me that they do have some ASP examples now) The real trick in working
with in-memory compression components like this, however, is to understand
that the compressed data is no longer a "string" - it's a byte
array. So you'll need to get comfortable with using the multibyte variations
of ASC, CHR, MID, LEN and other familiar VBScript intrinsic functions
(e.g., the ones with a "W" or "B" at the end, like
"LenB", "AscW" etc.) since VBScript cannot work directly
with real binary data. Also, since you will be using XMLHTTP to transmit
binary data (which by the way, it does very well) you'll need to process
this data differently on the server in the receiving page.
The XMLHTTP send() method takes one parameter, which is the requestBody
to use. The acceptable VARIANT input types are BSTR, SAFEARRAY of UI1
(unsigned bytes, which is what we are going to transmit here), IDispatch
to an XML Document Object Model (DOM) object, and IStream *. The component
automatically sets the Content-Length header for all but IStream * input
types. You can read the Content-Length header in the receiving page, or
you can access the information directly from the ASP Request object, as
I'll show shortly.
After downloading and installing the XCeed
Streaming Compression Library , you are ready to start compression
/ decompression of XML (- or any document, actually) in memory. Please
bear in mind that the code I'm going to show you has been stripped down
to the most generic usage, designed only to get the uninitiated up and
running. The rest is up to you -- there are a lot of intricacies, timing
issues and IIS - type issues you 'll need to study. But my initial results
have been so promising, I wanted to distill them into this short article
as my way of "giving back" to the developer community.
We will need two pages here. First we'll
show XCeedSend.htm, the client side page. You'll see the code to
be able to paste any document into a textarea, press a compress / send
button, have it compressed and sent over the wire via XMLHttp, and you'll
see the original size, the compressed size, and the estimated compression
ratio for the particular case.
The second page, XCeedReceive.asp,
is the server-side listener page. This retrieves the compressed binary
data in the Request body, uses the XCeed library again to Uncompress it,
and streams it back uncompressed to the XMLHTTP.responseText property
for display in the original page as "Proof of Concept"
First, lets browse through the code for
the client-side page:
<HTML>
<HEAD><TITLE>XML Compresssion Test</TITLE>
<script language="VBScript">
Dim uncompsz
DIm compsz
Dim elapsed
dim starttime
Function cmdCompress( sTextToCompress)
starttimer
Dim xCompressor
Set xCompressor = CreateObject("Xceed.streamingcompression.1")
'xyz =xCompressor.License("License number is inserted here")
Dim I
Dim lTextLen
Dim lErrorNumber
With xCompressor
sTextToCompress = txtTextToCompress.value
uncompsz= len(sTextToCompress)
lblUncompressedsize.innerText =uncompsz
lTextLen = Len(sTextToCompress)
On Error Resume Next
m_vaCompressed = .Compress(sTextToCompress, True)
lErrorNumber = Err.Number
If lErrorNumber <> 0 Then
cmdCompress= "Error during compress." & vbCrLf & Err.Description
& " (" & Hex(Err.Number) & ")"
exit function
End If
On Error GoTo 0
If lErrorNumber = 0 Then
If IsEmpty(m_vaCompressed) Then
cmdCompress="no output"
exit function
End If
end if
End With
Set xCompressor = Nothing
compsz =lenB(m_vaCompressed)
lblCompressedSize.innerText=compsz
cmdCompress=m_vaCompressed
endTime
End function
Sub
DoCompress ()
sData = cmdCompress(txtTextToCompress.innerText)
txtTextToCompress.innerText =sData
compsz=len(sData)
lblCompressedSize.innerText =compsz
if
err <> 0 Then status.innerText = err.description
Dim xmlHttp
set xmlHttp = createObject("MSXML2.XMLHTTP")
xmlHttp.Open "POST", "http://localhost/xceedReceive.asp",
false
xmlHttp.Send sData
ratio.innerText = (uncompsz - compsz)/uncompsz & " Percent."
rText.innerHTML = "<XMP>" & xmlHttp.ResponseText &
"</XMP>"
set xmlhttp = Nothing
end sub
Function starttimer()
starttime = timer
End function
Sub endtime()
elapsed = timer - starttime
divelapsed.innerText=elapsed
end sub
</script>
</HEAD>
<BODY>
<CENTER><h3>XML COMPRESSION TEST</h3></CENTER>
<Textarea id=txtTextToCompress ROWS=20 COLS=100></textarea>
<BR><input type=button value ="compress and send" onclick
= "DoCompress()">
<BR>
Uncompressed:<div id=lblUncompressedSize></div><BR>
Compressed:<div id=lblCompressedSize></div><BR>
Compression Ratio: <div id=ratio></div><BR>
Elapsed Time:<div id=divelapsed></div><BR>
<div id=status></div>
<CENTER>Return Document after decompression at server:</CENTER>
<HR>
<div id=rText></div>
</BODY>
</HTML>
Ok,
let's trace what happens here. First, we render a page with a large textarea
and a button that wired to the "DoCompress()" method. We also
create a few DIV tags to hold Uncompressed and Compressed document sizes,
the ratio, elapsed time, and any status info we want to display. We also
have a final div "rText" to display the returned document after
it's been decompressed by the listener page on the server.
When
we paste an XML Document into the Textarea and press the button, "DoCompress"
runs "CmdCompress" with the value of the textarea as a parameter.
CmdCompress instantiates the XCeed library, does some other housekeeping,
and then calls the compress method on the document:
m_vaCompressed
= .Compress(sTextToCompress, True)
The length in bytes of the compressed document
("sData") is obtained, it's displayed to the user back in the
same textarea (not that it's going to be of much use to look at the browser's
rendition of a bytearray) and then we immediately SEND it to the server:
Dim
xmlHttp
set xmlHttp = createObject("MSXML2.XMLHTTP")
xmlHttp.Open "POST", "http://localhost/xceedReceive.asp",
false
xmlHttp.Send sData
Now
we switch gears and hop over to the server side to see what's happening:
<%
SData=Request.BinaryRead(Request.TotalBytes)
finaldata = cmdDecompress(Sdata)
Response.write finaldata
Private
Function cmdDecompress( stringToDecompress)
Dim xCompressor
Dim I
Dim sDecompressedText
Dim lErrorNumber
Set xCompressor = server.CreateObject("Xceed.streamingcompression.1")
'xyz =xCompressor.License("License number is inserted here")
With xCompressor
vaDecompressed = .Decompress(stringToDecompress, True)
lErrorNumber = Err.Number
If lErrorNumber <> 0 Then
cmdDecompress= "Error during compress." & vbCrLf & Err.Description
& " (" & Hex(Err.Number) & ")"
exit function
End If
On Error GoTo 0
If lErrorNumber = 0 Then
If Not IsEmpty(vaDecompressed) Then
cmdDecompress = vaDecompressed
End If
End If
End With
Set xCompressor = Nothing
End Function
Again, what happens is:
1. We get the length of the binary data
from Request.TotalBytes (we could also read the Content-length header
instead with Request.ServerVariables("HTTP_Content_Length")
).
2. We Read the binary data with Request.BinaryRead
3. we send the byte array to the cmdDecompress function, which does the
same thing with the Xceed component as on the client except it calls the
.Decompress method.
4. As a "proof of concept" we simply Response.write out the
finaldata (the decompressed document, which should be identical to the
one we originally pasted into the textarea) and send it right back to
the client page, which is still sitting there loaded.
Finally, back on the client page, we access
the XMLHTTP.responseText propery to get the data that was sent back to
us, and redisplay it in the rText div at the bottom of the page, with
"<XMP>" example rendering tags around it so you can see
the literal content.
You now have a greatly simplified, but nevertheless
quite functional basis for a very efficient and powerful XML Data compression
CODEC for over-the-wire data transmission. I'd be very interested in hearing
from other developers who have made inroads in this area. Email me or
post to our XML forum here on Eggheadcafe.com with whatever you have to
share.
NOTE: Since the original write of this article,
I've created a lightweight COM wrapper component for the powerful Zlib
C Library called PABZlib. This provides, among others, a "Combined"
method called CompressSendReceiveDecompress that handles the XMLHTTP or
ServerXMLHTTP send and receive early-bound, all within the same method.
We've ramped this component all the way up to 120 requests per second
using Application Center Test. The component is avaialable for sale or
a trial download HERE.
download the code that accompanies this article
Peter Bromberg is an independent consultant specializing in distributed .NET solutionsa Senior Programmer
/Analyst at in Orlando and a co-developer of the EggheadCafe.com
developer website. He can be reached at pbromberg@yahoo.com
|