| Recently
while architecting and implementing a set of classes to handle Form Post
to Xml conversions, I discovered that you can do a lot of very cool stuff
with
datasets
in memory, especially when you have a repetitive process such as a complicated
transform or translation that needs to operate on a large number of sets of
data in some
sort of an iterative loop. If you have a lengthy, repetitive operation that
needs to be able to check certain items in other data
stores
for certain
metadata
or rules
that tell
it how to process each particular set of data, an in-memory cached DataSet
can be more performant. I had already experimented extensively with things
like getting a DataSet for
one WebForm
page,
displaying it
in a DataGrid, caching the Dataset, and having template columns in the
grid that consisted of hyperlinks which would take you to a second edit
page that simply got the same DataSet out of cache, set a RowFilter on
the DataView, and allowed you to edit the specific record you had clicked
on without having to make a second trip to the database.
In this particular case it was an issue of automated form posts of corporate
leads that could be coming from a number of different sources all over
the
world, and
some
pretty
sophisticated rules processing needed to be implemented. Some posts might,
for example, come from Japan where data is typically encoded in UTF-7 ( 7-bit
encoding), a charset which is not at all fun to try and decode into US-ASCII
and consequently would require a specific set of rules. Or perhaps a particular
post is already
in XML format and what you are receiving is a form post with a single name/value
pair, the value being a complete string representation of an XML document that
mostly needed to be validated and mapped either through a set of rules containing
XPATH mapping queries or with an XSL Transform, or both.
In addition, each form post might internally contain some metadata that
needed to be extracted, which would dictate that in its particular case,
a special
set of
rules must be used instead of the default rule set for that type of document.
Hundreds, even thousands of these leads could come in on a daily basis and
you need a process which can efficiently perform the conversions on chunks
of say 100 or 200 records at a time, process them, do the conversion of each
one into a "master" XML format that is mapped based on sets of rules
to be consistent regardless of the type of post that was received, and then
save each mapped
record, write
some
metadata
and status information to some tables and go to sleep until the process is
woken
up and
told to get
the next chunk of records to process. Consequently, a large set of processing
rules containing Regex match expressions and / or XPath Queries needed to be
stored in a special "rules" table, and I needed to be able to access
and manipulate a specific subset of those rules "on the fly" based
on the metadata for each record in the processing queue.
Avoid Repetitive "Chatty" database calls
When you have a process like this you want to try and avoid going back
to the database every time you need to look up some metadata about how
to process
a particular record. If the process is an executable that's fired off by some
scheduler component such as SQL Agent or Task Manager, or if it's a Service
that has a timer or other controller process, you want to be able to get all
the data you need to perform your entire operation in one shot, cache it somewhere
reliable, and then you can do your processing very fast without the delays
of going back
to the
data
store
because you already have assembled all the information you need in-memory.
Using the AppDomain Cache
For an executable, you do not have the ASP.NET Cache, Session or Application
classes to work with. Instead, I use the AppDomain cache for this type of operation.
The advantage of this cache, which is little - known to most .NET developers,
is that any of your related components or classes in any satellite DLL to your
process all have access to this same Cache since they are all running under
the same App Domain.
You can store or retrieve virtually any object in AppDomain with the following
sample code:
System.AppDomain.CurrentDomain.SetData("mystuff",
myType)
myType =(myType) System.AppDomain.CurrentDomain.GetData("mystuff")
Note that when you get it back out, you may need to
do an explicit cast. VB.NET developers should consider using DirectCast,
not "CType". There is just one caveat, which can be kind of embarrassing
if you have to discover it yourself, as I did over two years ago -- data
in
the App Domain cache is stored by
reference.
So for example, if you pull an XmlDocument instance out of your "mystuff" cache
item, and
you
start filling in elements
with
data, and then you pull out another instance from the cached item (you think!)
to start filling in data for another XmlDocument, guess what? Your second
XmlDocument
is still
going to have all the data you put into the first one! What we need to do
in this situation is use the Clone method to rapidly create an exact copy
of what we got out of the cache item. In that case we will be working with
a completely cache-disconnected new instance.
And if you're concerned about speed, don't worry. This cache is humongously
faster than making a SQL call to your database to keep fetching the same or
similar data. You can use the various DataView Sort and Filter operations to
efficiently query your DataSet from the cache much faster than you could get
the specific data out of your database, especially in a load environment. We'll
look at some of these next.
Speed Counts - Meet the DataView: Your In-Memory Stored Procedure
The key to effective use of cached data in DataSets is
the DataVIew and its related classes, DataTable, DataRow, and DataRelation
which allow you to do a surprisingly sophisticated
amount of SQL query - like filtering and sorting on in-memory data.
. A DataSet is a collection of DataTables and metadata. A DataTable
is a collection of DataRows and other data. A DataRow is a collection of
fields and other data. None of these have built-in capabilities for filtering
or sorting the data they contain - that's the domain of the DataView and
DataViewManager.
Customized Data View Filters
The DataView class is simply a view of a DataTable.
At any point you can have different views of the same underlying
data, and you can manage each view as an independent object with
its own set of properties, methods, and events.The ADO.NET DataView object
can be handled as
a distinct object. DataView objects retain a reference to the underlying table
and allow for update operations. A DataView is easy to create:
DataView dv;
dv = new DataView(theDataSet.Tables["Employees"]);
The DefaultView property of a DataTable returns a DataView object
on that table. The DefaultView object is cached internally
, so any changes you make are retrieved later:
DataView dv = dt.DefaultView;
With your DataView, you can use two tools, the RowFilter, whiche lets you
specify the criteria that rows have to match in order to be visible through
the view, and the Sort, which takes an expression and sorts the rows
by that expression. You can use the Find method along with a Sort based on
the sorted column, and you can employ any combination of the two methods.
RowFilter is a read/write property that gets and sets the expression used
to filter the table.
You assign it an expression that can be formed by any valid combination of
column names, logical and numeric operators, and literals. Here are a few examples:
dv.RowFilter = "Country = 'USA'";
dv.RowFilter = "EmployeeID >5 AND Birthdate < #1/1/68#"
dv.RowFilter = "LastName LIKE 'Dard*'"
NOTE: You can also use the Select Method on a DataTable. The Select method
is very fast, even though the DataTable is not indexed, because the records
are
all
in-memory,
so there
is
no disk
I/O
overhead
as
with a database.
However, though ADO .NET does not expose indexes, it does create and use them
both for Primary Keys and for DataViews. You can take advantage of the indexes
created for Primary Keys or DataViews as follows:
If you are searching on a Primary Key field, or if you can make the field(s)
you are searching on a Primary Key, then instead of DataTable.Select(), you
can use DataTable.Rows.Find(). Find() will look up a primary key value using
the built index. Here is some code that illustrates the DataTable Select method:
[VB.NET]
Dim customerTable As DataTable
customerTable = myDataSet.Tables[0]
' Add a new ID column
customerTable.Columns.Add( "id", GetType(Integer) )
' Set PrimaryKey
customerTable.Columns("id").Unique = true
customerTable.PrimaryKey = new DataColumn() { customerTable.Columns("id")
}
customerTable.AcceptChanges()
Dim strExpr As String
Dim strSort As String
strExpr = "id > 5"
' Sort descending by CompanyName column.
strSort = "name DESC"
' Use the Select method to find all rows matching the filter.
Dim foundRows As DataRow() = _
customerTable.Select( strExpr, strSort, DataViewRowState.CurrentRows ) Your DataTable also sports a Rows.Find method:
[C#]
private void FindInPrimaryKeyColumn(DataTable myTable, long n){
// Find the number n in the primary key column of the table.
DataRow foundRow = myTable.Rows.Find(n);
// Print the value of column 1 of the found row.
if(foundRow != null)
Console.WriteLine(foundRow[1]);
}
If you are doing repeated searches on a column or set of columns,
you can create a DataView sorted on those columns, and then use DataView.FindRows() to
search for a value within the columns. DataViews create an index for the sorted
columns, and FindRows() uses this. Also, when using a DataView, it is
more efficient to use the constructor that accepts the table, filter, sort,
and rowversions together. Each time you set the filter, sort, or rowversion
properties your index is recreated, so creating a DataView and setting these
properties individually can actually create the index several times. Here is
an example:
DataView dv =
new DataView(ds.Tables["Suppliers"], "Country = 'US'", "CompanyName",
DataViewRowState.CurrentRows);
dv.AllowEdit = true;
dv.AllowNew = true;
dv.AllowDelete = true;
More on DataViews
With a filter string you use AND,
OR, NOT to tie together smaller expressions. You can also use parentheses
to group clauses and force precedence.
Any clause in a filter contains a column name that is normally compared
against a literal, a number, a date, or another column name. You use
both relational
and arithmetic operators such as >=, <, >, +, *, %.
If the rows to select don't follow a clear pattern that can be easily
rendered through arithmetic or logical operators, you can use the IN
operator, just
as you would with Transact SQL:
dv.RowFilter = "employeeID IN (12,144,54)"
Wildcard characters, * and %, are supported and are used with
the LIKE operator. Wildcards are allowed only at the beginning and/or
end of a filter
string.
Strings must be enclosed within single quotes, and dates require
pound symbols (#). Numeric values can utilize decimals and scientific
notation.
The RowFilter also supports aggregates such as SUM, COUNT, MIN, MAX,
and AVG. If a table has no rows, the aggregate functions return a
null reference.
Len() returns the length of a specified
expression, which can be a column name
or any other valid expression. Substring() returns a sub-string of a
specified length, starting at a specified point in the given expression. IIF()
resolves to one of two values depending on the result of the given logical
expression, and works just like the Visual Basic
IIF(expression, value_if_true, value_if_false)
Using combinations of the above tools you can perform very sophisticated queries
on your in-memory data without ever having to make a call over the wire to
your database.
Sort your way to the Data
A DataView exposes a Sort property that you can use to sort the content of
the view. Sort takes a comma-separated list of column expressions and sorts
the view by that list. By adding a trailing ASC or DESC qualifier to any of
the column expressions, you can sort that field in the specified ascending
or descending order. If no direction qualifier is specified, then the sorting
algorithm defaults
to ASC. What's also useful with a Sort is that you have a Find method in the
DataView:
private void FindValueInDataView (DataTable tbl)
{
DataView dv = New DataView (tbl);
dv.Sort = "LastName";
// Find the customer named "Smith" in the LastName column
int i = dv.Find ("Smith");
Response.Write (dv[i]);
}
DataView also exposes the Rowstate Filter, which Filters a DataTable based
on the State of the rows:.
| CurrentRows |
Current
rows including unchanged, new, and modified rows. |
| Deleted |
All
rows that have been deleted since last AcceptChanges call. |
| ModifiedCurrent |
All
rows that have been modified since last AcceptChanges call. |
| ModifiedOriginal |
The original version of rows that have been modified
since last AcceptChanges cal. |
| New |
All new rows that have been added since last AcceptChanges
call. |
| OriginalRows |
Returns the original set of rows. This includes
unchanged and deleted rows. |
| Unchanged |
All rows not affected by changes. |
|
|
Any Changes you make become effective
after you call AcceptChanges on the DataTable. Changes
on a single row can be made effective by calling the DataRow's AcceptChanges.
Changes can be canceled by calling RejectChanges either on the
DataTable class (for all the rows) or on a single DataRow object.
The DataViewManager is the container for
all view settings for each table in a DataSet. The key property of a DataViewManager
is DataViewSettings, which is a collection of DataViewSetting objects:
Dim dvs As DataViewSetting
dvs = dvm.DataViewSettings("Customers")
dvs.Sort = "Company"
DataRelations and Constraints enforce
referential integrity "on the fly"
Finally, you can enforce Primary Key and other
constraints between the tables in your DataSets dynamically once your
data has been assembled into the in-memory DataSet. For example:
UniqueConstraint pkey;
pkey=new UniqueConstraint("PK_Emplist", ds.Tables["empList"].Columns["EmpID"],true);
ds.Tables["empList"].Constraints.Add(pkey);
With code similar to the above, plus some Rule
properties adjustments, we can enforce cascading deletes, inserts and
updates just as we would in Sql Server. Being able to define a ForeignKeyConstraint
on an in-memory DataSet can greatly amplify our programming power.
DataRelation objects are the way we can bind the columns of tables
together in order to make the business logic work the same way as it
does in the database they came from.
Cache as Cache Can
If you are processing large amounts of cached data in an iterative
manner, these tools can speed up your processing potentially a hundredfold
and keep network traffic to your tired SQL Server to a mimimum. You can
iterate through an in-memory table of data row-by-row, use a value or
values in a row to set a filter or sort query on another DataTable in
memory, then use the data in that query row to query data out of a third
table that contains, for example, rules on how to process the data in
the first table's row. You can then update fields in the row in the first
table, and continue processing through all the rows, all the while never
needing once to hit your database. Even if you are running a Console Application
or Winforms app, you can retrieve the data for all these tables into
a DataSet that you store in the AppDomain Cache one time, and never need
to return to the database for this data so long as the app is running.
When you are finished, you can even use a DataAdapter and its properly
configured Update command to send all the updates back to your SQL Server
in a single call. By using this technique, I was able to do some heavy
duty in-memory processing one a set of 233 records in about 9 seconds,
an operation which would have easily taken 15 to 30 seconds or more,and
used up considerable network traffic bloat had I chosen to repetitively
query the database for my data each time I processed
a new row. Proper use of caching mechanisms like these in .NET can result
in up to a 1000% improvement in processing efficiency over legacy COM
- based systems that don't employ caching.
| Peter Bromberg is a C# MVP, MCP, and .NET consultant who has worked in the banking and financial industry for 20 years. He has architected and developed web - based corporate distributed application solutions since 1995, and focuses exclusively on the .NET Platform. Pete's samples at GotDotNet.com have been downloaded over 41,000 times. You can read Peter's UnBlog Here. --><-- NOTE: Post QUESTIONS on FORUMS! |  | Do you have a question or comment about this article? Have a programming problem you need to solve? Post it at eggheadcafe.com forums and receive immediate email notification of responses.
|