|
If you have a discussion forum or other
database - enabled service on your site that accumulates significant amounts
of textual data, and you want to provide users with a powerful way to
search this content based on words or sentences, then the SQL Server Fulltext
query engine could be just the thing for you.
Now most developers are familiar with
Indexing Service (formerly "Index Server") and how it is used
to build a "catalog" of the files on a webserver or file server.
You may even be familiar with the ixsso.util and ixsso.Query
index query object s and how they can be used to build an ASP
- based search page.
It's surprising to find how many developers
are unaware of how Indexing Service can also index SQL Server data and
provide these search and ranking functions in the same way - through the
SQL Server fulltext index facility. In this short article, I'll review
how to create and populate a full-text catalog on a SQL Server table,
and illustrate some of the syntax from the Transact SQL extensions that
allow you to perform powerful full-text search and ranking functions on
your data.
SQL Server keeps any fulltext indexes
you create in external "catalogs" - the same type of catalog
that Indexing Service uses for your webserver or fileserver filesystem.
SQL just stores them in a different place. The regular Indexing Service
catalogs you are probably familiar with are maintained through the Indexing
Services snap-in:

The catalogs for SQL Server Full-Text indexes,
however, are managed through the SQL Server Enterprise Manager.
Each Full-Text catalog in Indexing Service
stores the index data for one or more SQL Server tables of a single database.
Unlike as with regular table indexes, SQL Server does not update these
indexes immediately as data is changed, added or deleted. Rather, they
are populated and updated according to a schedule that you can conveniently
set from within the SQL Server Enterprise manager, or even programmatically
with SQL DMO.
Now let's go ahead and set up a catalog
on a fictitious "Hotlinks" service table we have on our website.
This is a service where "hotlinks" of URLS that are in the news
or are of interest to our particular clientele are added on a daily basis
by category and which have a potentially large nText type "Description"
column in the table. We want our users to be able to search not only by
category or keyword but also in the fulltext of this description, including
the ability to have exclusionary searches, e.g., "Everything with
"maragarita" near the word "party" but NOT "hangover".
Now I don't know about you, but personally
my experience tells me the above search is not possible. However, if it
were, the stuff we'll do next would enable us to easily handle it!
To create a fulltext index on a SQL Server
table, expand a table subtree in Enterprise Manager, right-click the Full-Text
Index Table context menu item, and choose "Define Full-Text Indexing
on a Table":

Note that you must have at least one unique
index field in the table, preferably one with a small numeric integer
type such as the identity field. If you don't have such a field on the
table, now is a good time to add one. The Wizard will allow you to choose
the unique index field you want to use. Next, you'll be asked which table
column you want to do the full-text indexing on. Next, the Wizard will
suggest an existing Indexing Service catalog to store your new index in,
and gives you the option to create a new catalog :

As you can see above, here we have elected
to start a new catalog "Hotlinks" for our table, as we expect
to accumulate a large amount of data. Finally, the Wizard asks how you
want to update the catalog:

Here we have selected to update the full
catalog using an incremental population (faster) once every day at midnight,
since we have a lot of new data added each day. This completes the wizard,
and your catalog and / or population will be created. Finally (and don't
forget this step!) populate the index from the "Start full
Population" menu choice on the "Hotlinks" table context
menu under "Full Text Index Table" ,and we are now ready to
actually use our new full-text index.
Now that we're in the full-text search
business, let's take a look at some sample SQL code to perform some queries
on our full-text search index data, using the Transact SQL extensions
that are designed especially for this purpose:
Using
the CONTAINS and FREETEXT Functions
CONTAINS:
Is a predicate used to search Full-Text indexed columns containing character-based
data types for precise or fuzzy (less precise) matches to single words
and phrases, the proximity of words within a certain distance of one another,
or weighted matches. CONTAINS can search for:
- A word or phrase.
- The prefix of a word or phrase.
- A word near another word.
- A word inflectionally generated from another (for
example, the word "drive" is the inflectional stem of drives,
drove, driving, and driven).
- A word that has a higher designated weighting than
another word.
- Syntax
An exmple of using "CONTAINS"
SELECT
Link_Description, Link_URL
FROM HOTLINKS
WHERE CONTAINS(Link_Description, ' "ASP" OR "XML"
')
The
above SQL might be constructed dynamically from form-field choices that
a user made in searching out hotlinks archive for a hotlink whose link_description
column (that's our full-text indexed column) contained either the word
"ASP" or the word "XML".
CONTAINS
can also be used with prefix terms (webserv*), proximity terms ("ASP"
NEAR "DHTML"), generation (word form) terms, and weighted terms
[Link_Description, ISABOUT("ASP" weight (.8), DHTML weight(.4)]
FREETEXT:
Is a predicate used to search columns containing character-based data
types for values that match the meaning and not the exact wording of the
words in the search condition. When FREETEXT is used, the full-text query
engine internally "word-breaks" the freetext_string into a number
of search terms and assigns each term a weight and then finds the matches.
An example of using "FREETEXT" in a query
might be:
SELECT Link_URL,
Link_Description, Link_date FROM HOTLINKS WHERE FREETEXT(Link_Description,
@SearchWord)
The CONTAINSTABLE and FREETEXTTABLE
functions are used to specify full-text queries that return relevance
rankings for each row. These functions are very similar but used differently
from the full-text predicates, CONTAINS and FREETEXT.
Although both the full-text predicates described above
and the full-text rowset-valued functions are used for full-text queries,
and the Transact-SQL statement used to specify the full-text search condition
is the same in both the predicates and the functions, there are major
differences in the way that these are used:
CONTAINS and FREETEXT both return a TRUE or FALSE value,
so they are typically specified in the WHERE clause of a SELECT statement,
as in the examples shown here.
CONTAINSTABLE and FREETEXTTABLE both return a table of zero, one,
or more rows, so they must always be specified in the FROM clause.
CONTAINS and FREETEXT can only be used to specify selection
criteria, which Microsoft® SQL Server uses to determine the
membership of the result set.
CONTAINSTABLE and FREETEXTTABLE are also used to specify selection criteria.
The table returned has a column named KEY that contains full-text key
values. Each full-text registered table has a column whose values are
guaranteed to be unique. The values returned in the KEY column of CONTAINSTABLE
or FREETEXTTABLE are the unique values, from the full-text registered
table, of the rows that match the selection criteria specified in the
full-text search condition.
The table produced by CONTAINSTABLE and FREETEXTTABLE
also has a column named RANK, which contains values from 0 through 1000.
These values are used to rank the rows returned according to how well
they met the selection criteria. You have probably seen these ranks shown
in search pages on the web.
Queries that use the CONTAINSTABLE and FREETEXTTABLE
functions are more complex than those that use the CONTAINS and FREETEXT
predicates because qualifying rows returned by the functions must be explicitly
joined with the rows in the original SQL Server table.
This example returns the description and category name
of all Hotlinks for which the Link_Description column contains the phrase
"ASP HOSTING" near either the word "IIS" or the word
"NT" All rows with a Link_TOPIC name "XML" are disregarded.
Only rows with a rank value of 2 or higher are returned.
SELECT
FT_TBL.Description,
FT_TBL.CategoryName,
KEY_TBL.RANK
FROM HOTLNKS AS FT_TBL INNER JOIN
CONTAINSTABLE (HOTLNKS, Link_Description,
'("ASP HOSTING" NEAR IIS) OR
("ASP HOSTING" NEAR NT)'
) AS KEY_TBL
ON FT_TBL.Hotlink_ID = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK > 2
AND FT_TBL.Link_Topic<> 'XML'
ORDER BY KEY_TBL.RANK DESC
Note the use of square brackets
around the KEY column name in the JOIN statment above. This is because
KEY is a SQL Server keyword, and we must enclose these in brackets if
we use them for a column or table name. Also note that phrases ("ASP
HOSTING") are enclosed in double quotes.
There are a number of other features that can be used
by CONTAINSTABLE to make your searches more powerful. An hour or so spent
studying SQL Server Books Online will be well worth the effort.
You can avoid the complexity of using CONTAINSTABLE and
FREETEXTTABLE by writing stored procedures that accept a few facts about
the query and then create and execute the appropriate query. A simplified
procedure that submits a FREETEXTTABLE query follows. The table shows
the procedure parameters (all input).
|
procedure parameters
|
| Parameter |
Required |
Description |
| @additional_predicates |
Optional |
If there are any, these get added with AND after
the FREETEXT predicate. KEY_TBL.RANK can be used within expressions. |
| @freetext_column |
Yes |
|
| @freetext_search |
Yes |
Search condition. |
| @from_table |
Yes |
|
| @order_by_list |
Optional |
KEY_TBL.RANK can be one of the columns specified.
|
| @select_list |
Yes |
KEY_TBL.RANK can be one of the columns specified. |
The code for the procedure is:
CREATE
PROCEDURE freetext_rank_proc
@select_list nvarchar(1000),
@from_table nvarchar(517),
@freetext_column sysname,
@freetext_search nvarchar(1000),
@additional_predicates nvarchar(500) = '',
@order_by_list nvarchar(500) = ''
AS
BEGIN
DECLARE @table_id integer,
@unique_key_col_name sysname,
@add_pred_var nvarchar(510),
@order_by_var nvarchar(510)
--
Get the name of the unique key column for this table.
SET @table_id = Object_Id(@from_table)
SET @unique_key_col_name =
Col_Name( @table_id,
ObjectProperty(@table_id, 'TableFullTextKeyColumn') )
--
If there is an additional_predicate, put AND() around it.
IF @additional_predicates <> ''
SET @add_pred_var = 'AND (' + @additional_predicates + ')'
ELSE
SET @add_pred_var = ''
--
Insert ORDER BY, if needed.
IF @order_by_list <> ''
SET @order_by_var = 'ORDER BY ' + @order_by_list
ELSE
SET @order_by_var = ''
--
Execute the SELECT statement.
EXECUTE ( 'SELECT '
+ @select_list
+ ' FROM '
+ @from_table
+ ' AS FT_TBL, FreetextTable('
+ @from_table
+ ','
+ @freetext_column
+ ','''
+ @freetext_search
+ ''') AS KEY_TBL '
+ 'WHERE FT_TBL.'
+ @unique_key_col_name
+ ' = KEY_TBL.[KEY] '
+ @add_pred_var
+ ' '
+ @order_by_var
)
END
This procedure can be used to submit the query:
EXECUTE
freetext_rank_proc
'Description, KEY_TBL.RANK', -- Select list
'Categories', -- From
'Description', -- Column
'How can I get free ASP web hosting?', -- Freetext search
'KEY_TBL.RANK >= 10', -- Additional predicate
'KEY_TBL.RANK DESC' -- Order by
There is much more that you can do with Full-Text; space
simply does not permit a more complete treatment here. SQL Server Books
Online has much more information. There are also several good books
that deal with the subject, most notably "Professional
SQL Server 2000 Programming" by Robert Vieira (WROX).
One last consideration: Full-Text runs as a separate
service on your SQL Server box. It takes resources and memory. When you
implement full-text, you should consider the load that the population
process will place on your machine, and try to balance that against how
quickly you will need changes to be reflected in your search results.
At the least, you should try to schedule population for non-peak hours,
and also it would be wise to increase the size of your virtual memory
pagefile to as large as three times your RAM.
With this ammunition you have just about everything you
need to start building a robust full-text search facility that scales
to match the needs of your database structure, your business, and your
customers.
Peter Bromberg is an independent consultant specializing in distributed .NET solutionsa Senior Programmer / Analyst at
in Orlando and a co-developer of the EggheadCafe.com
developer website. He can be reached at pbromberg@yahoo.com
|