HTML::Index Home Page

Last Modified Thursday, 05-Sep-2002 13:05:52 UTC



HTML::Index is a set of modules for creating an index of HTML documents so that they can be subsequently searched by keywords, or by Boolean combinations of keywords. It was originally inspired by script in the O'Reilly "CGI Programming with Perl, 2nd Edition" book (

All storage operations are contained in the HTML::Index::Store module that can be subclassed to support other storage options (such as BerkeleyDB files, or SQL databases). One such subclass (HTML::Index::Store::BerkeleyBD) is included in the distribution.

The modules can be used to index any HTML documents - whether stored as files, or in a database. They support the use of stopword lists, soundex searches, compression of the inverted indexes using Compress::Zlib, and re-indexing of documents that have changed. A CGI search interface, which can be customized using on HTML::Template templates, is also provided. Search queries can be expressed as compound Boolean expressions, composed of keywords, parentheses, and logical operators (OR, AND, NOT).


SourceForge Stuff

Last Modified Tuesday, 14-Jan-2003 07:00:05 UTC

Tracker Tracker

 - Bugs ( 0 open / 0 total )
Bug Tracking System

 - Support Requests ( 0 open / 0 total )
Tech Support Tracking System

 - Patches ( 0 open / 0 total )
Patch Tracking System

 - Feature Requests ( 0 open / 0 total )
Feature Request Tracking System

Forums Forums ( 2 messages in 2 forums )
Docs Doc Manager
Mail Lists Mailing Lists ( 1 mailing lists )
Tasks Task Manager
There are no public subprojects available
CVS CVS Tree ( 1 commits, 1 adds )
FTP Released Files Logo