The Inktomi Search Toolkit has been announced as an innovative OEM solution that "delivers the advanced XML-based retrieval capabilities for finding structured, unstructured, and semi-structured content within enterprise applications to improve application usability and increase end-user productivity. By indexing documents in native XML format and preserving the hierarchy of the data, the Search Toolkit allows you to return the reference to the documents, the actual XML documents or any fragments of the documents." The toolkit "has been built from the ground up to utilize XML as the content mark up language to provide a standards-based query language (W3C XQuery) for retrieval of structured information. In addition, it provides a comprehensive suite of keyword search capabilities. It is available as a multi-threaded server product. For easy integration with the parent application, a Java API is provided for the product, as well as an open, socket-based interface using an XML-based and HTTP-based protocol. The internals of the Search Toolkit were designed to support retrieval across both unstructured content, as well as structured content marked up with XML."
Inktomi Search Toolkit features (from the Overview document):
Superior retrieval and storage for XML: "Built for native XML support from the ground up, the Inktomi Search Toolkit provides leading edge XML storage and retrieval capabilities. By indexing documents in native XML format, and preserving the hierarchy of the data, the Search Toolkit allows you to return the reference to the documents, the actual XML documents or any fragments of the documents. This eliminates the need for a separate database to store your XML content, simplifying your software design and administration, and reducing total system cost."
Search and Query Features: Keyword and natural language search; Boolean operators; Phrase search; Fielded search; Parametric search; Word stemming, word breaking; Metadata search; Proximity search; Wildcard search; XPath and XQuery, powerful standards-based XML query language; Communication and data API: Java, XML socket-based, or HTTP.
Indexing Features: Full-text index of unstructured content; Integrated content and metadata; Field weights adjustable by schema without re-indexing documents; Full hierarchical XML index; XML repository; 100% schema and DTD independent; no need to pre-define DTDs; Real-time index updates; Content types supported include XML, HTML, over 225 document formats -- Microsoft Office, PDF, etc.
Native XML with unstructured search: "Native XML complements unstructured search. The Inktomi Search Toolkit is designed to help companies meet current customer demands for unstructured search ('keyword search') capabilities and move with them into the next wave of technology using XML. For example, technical documentation is typically prepared to a specific format. Compliance documents are another example of documents with a fairly rigid structure. Native XML support really gives your application an advantage when you take into account the types of structured information that are associated with most business documents..." [FAQ document]
Principal references:
- Announcement 2002-05-14: Inktomi Unveils Next-Generation Information Retrieval Technology to Provide Advanced Search Functionality within Enterprise Applications. New XML-Based, OEM Offering Combines Keyword and Parametric Search Capabilities to Improve Application Usability and Increase End-User Productivity."
- Inktomi Search Toolkit website
- Toolkit Feature Description
- "Search Toolkit: The Complete OEM Retrieval Solution for Enterprise Applications." Datasheet.
- FAQ document
- W3C XML Query
- "XML and Query Languages" - Main reference page.