The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
SEARCH | ABOUT | INDEX | NEWS | CORE STANDARDS | TECHNOLOGY REPORTS | EVENTS | LIBRARY
SEARCH
Advanced Search
ABOUT
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

NEWS
Cover Stories
Articles & Papers
Press Releases

CORE STANDARDS
XML
SGML
Schemas
XSL/XSLT/XPath
XLink
XML Query
CSS
SVG

TECHNOLOGY REPORTS
XML Applications
General Apps
Government Apps
Academic Apps

EVENTS
LIBRARY
Introductions
FAQs
Bibliography
Technology and Society
Semantics
Tech Topics
Software
Related Standards
Historic
Last modified: September 22, 1999
EDGARspace Portal

On the EDGARspace portal: [excerpts] "Invisible Worlds is developing a new protocol and class of distributed Internet servers, aimed at making information about information, or meta-information, easy to use and share. A basic form of meta-information is structure identifying the sections of a document or collection. In EDGAR, this includes the various kinds of filings as well as specific items such as balance sheets and beneficial ownership data. Looking further into the future, meta-information can also include data such as reviewers' opinions, statistics, comments, ratings, and all sorts of highly specific information about information. . . [the Blocks Architecture:] At the core of this system is a new protocol, the Blocks Protocol. The SpaceServer speaks this protocol to communicate with other SpaceServers and to communicate with two other kinds of software components we have developed, Builders and Mixers. Mixers are tools that find meta-information from a variety of sources on the Internet. They use the Blocks protocol to send the data into SpaceServer engines. In a broad sense, Mixers are text- or data-mining tools. The SpaceServer engine is a general-purpose server that receives, stores and shares meta-information. Our primary design considerations have been speed and global scalability. It includes a full-text search engine and incorporates other data storage systems such as databases and text search engines. Builders use the Blocks protocol to retrieve meta-information from SpaceServer engines and to prepare it for display by a Web browser or other tool you might be using to search, view and analyze information. . . When users send queries to the SpaceServer engine, Builders follow three steps to bring the results back to the desktop: (1) A set of retrieve operations specify which data are to be taken out of the data store. The retrieve operation is able to pull data out using any XML tag or attribute as a search criterion. (2) The results from the retrieve operation are fed into the evaluate step, where a TCL script (or a script in another language) looks for relationships among the data. (3) Last is the publish step, where the data are formatted for the user interface, which might be an HTML browser, a JavaScript or Java array that feeds an application, or spreadsheet. Just as Builders use a 3-step process, so do Mixers. Mixers skulk a variety of sources on the Internet, from real-time feeds to web sites to unstructured deep wells like the EDGAR database. As part of the skulking process, nuggets of meta-data are extracted and transformed into valid XML, then stored into one or many different SpaceServer engines. . . We believe this architecture of Mixers, SpaceServer engines, and Builders is suited to a wide variety of applications on the Internet. But we'd rather show real results than just tell a good story. EDGAR is an ideal database in many respects. It is very big and increasing by some 30 gigabytes per year. We think our software screams, but there is no better proof than trying it out on several hundred gigabytes of data and thousands of users.

[Rocket Science - "Mappa.Mundi's first installment of Rocket Science explores how Invisible Worlds' EDGARspace takes a step above the SEC's EDGAR database."] space.cgi Interface: "Space.cgi is a web proxy which makes the services of the SpaceServer engine available to traditional web browsers. The EDGARspace portal is built using this interface. Underneath the space.cgi interface is a rich architecture of protocols, servers, and other modules."

"The core concept to understand is the retrieve, evaluate, publish paradigm. All calls to space.cgi exercise this paradigm: data are retrieved from the SpaceServer. The metadata are fed into an evaluate script (or a series of scripts) to look for relationships among the data, then the results of the evaluate stage are fed into the publish stage for formatting."

"The SpaceServer and the underlying SpaceEngine (the data store) use XML as a way of structuring data. While our interfaces have selected several elements to query on, you can specify any of the elements and attributes in the Document Type Definition (DTD) that was used to check data into the SpaceServer. Currently, our SpaceServer is aware of two kinds of data: (1) Internet RFCs (doc.rfc) (2) SEC EDGAR Documents (doc.edgar)"

References:

  • Mappa.Mundi Home Page

  • XML DTDs - The DTDs define the structure of data stored in the SpaceServer. By looking at these DTDs, you can see what types of queries are possible.

  • The EDGAR.Space DTD [local archive copy]

  • The RFC.Space DTD [local archive copy]

  • [September 16, 1999] "Internet Pioneers Build a Better EDGAR Using XML. New Web Service Delivers Dramatically Improved Searches of SEC Filings With the First XML-based Financial Information System." - "Invisible Worlds, a San Francisco-based startup company headed by a team of Internet veterans responsible for many of the most significant innovations and standards behind the Internet, today unveiled the EDGARspace portal, a new Web service that delivers dramatically refined searches of the U.S. Securities and Exchange Commission's (SEC) EDGAR filings. 'The EDGARspace service is the first glimpse into the way XML-based Internet information systems will work in the future,' said Invisible Worlds' CEO and Chairman Carl Malamud, who first put EDGAR on the Internet five years ago. 'For the first time, you can reach inside EDGAR filings for gems of knowledge that were previously buried in text and also rise above the immense collection of documents to make sense of broad searches.' The EDGARspace portal, one of the largest XML-based (Extensible Markup Language) information systems ever developed, demonstrates the potential of this new Web standard by giving investment, financial and research professionals better ways to search for and find information that had been difficult to obtain. . . The SEC filings are enhanced using industry-standard XML to tag key information. For example, a search can target all insider-trading reports within an investor's portfolio, or look for Initial Public Offerings within a particular industry segment. EDGARspace organizes search results with XML 'meta-information,' making large result sets more manageable. Complex search results can be organized by any type of meta-information, such as filing type, document section, industry code, date or company name."

  • "The Importance of Being EDGAR. Who We Are, What We Do." By Carl Malamud (CEO, Invisible Worlds). "A general introduction to SpaceServers and other components of the Blocks Architecture."

  • The union operator - An advanced form of retrieval, where you specify the nature of your query using XML. With The Union Operator Profile DTD.

  • Examples

  • Rocket Scientist

  • EDGARspace portal - example of an application that uses space.cgi, the web proxy interface to the SpaceServer.

  • Danny Goodman SpaceKit - example of an application that uses space.cgi, the web proxy interface to the SpaceServer.


Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation

Primeton

XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Globe Image

Document URI: http://xml.coverpages.org/edgarspace.html  —  Legal stuff
Robin Cover, Editor: robin@oasis-open.org