MarkMail Indexes java.net Mailing List Archives
MarkMail Announces Search Support for java.net Mailing List Archives
San Carlos, California, USA. July 22, 2008.
In a July 22, 2008 posting to "The Making of MarkMail" (MarkMail Announce Mailing List), Jason Hunter reported on a cooperative effort with Sun Microsystems and CollabNet to index the mail archives for java.net, which generates some "15,000 human-to-human emails every month..."
MarkMail is a free service for searching mailing list archives, with huge advantages over traditional search engines. It is powered by MarkLogic Server: Each email is stored internally as an XML document, and accessed using XQuery. All searches, faceted navigation, analytic calculations, and HTML page renderings are performed by a small MarkLogic Server cluster running against millions of messages. The Project List identifies mailing lists integrated into the Markmail index/search operations. As of 2008-07-22, the list included 916 projects: "each project has its own subdomain, which implicitly limits all searches to the emails of that specific project. That can be useful for bookmarking or linking..." These include the XSL list, World Wide Web Consortium (W3C), WSO2, Apache, Google Groups, MySQL, Eclipse, Python, Mozilla, OpenOffice, OSA Foundation, Gnome, KDE, Jabber...
From the posting:
Last week, in collaboration with Sun and CollabNet, we loaded the mail archive histories for java.net, Sun's open source developer playground for Java projects and home to projects like GlassFish, jMaki, AppFuse, Grizzly, Hudson, and WebWork.
The load includes more than 1,000 mailing lists and roughly 1,000,000 messages. Their growth curve is fantastic (the last month is partial):
Just about half the java.net mails are auto-generated as a result of checkins or bugs. If we remove those, the curve is still beautiful. Looks like people are writing more than 15,000 human-to-human emails every month on java.net.
With such a large community, it's fun to look at community-wide analytics. It's a little-known feature that you can go to our browse page and add an arbitrary query to the URL and it'll show you list-by-list numbers for all messages matching that query. For example, you can view the total number of messages per list throughout time, or the counts for just last week.
You can browse the lists where people from "sun.com" have written the most. If you want to see the top lists, do it as a regular search.
Posted by Jason Hunter to 'The Making of MarkMail'
Subject: Here Comes the Sun
Date: July 22, 2008
MarkMail Announce Mailing List
Excerpt from the MarkMail Press Release:
MarkMail Enables Developers to Tap into Java Brain Trust by Incorporating java.net Mail Archives
Community-Focused Email Archive Website Exceeds 16 Million Messages
San Carlos, CA, USA. July 22, 2008.
Mark Logic Corporation, provider of the industry's leading XML content platform, today announced that MarkMail, a free service for searching mailing list archives available at http://markmail.org/ has added group messages from the highly influential java.net community. In cooperation with CollabNet, the leading provider of solutions for distributed software development and the system that powers http://java.net more than 975,000 emails were loaded into the MarkMail system. The java.net community members now have the ability to seamlessly query across the structured and unstructured parts of email, including attachments, and unlock the tremendous value embedded in these messages.
Founded in 2003, java.net is the realization of a vision of a diverse group of engineers, researchers, technologists and evangelists at Sun Microsystems, Inc. to provide a common area for interesting conversations and innovative development projects related to Java technology. With nearly 450,000 total members, java.net encompasses a wealth of community-based knowledge in its ongoing email discussions, which most recently totaled 40,000 emails per month as shown by the following MarkMail histogram: [see above for Java chart, where the last bar represents messages as of July 22, 2008]
"Wow — the features in MarkMail are cool and you quickly get the feeling it is just the tip of the iceberg," said Marla Parker, community manager for java.net, in a recent blog post. "To assist users, we plan to create a page on java.net with a bunch of sample queries and instructions on how new active mailing lists on java.net can get added to the MarkMail archive."
MarkMail was launched in November 2007 with four million messages from the Apache Software Foundation. Since the site's unveiling, MarkMail has garnered rave reviews from both the community and industry watchers, and continues to add content relevant to a wide breadth of software developers. Today, MarkMail hosts more than 16 million messages across 3,800 mailing lists, including umbrella communities such as Apache, Codehaus, Mozilla, and the World Wide Web Consortium, technology-specific communities such as Eclipse, GNOME, KDE, MySQL, NetcoolUsers, Perl, Python and Xen, and a broad spectrum of other projects.
"I've been working with Java for over a decade now and it's really gratifying both personally and professionally to be able to bring MarkMail's advanced search and discovery capabilities to the java.net community — to projects such as GlassFish, jMaki, AppFuse, Grizzly, Hudson and WebWork," said Jason Hunter, principal technologist for MarkMail at Mark Logic Corporation. "Having the java.net list archives available in MarkMail will help speed project development, make it easier for users to find answers, and make it possible to track what's happening across thousands of lists for the first time."
The free MarkMail service provides sophisticated search functionality with a powerful faceted navigation interface. Combined with real-time analytics, the system delivers a new, state-of-the-art experience for interacting with large-scale message archives, such as those used by open source projects. MarkMail's powerful functionality and fluid user experience were built utilizing the unique capabilities of MarkLogic Server for handling semi-structured content. With MarkMail, users can seamlessly query across the structured and unstructured parts of email, including attachments, unlocking the value trapped inside millions of emails. By observing structure in the seemingly free-form content of the message body and automatically weighting query terms appropriately, MarkMail delivers results far superior to simple full-text search. In addition, MarkMail presents analytic information based on header information and other message metadata, enabling user drill down and query refinement. The results are a single interactive experience that lets users rapidly focus in on the answers they are seeking. For more information, visit http://markmail.org or visit the MarkMail blog at http://markmail.blogspot.com.
About the MarkLogic Server
MarkLogic Server is an XML content platform that provides the agility you need to build and deploy next-generation content applications. To help you unlock the value of your content, it includes the capabilities of a traditional database management system for storing content, a search engine for accessing it, and a dynamic content server to deliver it. As a platform built for XML, it empowers your organization with the agility you need to quickly adapt to changing market conditions and new product requirements.
Whether you are looking to build new applications for your organization or embed XML content capabilities into your existing products, MarkLogic Server provides the single infrastructure necessary to build and deploy applications. It includes an XML repository, full-text and XML search capabilities, XQuery engine and a web server, giving you everything you need to meet your XML content delivery needs. This means that you can more quickly develop applications, and those applications run more efficiently and effectively because they are on a single platform, saving you time and money.
MarkLogic Server supports loading content 'as-is', which means you can avoid making costly, time consuming and sometimes impossible transformations with your content. Instead, you can easily combine XML content, documents, books, messages, user generated content and more into a single centralized repository . You can also get started more quickly when creating applications, because you don't need to try to convert your content to its final form before you start experimenting with new applications and new business models...
Greater understanding of content/use patterns: — how users are working with it through content analytics. MarkLogic Server lets you understand and find new patterns, relationships and other details within your content. This knowledge of how your content is related and how users interact with it enables you to evolve and refine your products to better meet user needs...
About Mark Logic Corporation
Mark Logic Corporation is a leading provider of information access and delivery solutions used by publishers, government agencies, and other large enterprises. The company's flagship product, MarkLogic Server, is an XML content platform that includes a unique set of capabilities to store, aggregate, enrich, search, navigate, and dynamically deliver content. The company has two patents on its innovative technology, is privately held, and is backed by Sequoia Capital and Lehman Brothers. To read the Mark Logic CEO Blog, visit marklogic.blogspot.com To learn more about Mark Logic, or to download a free community edition of MarkLogic Server, go to www.marklogic.com.
MarkMail and the MarkLogic Server have been referenced earlier in the Cover Pages:
Brian d'foy (use Perl; List): ""A couple of months ago, MarkLogic imported a bunch of Perl mailing lists into their MarkMail service. After only a couple of minutes playing with the service, I really liked it. Instead of making a query, getting results, then trying again, MarkMail lets me start with a broad topic and refine the search by looking at intermediate results. I wanted to find out more. For 'The Perl Review', I interviewed Jason Hunter, the Principal Technologist at Mark Logic, to find out more. I also made a screencast of me using the service — as well as pounding my laptop with a mallet. You'll see that Jason listed some corpus sizes for different subjects. Although Perl has half a million messages, so other subjects are huge too. We can boost Perl's numbers by getting more lists into MarkMail. Although MarkMail initially imported 75 of the Perl lists, they want to import more. In the interview, Jason has instructions on what to do. Basically, you write to them and tell them to import your list. If you have dead, historical lists, they can import those archives too...
... The blogosphere warmed up a bit when veteran (SGML/XML) markup language experts learned that Norm Walsh is joining Mark Logic. The Mark Logic flagship product, MarkLogic Server, includes a unique set of capabilities to store, aggregate, enrich, search, navigate and dynamically deliver content. On top of this platform, partners and customers build information access and delivery solutions used by publishers, government agencies and other large enterprises to accelerate the creation of content applications. MarkLogic Server was architected top-to-bottom for XML content and provides the most extensive implementation of XQuery, the W3C standard query language for accessing XML documents, on the market today. Open interfaces allow enhancement and enrichment of content without the need to modify the original source files, and real-time update capabilities eliminate the need to re-load as metadata changes...
"Mark Logic Corp. has submitted its Extensible Markup Language (XML) database server for Common Criteria certification. Version 4.0 of MarkLogic Server Enterprise Edition will be tested at Evaluation Assurance Level 3. In a addition, the certification will be augmented with ALC_FLR.3, an assurance on the part of the vendor that it has a process in place to track and fix flaws found in the software found after the certification is issued. Overseen in the U.S. by the National Information Assurance Partnership (NIAP), Common Criteria is a set of security requirements set by government agencies and private companies. To get their products certified, vendors provide a set of security attributes for each product, which are verified by an independent laboratory. The Defense Department uses the Common Criteria as a baseline for purchasing IT products for secure networks. NIAP is a partnership between the National Institute of Standards and Technology and the National Security Agency... MarkLogic Server is database server software for handling XML data, one that uses the XQuery and XPath standards. To date, no other XML databases have achieved Common Criteria certification, though the latest releases of some widely-used relational databases such as Oracle and IBM DB2 do support XML parsing..."
HTML 4.0, XML, PNG, CSS, DOM, and XQuery: These are but a few of the technologies to come out of the World Wide Web Consortium, commonly referred to as the W3C. Mark Logic Corporation is proud to announce that MarkMail has loaded the full W3C public mailing lists. MarkMail in fact uses all of those W3C technologies. The W3C mailing list archives start in 1994 and cover 400,000 emails across 200 mailing lists. MarkMail is a free service for searching mailing list archives, with huge advantages over traditional search engines. It is powered by MarkLogic Server: Each email is stored internally as an XML document, and accessed using XQuery. All searches, faceted navigation, analytic calculations, and HTML page renderings are performed on a single MarkLogic Server machine running against millions of messages. The MarkLogic Server is a commercial enterprise-class XML Content Server built to load, query, manipulate, and render large amounts of XML using the W3C's XQuery language. In MarkMail every email is represented and held as an XML document. MarkMail lets you search millions of emails across thousands of mailing lists.
NB: W3C public mailing lists, on 2008-07-22: "Searching 197 lists and 422,780 messages. First list started in June 1994. There are 107 active lists, recently accumulating 113 messages per day..."
Mark Logic Corporation announced that Princeton Theological Seminary has implemented MarkLogic Server as the new basis for the library's new digital collection. The library has launched a system for publishing digital content to give users better access to and navigation through more than 100,000 digital objects, including digitized representations of historic photographs, portraits, artifacts, and journals. This provides library members — both seminary students pursuing advanced degrees in divinity or theology, as well as the general public — with new levels of access and interactivity with historical and modern theological works. The Seminary Library implemented MarkLogic Server to enhance the library's existing browsing services with search and faceted navigation including the Web 2.0 concept of user-tagging...
Tim O'Reilly (Blog): "I've been meaning to write for a while about MarkLogic's awesome new search tool for trolling through open source mailing lists, MarkMail. Let's face it. While there may be a new generation that thinks that email is for old fogies, for many of us, email is a primary online tool, at least as important to us as the web. Many of us no longer file documents or attachments — we just search for them again in our email. Perhaps most importantly, email is a primary collaboration tool, and as many of us have figured out, collaboration is one of the internet's killer apps. Searching our shared memory in a collaborative space is REALLY useful — with open source mailing lists being a great example...
Developers at Mark Logic Corporation have announced support for a number of new mailing lists and significant growth of existing lists, increasing the base of content being archived and searched with MarkMail. Launched in November 2007, MarkMail is a community-focused searchable message archive service. MarkMail allows people to leverage the immense amount of collective knowledge accumulated over time through email discussions. Users can find technical information, research historical decision making, analyze and understand trends, and locate experts for a wide range of technical topics...
Prepared by Robin Cover for The XML Cover Pages archive.