From owner-meta2@mrrl.lut.ac.uk Tue Mar 26 15:36:17 +0000 1996 Return-path: Received: from avarice.mrrl.lut.ac.uk [158.125.220.8] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0u1anA-0006jE-00; Tue, 26 Mar 1996 15:36:16 +0000 Received: (majordom@localhost) by avarice.mrrl.lut.ac.uk (8.7.4/8.6.9) id PAA26115 for meta2-outgoing; Tue, 26 Mar 1996 15:33:40 GMT Received: from goggins.bath.ac.uk (goggins.bath.ac.uk [138.38.32.13]) by avarice.mrrl.lut.ac.uk (8.7.4/8.6.9) with ESMTP id PAA26110 for ; Tue, 26 Mar 1996 15:33:38 GMT From: H.A.Gott@bath.ac.uk Received: from bath.ac.uk (actually host midge.bath.ac.uk) by goggins.bath.ac.uk with SMTP (PP); Tue, 26 Mar 1996 15:33:26 +0000 Subject: Programme for metadata workshop II To: meta2@mrrl.lut.ac.uk Date: Tue, 26 Mar 1996 15:33:22 +0000 (GMT) Cc: Hazel Gott X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID: <9603261533.aa06340@midge.bath.ac.uk> Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk Dear Participant I hope you all received your joining notes. As I said, the programme will be distributed at registration on 1 April, but meanwhile, an e-mail version follows. This will also be available on the OCLC server. Hazel Monday 1 April 09.30-11.00 Registration Reception Area 11.00-12.30 Welcome and Introduction Lecture Room 4 Lorcan Dempsey (UKOLN) and Stuart Weibel (OCLC Office of Research) Delegates will be invited to introduce themselves and their work Some points on conference logistics : Hazel Gott (UKOLN) 12.30-14.00 Buffet lunch Private Dining Room 14.00-15.40 Plenary Session Lecture Room 4 State of the Art Reports : Some Dublin Core Developments Juha Hakala (Helsinki University Library) Bemal Rajapatirana (National Library of Australia) Eric Miller (OCLC Office of Research) Renato Iannella (DSTC Australia) Priscilla Caplan (University of Chicago) 15.40-16.10 Refreshments The Lounge 16.10-17.10 Plenary Session Lecture Room 4 Related Metadata Initiatives : NCSTRL Project, the ROADS Project, and the IEEE Metadata Initiative Rebecca Lasher (Stanford University) Rachel Heery (UKOLN, University of Bath) Terry Smith (University of California at Santa Barbara) 17.10-18.10 Open Discussion Lecture Room 4 19.30 Dinner Private Dining Room Tuesday 2 April 07.30-08.30 Breakfast The Restaurant 09.00-10.00 Plenary Session Lecture Room 4 Objects and their Description : Granularity and Aggregation Bill Arms (CNRI) Michael Heaney (Bodleian Library) 10.00-10.30 Follow-on Discussion 10.30-11.00 Refreshments The Lounge 11.00-13.00 Break-out Groups Topics and Case Study Rooms to be assigned 13.00-14.00 Buffet Lunch Private Dining Room 14.00-15.00 Plenary Session Lecture Room 4 Internalisation Issues : Update on the IETF Led by Chris Weider (CNIDR) 15.00-15.30 Refreshments The Lounge 15.30-17.30 Break-out Groups Topics and Case Study Rooms to be assigned 19.30 Dinner Private Dining Room Wednesday 3 April 07.30-08.30 Breakfast The Restaurant 09.00-10.30 Break-out Groups' Reports Lecture Room 4 10.30-11.00 Refreshments The Lounge 11.00-12.30 General Plenary Discussion Lecture Room 4 Identification of Issues 12.30-13.30 Buffet Lunch Private Dining Room 13.30-15.00 Planning Future Deployment Lecture Room 4 Is there a consensus on deployment issues emerging from these discussions that can influence the direction of projects now in progress or being planned, that will support the deployment of interoperable, core resource deescription? 15.00 Refreshments and Departure The Lounge Hazel ------------------------------ Hazel Gott, Promotions Officer UKOLN : The UK Office for Library and Information Networking University of Bath, Claverton Down, Bath, BA2 7AY, UK h.a.gott@bath.ac.uk / Tel: +44 1225 826256 / Fax: +44 1225 826838 From owner-meta2@net.lut.ac.uk Thu Apr 04 16:07:55 +0100 1996 Return-path: Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0u4qdf-00005a-00; Thu, 4 Apr 1996 16:07:55 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id QAA12256 for meta2-outgoing; Thu, 4 Apr 1996 16:06:30 +0100 (BST) Received: from gizmo.lut.ac.uk (exim@weeble.lut.ac.uk [158.125.96.47]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id QAA12220 for ; Thu, 4 Apr 1996 16:06:22 +0100 (BST) Received: from goggins.bath.ac.uk [138.38.32.13] by gizmo.lut.ac.uk with esmtp (Exim 0.42 #1) id E0u4kxX-0001hR-00; Thu, 4 Apr 1996 10:04:03 +0100 Received: from ukoln.bath.ac.uk by goggins.bath.ac.uk with SMTP (PP); Thu, 4 Apr 1996 10:02:02 +0100 Date: Thu, 4 Apr 1996 10:02:00 +0100 (BST) From: Lorcan Dempsey To: meta2@mrrl.lut.ac.uk Subject: on from warwick Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk Hello people This is a quick note to thank everybody for excellent participation in what looks like a significant event. I think it is fair to say that it has moved forward somewhat differently to how Stu and I might have anticipated, but in a very exciting way ... I would like to add a special note of thanks to Hazel Gott who looked after most of the hard work of arranging the event and did a very good job. I look forward to productive discussion on the list. Lorcan Lorcan Dempsey ---------------------- ----------------------------------------------- ph: +44 (0)1225 826254 UKOLN (UK Office for Library & Info Networking) fx: +44 (0)1225 826838 University of Bath, Bath BA2 7AY, UK From owner-meta2@net.lut.ac.uk Thu Apr 04 20:31:20 +0100 1996 Return-path: Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0u4ukZ-0000PF-00; Thu, 4 Apr 1996 20:31:19 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id UAA14440 for meta2-outgoing; Thu, 4 Apr 1996 20:30:54 +0100 (BST) Received: from kbm.konbib.nl (root@kbm.konbib.nl [192.87.31.198]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id UAA14435 for ; Thu, 4 Apr 1996 20:30:50 +0100 (BST) Received: from python.konbib.nl (python.konbib.nl [192.87.31.11]) by kbm.konbib.nl (8.6.11/8.6.9) with SMTP id TAA02405; Thu, 4 Apr 1996 19:32:10 +0200 Received: by python.konbib.nl; (5.65v3.0/1.1.8.2/17Jan96-1124AM) id AA16113; Thu, 4 Apr 1996 21:31:38 +0200 Date: Thu, 4 Apr 1996 21:31:37 +0200 (MET DST) From: Titia van der Werf To: Lorcan Dempsey Cc: meta2@mrrl.lut.ac.uk Subject: Re: on from warwick In-Reply-To: Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk On Thu, 4 Apr 1996, Lorcan Dempsey wrote: > > This is a quick note to thank everybody for excellent participation in > what looks like a significant event. I think it is fair to say that it > has moved forward somewhat differently to how Stu and I might have > anticipated, but in a very exciting way ... > > I would like to add a special note of thanks to Hazel Gott who looked > after most of the hard work of arranging the event and did a very good job. > > I look forward to productive discussion on the list. It was a great event to participate in, a nice mix of people and quality brainstorming in a pleasant (confined :-)) environment! Thanks, Lorcan (I'm sorry I missed you to say goodbye and thanks) and Stu and everyone who was there. I would like to ask those of you who are typing out Warwick reports for various purposes, if you would like to consider putting these on the Web and notify the list, so others can benefit from them. I will in each case do so with the paper on URI's and the metadata issue to be presented at the ELAG seminar in Berlin, at the end of April. I also think that some kind of early pre-processed statements about our results, on which we all can agree would help. The rest, we can individually flesh out with stuff to fit individual contexts. have a nice Easter holiday! gr., Titia From owner-meta2@net.lut.ac.uk Sat Apr 06 20:23:32 +0100 1996 Return-path: Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0u5da8-0003Lx-00; Sat, 6 Apr 1996 20:23:32 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id UAA25344 for meta2-outgoing; Sat, 6 Apr 1996 20:22:52 +0100 (BST) Received: from mrrl.lut.ac.uk (martin@localhost.mrrl.lut.ac.uk [127.0.0.1]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with ESMTP id UAA25337 for ; Sat, 6 Apr 1996 20:22:47 +0100 (BST) Message-Id: <199604061922.UAA25337@gizmo.lut.ac.uk> To: meta2@mrrl.lut.ac.uk X-URI: Subject: those pesky meta tags Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sat, 06 Apr 1996 20:22:46 +0100 From: Martin Hamilton Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk After all the META tag discussions at Warwick, I thought I'd do a bit of background reading. Here's what I have so far, in case it's useful to anyone else. Comments welcome! Martin 1. Starting points There was much interest in defining a minimal set of metadata which could be embedded into HTML documents, and establishing a convention for doing this using the "META" tag in the document's "HEAD". I think it was also seen as desirable to have a way of pointing at a richer collection of metadata, in whatever format (Somebody Else's Problem?) e.g. via UR[LN]s. 2. What we actually have to play with The following attributes of META are defined by the HTML DTD in RFC 1866: Note that if a META element has a HTTP-EQUIV attribute, this may be turned into HTTP headers at the discretion of the WWW server (but this isn't implemented by anyone? :-) We also have the following for the LINK tag, which wasn't discussed (that I recall) at Warwick, but seems to be very relevant: And finally, the "TITLE" element! 3. Related work These two documents, currently out as Internet Drafts, propose formalisations for the values of the META and LINK attributes, respectively. The META Tag of HTML draft-musella-html-metatag-02.txt Davide Musella, National Research Council (Italy) January 1996 This suggests ... keywords: to indicate the keywords of the document author: to indicate the author of the document timestamp: to indicate when the document is authored (HTTP-date format) expire: to indicate the expire date of the document (HTTP-date format) language: to indicate the language of the document (using ISO3316 code or ISO639 code) abstract: to indicate the abstract of the document organization: to indicate the organization of the author revision: to indicate the revision number of the document Hypertext links in HTML draft-ietf-html-relrev-00.txt Murray Maloney & Liam Quin, SoftQuad Inc. December 1995 This suggests (amongst other things!) ... REV=MADE identify the author or "maker" of an HTML document REL=AUTHOR hypertext link to an author. REL=COPYRIGHT hypertext link to a copyright notice. REL=DISCLAIMER hypertext link to a legal disclaimer. REL=EDITOR hypertext link to an editor. REL=META hypertext link to a node which contains meta-information related to the current document. REL=PUBLISHER hypertext link to a publisher. REL=TRADEMARK hypertext link to a trademark notice. REL=TRANSLATION the target is a translation to another language. REL=LANG indicates language of the target document. REL=OBSOLETES the target document is a later version of the current document REV=OBSOLETES the target document is obsoleted by the current document. REL=UPDATES the target document contains revisions to the current document. REL=DERIVED-FROM the target document was derived from the current document REV=DERIVED-FROM the current document was derived from the target document, perhaps by automatic processing or by manual editing. Think I got the RELs and REVs the right way round :-) 4. Current usage (of META) Altavista - "It is however possible for you to control how your page is indexed by using the META tag to specify additional keywords to index, and a short abstract." Understands "keywords" and "description" values for the META NAME attribute. Says it will index both fields as words, and return the description along with the URL in any search results. MOMspider - "As shipped, MOMspider only stores the META elements tagged as "Expires", "Owner", and "Reply-To". However, it is very easy to extend MOMspider so that it will look for and store other named metainfo. Possibilities include IAFA index items for building site description files, graphical coordinates for building spacial maps of webspace, etc." ALIWEB cf. Robert S. Thau's WWW site indexing tool generates IAFA templates automatically for ingestion into ALIWEB. It uses the following values for the META NAME attribute: "description The value of this attribute should be a description of the document which makes sense out of context (as it will be seen by people who retrieve it from a global index). keywords The value of this attribute should be a few keywords describing the content areas addressed by the document. resource-type Indicates what sort of object this is. Currently recognized values are document and service, the latter being appropriate for search engine cover sheets and the like. If the document contains no resource-type meta-tag, document is the default. distribution Indicates to what groups of users this document is of interest. If the script is properly configured (see below), then this meta-variable will determine how the document is indexed. If you are preparing a single index for global distribution, you don't have to worry about this." "In addition, the s of your documents are used to fill in the Title: fields of the IAFA templates; you must give your documents HTML titles if they are to appear in your index." <<any more web crawler type usage examples for META and LINK?? info.webcrawler.com was down when I wrote this, so I couldn't check up on the robots list archive>> 5. Problems META and LINK tags being empty apparently confuses some software which has to parse the HTML. Buffers may be too small for the values being stuffed into them, e.g. abstracts and descriptions. Generating extra HTTP headers via HTTP-EQUIV isn't a widely implemented feature. <<are these serious problems ? any others ??>> 6. Conclusions Use LINK and REL=META (or was it REV=META? :-) to point at external meta info. Take a grab bag from the above to form the minimal set of embedded meta info! From owner-meta2@net.lut.ac.uk Sun Apr 07 21:36:24 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0u61CB-0004ON-00; Sun, 7 Apr 1996 21:36:23 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id VAA05278 for meta2-outgoing; Sun, 7 Apr 1996 21:35:59 +0100 (BST) Received: from mrrl.lut.ac.uk (martin@localhost.mrrl.lut.ac.uk [127.0.0.1]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with ESMTP id VAA05271 for <meta2@mrrl.lut.ac.uk>; Sun, 7 Apr 1996 21:35:54 +0100 (BST) Message-Id: <199604072035.VAA05271@gizmo.lut.ac.uk> X-Mailer: exmh version 1.6.6 3/24/96 To: meta2@mrrl.lut.ac.uk X-URI: <URL:http://www.roads.lut.ac.uk/~martin> Subject: mime as container format Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sun, 07 Apr 1996 21:35:53 +0100 From: Martin Hamilton <martin@mrrl.lut.ac.uk> Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk Just a couple of notes on that poster that Jon and I did showing how MIME could be used as a container format for those blobs^H^H^H^H^Hpackages of metadata MIME seemed kind of neat as the deployment mechanism: . already specified . widely implemented . widely deployed . basis for multi-media support in HTTP What's more: . object types, charsets, encodings, etc are tagged in a standardised way . supports nesting via multipart/{mixed,alternative} . somebody already looking after tag registrations! I guess that last one maybe isn't so good, if you subscribe to the view that the registration process for new content types isn't working... :-( Another fly in the ointment is that the use of MIME in HTTP and its (original) use in email are slightly disjoint - e.g. CRLF canonicalisation, 8 bit clean versus ASCII, and different header names. In our example we were using the mail variant of MIME, rather than the HTTP one. Note that the "Content-Location:" header we refer to is drawn from an Internet Draft by Jacob Palme "The Text/HTML Content Type and the Content-Location MIME Header", aka draft-palme-text-html-02.txt. This suggests borrowing the "Location:" header from HTTP for use in email as a way of referring to external bodyparts. This was just a private submission, but now that the MHTML working group has been started up we can perhaps expect it to become a standard (eventually). Likewise the "Content-MD5:" header isn't a standard part of MIME, but is standards track - see RFC 1864. Some mailers have some support for it, but WWW usage seems to have been caught up in the wrangling over HTTP 1.1. In HTTP we already have the important header - "Location:", but I don't believe there is much experience with deploying multipart aware clients and servers. Does anyone else know better ? This probably wouldn't be much of a problem, since plenty of code has already been written to support it in mailers and newsreaders ? :-) Martin From owner-meta2@net.lut.ac.uk Mon Apr 08 17:32:28 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0u6Jrf-0005Dr-00; Mon, 8 Apr 1996 17:32:27 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id RAA08955 for meta2-outgoing; Mon, 8 Apr 1996 17:32:08 +0100 (BST) Received: from mrrl.lut.ac.uk (martin@localhost.mrrl.lut.ac.uk [127.0.0.1]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with ESMTP id RAA08948 for <meta2@mrrl.lut.ac.uk>; Mon, 8 Apr 1996 17:32:05 +0100 (BST) Message-Id: <199604081632.RAA08948@gizmo.lut.ac.uk> To: meta2@mrrl.lut.ac.uk X-URI: <URL:http://www.roads.lut.ac.uk/~martin> Subject: one more thing... Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Mon, 08 Apr 1996 17:32:05 +0100 From: Martin Hamilton <martin@mrrl.lut.ac.uk> Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk This is something which wasn't discussed at the workshop, as far as I recall, but it seems to be relevant once we start to think about the uses which that metadata might be put. The question is how one might go about writing the metadata for a whole WWW site in such a way as to minimize the number of network transactions required to grab it, and perhaps minimize the bandwidth used up in the grabbing. Embedding the metadata related to an object within the object itself effectively means that the object has to be snarfed in order to extract the info - unless some process extracts it and stashes it in a safe place, for the Web crawlers to find later. Separate metadata (whether extracted from the original object, generated by hand, or...) implies separate network transactions to retrieve the parcel of metadata associated with each object. What I don't recall us discussing was bundling all the metadata for (say) a WWW site into a single object, even if only at a protocol level ? This is the approach adopted by the likes of ALIWEB and Harvest, using IAFA templates and SOIF respectively. Harvest actually has a neat little protocol of its own which makes it possible to get just the stuff you're interested in, with compression thrown in for good measure: HELLO <hostname> - Friendly Greeting HELP - This message SEND-OBJECT <oid> - Send an Object Description SEND-UPDATE <timestamp> - Send all Object Descriptions that have been changed/created since timestamp SET compression - Enable GNU zip compressed transfers QUIT - Close session This might not seem relevant to our discussions, but now that the Harvest software has been re-badged as the Netscape Catalog Server [1] this will presumably generate a great deal of interest in the SOIF format and the Harvest protocols. One might even go so far as to suggest that SOIF be "fixed up" (if you believe this is necessary) rather than a whole new metadata format developed... ;-) <ducks> Martin [1] <URL:http://home.netscape.com/newsref/pr/newsrelease97.html> From owner-meta2@net.lut.ac.uk Mon Apr 08 19:19:46 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0u6LXW-0005L4-00; Mon, 8 Apr 1996 19:19:46 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id TAA09569 for meta2-outgoing; Mon, 8 Apr 1996 19:19:41 +0100 (BST) Received: from newton.ncsa.uiuc.edu (newton.ncsa.uiuc.edu [141.142.2.2]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id TAA09564 for <meta2@mrrl.lut.ac.uk>; Mon, 8 Apr 1996 19:19:37 +0100 (BST) Received: from void.ncsa.uiuc.edu (void.ncsa.uiuc.edu [141.142.103.20]) by newton.ncsa.uiuc.edu (8.6.11/8.6.12) with SMTP id NAA15430 for <meta2@mrrl.lut.ac.uk>; Mon, 8 Apr 1996 13:19:16 -0500 Received: by void.ncsa.uiuc.edu (4.1/NCSA-4.1) id AA01490; Mon, 8 Apr 96 13:17:03 CDT Date: Mon, 8 Apr 96 13:17:03 CDT From: liberte@ncsa.uiuc.edu (Daniel LaLiberte) Message-Id: <9604081817.AA01490@void.ncsa.uiuc.edu> To: meta2@mrrl.lut.ac.uk Subject: one more thing... In-Reply-To: <199604081632.RAA08948@gizmo.lut.ac.uk> References: <199604081632.RAA08948@gizmo.lut.ac.uk> Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk Martin Hamilton writes: > The question is how one might go about writing the metadata for a whole > WWW site in such a way as to minimize the number of network > transactions required to grab it, and perhaps minimize the bandwidth > used up in the grabbing. This is a good idea, but I would amend what you are suggesting to be specific to a collection rather than to a site. A site, or server, or repository, is more of an administrative unit rather than a semantic unit. A single site could provide several unrelated collections, and often does. What one wants when grabbing metadata is typically *not* everything at a site, unless you are particularly interested in the site itself. Instead, one wants everything in some related set or collection of resources. (BTW, web crawlers will soon find it impractical to try grabbing everything in the web, and will instead have to focus on selecting related pages.) > Embedding the metadata related to an object within the object itself > effectively means that the object has to be snarfed in order to extract > the info - unless some process extracts it and stashes it in a safe > place, for the Web crawlers to find later. This automatic extraction process should be supported by smart servers. > What I don't recall us discussing was bundling all the metadata for > (say) a WWW site into a single object, even if only at a protocol level > ? In addition to an object representing the bundling of all metadata for everything in a collection, there might be a separate set of metadata for the collection as a whole, and other metadata that is inherited by every item in a collection (those are different things). Daniel LaLiberte (liberte@ncsa.uiuc.edu) National Center for Supercomputing Applications http://union.ncsa.uiuc.edu/~liberte/ From owner-meta2@net.lut.ac.uk Tue Apr 09 20:56:11 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0u6jWN-0006Xe-00; Tue, 9 Apr 1996 20:56:11 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id UAA15450 for meta2-outgoing; Tue, 9 Apr 1996 20:55:14 +0100 (BST) Received: from fssun09.dev.oclc.org (fssun09-24.dev.oclc.org [132.174.24.10]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id UAA15444 for <meta2@mrrl.lut.ac.uk>; Tue, 9 Apr 1996 20:55:06 +0100 (BST) Received: from ws02-00.rsch.oclc.org by fssun09.dev.oclc.org (4.1/SMI-4.1) id AA13891; Tue, 9 Apr 96 15:54:33 EDT From: weibel@oclc.org (Stu Weibel) Received: (weibel@localhost) by ws02-00.rsch.oclc.org (8.6.10/8.6.9) for meta2@mrrl.lut.ac.uk id PAA02366; Tue, 9 Apr 1996 15:54:31 -0400 Date: Tue, 9 Apr 1996 15:54:31 -0400 Message-Id: <199604091954.PAA02366@ws02-00.rsch.oclc.org> To: meta2@mrrl.lut.ac.uk Subject: Outline: Please review X-Sun-Charset: US-ASCII Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk Fellow Warwick Workers, The following is an outline for the top level description piece for Workshop. I'd love to have feedback about it before I get too deep. For those who gave presentations about implementations (sections III and IV), it would be great if you could give me a paragraph or two of synopsis. Please note the outline does not reference the main plenary talks. It struck me that this should be a narrative about where we are rather than how we got there, but I also see these important pieces as an integral component of the discussion. Suggestions welcome. More to come soon. stu ----------------------------------------------------------------------- The Warwick Metadata Workshop: A Framework for the Deployment of Resource Description I. INTRODUCTION II. INTENDED USES FOR THE DUBLIN CORE A. Content Self-description HTML is the strategic application B. Semantic Interoperability Unifying disparate description models III. THE DUBLIN CORE: EARLY IMPLEMENTERS A. The Nordic Core B. The Dublin Core Down Under Two projects in resource description from Australia 1. TURNIP 2. National Library of Australia C. OCLC's Dublin Core Initiatives D. Mapping between The Dublin Core and MARC Records E. Deployment of Dublin Core records in the Alexandria Project DLI IV. RELATED DESCRIPTION MODELS A. IAFA templates B. RFC 1807 V. RESOLVING IMPEDIMENTS TO DEPLOYMENT A. Syntax B. An Architecture for Metadata: The Warwick Framework 1. Recursive packages of metadata 2. Extensibility 3. Modularity 4, Registered Metadata types Resource discovery eg: D-C, IAFA, 1807 Archiving and Provenance Administrative Metadata Terms and Conditions local extensions C. User Guides 1. Guide to Authors for generating resource description 2. Guide to adminsitrators of collections D. Internationalization Yep... this is a tough problem VI. SUMMARY AND FUTURE DIRECTIONS From owner-meta2@net.lut.ac.uk Wed Apr 10 12:07:53 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0u6xkf-0007FY-00; Wed, 10 Apr 1996 12:07:53 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id MAA17551 for meta2-outgoing; Wed, 10 Apr 1996 12:07:02 +0100 (BST) Received: from oxmail3.ox.ac.uk (oxmail3.ox.ac.uk [163.1.2.9]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id MAA17546 for <META2@MRRL.LUT.AC.UK>; Wed, 10 Apr 1996 12:07:00 +0100 (BST) Received: from vax.ox.ac.uk (actually host vax) by oxmail3 with SMTP (PP); Wed, 10 Apr 1996 12:06:42 +0100 Received: by vax.ox.ac.uk (MX V4.2 VAX) id 9; Wed, 10 Apr 1996 12:06:39 +0100 Date: Wed, 10 Apr 1996 12:06:39 +0100 From: MIKE HEANEY <heaney@vax.ox.ac.uk> To: META2@mrrl.lut.ac.uk CC: heaney@vax.ox.ac.uk Message-ID: <009A0A5E.A2F017BE.9@vax.ox.ac.uk> Subject: RE: Top-level description Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk A comment on: >II. INTENDED USES FOR THE DUBLIN CORE > > A. Content Self-description > > HTML is the strategic application Do we need to elucidate here the difference we discussed between embedded metadata (i.e. meta or equivalent tags embedded in an HTML document) and separate metadata, pointing to and allowing description of a wide range of objects (html, ps, pdf, jpeg, gif, mpeg, avi &c). I don't see this point explicitly covered in section V IMPEDIMENTS (Syntax/Architecture) Mike Heaney Bodleian Library michael.heaney@bodley.ox.ac.uk From owner-meta2@net.lut.ac.uk Wed Apr 10 12:18:06 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0u6xuX-0007GP-00; Wed, 10 Apr 1996 12:18:05 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id MAA17586 for meta2-outgoing; Wed, 10 Apr 1996 12:17:58 +0100 (BST) Received: from oxmail3.ox.ac.uk (oxmail3.ox.ac.uk [163.1.2.9]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id MAA17581 for <meta2@mrrl.lut.ac.uk>; Wed, 10 Apr 1996 12:17:56 +0100 (BST) Received: from vax.ox.ac.uk (actually host vax) by oxmail3 with SMTP (PP); Wed, 10 Apr 1996 12:17:12 +0100 Received: by vax.ox.ac.uk (MX V4.2 VAX) id 11; Wed, 10 Apr 1996 12:17:08 +0100 Date: Wed, 10 Apr 1996 12:17:07 +0100 From: Lou Burnard <lou@vax.ox.ac.uk> To: meta2@mrrl.lut.ac.uk CC: lou@vax.ox.ac.uk Message-ID: <009A0A60.19CA7543.11@vax.ox.ac.uk> Subject: RE: Top-level description Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk Re Mike Heaney's comment: I am currently drafting a document setting out the various options discussed in the "syntax" context which touches on this issue (a better metaphor might be "sits on its head"). This should be available in the next few days. Lou From owner-meta2@net.lut.ac.uk Wed Apr 10 12:59:09 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0u6yYG-0007Ib-00; Wed, 10 Apr 1996 12:59:08 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id MAA17744 for meta2-outgoing; Wed, 10 Apr 1996 12:59:01 +0100 (BST) Received: from weeble.lut.ac.uk (exim@weeble.lut.ac.uk [158.125.96.47]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id MAA17739 for <meta2@mrrl.lut.ac.uk>; Wed, 10 Apr 1996 12:58:59 +0100 (BST) Received: by weeble.lut.ac.uk with local (Exim 0.42 #1) id E0u6yY6-0007IX-00; Wed, 10 Apr 1996 12:58:58 +0100 Date: Wed, 10 Apr 1996 12:58:58 +0100 (BST) From: Jon Knight <J.P.Knight@lut.ac.uk> To: meta2@mrrl.lut.ac.uk Subject: RE: Top-level description In-Reply-To: <009A0A5E.A2F017BE.9@vax.ox.ac.uk> Message-ID: <Pine.SUN.3.91.960410124124.8252Z-100000@weeble.lut.ac.uk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk On Wed, 10 Apr 1996, MIKE HEANEY wrote: > Do we need to elucidate here the difference > we discussed between embedded metadata (i.e. > meta or equivalent tags embedded in an HTML > document) and separate metadata, pointing to > and allowing description of a wide range of > objects (html, ps, pdf, jpeg, gif, mpeg, avi > &c). I don't see this point explicitly covered > in section V IMPEDIMENTS (Syntax/Architecture) I think this is an important point as towards the end of the workshop we seemed (from where I was sitting anyway) to be settling in to solving three different but interlinked problems; the embedding of Dublin Core Element Set attributes into HTML doucments (a specific instance of embedded metadata, the construction of a concrete canonical syntax for the Dublin Core Element Set and the design of the "Warwick Framework" for the exchange of arbitrary packages of metadata between consenting programs. For my part I see the embedding of DCES attributes within HTML as a particular mapping between the concrete syntax that the SGML boys came up with and the limitations of the HTML DTD. I can imagine other mappings might appear in time (PostScript structured comments, PNG extenstions, VRML headers, IAFA templates, MARC records etc, etc) if the DCES proves to be popular and useful. It might be worth pointing out in section IIA that although HTML is file format that is going to get the first mapping, other mappings to other formats should be encouraged as well. Section IIB seems to cover the other two problems (ie: pointing out that we need a canonical concrete representation of DCES and also the WF for interoperability between systems). Tatty bye, Jim'll -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Jon "Jim'll" Knight, Researcher, Sysop and General Dogsbody, Dept. Computer Studies, Loughborough University of Technology, Leics., ENGLAND. LE11 3TU. * I've found I now dream in Perl. More worryingly, I enjoy those dreams. * From owner-meta2@net.lut.ac.uk Thu Apr 11 18:28:27 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0u7QAU-00010S-00; Thu, 11 Apr 1996 18:28:26 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id SAA25013 for meta2-outgoing; Thu, 11 Apr 1996 18:27:24 +0100 (BST) Received: from oxmail3.ox.ac.uk (oxmail3.ox.ac.uk [163.1.2.9]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id SAA25008 for <meta2@mrrl.lut.ac.uk>; Thu, 11 Apr 1996 18:27:21 +0100 (BST) Received: from vax.ox.ac.uk (actually host vax) by oxmail3 with SMTP (PP); Thu, 11 Apr 1996 18:27:10 +0100 Received: by vax.ox.ac.uk (MX V4.2 VAX) id 3; Thu, 11 Apr 1996 18:27:07 +0100 Date: Thu, 11 Apr 1996 18:27:07 +0100 From: Lou Burnard <lou@vax.ox.ac.uk> To: meta2@mrrl.lut.ac.uk CC: lou@vax.ox.ac.uk Message-ID: <009A0B5C.F417559A.3@vax.ox.ac.uk> Subject: RE: Top-level description Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk Jon "'" Knight writes >For my part I see the embedding of DCES attributes within HTML as a >particular mapping between the concrete syntax that the SGML boys came up >with and the limitations of the HTML DTD. I can imagine other mappings >might appear in time (PostScript structured comments, PNG extenstions, >VRML headers, IAFA templates, MARC records etc, etc) if the DCES proves to >be popular and useful. Some of these "mappings" are going to be more useful than others. One kind of mapping the DCES to Postscript might be to just print it out, for example; and one kind of mapping to HTML would say "well, if we wanted to print this Dublin Core out so it looked nice, what tags would we use to do that?" (that's option 2b in the working paper I'm hoping to announce here shortly). A useful mapping, in my book, implies that the syntax of the target system is at least as powerful as that of the source, in the sense that it allows you to represent all the inter-relations, structure, distinctions etc. that a human reader might wish to identify in the set of information carried by the DC. MARC records probably do that but I'm less sure of the others. Mappings which are lossy are, I opine, to be deprecated. Or why bother with em? Lou, on behalf of The SGML Boys From owner-meta2@net.lut.ac.uk Thu Apr 11 19:38:15 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0u7RG3-00014s-00; Thu, 11 Apr 1996 19:38:15 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id TAA25606 for meta2-outgoing; Thu, 11 Apr 1996 19:38:05 +0100 (BST) Received: from weeble.lut.ac.uk (exim@weeble.lut.ac.uk [158.125.96.47]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id TAA25599 for <meta2@mrrl.lut.ac.uk>; Thu, 11 Apr 1996 19:38:01 +0100 (BST) Received: by weeble.lut.ac.uk with local (Exim 0.42 #1) id E0u7RFo-00014o-00; Thu, 11 Apr 1996 19:38:00 +0100 Date: Thu, 11 Apr 1996 19:38:00 +0100 (BST) From: Jon Knight <J.P.Knight@lut.ac.uk> To: meta2@mrrl.lut.ac.uk Subject: RE: Top-level description In-Reply-To: <009A0B5C.F417559A.3@vax.ox.ac.uk> Message-ID: <Pine.SUN.3.91.960411192145.8252l-100000@weeble.lut.ac.uk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk On Thu, 11 Apr 1996, Lou Burnard wrote: > Mappings which are > lossy are, I opine, to be deprecated. Or why bother with em? Because there _may_ be some file formats that we'd like to embed metadata in that constrain the amount of information that we can pack in. Obviously lossless mappings are an ideal we should aspire to (and maybe they're possible in all situtations and I'm just being overly pessimistic) but I don't think we should discount lossy mappings in all circumstances. There may be occasions in the future where there are constraints imposed by the target format in which we wish to embed DCES metadata that prevent us from including all the metadata we'd like. In these situations it might be useful to define a lossy mapping that crams as much of the useful metadata as possible into the object's file format. I seem to remember a number of people saying that (paraphrased) "something is better than nothing", which is a view with which I heartily concur. Anyway, this is a minor side point that we shouldn't get too distracted with. Getting that lossless mapping for embedding DCES into HTML is the important thing and I don't want to distract you from that! :-) Tatty bye, Jim'll -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Jon "Jim'll" Knight, Researcher, Sysop and General Dogsbody, Dept. Computer Studies, Loughborough University of Technology, Leics., ENGLAND. LE11 3TU. * I've found I now dream in Perl. More worryingly, I enjoy those dreams. * From owner-meta2@net.lut.ac.uk Mon Apr 15 17:32:52 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0u8rCt-0000J9-00; Mon, 15 Apr 1996 17:32:51 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id RAA18149 for meta2-outgoing; Mon, 15 Apr 1996 17:31:26 +0100 (BST) Received: from ns.onet.on.ca (ns.onet.on.ca [130.185.89.125]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with ESMTP id RAA18140 for <meta2@mrrl.lut.ac.uk>; Mon, 15 Apr 1996 17:31:03 +0100 (BST) Received: from sqarc.sq.com ([192.31.6.128]) by ns.onet.on.ca with SMTP id <252201>; Mon, 15 Apr 1996 12:31:07 -0400 Received: from sqrex.sq.com by sqarc.sq.com with smtp (Smail3.1.29.1 #4) id m0u8rAQ-000OjVC; Mon, 15 Apr 96 12:30 EDT Received: by sqrex.sq.com (4.1//ident-1.0) id AA20564; Mon, 15 Apr 96 12:30:18 EDT Date: Mon, 15 Apr 96 12:30:18 EDT From: lee@sq.com Message-Id: <9604151630.AA20564@sqrex.sq.com> To: meta2@mrrl.lut.ac.uk Subject: [fwd] Microsoft enters the metadata fray? Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk I am not sure what difference, if any, this makes to what was discussed in Coventry-- er, at Warwick, but I'm forwarding it for information. Lee Liam Quin, SoftQuad Inc., lee@sq.com ----- Begin Included Message ----- From: rongus@tiac.net (Ron Gustavson) To: ebook-list@aros.net Subject: Copyright and Cryptolopes Date: Sat, 30 Mar 1996 08:06:57 GMT Reply-To: ebook-list@aros.net I just read about this in Clari News--it concerns a technology that IBM is promoting for publishers to control their rights, royalties, etc. IBM cryptolopes, or "encrypted envelopes," provide information in sealed containers that include a description of the contents, the size of the file, any coupons or other promotions associated with the information, and use and pricing info... These are supposed to be available for trial from the IBM web site and infoMarket in April. Other URLs you may want to check out regarding intellectual property rights are: http://www.hotwired.com/wired/4.01/features/whitepaper.html gopher://iitf.doc.gov/00/papers/documents/files/ipnii.txt http://home.worldweb.net/dfc/ ____________________________________________________ Ron Gustavson <rongus@tiac.net> NO-8-DO <75144.1333@compuserve.com> ----------------------------------------------------------- Thanks for using EBOOK-List, Discussion on Electronic Books Post Message: ebook-list@aros.net Get Commands: majordomo@aros.net "help" Administrator: noring@netcom.com Unsubscribe: majordomo@aros.net "unsubscribe ebook-list" ----- End Included Message ----- From owner-meta2@net.lut.ac.uk Wed Apr 17 21:26:14 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0u9dnp-0002vV-00; Wed, 17 Apr 1996 21:26:13 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id VAA29907 for meta2-outgoing; Wed, 17 Apr 1996 21:25:42 +0100 (BST) Received: from sulmail.Stanford.EDU (sulmail.Stanford.EDU [36.31.0.12]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id VAA29902 for <meta2@mrrl.lut.ac.uk>; Wed, 17 Apr 1996 21:25:38 +0100 (BST) Received: from [36.120.0.23] (SUL-Math-RL.Stanford.EDU [36.120.0.23]) by sulmail.Stanford.EDU (8.6.12/8.6.6) with SMTP id NAA121924; Wed, 17 Apr 1996 13:20:19 -0700 From: Rebecca Lasher <rlasher@sulmail.Stanford.EDU> To: meta2@mrrl.lut.ac.uk cc: rlasher@sulmail.Stanford.EDU Subject: Ballads of Dublin Core and Warwick Framework Message-ID: <SIMEON.9604171251.F@muahost.sulmail> Date: Wed, 17 Apr 1996 12:26:51 -0800 Priority: NORMAL X-Mailer: Simeon for Macintosh X-Authentication: none MIME-Version: 1.0 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk TO: MetaData Workshop Participants FROM: Rebecca Lasher Chris Weider and I were building stories in London after the workshop. He came up with the concept of metaographer and the tale was built from that. My husband, Gary Wesley, helped with the meter. The ballads are meant to be fun. I never realized how hard it is to write poetry on a technical subject. Difficult if not impossible words include, interoperability, integration, implementation, and convergence. Enjoy. --------------------------------------------------- Ballad of the DUBLIN Core There once was a group of metaographers Trying to imitate bibliographers They met in Dublin with one thing in mind To find out which fields are the best kind They inspected each element, tore them apart And argued til sleep forced them to part They declared 13 fields they saw were the best For resource description just they pass the test The metaographers were happy as they went out the door Their work will be known as the Dublin Core ***** Ballad of the Warwick Framework One year later the group met once more Their task to extend the great Dublin Core They met in Warwick, diverse goals in mind Implementation details of all kinds Some wanted to focus on metadata syntax Others to extend the Core to the max By end of Day Three consensus was spied Concept of container with packages inside A new architecture with packages and types Intellectual content not just the bytes Our leaders were the best, they were not jerks The result is known as Warwick Framework From owner-meta2@net.lut.ac.uk Mon Apr 22 09:22:31 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uBGtC-0000iF-00; Mon, 22 Apr 1996 09:22:30 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id JAA25057 for meta2-outgoing; Mon, 22 Apr 1996 09:19:02 +0100 (BST) Received: from oxmail3.ox.ac.uk (oxmail3.ox.ac.uk [163.1.2.9]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id JAA25050 for <meta2@mrrl.lut.ac.uk>; Mon, 22 Apr 1996 09:18:59 +0100 (BST) Received: from vax.ox.ac.uk (actually host vax) by oxmail3 with SMTP (PP); Mon, 22 Apr 1996 09:18:49 +0100 Received: by vax.ox.ac.uk (MX V4.2 VAX) id 12; Mon, 22 Apr 1996 09:18:43 +0100 Date: Mon, 22 Apr 1996 09:18:42 +0100 From: Lou Burnard <lou@vax.ox.ac.uk> To: meta2@mrrl.lut.ac.uk CC: lou@vax.ox.ac.uk Message-ID: <009A13B5.29DEE3E7.12@vax.ox.ac.uk> Subject: Syntax for Dublin Core: paper available Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk As promised here a week or two ago, a document is now available which presents the recommendations on syntax for metadata discussed at Warwick at the beginning of this month. The document is titled "A Syntax for Dublin Core Metadata: Recommendations from the Second Metadata Workshop" and its authors are Lou Burnard, Eric Miller, Liam Quin, and C.M. Sperberg-McQueen. You can find it in html at http://info.ox.ac.uk/~lou/wip/metadata.syntax.html OR http://www.uic.edu/~cmsmcq/tech/metadata.syntax.html It is also available in SGML (try Panorama on it!) at http://info.ox.ac.uk/~lou/wip/metadata.syntax.tei OR http://info.ox.ac.uk/~cmsmcq/tech/metadata.syntax.tei The document summarizes a set of recommendations concerning the representation of metadata, derived from discussion within the syntax working group which met at the second Metadata Workshop, held at Warwick University in April 1996. The discussion begun in Warwick has been continued electronically by the current authors, and this paper presents both the recommendations agreed on by the syntax working group in Warwick and some further developments for which the authors alone are responsible. Here's the executive summary: * that recommendations be made showing how to use the HTML <meta> element for Dublin-Core metadata; examples are included in the document. * that a standard canonical syntax be defined for Dublin-Core metadata, using SGML syntax. The working group defined no DTD, but a possible DTD devised by the authors is included in the document. Discussions in Warwick also led to an informal demonstration of how SGML could be used as the mechanism for encoding the containers and metadata packages foreseen in the Warwick Framework. A sample DTD for such packages is included in the document. From owner-meta2@net.lut.ac.uk Mon Apr 22 13:28:47 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uBKjW-0000xD-00; Mon, 22 Apr 1996 13:28:46 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id NAA25587 for meta2-outgoing; Mon, 22 Apr 1996 13:27:41 +0100 (BST) Received: from weeble.lut.ac.uk (exim@weeble.lut.ac.uk [158.125.96.47]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id NAA25582 for <meta2@mrrl.lut.ac.uk>; Mon, 22 Apr 1996 13:27:38 +0100 (BST) Received: from fssun09.dev.oclc.org [132.174.24.10] by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uBKiN-0000x2-00; Mon, 22 Apr 1996 13:27:36 +0100 Received: from ws02-00.rsch.oclc.org by fssun09.dev.oclc.org (4.1/SMI-4.1) id AA23699; Mon, 22 Apr 96 08:09:50 EDT From: weibel@oclc.org (Stu Weibel) Received: (weibel@localhost) by ws02-00.rsch.oclc.org (8.6.10/8.6.9) for meta2@mrrl.lut.ac.uk id IAA01496; Mon, 22 Apr 1996 08:09:47 -0400 Date: Mon, 22 Apr 1996 08:09:47 -0400 Message-Id: <199604221209.IAA01496@ws02-00.rsch.oclc.org> To: meta2@mrrl.lut.ac.uk Subject: W3C indexing workshop X-Sun-Charset: US-ASCII Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk Metafolk, I'm very pleased to be seeing the good stuff coming across the list... looks like we have some good beginnings on documents to support the work we did in Warwick. Also, a note about the W3C Distributed Indexing workshop. I have submitted a position paper (http://www.oclc.org:5046/~weibel/dist_indexing.html) to participate in the workshop, but I encourage others who are inclined to attend to also do so. I received a revised call this past weekend indicating the deadline had been extended: > ***POSITION PAPER DEADLINE HAS BEEN EXTENDED*** > > Position papers due: May 6, 1996 > Acceptance notifications: May 13, 1996 stu From owner-meta2@net.lut.ac.uk Mon Apr 22 20:58:45 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uBRkz-0001NK-00; Mon, 22 Apr 1996 20:58:45 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id UAA29492 for meta2-outgoing; Mon, 22 Apr 1996 20:58:24 +0100 (BST) Received: from UICVM.UIC.EDU (UICVM-ETH1.CC.UIC.EDU [128.248.2.150]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id UAA29487 for <meta2@MRRL.LUT.AC.UK>; Mon, 22 Apr 1996 20:58:15 +0100 (BST) Message-Id: <199604221958.UAA29487@gizmo.lut.ac.uk> Received: from UICVM.CC.UIC.EDU by UICVM.UIC.EDU (IBM VM SMTP V2R2) with BSMTP id 4462; Mon, 22 Apr 96 14:57:42 CDT Received: from UICVM (NJE origin U35395@UICVM) by UICVM.CC.UIC.EDU (LMail V1.2a/1.8a) with BSMTP id 3363; Mon, 22 Apr 1996 14:57:42 -0500 Date: Mon, 22 Apr 96 14:55:27 CDT From: "C. M. Sperberg-McQueen" <U35395@UICVM.CC.UIC.EDU> Organization: ACH/ACL/ALLC Text Encoding Initiative Subject: Re: Syntax for Dublin Core: paper available To: meta2@mrrl.lut.ac.uk In-Reply-To: Message of Mon, 22 Apr 1996 09:18:42 +0100 from <lou@vax.ox.ac.uk> Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk On Mon, 22 Apr 1996 09:18:42 +0100 <lou@vax.ox.ac.uk> said: >As promised here a week or two ago, a document is now available which >presents the recommendations on syntax for metadata discussed at Warwick >at the beginning of this month. > ... >It is also available in SGML (try Panorama on it!) at > > http://info.ox.ac.uk/~lou/wip/metadata.syntax.tei >OR > http://info.ox.ac.uk/~cmsmcq/tech/metadata.syntax.tei For the latter, read http://www.uic.edu/~cmsmcq/tech/metadata.syntax.tei I'm having a little bit of trouble with the style sheets just at the moment, but hope to iron it out soon. -Michael Sperberg-McQueen From owner-meta2@net.lut.ac.uk Tue Apr 23 18:14:45 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uBlfp-0002OT-00; Tue, 23 Apr 1996 18:14:45 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id SAA05710 for meta2-outgoing; Tue, 23 Apr 1996 18:12:36 +0100 (BST) Received: from weeble.lut.ac.uk (exim@weeble.lut.ac.uk [158.125.96.47]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id SAA05705 for <meta2@mrrl.lut.ac.uk>; Tue, 23 Apr 1996 18:12:33 +0100 (BST) Received: by weeble.lut.ac.uk with local (Exim 0.42 #1) id E0uBldg-0002OG-00; Tue, 23 Apr 1996 18:12:32 +0100 Date: Tue, 23 Apr 1996 18:12:32 +0100 (BST) From: Jon Knight <J.P.Knight@lut.ac.uk> To: meta2@mrrl.lut.ac.uk cc: lou@vax.ox.ac.uk Subject: Re: Syntax for Dublin Core: paper available In-Reply-To: <009A13B5.29DEE3E7.12@vax.ox.ac.uk> Message-ID: <Pine.SUN.3.91.960423174914.5037M-100000@weeble.lut.ac.uk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk On Mon, 22 Apr 1996, Lou Burnard wrote: > As promised here a week or two ago, a document is now available which > presents the recommendations on syntax for metadata discussed at Warwick > at the beginning of this month. I like the look of the unstructured <META> tags for HTML2 and also standalone DC DTD looks good. Just one minor point struck me when reading through the first example in section 2.1; why does the "scheme" part of otheragent (transcriber) get separated from the name by a colon whereas the "scheme" part of date, form and language are surrounded by brackets? Surely we're attaching some internal syntax into the value of the name attribute to the <META> element and so we should be consistent? However I'm a little unclear how the last part of the proposal will work still (the SGML DTD for the Warwick Framework). Am I right in thinking that all the contents of an instance of the WF would have to be "SGML friendly" in order to be included? If not, how would a non-SGML metadata format (say PICS or IAFA templates or even a binary file) be embedded into a document conforming to the WF DTD without breaking SGML parsers? Is there a way in SGML of including variable value boundary markers (like those found in MIME) so that the content of the metadata packages can be distinguished from the WF stuff that surrounds them? Using external references might not be an option (for example if you want to drop a whole load of WF containers onto a laptop to work on during a flight). I can see how MIME can handle these but I can't see how SGML can do it (from what I know of SGML and I'm certainly no expert in that complex and wacky world :-) ). Maybe there is some special content encoding that can protect the SGML elements from the vaguries of the metadata packages that they surround? Tatty bye, Jim'll -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Jon "Jim'll" Knight, Researcher, Sysop and General Dogsbody, Dept. Computer Studies, Loughborough University of Technology, Leics., ENGLAND. LE11 3TU. * I've found I now dream in Perl. More worryingly, I enjoy those dreams. * From owner-meta2@net.lut.ac.uk Tue Apr 23 19:16:42 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uBmdm-0002Rv-00; Tue, 23 Apr 1996 19:16:42 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id TAA06027 for meta2-outgoing; Tue, 23 Apr 1996 19:16:04 +0100 (BST) Received: from ns.onet.on.ca (ns.onet.on.ca [130.185.89.125]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with ESMTP id TAA06022 for <meta2@mrrl.lut.ac.uk>; Tue, 23 Apr 1996 19:15:57 +0100 (BST) Received: from sqarc.sq.com ([192.31.6.128]) by ns.onet.on.ca with SMTP id <253145>; Tue, 23 Apr 1996 14:15:34 -0400 Received: from sqrex.sq.com by sqarc.sq.com with smtp (Smail3.1.29.1 #4) id m0uBmbw-000Ok4C; Tue, 23 Apr 96 14:14 EDT Received: by sqrex.sq.com (4.1//ident-1.0) id AA03446; Tue, 23 Apr 96 14:14:48 EDT Date: Tue, 23 Apr 96 14:14:48 EDT From: lee@sq.com Message-Id: <9604231814.AA03446@sqrex.sq.com> To: meta2@mrrl.lut.ac.uk Subject: Re: Syntax for Dublin Core: paper available Cc: lou@vax.ox.ac.uk Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk I have a quick implementation question... Where does Author's Affiliation (or, Company) go, using the HTML META tag method? I am more and more coming to like the idea of supporting separate files using <LINK REL="metadata/dublinCore" href="theFile.meta"> <LINK REL="metadata/IAFA" href="theFile.iafa"> and so on. For a non-HTML document, you could also have <LINK REL="instance" type="application/PostScript" href="file.ps"> and simply link everything with an HTML `document' containing only the metadata. This is partly because I think that the HTTP people are also working on ways of shipping bundles of stuff, and probably we shouldn't be trying to solve compound documents. It's very rude of me to be attacking a document with my name on it, I know. But I am coming to think that the approach I am suggesting is much easier for WWW robots to handle, and allows binary formats to exist, and doesn't _require_ a special unpacker -- you can leave the files loose if that's convenient for you, and software that uses (say) IAFA templates to build an FTP archive index (or whatever) can find them without being modified. Having said that... SGML can easily point to external files. A recent modification to SGML (the `corigendum') allows you to include foreign objects, too, although they mustn't contain the 2-character sequence "</" in them. MIME base64 is probably OK, but I haven't checked. (the restriction is for backards compatibility) Sorry to include so many issues in one letter. Lee -- Liam Quin, SoftQuad Inc +1 416 239 4801 lee@sq.com From owner-meta2@net.lut.ac.uk Wed Apr 24 21:10:45 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uCAth-0004Hb-00; Wed, 24 Apr 1996 21:10:45 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id VAA13042 for meta2-outgoing; Wed, 24 Apr 1996 21:10:00 +0100 (BST) Received: from UICVM.UIC.EDU (UICVM.CC.UIC.EDU [128.248.100.50]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id VAA13035 for <meta2@MRRL.LUT.AC.UK>; Wed, 24 Apr 1996 21:09:55 +0100 (BST) Message-Id: <199604242009.VAA13035@gizmo.lut.ac.uk> Received: from UICVM.CC.UIC.EDU by UICVM.UIC.EDU (IBM VM SMTP V2R2) with BSMTP id 2754; Wed, 24 Apr 96 15:09:30 CDT Received: from UICVM (NJE origin U35395@UICVM) by UICVM.CC.UIC.EDU (LMail V1.2a/1.8a) with BSMTP id 3572; Wed, 24 Apr 1996 15:09:30 -0500 Date: Wed, 24 Apr 96 14:54:23 CDT From: "C. M. Sperberg-McQueen" <U35395@UICVM.CC.UIC.EDU> Organization: ACH/ACL/ALLC Text Encoding Initiative Subject: Re: Syntax for Dublin Core: paper available To: meta2@mrrl.lut.ac.uk In-Reply-To: Message of Tue, 23 Apr 1996 18:12:32 +0100 (BST) from <J.P.Knight@lut.ac.uk> Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk On Tue, 23 Apr 1996 18:12:32 +0100 (BST) Jon Knight said: > ... Just one minor point struck me when reading >through the first example in section 2.1; why does the "scheme" part of >otheragent (transcriber) get separated from the name by a colon whereas >the "scheme" part of date, form and language are surrounded by brackets? >Surely we're attaching some internal syntax into the value of the name >attribute to the <META> element and so we should be consistent? I don't believe that 'transcriber' is a SCHEME; it specifies the role of the other agent being named, not the rule set governing the identification and/or transcription of the name. Hence the variant improvised syntax. >However I'm a little unclear how the last part of the proposal will work >still (the SGML DTD for the Warwick Framework). Am I right in thinking >that all the contents of an instance of the WF would have to be "SGML >friendly" in order to be included? If not, how would a non-SGML metadata >format (say PICS or IAFA templates or even a binary file) be embedded into >a document conforming to the WF DTD without breaking SGML parsers? Is The SGML DTD for the Warwick Framework provides a method of naming a lot of packages in a single container. Since the packages may not be contained physically within the SGML document entity, the DTD per se does not address the problem I think you are concerned with, namely packaging it all up in a single data stream for shipment over a network. There are a number of ways people go about packing up the entities referred to from an SGML document; unlike the Warwick Framework DTD, they do address the problem of shipment over the net. There is the SGML Document Interchange Format (SDIF) defined by ISO a few years back, but I don't know whether anyone uses it or not. The SGML Open industry consortium has also been working on this problem, but I don't know the state of play. I hope Lee can tell us all a bit more, or point toward the right documentation. As far as I can tell, MIME could also be used as a packing tool for packaging sets of entities. (But I don't know enough about MIME to say for sure, or to know what would still need solving.) >there a way in SGML of including variable value boundary markers (like >those found in MIME) so that the content of the metadata packages can be >distinguished from the WF stuff that surrounds them? Using external >references might not be an option (for example if you want to drop a >whole load of WF containers onto a laptop to work on during a flight). SGML itself abstracts away from transmission media and does NOT attempt to deal with this, or require a particular method of dealing with it. Any software that can map from a system-dependent name to a data stream and can interact with the SGML parser can, in principle, serve as an SGML entity manager. (Which is not to say making the parser and entity manager talk to each other will necessarily be easy.) -C. M. Sperberg-McQueen ACH / ACL / ALLC Text Encoding Initiative University of Illinois at Chicago u35395@uicvm.uic.edu / u35395@uicvm All opinions expressed in this note (except those I have quoted with a view to refuting them) are mine. They are not necessarily those of the Text Encoding Initiative, its executive committee or other participants, its sponsors, or its funders. Anyone who says otherwise is wrong. From owner-meta2@net.lut.ac.uk Wed Apr 24 21:35:02 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uCBHC-0004Ii-00; Wed, 24 Apr 1996 21:35:02 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id VAA13245 for meta2-outgoing; Wed, 24 Apr 1996 21:34:57 +0100 (BST) Received: from simon.cs.cornell.edu (SIMON.CS.CORNELL.EDU [128.84.154.10]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id VAA13239 for <meta2@mrrl.lut.ac.uk>; Wed, 24 Apr 1996 21:34:54 +0100 (BST) Received: from cloyd.cs.cornell.edu (CLOYD.CS.CORNELL.EDU [128.84.227.15]) by simon.cs.cornell.edu (8.6.10/R1.4) with ESMTP id QAA00537 for <meta2@mrrl.lut.ac.uk>; Wed, 24 Apr 1996 16:34:53 -0400 Received: from CARL-LAPTOP.CS.CORNELL.EDU (CARL-LAPTOP.CS.CORNELL.EDU [128.84.211.11]) by cloyd.cs.cornell.edu (8.6.10/M1.8) with SMTP id QAA28011 for <meta2@mrrl.lut.ac.uk>; Wed, 24 Apr 1996 16:34:51 -0400 Received: by CARL-LAPTOP.CS.CORNELL.EDU with Microsoft Mail id <01BB31FB.4DA79040@CARL-LAPTOP.CS.CORNELL.EDU>; Wed, 24 Apr 1996 16:30:09 -0400 Message-ID: <01BB31FB.4DA79040@CARL-LAPTOP.CS.CORNELL.EDU> From: Carl Lagoze <lagoze@cs.cornell.edu> To: "meta2@mrrl.lut.ac.uk" <meta2@mrrl.lut.ac.uk> Subject: RE: Syntax for Dublin Core: paper available Date: Wed, 24 Apr 1996 16:30:06 -0400 Encoding: 67 TEXT Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk Since I am right in the middle of writing up the description of the Warwick Framework abstraction for the workshop report, I thought that it might be appropriate to throw my two cents in here. Two points: - I think that it is important at this point to pay most attention to the abstraction offerred by the Warwick Framework before debating/discussing its implementation. The concept of recursive containers of arbitrarily complex, typed objects (possibly referenced indirectly) is a powerful abstraction that might be implemented in a variety of ways. - Now I will violate my first point. While I don't pretend to be an expert about SGML or MIME, my intuition is that both of these technologies are not sufficiently powerful to fully express the abstraction. I think for some relatively simple examples, SGML and MIME but be entirely appropriate. My prejudice, however, is to model this using CORBA or ILU and rely on the strong typing provided by the distributed object model. Carl Carl Lagoze Project Leader, Digital Library Research Group Department of Computer Science, Cornell University Ithaca, NY 14853 phone: 607-255-6046 FAX: 607-255-4428 ---------- From: C. M. Sperberg-McQueen[SMTP:U35395@UICVM.CC.UIC.EDU] Sent: Wednesday, April 24, 1996 3:54 PM To: meta2@mrrl.lut.ac.uk Subject: Re: Syntax for Dublin Core: paper available The SGML DTD for the Warwick Framework provides a method of naming a lot of packages in a single container. Since the packages may not be contained physically within the SGML document entity, the DTD per se does not address the problem I think you are concerned with, namely packaging it all up in a single data stream for shipment over a network. There are a number of ways people go about packing up the entities referred to from an SGML document; unlike the Warwick Framework DTD, they do address the problem of shipment over the net. There is the SGML Document Interchange Format (SDIF) defined by ISO a few years back, but I don't know whether anyone uses it or not. The SGML Open industry consortium has also been working on this problem, but I don't know the state of play. I hope Lee can tell us all a bit more, or point toward the right documentation. As far as I can tell, MIME could also be used as a packing tool for packaging sets of entities. (But I don't know enough about MIME to say for sure, or to know what would still need solving.) -C. M. Sperberg-McQueen ACH / ACL / ALLC Text Encoding Initiative University of Illinois at Chicago u35395@uicvm.uic.edu / u35395@uicvm All opinions expressed in this note (except those I have quoted with a view to refuting them) are mine. They are not necessarily those of the Text Encoding Initiative, its executive committee or other participants, its sponsors, or its funders. Anyone who says otherwise is wrong. From owner-meta2@net.lut.ac.uk Wed Apr 24 23:45:33 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uCDJU-0004Oj-00; Wed, 24 Apr 1996 23:45:32 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id XAA13709 for meta2-outgoing; Wed, 24 Apr 1996 23:45:28 +0100 (BST) Received: from simon.cs.cornell.edu (SIMON.CS.CORNELL.EDU [128.84.154.10]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id XAA13704 for <meta2@mrrl.lut.ac.uk>; Wed, 24 Apr 1996 23:45:26 +0100 (BST) Received: from cloyd.cs.cornell.edu (CLOYD.CS.CORNELL.EDU [128.84.227.15]) by simon.cs.cornell.edu (8.6.10/R1.4) with ESMTP id SAA05894 for <meta2@mrrl.lut.ac.uk>; Wed, 24 Apr 1996 18:45:25 -0400 Received: from CARL-LAPTOP (CARL-LAPTOP.CS.CORNELL.EDU [128.84.211.11]) by cloyd.cs.cornell.edu (8.6.10/M1.8) with SMTP id SAA02070 for <meta2@mrrl.lut.ac.uk>; Wed, 24 Apr 1996 18:45:23 -0400 Received: by CARL-LAPTOP with Microsoft Mail id <01BB320D.89F32E80@CARL-LAPTOP >; Wed, 24 Apr 1996 18:40:41 -0400 Message-ID: <01BB320D.89F32E80@CARL-LAPTOP > From: Carl Lagoze <lagoze@cs.cornell.edu> To: "'meta2'" <meta2@mrrl.lut.ac.uk> Subject: Metadata Council Date: Wed, 24 Apr 1996 18:40:35 -0400 Encoding: 15 TEXT Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk This afternoon I stumbled into something called the metadata council, that is an industry effort for defining metadata standards across enterprise data management tools. You can look at http://www.evtech.com/newmeta.html for more information. Does anybody have any contact with these folks? Are they doing anything relevant to our efforts? Carl Carl Lagoze Project Leader, Digital Library Research Group Department of Computer Science, Cornell University Ithaca, NY 14853 phone: 607-255-6046 FAX: 607-255-4428 From owner-meta2@net.lut.ac.uk Thu Apr 25 01:38:06 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uCF4Q-0004Tu-00; Thu, 25 Apr 1996 01:38:06 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id BAA14225 for meta2-outgoing; Thu, 25 Apr 1996 01:37:57 +0100 (BST) Received: from ns.onet.on.ca (ns.onet.on.ca [130.185.89.125]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with ESMTP id BAA14220 for <meta2@mrrl.lut.ac.uk>; Thu, 25 Apr 1996 01:37:54 +0100 (BST) Received: from sqarc.sq.com ([192.31.6.128]) by ns.onet.on.ca with SMTP id <253287>; Wed, 24 Apr 1996 20:37:54 -0400 Received: from sqrex.sq.com by sqarc.sq.com with smtp (Smail3.1.29.1 #4) id m0uCF3W-000Ok4C; Wed, 24 Apr 96 20:37 EDT Received: by sqrex.sq.com (4.1//ident-1.0) id AA26457; Wed, 24 Apr 96 20:37:09 EDT Date: Wed, 24 Apr 96 20:37:09 EDT From: lee@sq.com Message-Id: <9604250037.AA26457@sqrex.sq.com> To: meta2@mrrl.lut.ac.uk Subject: Re: Metadata Council Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk Carl Lagoze <lagoze@cs.cornell.edu> wrote: > [...] the metadata council, [...] > an industry effort for defining metadata standards across enterprise > data management tools. You can look at http://www.evtech.com/newmeta.html > for more information. Does anybody have any contact with these folks? Are > they doing anything relevant to our efforts? I don't know anything more than the web page... I note that it's a consortium you pay $2,500 per year to join (which is fairly cheap as these things go, for what that's worth), and that as far as I can tell from the web pages, they've done little more than make press releases about how each of the `Six leading industry vendors' will work together. Oh, actually they mention a white paper, which I couldn't see online. So it's really rather hard to tell. I suspect it's actually only very peripherally related to the Dublin Core stuff, though. The Wep page also says: Organizations interested in joining the Metadata Coalition should contact either of the Council co-chairs: Patricia Nghiem (408-973-9300) or Katherine Hammer (512-327-6994). For more information about the META Group, please contact Heather Whiteman (203-973-6700). Lee From owner-meta2@net.lut.ac.uk Thu Apr 25 01:55:01 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uCFKm-0004Ul-00; Thu, 25 Apr 1996 01:55:00 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id BAA14326 for meta2-outgoing; Thu, 25 Apr 1996 01:54:52 +0100 (BST) Received: from ns.onet.on.ca (ns.onet.on.ca [130.185.89.125]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with ESMTP id BAA14321 for <meta2@mrrl.lut.ac.uk>; Thu, 25 Apr 1996 01:54:49 +0100 (BST) Received: from sqarc.sq.com ([192.31.6.128]) by ns.onet.on.ca with SMTP id <253292>; Wed, 24 Apr 1996 20:54:52 -0400 Received: from sqrex.sq.com by sqarc.sq.com with smtp (Smail3.1.29.1 #4) id m0uCFK2-000Ok4C; Wed, 24 Apr 96 20:54 EDT Received: by sqrex.sq.com (4.1//ident-1.0) id AA26508; Wed, 24 Apr 96 20:54:13 EDT Date: Wed, 24 Apr 96 20:54:13 EDT From: lee@sq.com Message-Id: <9604250054.AA26508@sqrex.sq.com> To: meta2@mrrl.lut.ac.uk Subject: RE: Syntax for Dublin Core: paper available Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk Carl Lagoze <lagoze@cs.cornell.edu> wrote: > The concept of recursive containers of arbitrarily > complex, typed objects (possibly referenced indirectly) is a powerful > abstraction that might be implemented in a variety of ways. Agreed. This is, in fact, what SGML is for :-) > - Now I will violate my first point. While I don't pretend to be an expert > about SGML or MIME, my intuition is that both of these technologies are not > sufficiently powerful to fully express the abstraction. I think for some > relatively simple examples, SGML and MIME but be entirely appropriate. My > prejudice, however, is to model this using CORBA or ILU and rely on the > strong typing provided by the distributed object model. The trouble here is that you're likely to end up with something that the grass-roots barefoot-programmer software on the Web can't deal with. Unfortunately, if that happens, we've failed. E.g. Windows 95 doesn't come with CORBA, but it does come with HTML software. I actually think that packaging objects up in any way at all is a little risky, and that we should certainly allow a one-top-level-object-per-file granularity for those people for whom it makes sense. It's the simple cases that we have to solve. We're not reinventing a distributed version or MARC here :-) SGML does, however, have object types. It's not too hot on methods, but it's not clear that many explicit methods are needed -- people will interpret the data in the way that's most useful for their indexing software. Lee -- Liam Quin, SoftQuad Inc +1 416 239 4801 lee@sq.com From owner-meta2@net.lut.ac.uk Thu Apr 25 02:08:41 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uCFY0-0004VY-00; Thu, 25 Apr 1996 02:08:40 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id CAA14426 for meta2-outgoing; Thu, 25 Apr 1996 02:08:37 +0100 (BST) Received: from ns.onet.on.ca (ns.onet.on.ca [130.185.89.125]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with ESMTP id CAA14421 for <meta2@mrrl.lut.ac.uk>; Thu, 25 Apr 1996 02:08:33 +0100 (BST) Received: from sqarc.sq.com ([192.31.6.128]) by ns.onet.on.ca with SMTP id <253286>; Wed, 24 Apr 1996 21:08:04 -0400 Received: from sqrex.sq.com by sqarc.sq.com with smtp (Smail3.1.29.1 #4) id m0uCFWq-000Ok4C; Wed, 24 Apr 96 21:07 EDT Received: by sqrex.sq.com (4.1//ident-1.0) id AA26589; Wed, 24 Apr 96 21:07:28 EDT Date: Wed, 24 Apr 96 21:07:28 EDT From: lee@sq.com Message-Id: <9604250107.AA26589@sqrex.sq.com> To: meta2@mrrl.lut.ac.uk Subject: Re: Syntax for Dublin Core: paper available Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk > There are a number of ways people go about packing up the entities > referred to from an SGML document; unlike the Warwick Framework DTD, > they do address the problem of shipment over the net. I just want to point out that there's an IETF working group charged with coming up with a way to send SGML over a MIME stream. There are two proposals. The Mimiest (is that a word?) seems to involve modifying all MIME software, to add the multipart-related concept. The less general approach involves a modification to the SGML OPEN TR for CATALOG interchange. Hmm, that's too much jargon for one paragraph. If you want to know more, you can look at the IETF archives or feel free to mail me. Neither proposal is implemented yet I think, although James Clark is trying to do both, as is EBT. In the meantime, the problem of packing the Warwick Framework sort of SGML over a MIME stream is much simpler, thankfully, and I don't see it as a problem. Also, if the HTML <LINK> element is used to point to extrnal data, existing unmodified HTML and HTTP software can do the transfer. Lee From owner-meta2@net.lut.ac.uk Thu Apr 25 11:04:29 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uCNuX-0004rg-00; Thu, 25 Apr 1996 11:04:29 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id LAA17301 for meta2-outgoing; Thu, 25 Apr 1996 11:03:57 +0100 (BST) Received: from mrrl.lut.ac.uk (martin@localhost.mrrl.lut.ac.uk [127.0.0.1]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with ESMTP id LAA17294 for <meta2@mrrl.lut.ac.uk>; Thu, 25 Apr 1996 11:03:53 +0100 (BST) Message-Id: <199604251003.LAA17294@gizmo.lut.ac.uk> X-Mailer: exmh version 1.6.6 3/24/96 To: meta2@mrrl.lut.ac.uk Subject: Re: Syntax for Dublin Core: paper available X-URI: <URL:http://www.roads.lut.ac.uk/~martin> In-reply-to: Your message of "Wed, 24 Apr 1996 16:30:06 EDT." <01BB31FB.4DA79040@CARL-LAPTOP.CS.CORNELL.EDU> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 25 Apr 1996 11:03:52 +0100 From: Martin Hamilton <martin@mrrl.lut.ac.uk> Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk Carl Lagoze writes: | - Now I will violate my first point. While I don't pretend to be an expert | about SGML or MIME, my intuition is that both of these technologies are not | sufficiently powerful to fully express the abstraction. I think for some | relatively simple examples, SGML and MIME but be entirely appropriate. My | prejudice, however, is to model this using CORBA or ILU and rely on the | strong typing provided by the distributed object model. Power is the enemy of deployment :-) From owner-meta2@net.lut.ac.uk Thu Apr 25 12:43:22 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uCPSD-0004w5-00; Thu, 25 Apr 1996 12:43:21 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id MAA17762 for meta2-outgoing; Thu, 25 Apr 1996 12:42:34 +0100 (BST) Received: from fssun09.dev.oclc.org (fssun09.dev.oclc.org [132.174.19.10]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id MAA17757 for <meta2@mrrl.lut.ac.uk>; Thu, 25 Apr 1996 12:42:24 +0100 (BST) Received: from ws02-00.rsch.oclc.org by fssun09.dev.oclc.org (4.1/SMI-4.1) id AA10570; Thu, 25 Apr 96 07:41:41 EDT From: weibel@oclc.org (Stu Weibel) Received: (weibel@localhost) by ws02-00.rsch.oclc.org (8.6.10/8.6.9) for meta2@mrrl.lut.ac.uk id HAA00408; Thu, 25 Apr 1996 07:41:39 -0400 Date: Thu, 25 Apr 1996 07:41:39 -0400 Message-Id: <199604251141.HAA00408@ws02-00.rsch.oclc.org> To: meta2@mrrl.lut.ac.uk Subject: Re: Syntax for Dublin Core: paper available X-Sun-Charset: US-ASCII Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk I'm inclined to come down on the side of Martin, though I would tweak his admonishion slightly... COMPLEXITY is the enemy of deployment. I hope we won't get bogged down in syntax wars here, and be distracted from the central point of Michael's message:: Syntax for deployment of the Dublin Core on the web. As for the larger framework... Is it reasonable to propose that a relatively simple, multi-package set of metadata might be deployed in an SGML framework for inclusion in documents as a proof of concept, with alternate models being developed in parallel? That is, are these two approaches really in conflict? stu From owner-meta2@net.lut.ac.uk Thu Apr 25 13:16:15 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uCPy2-0004zC-00; Thu, 25 Apr 1996 13:16:14 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id NAA17989 for meta2-outgoing; Thu, 25 Apr 1996 13:16:00 +0100 (BST) Received: from weeble.lut.ac.uk (exim@weeble.lut.ac.uk [158.125.96.47]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id NAA17978 for <meta2@mrrl.lut.ac.uk>; Thu, 25 Apr 1996 13:15:56 +0100 (BST) Received: from simon.cs.cornell.edu [128.84.154.10] by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uCPxh-0004z0-00; Thu, 25 Apr 1996 13:15:54 +0100 Received: from cloyd.cs.cornell.edu (CLOYD.CS.CORNELL.EDU [128.84.227.15]) by simon.cs.cornell.edu (8.6.10/R1.4) with ESMTP id HAA21831 for <meta2@mrrl.lut.ac.uk>; Thu, 25 Apr 1996 07:45:57 -0400 Received: from CS-ANNEX-1-02.CS.CORNELL.EDU (CS-ANNEX-1-02.CS.CORNELL.EDU [128.84.254.7]) by cloyd.cs.cornell.edu (8.6.10/M1.8) with SMTP id HAA16011 for <meta2@mrrl.lut.ac.uk>; Thu, 25 Apr 1996 07:45:54 -0400 Received: by CS-ANNEX-1-02.CS.CORNELL.EDU with Microsoft Mail id <01BB327A.91BE7E40@CS-ANNEX-1-02.CS.CORNELL.EDU>; Thu, 25 Apr 1996 07:41:09 -0400 Message-ID: <01BB327A.91BE7E40@CS-ANNEX-1-02.CS.CORNELL.EDU> From: Carl Lagoze <lagoze@cs.cornell.edu> To: "meta2@mrrl.lut.ac.uk" <meta2@mrrl.lut.ac.uk> Subject: RE: Syntax for Dublin Core: paper available Date: Thu, 25 Apr 1996 07:41:07 -0400 Encoding: 77 TEXT Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk Liam, Again, I claim only a top-level understanding of SGML. If it can fully express the container abstraction then that's great. It would be great to see all sorts of implementations. As for your second point, I think we have a different focus, and I try to express this in my yet-to-be-finished report. Yes, I completely agree that we need a solution to the metadata problem within the context (tools, protocols, languages) of the current WWW. Quick deployment is a high priority, if we are going to make a dent in the problem. However, I think that it is VERY important that we look beyond the web (as it exists now) and propose a open metadata framework for a less constrained (crippled!?) infrastructure. What is so important about this workshop and its results is the mixture of .edu and .com types and the mixed focus on what works now and what we really might want for the future (both are a high priority and both groups play an important role in plotting that future course). Finally, I'm a little uncomfortable with your last point "people will interpret the data in the way that's most useful for their indexing software". My feeling is that the architecture should aim towards allowing the originator embed as much explicit meaning in data as possible, and not leave it "up to the reader" to figure out what its all about. Regards, Carl ---------- From: lee@sq.com[SMTP:lee@sq.com] Sent: Wednesday, April 24, 1996 8:54 PM To: meta2@mrrl.lut.ac.uk Subject: RE: Syntax for Dublin Core: paper available Carl Lagoze <lagoze@cs.cornell.edu> wrote: > The concept of recursive containers of arbitrarily > complex, typed objects (possibly referenced indirectly) is a powerful > abstraction that might be implemented in a variety of ways. Agreed. This is, in fact, what SGML is for :-) > - Now I will violate my first point. While I don't pretend to be an expert > about SGML or MIME, my intuition is that both of these technologies are not > sufficiently powerful to fully express the abstraction. I think for some > relatively simple examples, SGML and MIME but be entirely appropriate. My > prejudice, however, is to model this using CORBA or ILU and rely on the > strong typing provided by the distributed object model. The trouble here is that you're likely to end up with something that the grass-roots barefoot-programmer software on the Web can't deal with. Unfortunately, if that happens, we've failed. E.g. Windows 95 doesn't come with CORBA, but it does come with HTML software. I actually think that packaging objects up in any way at all is a little risky, and that we should certainly allow a one-top-level-object-per-file granularity for those people for whom it makes sense. It's the simple cases that we have to solve. We're not reinventing a distributed version or MARC here :-) SGML does, however, have object types. It's not too hot on methods, but it's not clear that many explicit methods are needed -- people will interpret the data in the way that's most useful for their indexing software. Lee -- Liam Quin, SoftQuad Inc +1 416 239 4801 lee@sq.com From owner-meta2@net.lut.ac.uk Thu Apr 25 13:50:51 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uCQVX-00051Y-00; Thu, 25 Apr 1996 13:50:51 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id NAA18200 for meta2-outgoing; Thu, 25 Apr 1996 13:50:38 +0100 (BST) Received: from weeble.lut.ac.uk (exim@weeble.lut.ac.uk [158.125.96.47]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id NAA18195 for <meta2@mrrl.lut.ac.uk>; Thu, 25 Apr 1996 13:50:36 +0100 (BST) Received: from simon.cs.cornell.edu [128.84.154.10] by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uCQSi-00051J-00; Thu, 25 Apr 1996 13:47:58 +0100 Received: from cloyd.cs.cornell.edu (CLOYD.CS.CORNELL.EDU [128.84.227.15]) by simon.cs.cornell.edu (8.6.10/R1.4) with ESMTP id IAA22408 for <meta2@mrrl.lut.ac.uk>; Thu, 25 Apr 1996 08:24:54 -0400 Received: from CS-ANNEX-1-02.CS.CORNELL.EDU (CS-ANNEX-1-02.CS.CORNELL.EDU [128.84.254.7]) by cloyd.cs.cornell.edu (8.6.10/M1.8) with SMTP id IAA16713 for <meta2@mrrl.lut.ac.uk>; Thu, 25 Apr 1996 08:24:49 -0400 Received: by CS-ANNEX-1-02.CS.CORNELL.EDU with Microsoft Mail id <01BB327F.FAA0D3E0@CS-ANNEX-1-02.CS.CORNELL.EDU>; Thu, 25 Apr 1996 08:19:52 -0400 Message-ID: <01BB327F.FAA0D3E0@CS-ANNEX-1-02.CS.CORNELL.EDU> From: Carl Lagoze <lagoze@cs.cornell.edu> To: "'meta2@mrrl.lut.ac.uk'" <meta2@mrrl.lut.ac.uk> Subject: RE: Syntax for Dublin Core: paper available Date: Thu, 25 Apr 1996 08:19:49 -0400 Encoding: 36 TEXT, 51 UUENCODE X-MS-Attachment: WINMAIL.DAT 0 00-00-1980 00:00 Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk You know, e-mail sucks as a form of human communication!! I think if we were all sitting over a few pints rather than in front of these stupid screens we'd realize that we all agree!!! Yes, yes, yes, let's keep a focus on immediate deployment issues. Yes, yes, yes, let's pay attention to the longer-term issues. Yes, yes, yes, let's be open to multiple implementations and multiple levels of power and complexity. Cheers, Carl ---------- From: Stu Weibel[SMTP:weibel@oclc.org] Sent: Thursday, April 25, 1996 7:41 AM To: meta2@mrrl.lut.ac.uk Subject: Re: Syntax for Dublin Core: paper available I'm inclined to come down on the side of Martin, though I would tweak his admonishion slightly... COMPLEXITY is the enemy of deployment. I hope we won't get bogged down in syntax wars here, and be distracted from the central point of Michael's message:: Syntax for deployment of the Dublin Core on the web. As for the larger framework... Is it reasonable to propose that a relatively simple, multi-package set of metadata might be deployed in an SGML framework for inclusion in documents as a proof of concept, with alternate models being developed in parallel? That is, are these two approaches really in conflict? stu begin 600 WINMAIL.DAT M>)\^(C@,`0:0" `$```````!``$``0>0!@`(````Y 0```````#H``$-@ 0` M`@````(``@`!!) &`" !```!````# ````,``# #````"P`/#@`````"`?\/ M`0```$<`````````@2L?I+ZC$!F=;@#=`0]4`@````!M971A,D!M<G)L+FQU M="YA8RYU:P!33510`&UE=&$R0&UR<FPN;'5T+F%C+G5K```>``(P`0````4` M``!33510`````!X``S !````%0```&UE=&$R0&UR<FPN;'5T+F%C+G5K```` M``,`%0P!`````P#^#P8````>``$P`0```!<````G;65T83) ;7)R;"YL=70N M86,N=6LG```"`0LP`0```!H```!33510.DU%5$$R0$U24DPN3%54+D%#+E5+ M`````P``.0`````+`$ Z`0````(!]@\!````! ````````/&,@$(@ <`& `` M`$E032Y-:6-R;W-O9G0@36%I;"Y.;W1E`#$(`02 `0`L````4D4Z(%-Y;G1A M>"!F;W(@1'5B;&EN($-O<F4Z('!A<&5R(&%V86EL86)L90`Y#P$%@ ,`#@`` M`,P'! `9``@`$P`Q``0`0 $!(( #``X```#,!P0`&0`(``X`%@`$`" !`0F M`0`A````-T$W-D1$0C<V03E%0T8Q,4%%-#DP,#(P049#,S!&038`90<!`Y & M`#P&```2````"P`C```````#`"8```````L`*0```````P`V``````! `#D` M8!'$?Z$RNP$>`' ``0```"P```!213H@4WEN=&%X(&9O<B!$=6)L:6X@0V]R M93H@<&%P97(@879A:6QA8FQE``(!<0`!````%@````&[,J%_O+?==GN>:A'/ MKDD`(*_##Z8``!X`'@P!````!0```%--5% `````'@`?# $````6````;&%G M;WIE0&-S+F-O<FYE;&PN961U`````P`&$ R'6Y0#``<0V@,``!X`"! !```` M90```%E/54M.3U<L12U-04E,4U5#2U-!4T%&3U)-3T9(54U!3D-/34U53DE# M051)3TY)5$A)3DM)1E=%5T5214%,3%-)5%1)3D=/5D52049%5U!)3E134D%4 M2$525$A!3DE.1E)/3E0``````@$)$ $```"K! ``IP0``+X'``!,6D9U<$4Z M/?\`"@$/`A4"J 7K`H,`4 +R"0(`8V@*P'-E=#(W!@`&PP*#,@/%`@!P<D)Q M$>)S=&5M`H,S=P+D!Q,"@'T*@ C/"=D[\18/,C4U`H *@0VQ"V#@;F<Q,#,4 M4 L*%%$E"_)C`$ @60A@(&N ;F]W+"!E+0# H0,1<W5C:P0@81Q!`B "$')M M(&]F((QH=0."!:!M;74#`)!C871I`B A(0J%(0J%22!T: N :R!Q!I @=V4@ M`180'%!LK1OA:0) "X!G'/!V!)"K'(('T7 +@'0$('(>$/YH(7$?@ .1"X < MH -@`C!/'/(B@1&P&_!T=2(`9-\;\ 4`">$$(" 0)R20%A#Y!T!I>B @(M$% M0" 1()+F80G"'F A(!K@!Y ;<*IY)W=L$< G!"!K">#R<!R#8W4$( (@'] = ML+<)@ <P$] @#; +4&\&P,\)\ 5 ! $*4',N)T\H5'T*L'D<4 ) *L$>,1]P M;]<CTBA `B!G!) M$] <T<LK#R?K8B @;W )\"VR_1W ;!X@"U @( =P,=$J MLGL>$QQ!;B20,8<H4"%@;/LI81T0<!M0(7(S(1V1,=':>"#@>2]0'HQ#(I $ MD'<OH#7M"L!L'HP*]"6P,00X, +1:2TQ-#3/#? ,T#HS"UDQ-@J@`V#U$]!C M!4 M/%<*ASL+###U.]9&`V$Z/5X[U@R"!@`A)% @5V5I,-!L6V!33510.B 0 M07) D2DP;&,N!;!G73S_WSX-!F ",#\_0$M4'3 1H/9D+/ ;<$$34 ,1&$ ; M< `Q.3DV(#<Z-)D:T$%-0Q\^#51O15^'0$L'@ &0,D!M<C@P$BX*0'0N`- N M=6OC23]$+G5B:CP12U] 2UQ295#@!K R@7@<HB#^1%" ); #H A0%A!0X JP M^S$0(8%V&\$!H"A0.%\Y8[PS-CK7&D4[UA[V)R[1_FY"D N "8 MLAV1*C$; M4+\#H"V2+@$`D VP'/)-"L!/(0$;<!^ "&!G:!]1=^T(8&Q9D2 084YF'Y < M0;9D!& #`',?D"F!<R6P.5PP=&PUL%\@-K!/38!03$582519+N&_+>,)\!/@ M+0`=`2I8+AZ.[UP`,1 @`P(@)P5 +F %0+T&X&<N8"206C,C(7-3%/YW$9$= M("!1&W S$C#1*?!_$\ B8#P@"8 *A0-2+>-C_RK!(F #(#2 (A%;,QWP$8#W M-! H@0>!<R; 4M!2ZRI9OQT`"H4M\E.Y6G8@$&)AO7Y!!"!3<BWS"L N82-! M8=,'@%R <FM?(DD$("#@_25R<P(@53(MLCO1-( D$?TF`V$*A180"V >(#0! M+0#O`) R(AMP,8,M"K <(&GQ_QOP8[$=`4TR1X !D#%P7L(_9B,J8UF!"X * MA0.14T?\34QOZ%-C62(I4"V"(R'_6C I0"JR'$4[T1T!'0$%H/M9,"C0=#<F M`_ ?@""!+J'_<7 J(01B-"$PT"$2#; T`9\Q`7=B+-%H02A0;#\O8/]',"8A M! !ET2!A(^-KUER _1Q0<#O1`- C\25S<]$C(>=[P1C@'?!T/QZ,)$$>C"]5 M[U;_6 P5,0"(T `#`! 0``````,`$1 `````0 `',""CE;R@,KL!0 `(,""C <E;R@,KL!'@`]``$````%````4D4Z( ````"3V@`' ` end From owner-meta2@net.lut.ac.uk Thu Apr 25 14:23:04 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uCR0h-00053E-00; Thu, 25 Apr 1996 14:23:04 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id OAA18351 for meta2-outgoing; Thu, 25 Apr 1996 14:22:08 +0100 (BST) Received: from simon.cs.cornell.edu (SIMON.CS.CORNELL.EDU [128.84.154.10]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id OAA18346 for <meta2@mrrl.lut.ac.uk>; Thu, 25 Apr 1996 14:21:54 +0100 (BST) Received: from cloyd.cs.cornell.edu (CLOYD.CS.CORNELL.EDU [128.84.227.15]) by simon.cs.cornell.edu (8.6.10/R1.4) with ESMTP id JAA24055 for <meta2@mrrl.lut.ac.uk>; Thu, 25 Apr 1996 09:21:39 -0400 Received: from CS-ANNEX-1-02.CS.CORNELL.EDU (CS-ANNEX-1-02.CS.CORNELL.EDU [128.84.254.7]) by cloyd.cs.cornell.edu (8.6.10/M1.8) with SMTP id JAA18502 for <meta2@mrrl.lut.ac.uk>; Thu, 25 Apr 1996 09:21:36 -0400 Received: by CS-ANNEX-1-02.CS.CORNELL.EDU with Microsoft Mail id <01BB3287.F13B7500@CS-ANNEX-1-02.CS.CORNELL.EDU>; Thu, 25 Apr 1996 09:16:53 -0400 Message-ID: <01BB3287.F13B7500@CS-ANNEX-1-02.CS.CORNELL.EDU> From: Carl Lagoze <lagoze@cs.cornell.edu> To: "'meta2'" <meta2@mrrl.lut.ac.uk> Subject: SGML and MIME implementations of containers Date: Thu, 25 Apr 1996 09:16:50 -0400 Encoding: 12 TEXT Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk I expect to have a workable draft of the container document by the end of this week or the beginning of next. The final section is an Implementation Issues section that should discuss a possible variety of implementations. I'm prepared to do a write up of a CORBA-like implementation. I need help from the SGML crowd and MIME crowd for those parts. Can I get a couple of volunteers to work with me on this? I'm not looking for anything extremely detailed, just something that will give a flavor of the implementation and its possible limitations. I will pass on the body of the document to these volunteers as soon as possible. Carl From owner-meta2@net.lut.ac.uk Fri Apr 26 07:50:28 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uChMJ-00066c-00; Fri, 26 Apr 1996 07:50:27 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id HAA24449 for meta2-outgoing; Fri, 26 Apr 1996 07:50:02 +0100 (BST) Received: from trapdoor.dstc.edu.au (root@trapdoor.dstc.edu.au [130.102.176.12]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with ESMTP id HAA24444 for <meta2@mrrl.lut.ac.uk>; Fri, 26 Apr 1996 07:49:57 +0100 (BST) Received: from fatcat.dstc.edu.au (fatcat.dstc.edu.au [130.102.176.7]) by trapdoor.dstc.edu.au (8.6.9/8.6.12) with ESMTP id QAA09783 for <meta2@mrrl.lut.ac.uk>; Fri, 26 Apr 1996 16:49:29 +1000 Received: (from renato@localhost) by fatcat.dstc.edu.au (8.6.10/8.6.12) id QAA31096 for meta2@mrrl.lut.ac.uk; Fri, 26 Apr 1996 16:48:49 +1000 From: Renato Iannella <renato@dstc.edu.au> Message-Id: <199604260648.QAA31096@fatcat.dstc.edu.au> Date: Fri, 26 Apr 1996 16:48:49 +1000 (EST) To: meta2@mrrl.lut.ac.uk Subject: DSTC Slides X-Mailer: Ishmail 1.2-960125-osf1 MIME-Version: 1.0 Content-Type: text/plain Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk Dear all, a html version of the slides I used at Warwick for my presentation can be found at: http://www.dstc.edu.au/RDU/pres/warwick/ Cheers... Renato _______________________________________________________________________ Dr Renato Iannella http://www.dstc.edu.au/RDU/staff/ri Research Data Network CRC urn:inet:dstc.edu.au:renato:home DSTC Pty Ltd, Gehrmann Laboratories phone/fax: +61 7 3365 4310/11 University of Queensland, 4072, AUSTRALIA email: renato@dstc.edu.au From owner-meta2@net.lut.ac.uk Mon Apr 29 20:18:33 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uDySv-0002RV-00; Mon, 29 Apr 1996 20:18:33 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id UAA11072 for meta2-outgoing; Mon, 29 Apr 1996 20:17:19 +0100 (BST) Received: from simon.cs.cornell.edu (SIMON.CS.CORNELL.EDU [128.84.154.10]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id UAA11067 for <meta2@mrrl.lut.ac.uk>; Mon, 29 Apr 1996 20:17:01 +0100 (BST) Received: from cloyd.cs.cornell.edu (CLOYD.CS.CORNELL.EDU [128.84.227.15]) by simon.cs.cornell.edu (8.6.10/R1.4) with ESMTP id MAA14096 for <meta2@mrrl.lut.ac.uk>; Mon, 29 Apr 1996 12:21:30 -0400 Received: from CARL-LAPTOP (CARL-LAPTOP.CS.CORNELL.EDU [128.84.211.11]) by cloyd.cs.cornell.edu (8.6.10/M1.8) with SMTP id MAA04211 for <meta2@mrrl.lut.ac.uk>; Mon, 29 Apr 1996 12:21:28 -0400 Received: by CARL-LAPTOP with Microsoft Mail id <01BB35C6.7C5EB980@CARL-LAPTOP >; Mon, 29 Apr 1996 12:22:08 -0400 Message-ID: <01BB35C6.7C5EB980@CARL-LAPTOP > From: Carl Lagoze <lagoze@cs.cornell.edu> To: "'meta2'" <meta2@mrrl.lut.ac.uk> Subject: Draft of Warwick framework description Date: Mon, 29 Apr 1996 12:22:04 -0400 Encoding: 20 TEXT Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk I have placed a draft version of the Warwick Framework description at http://cs-tr.cs.cornell.edu/~lagoze/warwick.html. It is missing the SGML and MIME implementation sections. Lou Burnard has volunteered to do the SGML section and Jon Knight and Martin Hamilton have volunteered to do the MIME section. As soon as those folks send me those sections, I wil plug them in and send notification to the list about the revised version. Regards, Carl Carl Lagoze Project Leader, Digital Library Research Group Department of Computer Science, Cornell University Ithaca, NY 14853 phone: 607-255-6046 FAX: 607-255-4428 From owner-meta2@net.lut.ac.uk Tue Apr 30 22:31:43 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uEN1K-0003fv-00; Tue, 30 Apr 1996 22:31:42 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id WAA16444 for meta2-outgoing; Tue, 30 Apr 1996 22:29:51 +0100 (BST) Received: from weeble.lut.ac.uk (exim@weeble.lut.ac.uk [158.125.96.47]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id WAA16439 for <meta2@mrrl.lut.ac.uk>; Tue, 30 Apr 1996 22:29:49 +0100 (BST) Received: by weeble.lut.ac.uk with local (Exim 0.42 #1) id E0uEMzU-0003fn-00; Tue, 30 Apr 1996 22:29:48 +0100 Date: Tue, 30 Apr 1996 22:29:48 +0100 (BST) From: Jon Knight <J.P.Knight@lut.ac.uk> To: meta2@mrrl.lut.ac.uk Subject: Re: Draft of Warwick framework description In-Reply-To: <01BB35C6.7C5EB980@CARL-LAPTOP > Message-ID: <Pine.SUN.3.91.960430222707.11298e-100000@weeble.lut.ac.uk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk I've knocked up a very, very first draft of the MIME section which you can take a look at by pointing your favourite WWW browser at <URL:http://weeble.lut.ac.uk/MIME-WF.html>. Its certainly not ready for prime time (I've only been working on it this evening!) but it'll give you a taster for what MIME could do. Comments, criticisms, queries and disgustingly large bars of chocolate welcome. Tatty bye, Jim'll -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Jon "Jim'll" Knight, Researcher, Sysop and General Dogsbody, Dept. Computer Studies, Loughborough University of Technology, Leics., ENGLAND. LE11 3TU. * I've found I now dream in Perl. More worryingly, I enjoy those dreams. * From owner-meta2@net.lut.ac.uk Wed May 01 17:05:16 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uEeOx-0004UE-00; Wed, 1 May 1996 17:05:15 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id RAA21219 for meta2-outgoing; Wed, 1 May 1996 17:04:08 +0100 (BST) Received: from CNRI.Reston.VA.US (CNRI.Reston.VA.US [132.151.1.1]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id RAA21214 for <meta2@mrrl.lut.ac.uk>; Wed, 1 May 1996 17:03:48 +0100 (BST) Received: from newcnri.cnri.reston.va.us by CNRI.Reston.VA.US id ab07535; 1 May 96 12:01 EDT Received: from [132.151.1.217] (warmsmc) by newcnri.CNRI.Reston.Va.US (5.x/SMI-SVR4) id AA01192; Wed, 1 May 1996 12:01:22 -0400 X-Sender: warms@newcnri.cnri.reston.va.us Message-Id: <v02130501adad3a967e94@[132.151.1.217]> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Wed, 1 May 1996 12:01:38 -0400 To: Carl Lagoze <lagoze@cs.cornell.edu> From: "William Y. Arms" <warms@cnri.reston.va.us> Subject: Your metadata draft Cc: meta2@mrrl.lut.ac.uk Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk Carl, I had a quick look at your paper today. More detailed comments will follow , but my first reaction was that it lacks an overview to motivate the concepts that follow. Here is my attempt at an introduction. Please feel free to extract anything that is useful. Bill ======= Overview In March 1995, OCLC hosted a meeting to discuss metadata for items of digital information. The major result of that meeting was a list of thirteen metadata elements that can describe a wide variety of items. This list has become known as "the Dublin Core". A year later a follow up meeting was held at the University of Warwick to review progress and plan future steps. Three key concepts came out of this meeting. Collectively, they have been nicknamed "the Warwick Framework." Metadata Packages Although many groups are building information services with metadata drawn from the Dublin Core, every group is adding extra metadata elements. The additions may be subject specific (e.g., for geo-spatial data), technical (e.g., formats or protocols), structural (e.g., links to show relationships between complex objects), or business related (e.g., terms and conditions for usage). To handle this need, the Warwick meeting proposes a set of metadata packages. For example, the Dublin core is one package; another might be the terms and conditions package. An information service can select one or more packages to provide metadata for a set of objects. This approach has several advantages over selecting individual metadata elements from a very long list of elements. Packages can be very different. For example, a package that expresses relationships among objects might use abstract data structures. A reasonably small list of well defined packages is hoped to enhance interoperation and lead towards standardization of practices. In addition, as described below, packages allow flexibility in the development of a security architecture. Security When a digital object is accessed over a network, there are many occasions when a supplier wishes to make only part of the metadata accessible to specific users. For example, an organization may need to have access to technical metadata in order to store and transmit information, but, to avoid potential liability, may explicitly desire not to have access to metadata that describes content. A commercial organization may wish to provide some metadata openly, but require authorization before giving access to other metadata. These objectives can be achieved by providing each metadata package with its own security. Access controls on each package can be different. Representation of Metadata There will undoubtedly be many different representations of metadata within repositories. For example, the metadata for a digital item can be embedded within the item or external but associated. Much of the work on repositories uses the concept of a "digital object", in which the metadata and the data are both stored within a repository without the details of the storage mechanism being known externally. Formats for exchanging metadata between systems need to be clearly defined, flexible, yet easy to use. Preliminary work carried out during the Warwick meeting convinced many of the people attending that SGML provides a suitable format to represent metadata packages. The meeting considered that Web pages in html format are such an important special case that they deserve special attention. The meeting proposes a syntax based on the html "meta" tag. From owner-meta2@net.lut.ac.uk Wed May 01 18:05:17 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uEfL2-0004Xm-00; Wed, 1 May 1996 18:05:16 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id SAA21570 for meta2-outgoing; Wed, 1 May 1996 18:04:52 +0100 (BST) Received: from yscydion.ansa.co.uk (yscydion.ansa.co.uk [192.5.254.44]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id SAA21564 for <meta2@mrrl.lut.ac.uk>; Wed, 1 May 1996 18:04:48 +0100 (BST) From: msm@ansa.co.uk Received: by yscydion.ansa.co.uk; Wed, 1 May 96 18:04:17 +0100 Received: from localhost by euclid.ansa.co.uk; Wed, 1 May 96 18:04:16 +0100 Message-Id: <9605011704.AA10814@euclid.ansa.co.uk> X-Mailer: exmh version 1.6.5 12/11/95 To: "William Y. Arms" <warms@cnri.reston.va.us> Cc: Carl Lagoze <lagoze@cs.cornell.edu>, meta2@mrrl.lut.ac.uk Subject: Re: Your metadata draft In-Reply-To: Your message of "Wed, 01 May 1996 12:01:38 EDT." <v02130501adad3a967e94@[132.151.1.217]> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 01 May 96 18:04:16 BST Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk Bill, Excuse me for butting in here, but I want to add to what you wrote as potential overview under the heading of security: > Security > > When a digital object is accessed over a network, there are many occasions > when a supplier wishes to make only part of the metadata accessible to > specific users. For example, an organization may need to have access to > technical metadata in order to store and transmit information, but, to > avoid potential liability, may explicitly desire not to have access to > metadata that describes content. A commercial organization may wish to > provide some metadata openly, but require authorization before giving > access to other metadata. > > These objectives can be achieved by providing each metadata package with > its own security. Access controls on each package can be different. I don't see that this last paragraph can stand on its own. Whether metadata packages are delivered is an issue for the server, so the only way that selective access can be maintained at the package level is by multikey encryption of packages. So one has a choice between: 1. Enforcing access controls in the _service_ that delivers the metadata objects/packages, in which case security is an infrastructure issue and _not_ part of the metadata framework, or 2. Building sophisticated key management mechanisms into the metadata framework. Astoundingly sophisticated key management mechanisms. I don't believe the second is a viable approach unless metadata is built in an environment with self-aware objects. Also, proving issues such as liability will require (a) certification authorit(y|ies) as a minimal backbone for a nonrepudiation mechanism. Without those in place, there's no need for a defence against liability claims. This is not to say that security is not important for metadata - it is _very_ important - but rather that it's important to impose security at the right layer of the infrastructure, architecturally speaking. Sorry to be critical, but we need to start by solving simple issues before tackling the hard ones. Mark -- Mark Madsen <msm@ansa.co.uk> Telephone: +44-1223-568934 APM Ltd Poseidon House Castle Park Cambridge CB3 0RD UK <URL:http://www.ansa.co.uk/><URL:mailto:apm@ansa.co.uk> Reception: +44-1223-515010; Facsimile: +44-1223-359779 From owner-meta2@net.lut.ac.uk Wed May 01 18:21:40 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uEfat-0004Ym-00; Wed, 1 May 1996 18:21:39 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id SAA21662 for meta2-outgoing; Wed, 1 May 1996 18:21:26 +0100 (BST) Received: from simon.cs.cornell.edu (SIMON.CS.CORNELL.EDU [128.84.154.10]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id SAA21652 for <meta2@mrrl.lut.ac.uk>; Wed, 1 May 1996 18:21:02 +0100 (BST) Received: from cloyd.cs.cornell.edu (CLOYD.CS.CORNELL.EDU [128.84.227.15]) by simon.cs.cornell.edu (8.6.10/R1.4) with ESMTP id MAA07327; Wed, 1 May 1996 12:55:15 -0400 Received: from CARL-LAPTOP (CARL-LAPTOP.CS.CORNELL.EDU [128.84.211.11]) by cloyd.cs.cornell.edu (8.6.10/M1.8) with SMTP id MAA10650; Wed, 1 May 1996 12:55:12 -0400 Received: by CARL-LAPTOP with Microsoft Mail id <01BB375D.7FC4A2E0@CARL-LAPTOP >; Wed, 1 May 1996 12:55:39 -0400 Message-ID: <01BB375D.7FC4A2E0@CARL-LAPTOP > From: Carl Lagoze <lagoze@cs.cornell.edu> To: Carl Lagoze <lagoze@cs.cornell.edu>, "'William Y. Arms'" <warms@cnri.reston.va.us> Cc: "meta2@mrrl.lut.ac.uk" <meta2@mrrl.lut.ac.uk> Subject: RE: Your metadata draft Date: Wed, 1 May 1996 12:55:34 -0400 Encoding: 127 TEXT Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk Bill, Thanks for the initial comments. I agree with your critique, however the omission of a leading context was intentional. I intended my Warwick Framework section to be an embedded section of a larger document. Sorry that I didn't make this clear but my vision of a larger document was: - An overview of the metadata effort as carried out by the Dublin workshop and then this follow-on workshop.This is the context that you have made a stab at, I assume. - A review of the notion of a core metadata set as represented by the Dublin core. - A description of how to incoporate core metadata in rthe existing WWW framework (I have a back reference to this in my section). I believe that Lou Burnard, et.al. have already made an effort at this at http://info.ox.ac.uk/~lou/wip/metadata.syntax.html. - The Warwick Framework section as I wrote it. Sorry if I misunderstood my "assignment". To be honest, I never find it easy to write a piece of something for which I don't have a sense of the "whole" yet. I'd be glad to make a stab at the earlier sections but I think that Stu and/or others who started this whole effort might be a better "context setter". As for you text, I really like parts of it but I think it starts to get too "architectural" (read, Kahn/Wilensky-ish) towards the end. I think context should be devoid of architectural prejudice and stick to higher level motivations. Which all brings up the issue of who is driving this bus (Stu, are you there?) I'd be glad to play the lead authorship role for the entire workshop report, but I feel uncomfortable grabbing that role through "chutzpah" rather than agreement. I think at this point we need to define the "authorship committee". We have a number of parts and they feel at least workable. Can we put together a crew of no more than four people (I will be the first volunteer) and decide on some clear goals on when/what we want to get out. Carl ---------- From: William Y. Arms[SMTP:warms@CNRI.Reston.VA.US] Sent: Wednesday, May 01, 1996 12:01 PM To: Carl Lagoze Cc: meta2@mrrl.lut.ac.uk Subject: Your metadata draft Carl, I had a quick look at your paper today. More detailed comments will follow , but my first reaction was that it lacks an overview to motivate the concepts that follow. Here is my attempt at an introduction. Please feel free to extract anything that is useful. Bill ======= Overview In March 1995, OCLC hosted a meeting to discuss metadata for items of digital information. The major result of that meeting was a list of thirteen metadata elements that can describe a wide variety of items. This list has become known as "the Dublin Core". A year later a follow up meeting was held at the University of Warwick to review progress and plan future steps. Three key concepts came out of this meeting. Collectively, they have been nicknamed "the Warwick Framework." Metadata Packages Although many groups are building information services with metadata drawn from the Dublin Core, every group is adding extra metadata elements. The additions may be subject specific (e.g., for geo-spatial data), technical (e.g., formats or protocols), structural (e.g., links to show relationships between complex objects), or business related (e.g., terms and conditions for usage). To handle this need, the Warwick meeting proposes a set of metadata packages. For example, the Dublin core is one package; another might be the terms and conditions package. An information service can select one or more packages to provide metadata for a set of objects. This approach has several advantages over selecting individual metadata elements from a very long list of elements. Packages can be very different. For example, a package that expresses relationships among objects might use abstract data structures. A reasonably small list of well defined packages is hoped to enhance interoperation and lead towards standardization of practices. In addition, as described below, packages allow flexibility in the development of a security architecture. Security When a digital object is accessed over a network, there are many occasions when a supplier wishes to make only part of the metadata accessible to specific users. For example, an organization may need to have access to technical metadata in order to store and transmit information, but, to avoid potential liability, may explicitly desire not to have access to metadata that describes content. A commercial organization may wish to provide some metadata openly, but require authorization before giving access to other metadata. These objectives can be achieved by providing each metadata package with its own security. Access controls on each package can be different. Representation of Metadata There will undoubtedly be many different representations of metadata within repositories. For example, the metadata for a digital item can be embedded within the item or external but associated. Much of the work on repositories uses the concept of a "digital object", in which the metadata and the data are both stored within a repository without the details of the storage mechanism being known externally. Formats for exchanging metadata between systems need to be clearly defined, flexible, yet easy to use. Preliminary work carried out during the Warwick meeting convinced many of the people attending that SGML provides a suitable format to represent metadata packages. The meeting considered that Web pages in html format are such an important special case that they deserve special attention. The meeting proposes a syntax based on the html "meta" tag. From owner-meta2@net.lut.ac.uk Wed May 01 18:58:08 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uEgAB-0004ah-00; Wed, 1 May 1996 18:58:07 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id SAA21936 for meta2-outgoing; Wed, 1 May 1996 18:57:47 +0100 (BST) Received: from fssun09.dev.oclc.org (fssun09.dev.oclc.org [132.174.19.10]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id SAA21931 for <meta2@mrrl.lut.ac.uk>; Wed, 1 May 1996 18:57:37 +0100 (BST) Received: from ws02-00.rsch.oclc.org by fssun09.dev.oclc.org (4.1/SMI-4.1) id AA29712; Wed, 1 May 96 13:48:36 EDT From: weibel@oclc.org (Stu Weibel) Received: (weibel@localhost) by ws02-00.rsch.oclc.org (8.6.10/8.6.9) for meta2@mrrl.lut.ac.uk id NAA00865; Wed, 1 May 1996 13:48:34 -0400 Date: Wed, 1 May 1996 13:48:34 -0400 Message-Id: <199605011748.NAA00865@ws02-00.rsch.oclc.org> To: meta2@mrrl.lut.ac.uk Subject: Workshop report and subparts X-Sun-Charset: US-ASCII Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk Yep, Stu is here... working on my WWW5 presentation. Lorcan and I plan on co-authoring the overview document for the workshop, but I would like to see the three sub-parts be able to stand alone as discrete objects of scholarship (and authorship). So, I hope we will publish them together as related papers, but the sort of introduction that Bill suggested for Carl's paper is, I think, just fine. back to my slides... stu From owner-meta2@net.lut.ac.uk Wed May 01 21:11:31 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uEiFG-0004he-00; Wed, 1 May 1996 21:11:30 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id VAA22951 for meta2-outgoing; Wed, 1 May 1996 21:10:48 +0100 (BST) Received: from mrrl.lut.ac.uk (martin@localhost.mrrl.lut.ac.uk [127.0.0.1]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with ESMTP id VAA22942 for <meta2@mrrl.lut.ac.uk>; Wed, 1 May 1996 21:10:44 +0100 (BST) Message-Id: <199605012010.VAA22942@gizmo.lut.ac.uk> To: meta2@mrrl.lut.ac.uk X-URI: <URL:http://www.roads.lut.ac.uk/~martin> Subject: stirring things up a bit Mime-Version: 1.0 Content-Type: multipart/mixed ; boundary="===_0_Wed_May__1_21:09:32_BST_1996" Date: Wed, 01 May 1996 21:10:43 +0100 From: Martin Hamilton <martin@mrrl.lut.ac.uk> Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk This is a multipart MIME message. --===_0_Wed_May__1_21:09:32_BST_1996 Content-Type: text/plain; charset=us-ascii I thought I'd bung this out as an Internet Draft and see if there was any interest among HTTP implementors. Comments welcome... ! Martin --===_0_Wed_May__1_21:09:32_BST_1996 Content-Type: text/plain; charset=us-ascii Content-Description: harvesting.txt INTERNET-DRAFT Martin Hamilton draft-???-00.txt Loughborough University Expires in six months April 1996 Experimental HTTP methods to support indexing and searching Filename: draft-XXXX.txt Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Abstract This document proposes some experimental mechanisms which may be deployed within HTTP [1] to provide a local search capability on the information being made available by an HTTP server, and reduce both the bandwidth consumed by indexing agents, and the amount of work done by HTTP servers during the indexing process. 1. Introduction As the number of HTTP servers deployed has increased, providing searchable indexes of the information which they make available has itself become a growth industry. As a result there are now a large number of "web crawlers", "web wanderers" and suchlike. These indexing agents typically act independently of each other, and do not share the information which they retrieve from the servers being indexed. This can be a major cause for frustration on the part of the server maintainer, who sees multiple requests for the same information coming from different indexers. It also results in a [Page 1] INTERNET-DRAFT April 1996 large amount of redundant network traffic - with these repeated requests for the same objects, and the objects themselves, often travelling over the same physical infrastructure. It can be conjectured that the volume of indexing related traffic will in some cases be responsible for degraded network performance, but the author does not have any statistics with which to back up this supposition... The HTTP protocol has supported the "conditional GET" feature for some time. This allows clients to request that an object only be returned if it has been modified since a particular date and time, hence the use of the HTTP header name "If-Modified-Since" to refer to it. It is hoped that all indexing agents deployed on the Internet at large will make use of conditional GET when gathering the information they index. Whether or not conditional GET is supported, the normal approach to indexing an HTTP server is to transfer the full content of each object being indexed back to the indexer. Typically the only objects which the index server is interested in will be those from which plain text can readily be extracted - perhaps only HTML [2] documents, or those documents which are served up with a top level Internet Media Type of "text". The web crawler's data gathering process normally uses hyperlinks in HTML documents to discover the existence of new objects, and new servers, so that a single link to your server from another server which is already being indexed may be enough to make the index server aware of its existence. To get around some of the problems associated with this brute force approach to indexing, the robots exclusion convention [3] has been widely adopted. This takes the form of an object, referred to by the HTTP path name "/robots.txt", which server maintainers can use to indicate their preferences as to which objects it is acceptable for agents to retrieve. The robots.txt convention provides a more finely grained alternative to simply allowing or denying HTTP access from the indexing hosts. It is hoped that all indexing agents deployed on the Internet at large will support this feature. 2. Additional HTTP methods It would also be useful if the HTTP servers being indexed were capable of generating indexing information themselves, and making this information available in a bandwidth friendly manner - e.g. with compression, and sending only the indexing information for those objects which have changed since the indexing agent's last visit. Furthermore, HTTP servers should support a native search method, in order that (where a suitable search back end is available), HTTP [Page 2] INTERNET-DRAFT April 1996 clients may carry out a search of the information provided by an HTTP server in a standardised manner. In the following examples, "C:" is used to indicate the client side of the conversation, and "S:" the server side. 2.1 The COLLECT method The COLLECT method is drawn from the Collector/Gatherer protocol used by the Harvest software [4]. It represents a request for the indexing information about either all of the information being made available by the the HTTP server, or the indexing information pertaining to a particular collection of information being made available by the HTTP server. In COLLECT requests, the Request-URI (to use the jargon of [1]) should be an asterisk "*" if the request is for all of the indexing information the HTTP server can provide, or a symbolic name which refers to a particular collection. Implementors should note that this collection selection is in addition to the virtual host selection provided by the "Host:" HTTP header. The normal HTTP content negotiation features may be used in any request/response pair. In particular, the "If-Modified-Since:" request header should be used to indicate that the indexing agent is only interested in object which have been created or modified since the date specified, and the request/response pair of "Accept- Encoding:" and "Content-Encoding:" should be used to indicate whether compression is desired - and if so, the preferred compression algorithm. e.g. C: COLLECT * HTTP/1.1 C: Accept: application/soif C: Accept-Encoding: gzip, compress C: If-Modified-Since: Mon, 1 Apr 1996 07:34:31 GMT C: Host: www.lut.ac.uk C: S: 200 OK indexing data follows S: Content-type: application/soif S: S: [...etc...] [Page 3] INTERNET-DRAFT April 1996 2.2 The SEARCH method The SEARCH method embeds a query in the Request-URI component of the request, using the search syntax defined for the WHOIS++ protocol [5]. Any characters in the Request-URI which fall outside the legal character set for Request-URI, such as spaces, should be hex escaped. This is in order that SEARCH requests may readily be written as URLs in HTML documents. e.g. C: SEARCH keywords=venona HTTP/1.1 C: Accept: application/whois, text/html C: Host: www.lut.ac.uk C: S: 200 OK search results follow S: Content-type: application/whois S: S: [...etc...] WHOIS++ requests normally fit onto a single line, and no state is preserved between requests. Consequently, embedding WHOIS++ requests within HTTP requests does not add greatly to implementation complexity. 3. Discussion There is no widespread agreement on the form which the indexing information retrieved by web crawlers would take, and it may be the case that different web crawlers are looking for different types of information. As the number of indexing agents deployed on the Internet continues to grow, it seems likely that they will eventually proliferate to the point where it becomes infeasible to retrieve the full content of each and every indexed object from each and every HTTP server. Having said this, distributing the indexing load amongst a number of servers which pooled their results would be one way around this problem - splitting the indexing load along geographical and topological lines. To put some perspective on this discussion, the need to do this does not yet appear to have arisen. On the format of indexing information there is something of a dichotomy between those who see the indexing information as a long term catalogue entry, perhaps to be generated by hand, and those who see it merely as an interchange format between two programs - which may be generated automatically. Ideally the same format would be useful in both situations, but in practice it may be difficult to [Page 4] INTERNET-DRAFT April 1996 isolate a sufficiently small subset of a rich cataloguing format for machine use. Consequently, this document will not make any proposals about the format of the indexing information. By extension, it will not propose a default format for search results. However, it seems reasonable that clients be able to request that search results be returned formatted as HTML, though this in itself is not a particularly meaningful concept - since there are a variety of languages which all claim to be HTML based. A tractable approach for implementors would be that HTML 2 should be returned unless the server is aware of more advanced HTML features supported by the client. Currently, much of this feature negotiation is based upon the value of the HTTP "User-Agent:" header, but it is hoped that a more sophisticated mechanism will eventually be developed. The use of the WHOIS++ search syntax is based on the observation that most search and retrieval protocols provide little more than an attribute/value based search capability, and that WHOIS++ manages to do this in arguably the simplest and most readily implemented manner. Other protocols typically add extra complexity in delivering requests and responses, and management type features which are rarely exercised over wide area networks. This document has suggested that search requests be presented using a new HTTP method, primarily so as to avoid confusion when dealing with servers which do not support searching. This approach has the disadvantage that there is a large installed base of clients which would not understand the new method, a large proportion of which have no way of supporting new HTTP methods. An alternative strategy would be to implement searches embedded within GET requests. This would complicate processing of the GET request, but not require any changes on the part of the client. It would also allow searches to be written in HTML documents without any changes to the HTML syntax - they would simply appear as regular URLs. Searches which required a new HTTP method would presumably have to be delineated by an additional component in the HTML anchor tag. This problem does not arise with the collection of indexing information, since the number of agents performing the collection will be comparatively small, and there is no perceived benefit from being able to write HTML documents which include pointers to indexing information - rather the opposite, in fact. [Page 5] INTERNET-DRAFT April 1996 4. Security considerations Most Internet protocols which deal with distributed indexing and searching are careful to note the dangers of allowing unrestricted access to the server. This is normally on the grounds that unscrupulous clients may make off with the entire collection of information - perhaps resulting in a breach of users' privacy, in the case of White Pages servers. In the web crawler environment, these general considerations do not apply, since the entire collection of information is already "up for grabs" to any person or agent willing to perform a traversal of the server. Similarly, it is not likely to be a privacy problem is searches yield a large number of results. One exception, which should be noted by implementors, is that it is a common practice to have some private information on public HTTP server - perhaps limiting access to it on the basis of passwords, IP addresses, network numbers, or domain names. These restrictions should be considered when preparing indexing information or search results, so as to avoid revealing private information to the Internet as a whole. It should also be noted that many of these access control mechanisms are too trivial to be used over wide area networks such as the Internet. Domain names and IP addresses are readily forged, passwords are readily sniffed, and connections are readily hijacked. Strong cryptographic authentication and session level encryption should be used in any cases where security is a major concern. 5. Conclusions There can be no doubt that the measures proposed in this document are implementable - in fact they have already been implemented and deployed, though on nothing like the scale of HTTP. It is a matter for debate whether they are needed or desirable as additions to HTTP, but it is clear that the additional functionality added to HTTP for search support would be at some implementation cost. Indexing support would be trivial to implement, once the issue of formatting had been resolved. 6. Acknowledgements Thanks to <<your name here!!>> for comments on draft versions of this document. This work was supported by grants from the UK Electronic Libraries Programme (eLib) and the European Commission's Telematics for [Page 6] INTERNET-DRAFT April 1996 Research Programme. The Harvest software was developed by the Internet Research Task Force Research Group on Resource Discovery, with support from the Advanced Research Projects Agency, the Air Force Office of Scientific Research, the National Science Foundation, Hughes Aircraft Company, Sun Microsystems' Collaborative Research Program, and the University of Colorado. 7. References Request For Comments (RFC) and Internet Draft documents are available from <URL:ftp://ftp.internic.net> and numerous mirror sites. [1] R. Fielding, H. Frystyk, T. Berners-Lee, J. Gettys, J. C. Mogul. "Hypertext Transfer Protocol -- HTTP/1.1", Internet Draft (work in progress). April 1996. [2] T. Berners-Lee, D. Connolly. "Hypertext Markup Language - 2.0", RFC 1866. November 1995. [3] M. Koster. "A Standard for Robot Exclusion." Last updated March 1996. <URL:http://info.webcrawler.com/mak/projects/robots/ norobots.html> [4] C. M. Bowman, P. B. Danzig, D. R. Hardy, U. Manber, M. F. Schwartz, and D. P. Wessels. "Harvest: A Scalable, Customizable Discovery and Access Sys- tem", Technical Report CU-CS-732-94, Department of Computer Science, University of Colorado, Boulder, August 1994. <URL:ftp://ftp.cs.colorado.edu/pub/cs/techreports/sc hwartz/HarvestJour.ps.Z> [5] P. Deutsch, R. Schoultz, P. Faltstrom & C. Weider. "Architecture of the WHOIS++ service", RFC 1835. August 1995. 8. Author's Address Martin Hamilton Department of Computer Studies Loughborough University of Technology Leics. LE11 3TU, UK Email: m.t.hamilton@lut.ac.uk [Page 7] INTERNET-DRAFT April 1996 This Internet Draft expires XXXX, 1996. [Page 8] --===_0_Wed_May__1_21:09:32_BST_1996-- From owner-meta2@net.lut.ac.uk Thu May 02 06:50:29 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uErHY-00054q-00; Thu, 2 May 1996 06:50:28 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id GAA25372 for meta2-outgoing; Thu, 2 May 1996 06:49:36 +0100 (BST) Received: from newton.ncsa.uiuc.edu (newton.ncsa.uiuc.edu [141.142.2.2]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id GAA25367 for <meta2@mrrl.lut.ac.uk>; Thu, 2 May 1996 06:49:32 +0100 (BST) Received: from void.ncsa.uiuc.edu (void.ncsa.uiuc.edu [141.142.103.20]) by newton.ncsa.uiuc.edu (8.6.11/8.6.12) with SMTP id AAA19213 for <meta2@mrrl.lut.ac.uk>; Thu, 2 May 1996 00:49:31 -0500 Received: by void.ncsa.uiuc.edu (4.1/NCSA-4.1) id AA14927; Thu, 2 May 96 00:47:06 CDT Date: Thu, 2 May 96 00:47:06 CDT From: liberte@ncsa.uiuc.edu (Daniel LaLiberte) Message-Id: <9605020547.AA14927@void.ncsa.uiuc.edu> To: meta2@mrrl.lut.ac.uk Subject: boil and trouble In-Reply-To: <199605012010.VAA22942@gizmo.lut.ac.uk> References: <199605012010.VAA22942@gizmo.lut.ac.uk> Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk I comment on some aspects of Martin's draft, and then launch into how I think HTTP should deal with metadata. Martin Hamilton writes: > I thought I'd bung this out as an Internet Draft and see if there was > any interest among HTTP implementors. Comments welcome... ! I've been storing for awhile, so thanks for stirring. I'm not sure about this particular forum, but I don't know of a better one. The http-wg is busy cranking out 1.1 so this would be a diversion to them. > Experimental HTTP methods to support indexing and searching > This document proposes some experimental mechanisms which may be > deployed within HTTP [1] to provide a local search capability on the > information being made available by an HTTP server, and reduce both > the bandwidth consumed by indexing agents, and the amount of work > done by HTTP servers during the indexing process. This is an excellent goal. Are you perhaps planning (hoping?) to attend the distributed searching and indexing workshop? > 1. Introduction > > As the number of HTTP servers deployed has increased, providing > searchable indexes of the information which they make available has > itself become a growth industry. As a result there are now a large > number of "web crawlers", "web wanderers" and suchlike. There are a number of other rationales for why the goal is worth seeking even if one does not want to actively support more web crawling. On the other hand, I don't know if there is a strong enough case for the argument that web crawling is excessively loading the network and servers. Some alternative rationales are the desire for contrained replication of indexing services within an intranet, and client directed searching of distributed indexes. > 2. Additional HTTP methods > > It would also be useful if the HTTP servers being indexed were > capable of generating indexing information themselves, and making > this information available in a bandwidth friendly manner Another alternative to keep in mind is that some servers might want indexing to be done by an associated server, perhaps one they contract with for this service. So a request for indexing info or searching services might reasonably be redirected to another server. > 2.1 The COLLECT method > > The COLLECT method is drawn from the Collector/Gatherer protocol used > by the Harvest software [4]. It represents a request for the > indexing information about either all of the information being made > available by the the HTTP server, or the indexing information > pertaining to a particular collection of information being made > available by the HTTP server. This much is great, although I am skeptical of the utility of any request having to do with everything on a server. Frequently there are many disjoint collections on one server, so it might make more sense to first ask for the list of collections. > In COLLECT requests, the Request-URI (to use the jargon of [1]) > should be an asterisk "*" if the request is for all of the indexing > information the HTTP server can provide, or a symbolic name which > refers to a particular collection. Use of the "*" is fine for a request of all indexing information. But I don't like the symbolic name to identify the collection, unless this name is in the same name space as other identifiers associated with the server. I think it is perfectly reasonable to expect that collections would be identified by a URI such as is allowed by the general syntax of the Request-URI. A server might implement such a collection as a directory, or perhaps as a special object represented by a file or database. A GET request for a collection might return not the collection itself but only metadata for it, such as its size and the query used to produce it. We also have to consider virtual collections and temporary collections that are the result of other requests. More on these issues below. Identifying collections with general Request-URIs suggests that there might be collections within collections, and these relationships could be discerned merely by comparing identifiers. Although nested collections should be allowed and perhaps supported by a set of methods that deal with collections, clients should not infer containment based solely on the identifiers. > 2.2 The SEARCH method > > The SEARCH method embeds a query in the Request-URI component of the > request, using the search syntax defined for the WHOIS++ protocol. Just as COLLECT was based on either everything in the server or everything in a particular collection, so should SEARCH be. So the Request-URI for a SEARCH request should be either "*" or the URI of a collection. The parameters of the search should be in additional header lines specific to the search request, just as the COLLECT request used additional header lines to parameterize it. The particular syntax and semantics for the SEARCH parameters could be WHOIS++ or perhaps other protocols. The protocol extension protocol (PEP) will have mechanisms that allow multiple protocols to be used, and negotiation between client and server as to what protocols are supported or required. I don't have my PEP spec here to frob together something that looks right, so consume the following example with some salsa. Rewriting your example, I might do it something like this: C: SEARCH /vips HTTP/1.1 C: Accept: application/whois, text/html C: Host: www.lut.ac.uk C: Protocol: whois++ C: Query: keywords=venona C: S: 200 OK search results follow S: Content-type: application/whois S: S: [...etc...] > This document has suggested that search requests be presented using a > new HTTP method, primarily so as to avoid confusion when dealing with > servers which do not support searching. This approach has the > disadvantage that there is a large installed base of clients which > would not understand the new method, a large proportion of which have > no way of supporting new HTTP methods. Deployment is an interesting hard problem. > An alternative strategy would be to implement searches embedded > within GET requests. This would complicate processing of the GET > request, but not require any changes on the part of the client. It > would also allow searches to be written in HTML documents without any > changes to the HTML syntax - they would simply appear as regular > URLs. Searches which required a new HTTP method would presumably > have to be delineated by an additional component in the HTML anchor > tag. Changes to HTML would not necessarily be needed even to support new methods. In addition to the method name, where are the additional parameters of the request? One solution is to package up the whole request, including the method name, the URI, and additional parameters into a new URI. I've been calling this the "call" URI scheme. The above example might appear as: call:SEARCH;Protocol='whois++';Query='keywords=venona';http://www.lut.ac.uk/vips The URI in the call URI could itself be another call URI, so call URIs would get potentially very long since they are general expressions. Making them readable is therefore a relevant goal. If whitespace were ignored in the use of the URI, such as in HTML, we could do: <A HREF="call:SEARCH; Protocol='whois++'; Query='keywords=venona'; http://www.lut.ac.uk/vips"> Returning to the subject of metadata, and how I believe it should be delt with in HTTP, we need a few extensions. There are three areas that need extension to deal with the following questions: 1) how to request metadata for a resource, 2) how to request that some new metadata should be associated with a resource, and 3) how to signal that metadata is being returned instead of the requested resource. 1) How to request metadata for a resource. Two things to consider are a META method and a Meta header. But before choosing, we have to address (ha ha) how a resource is referenced. In HTTP at least, it is not sufficient to say that a resource is identified by an http URL. The reason it is not sufficient is that it is the full request, including the method and relevant header lines, and maybe the phase of the moon, that are all factors in determining what is returned by the server. It is this full request that maps to the result, and from the client's perspective, the result is the resource "identified" in some sense by the full request. Consider the collection which results from doing a SEARCH. The result collection may be stored temporarily on the server for subsequent refinement (given sessions and such in the future), and the server just returns metadata about the result collection. The result collection doesn't necessarily have an identifier. Some systems might assign a URI to even such a temporary object, and this provides a partial solution. But consider if a client wants to *ask* for the metadata up front that is associated with the result of processing a request. The client has no identifier yet. So what this argument is leading up to is that the request for metadata must be essentially a wrapper around another request. The metadata for the result of processing the wrapped request is what should be returned by the metadata request. Now this wrapping might be done either using a META method or a Meta header in the request. As a header, the Meta header would always be recognized as being effectively outside of the request it is contained in. This is weird, and it also prevents us from asking for meta-metadata, which itself is weird, but useful anyway. Using a META method, the wrapped request could either be stuffed into a header, or in the body of the META request. It is not sufficient to only allow META requests for resources identified only by the Request-URI unless we also allow another means for wrapping requests. Another way of wrapping requests is in a "call" URI, as described above. 2) How to request that some new metadata should be associated with a resource. This is something like a PUT relative to a GET. But the issue gets more complex again when we consider wrapping of requests. I have nothing more intelligent to add at this hour. 3) How to signal that metadata is being sent instead of the requested resource. Any time a server wants to return metadata instead of the resource itself, it should be allowed to do so. Such metadata might be the list of variants that are available in the case that the request was not sufficient for the server to make that choice. The server should signal that metadata is being returned with either a particular status code, or with a new header line. (Other alternatives?) Notice it is not sufficient to claim that a result must be metadata just because it looks like metadata. This would be wrong because metadata may be requested specifically, and the server may instead return meta-metadata. (Apologies to Stu.) It is also wrong because the very same format and type of data that is used for metadata in one case might be used for something else that is not intended as metadata. It is sufficient for some requests to always return metadata, if they have been declared by the protocol to do so, so that clients know what is coming. That's enough for now. Comments appreciated, but please try to respond before the web conference next week as I plan to present some of this in the URC panel. -- Daniel LaLiberte (liberte@ncsa.uiuc.edu) National Center for Supercomputing Applications http://union.ncsa.uiuc.edu/~liberte/ From owner-meta2@net.lut.ac.uk Thu May 02 09:01:11 +0100 1996 Return-path: <owner-meta2@net.lut.ac.uk> Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uEtK2-00059m-00; Thu, 2 May 1996 09:01:10 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id JAA25711 for meta2-outgoing; Thu, 2 May 1996 09:00:56 +0100 (BST) Received: from weeble.lut.ac.uk (exim@weeble.lut.ac.uk [158.125.96.47]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id JAA25706 for <meta2@mrrl.lut.ac.uk>; Thu, 2 May 1996 09:00:54 +0100 (BST) Received: by weeble.lut.ac.uk with local (Exim 0.42 #1) id E0uEtJN-00059f-00; Thu, 2 May 1996 09:00:29 +0100 Date: Thu, 2 May 1996 09:00:29 +0100 (BST) From: Jon Knight <J.P.Knight@lut.ac.uk> To: Daniel LaLiberte <liberte@ncsa.uiuc.edu> cc: meta2@mrrl.lut.ac.uk Subject: Re: boil and trouble In-Reply-To: <9605020547.AA14927@void.ncsa.uiuc.edu> Message-ID: <Pine.SUN.3.91.960502084001.11298n-100000@weeble.lut.ac.uk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk On Thu, 2 May 1996, Daniel LaLiberte wrote: > 1) How to request metadata for a resource. > > So what this argument is leading up to is that the request for metadata > must be essentially a wrapper around another request. The metadata for > the result of processing the wrapped request is what should be returned > by the metadata request. Now this wrapping might be done either using a > META method or a Meta header in the request. As a header, the Meta > header would always be recognized as being effectively outside of the > request it is contained in. This is weird, and it also prevents us from > asking for meta-metadata, which itself is weird, but useful anyway. If we use MIME as the container syntax then we can include metadata IMTs in the Accept: header. Plus possibly add in a Metadata: header to allow the client to request that just the metadata for an object, or just the object itself or both are returned (if no Metadata: header is provided, assume that only the object is required for backwards compatibility reasons). This could work nicely with some of the existing HTTP methods. For example, you might get something like (excluding all the other great HTTP headers!): C: GET /foo/bar/yoghurt.html HTTP/1.0 C: Accept: text/html, text/plain, x-metadata/x-dcessgml, x-metadata/x-pics, x-metadata/x-usmarc C: Metadata: both C: S: 200 OK S: Content-type: multipart/related; boundary="qxyzqxyzqxyzqxyzqxyz" S: S: --qxyzqxyzqxyzqxyzqxyz S: Content-type: x-metadata/x-dcessgml S: S: <!DOCTYPE dublinCore PUBLIC '-//OCLC//DTD Dublin core v.1//EN'> S: <dublinCore> S: <title>Yoghurt is nice cos Simon Spero says so. S: Jon Knight S: 1996-05-02 S: Toy example S:
text/html
S: S: --qxyzqxyzqxyzqxyzqxyz S: Content-type: text/html S: S: Yoghurt is nice! S:

Yoghurt is nice cos Simon Spero says so.

S:

So there. S: --qxyzqxyzqxyzqxyzqxyz-- Now of course this means that the WWW browser needs to understand multipart MIME stuff but that shouldn't be too tricky to do. > 2) How to request that some new metadata should be associated with a resource. > > This is something like a PUT relative to a GET. But the issue gets more > complex again when we consider wrapping of requests. I have nothing > more intelligent to add at this hour. Sounds very similar to the form uploading idea - maybe we can use the same sort of idea? > 3) How to signal that metadata is being sent instead of the requested resource. > The server should signal that metadata is being returned with > either a particular status code, or with a new header line. > (Other alternatives?) I was going to say MIME types but you can't use that because you might be requesting an x-metadata type as a first class object and also be getting other x-metadata typed objects as the metadata about it. Which I guess is a bit of a problem if you group the metadata with the object in a multipart MIME object. Hmm. Tatty bye, Jim'll -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Jon "Jim'll" Knight, Researcher, Sysop and General Dogsbody, Dept. Computer Studies, Loughborough University of Technology, Leics., ENGLAND. LE11 3TU. * I've found I now dream in Perl. More worryingly, I enjoy those dreams. * From owner-meta2@net.lut.ac.uk Thu May 02 21:52:40 +0100 1996 Return-path: Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uF5Md-0005oG-00; Thu, 2 May 1996 21:52:39 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id VAA00438 for meta2-outgoing; Thu, 2 May 1996 21:52:00 +0100 (BST) Received: from CNRI.Reston.VA.US (CNRI.Reston.VA.US [132.151.1.1]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id VAA00433 for ; Thu, 2 May 1996 21:51:55 +0100 (BST) Received: from newcnri.cnri.reston.va.us by CNRI.Reston.VA.US id aa29798; 2 May 96 16:45 EDT Received: from [132.151.1.217] (warmsmc) by newcnri.CNRI.Reston.Va.US (5.x/SMI-SVR4) id AA07796; Thu, 2 May 1996 16:45:07 -0400 X-Sender: warms@newcnri.cnri.reston.va.us Message-Id: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Thu, 2 May 1996 16:45:38 -0400 To: msm@ansa.co.uk From: "William Y. Arms" Subject: Re: Your metadata draft Cc: Carl Lagoze , meta2@mrrl.lut.ac.uk Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk Mark, My draft is clearly not very clearly written. The intention was to make one simple observation in the section on security. The observation is that there are many occasions when differing access controls and security will be applied to various metadata elements. Metadata packages partition the metadata elements. Hence they provide a way of partitioning questions of security and access. Bill From owner-meta2@net.lut.ac.uk Sat May 04 20:53:18 +0100 1996 Return-path: Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uFnOH-0007hZ-00; Sat, 4 May 1996 20:53:17 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id UAA26619 for meta2-outgoing; Sat, 4 May 1996 20:51:46 +0100 (BST) Received: from mrrl.lut.ac.uk (martin@localhost.mrrl.lut.ac.uk [127.0.0.1]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with ESMTP id UAA26611 for ; Sat, 4 May 1996 20:51:42 +0100 (BST) Message-Id: <199605041951.UAA26611@gizmo.lut.ac.uk> X-Mailer: exmh version 1.6.6 3/24/96 To: meta2@mrrl.lut.ac.uk Subject: Re: boil and trouble X-URI: In-reply-to: Your message of "Thu, 02 May 1996 00:47:06 CDT." <9605020547.AA14927@void.ncsa.uiuc.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sat, 04 May 1996 20:51:36 +0100 From: Martin Hamilton Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk Daniel LaLiberte writes: | I comment on some aspects of Martin's draft, and then launch into how I | think HTTP should deal with metadata. :-)) | > This document proposes some experimental mechanisms which may be | > deployed within HTTP [1] to provide a local search capability on the | > information being made available by an HTTP server, and reduce both | > the bandwidth consumed by indexing agents, and the amount of work | > done by HTTP servers during the indexing process. | | This is an excellent goal. Are you perhaps planning (hoping?) to attend | the distributed searching and indexing workshop? I'm not real big on workshops and conferences! | There are a number of other rationales for why the goal is worth seeking | even if one does not want to actively support more web crawling. On the | other hand, I don't know if there is a strong enough case for the | argument that web crawling is excessively loading the network and | servers. Some alternative rationales are the desire for contrained | replication of indexing services within an intranet, and client directed | searching of distributed indexes. It's an interesting one, for sure. To get a quick snapshot I had a look at the usage stats on our main WWW server for the last few months. Most months it looks like this: %Reqs %Byte Bytes Sent Requests Reversed Subdomain ----- ----- ------------ -------- |-------------------- 56.67 35.89 772653477 261273 | uk.ac.lut 12.78 13.23 284833493 58939 | Unresolved 4.02 0.86 18421219 18550 | uk.co.spice 1.53 0.47 10035051 7037 | com.lycos.srv 0.63 0.34 7308669 2924 | net.ja.lut 0.58 0.41 8734050 2653 | com.mckinley 0.54 0.74 15900604 2504 | uk.co.demon 0.49 0.57 12309822 2242 | uk.ac.hensa 0.38 0.23 5053358 1773 | com.atext 0.37 0.52 11151840 1726 | com.compuserve i.e. most web crawlers account for less than 1% of the requests and bytes delivered every month. Those "spice" people seem to be a bit more agressive than most ;-) This might look quite reasonable, but when you add up the known and suspected robots' entries we start to head up towards the 10% mark. I don't want my server to spend 10% of its time servicing requests from web crawlers, and I don't want to tie up anything like that much bandwidth talking to them. | Another alternative to keep in mind is that some servers might want | indexing to be done by an associated server, perhaps one they contract | with for this service. So a request for indexing info or searching | services might reasonably be redirected to another server. Good one! Perhaps via an HTTP "Location:" header and a redirect response code ? | This much is great, although I am skeptical of the utility of any | request having to do with everything on a server. Frequently there | are many disjoint collections on one server, so it might make | more sense to first ask for the list of collections. Yes, it begs the question of how you discover what collections of info the server offers...! In the context of current web crawler technology, I think "*" is the only thing they'll be interested in ? What's important is not to make it hard to introduce other more advanced indexing scenarios in the future - e.g. I will only let you index my server if you pay me $$$ derived from your advertising revenue | Just as COLLECT was based on either everything in the server or | everything in a particular collection, so should SEARCH be. So the | Request-URI for a SEARCH request should be either "*" or the URI of a | collection. The parameters of the search should be in additional | header lines specific to the search request, just as the COLLECT request | used additional header lines to parameterize it. Yep! Arguable whether Request-URI: should actually be used for anything ? ...or just there as a filler to make up the HTTP request :-) | Rewriting your example, I might do it something like this: | | C: SEARCH /vips HTTP/1.1 | C: Accept: application/whois, text/html | C: Host: www.lut.ac.uk | C: Protocol: whois++ | C: Query: keywords=venona | C: | S: 200 OK search results follow | S: Content-type: application/whois | S: | S: [...etc...] I think the Protocol attribute wants to include a URI to the spec, and mandate that it needs to be supported, in which case the header would end up looking something like this ... ? Protocol: {ftp://ftp.internic.net/rfc/rfc1835.txt {str req}} And a PEP aware server's response would use one of the ?2? response code series? e.g. 220 Umm, OK, I think I understand... Question: with PEP, is there really any point in using separate methods ? In any case, for the COLLECT operation at least, it would seem to be desirable to have something which could be used straight away with GET to retrieve the entire collection of indexing info for a server, or with a couple of PEP headers to retrieve a subset of the available info - a la Harvest. Hmm! This would still have the drawback that server admins would need to run something like Robert Thau's site-index.pl to generate the indexing info. With support built into the server, we can make the server generate this automagically - and take them out of the loop! | > This document has suggested that search requests be presented using a | > new HTTP method, primarily so as to avoid confusion when dealing with | > servers which do not support searching. This approach has the | > disadvantage that there is a large installed base of clients which | > would not understand the new method, a large proportion of which have | > no way of supporting new HTTP methods. | | Deployment is an interesting hard problem. And that's before you get onto choosing between metadata formats... ;-) | Changes to HTML would not necessarily be needed even to support new | methods. In addition to the method name, where are the additional | parameters of the request? One solution is to package up the whole | request, including the method name, the URI, and additional parameters | into a new URI. I've been calling this the "call" URI scheme. The | above example might appear as: | | call:SEARCH;Protocol='whois++';Query='keywords=venona';http://www.lut.ac.uk/v | ips Question: should HTTP clients need fixing up in order to be capable of supporting (ableit perhaps not in the most sophisticated way) a common search scheme ? [...] | That's enough for now. Comments appreciated, but please try to respond | before the web conference next week as I plan to present some of this in | the URC panel. I was going to say "see you on the MBONE" but I see this isn't one of the sessions being multicast. Awww, shucks! From owner-meta2@net.lut.ac.uk Tue May 07 01:59:34 +0100 1996 Return-path: Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uGb7k-00027u-00; Tue, 7 May 1996 01:59:32 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id BAA28178 for meta2-outgoing; Tue, 7 May 1996 01:58:42 +0100 (BST) Received: from ns.onet.on.ca (ns.onet.on.ca [130.185.89.125]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with ESMTP id BAA28170 for ; Tue, 7 May 1996 01:58:35 +0100 (BST) Received: from sqarc.sq.com ([192.31.6.128]) by ns.onet.on.ca with SMTP id <254375>; Mon, 6 May 1996 20:58:24 -0400 Received: from sqrex.sq.com by sqarc.sq.com with smtp (Smail3.1.29.1 #4) id m0uGb6N-000OkuC; Mon, 6 May 96 20:58 EDT Received: by sqrex.sq.com (4.1//ident-1.0) id AA29524; Mon, 6 May 96 20:58:07 EDT Date: Mon, 6 May 96 20:58:07 EDT From: lee@sq.com Message-Id: <9605070058.AA29524@sqrex.sq.com> To: meta2@mrrl.lut.ac.uk Subject: Re: boil and trouble Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk > Now of course this means that the WWW browser needs to understand > multipart MIME stuff but that shouldn't be too tricky to do. Anything involving getting 30 or 100 software vendors to change their products needs a far stronger business case than the metadata group can ever provide. `your users will benefit when they use your product to search the web, if everyone else implements this, people use it, and the webcrawlers index it, and ...' Forget it. Note that Netscape Navigator already understands multipart MIME messages, but with a compltely different semantics -- this is how `server push' animations are done. You won't get that to change easily, since it's *extremely* widely deployed. I know I keep saying this, but I'm not vey interested in a metadata standard for the year 2010. Nor even in one for the year 1998. And if you require coordinated multi-vendor software changes, you'll be lucky to get it by then. The MIME stuff has a certain elegance. But anything that alters existing non-DublinCore metada (e.g. by wrapping it), or that alters the way an HTML file is deliverd, is doomed to failure. Two years of hard experience on the various IETF mailing lists and meetings has demonstrated this clearly. Lee From owner-meta2@net.lut.ac.uk Tue May 07 03:38:24 +0100 1996 Return-path: Received: from gizmo.lut.ac.uk [158.125.96.46] (majordom) by weeble.lut.ac.uk with smtp (Exim 0.42 #1) id E0uGcXL-0002CM-00; Tue, 7 May 1996 03:30:03 +0100 Received: (majordom@localhost) by gizmo.lut.ac.uk (8.7.5/8.6.9) id DAA28530 for meta2-outgoing; Tue, 7 May 1996 03:29:52 +0100 (BST) Received: from weeble.lut.ac.uk (exim@weeble.lut.ac.uk [158.125.96.47]) by gizmo.lut.ac.uk (8.7.5/8.6.9) with SMTP id DAA28525 for ; Tue, 7 May 1996 03:29:48 +0100 (BST) Received: by weeble.lut.ac.uk with local (Exim 0.42 #1) id E0uGcWt-0002CA-00; Tue, 7 May 1996 03:29:35 +0100 Date: Tue, 7 May 1996 03:29:34 +0100 (BST) From: Jon Knight To: lee@sq.com cc: meta2@mrrl.lut.ac.uk Subject: Re: boil and trouble In-Reply-To: <9605070058.AA29524@sqrex.sq.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-meta2@mrrl.lut.ac.uk Precedence: bulk Reply-To: meta2@mrrl.lut.ac.uk On Mon, 6 May 1996 lee@sq.com wrote: > > Now of course this means that the WWW browser needs to understand > > multipart MIME stuff but that shouldn't be too tricky to do. > > Anything involving getting 30 or 100 software vendors to change their > products needs a far stronger business case than the metadata group > can ever provide. `your users will benefit when they use your product > to search the web, if everyone else implements this, people use it, and > the webcrawlers index it, and ...' Forget it. Well, I disagree completely with the "forget it bit". Otherwise we'd all still be using HTML 1.0 and the CERN line mode browser and we might as well go home now. WWW browser implementors _do_ implement neat new features all the time. Not all of them I-candy stuff like and ; how about proxy support, multilingual support, user agent spoofing (a great X Mosaic feature :-) ), active objects, etc? And we know the browser writers/index generators are interested in metadata already because of the rumblings from W3C and the various hype, erm, I mean press announcements we've seen recently (eg: Netscape using Harvest). Indexing is getting important and people are realising that. Howver, metadata generation is something that is going to take time; even if you let people stick in HTML files, someone has to create it, someone has to index it, etc, etc. Someone has to write/glue together some code somewhere, be it web browser vendors or webcrawler writers. That's something I think that we've got to accept. Look how long GILS is taking even with the US Federal Government giving it a boot up the botty. :-) > Note that Netscape Navigator already understands multipart MIME messages, > but with a compltely different semantics -- this is how `server push' > animations are done. You won't get that to change easily, since it's > *extremely* widely deployed. Different semantics to multipart/mixed and multipart/alternative??? That isn't MIME in that case (the RFCs tell you what the semantics of the multipart content types are and I wouldn't be at all surprised if Netscrape have broken yet another standard along with HTML/SGML. Unfortunately the Netscrape site is as ever too slow to get to at the moment (and its 3am!) so I can't check this). Or am I misunderstanding what you mean by semantics here? > I know I keep saying this, but I'm not vey interested in a metadata standard > for the year 2010. Nor even in one for the year 1998. And if you require > coordinated multi-vendor software changes, you'll be lucky to get it by then. Well years 1996 and 1997 are pretty much out of the question. I doubt that we'll all be able to write and ship the multiple, independent implementations of code required to progress a standards track protocol in under a year. We _might_ be able to get a few, rough and ready implementations out there for folks to try out in a year or so, but there won't be much data in it for a while. Even the WWW took a while to get going (I remember when it was in the same league as Hyper-G alongside the then rapidly growing gopher). I would say at least 1998 for the _start_ of a new metadata standard. If you're after a standard today, there's always MARC and Z39.2/Z39.50. And its GILS compliant. :-) I don't think we want or even need coordinated multi-vendor software changes. What is needed is killer-app, Mosaic style, that gets everybody generating and using metadata. Something that makes it worth their while creating the stuff (just like Mosaic's GUI made it worthwhile bothering with inserting HTML tags into documents). It might be something like Silk or Harvest modified to understand WF, DCES embedded in HTML files and the DCES-SGML DTD for example. Maybe agent (yuck, nasty word) technology that sits on your desktop and gives you a nice, user friendly search mechanism that is constantly updated for your favourite search term; I don't know. However, once one killer-app is out there, the sheep will follow (or perish - definition of a killer app :-)). Coordinating multi-vendor changes is ISO committee thinking, not free market thinking. Making a kick-ass application that everyone wants and is willing to put the extra effort into in order to gain some benefits is what we should be thinking about. If it worked for Marc A.... $$$ :-) > The MIME stuff has a certain elegance. But anything that alters existing > non-DublinCore metada (e.g. by wrapping it), or that alters the way an > HTML file is deliverd, is doomed to failure. Two years of hard experience > on the various IETF mailing lists and meetings has demonstrated this clearly. Err, the point of wrapping the non-DCES metadata is that you don't have to alter it, you just wrap it. To my mind altering the metadata means going in and fiddling with bits inside the unencoded packages which I didn't see anyone proposing (maybe I missed it?). At the moment few browsers understand any metadata at all and none of them generate the accept headers proposed for MIME wrapping of metadata, so its not like proprosing wrapping of metadata formats in MIME is going to break stuff. Library OPACs are going to carry on exchanging raw MARC between systems without MIME encoding it. The world isn't going to fall apart if we decide to wrap up metadata for use in applications we don't have yet. And we don't alter the way HTML files are delivered to non-WF aware browsers; they don't generate the correct accept headers to get MIME encoded metadata wrapped round their objects and so just get the objects with no multipart content-types (which is, after all, all they know how to process). WF aware browsers also wouldn't break existing HTTP servers as the server would just return them the object (or no match if all they wanted was the metadata). Allowing metadata to be transported with HTML files without breaking them whole point of developing the cool DCES-embedded-in-HTML stuff (or so I thought - was I wrong?). The most likely "browsers" to implement a MIME (or SGML or whatever) based WF initially aren't likely to be end user browsers anyway; IMHO the WWW client side of the indexing engines is the place this code will appear first (if anywhere). There are far fewer of these deployed than Netscapes and their owners/implementors are likely to have a vested interest in getting their hands on any and all metadata that they can in order to improve their service. Of course, if even these people aren't interested in getting metadata, one has to start questioning who _is_ interested in this outside of the 50 of us that met in Warwick! :-) Apologies for any rambling but its very late (or is that early) and my MBONE recording tools _still_ don't want to record Stu's session at the Paris WWW conference. Time for bed me thinks... Tatty bye, Jim'll -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Jon "Jim'll" Knight, Researcher, Sysop and General Dogsbody, Dept. Computer Studies, Loughborough University of Technology, Leics., ENGLAND. LE11 3TU. * I've found I now dream in Perl. More worryingly, I enjoy those dreams. *