Re: API command set (revisited) - SGMLS::Handler Re: code to modularize SGMLS.pm's commandline driven sgmlspl tool From: Ingo Macherius WWW: http://home.tu-clausthal.de/~inim/ ------------ From null@ActiveState.com Mon Mar 30 07:04:42 1998 Date: Mon, 30 Mar 1998 14:54:40 +0000 From: Ingo Macherius Subject: Re: API command set (revisited) Cc: perl-xml@ActiveState.com > From: "Robert Hanson" > Subject: API command set (revisited) > Date: Sun, 29 Mar 1998 17:00:43 -0500 > It seems to me that SAX sends events based on the starting and ending of > tags... but shouldn't these be included in the inards of the parser and in > my opinion should have nothing to do with the "application". I've done a small piece of code to modularize SGMLS.pm's commandline driven sgmlspl tool. Recently I received mail by Tony Graham, who had done something very similar. The idea ist to register references to functions with the API, open the resource and pass the events through. So after installing the callbacks my main program looks like that: #!/usr/local/bin/perl require SGMLS; require SGMLS::Handler; $handler = new SGMLS::Handler; $parse = new SGMLS(STDIN); $handler->start_document; while ($event = $parse->next_event) { $handler->do_event(event); }; $handler->end_document; > I see the API to the application as more of a tree than event driven. I > can't understand why my application would care if a new tag has started or > not... I am more interested in a higher level like "give me the contents of > this entity". The overhead doing so is enormous. Maybe your application just want's to find out if a 10MB document contains an element with ID="foo". Building a tree ? Never ! Of course higher level ("grove level") functions are also needed, the simple ones are needed, too. A grove can only be build on simple events like , , ... > Who does SAX really appy to? The processor or the top level application? Right in the middle. If you look at the Python-SAX-lookalike modules, you'll find that a parser in Python may be used OR a interface to jjc's "nsgmls" ESIS-output, much like SGMLS.pm did. To me that is the point with SAX, it's implementable even with such limited information as a stream of ESIS events. SAX will simply hide the parser behind. Possibly limitations in the parser will result in limitations to the functions provided by the SAX-alike API. SGMLS.pm is not too nice, as all entity management etc. is done outside Perl. On the other hand, integrated all-in-one modules like Quilt are pain to install and maintain. So any Perl-XML API should not be more complex than implementable on nsgmls ESIS output in the base layer. If there is a more abstract level build on this -- just fine. Maybe the code is of interest, so I appended it to this mail. Comments welcome. ++im ---- snip ---- #!/usr/local/bin/perl -w ######################################################################## # SGMLS::Handler -- the sgmlspl process modell as a perl module # # Revision: $Id: Handler.pm,v 1.2 1996/09/12 00:00:58 inim Exp $ # # Heavily based on version 1.8 (as of 1995/12/03) of David Megginson's # sgmlspl.pl script. Those portions of this code are Copyright (c) # 1995 by David Megginson # # Modified by Ingo Macherius . # Modifications copyright 1996 by Ingo Macherius. # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # ######################################################################## package SGMLS::Handler; use strict; use Carp; require SGMLS::Output; require SGMLS; ######################################################################## sub new { my $class = shift; my $self = { # Hashes of functions 'start_element', => { '' => sub {} }, 'end_element', => { '' => sub {} }, 'entity', => { '' => sub {} }, 'start_subdoc', => { '' => sub {} }, 'end_subdoc', => { '' => sub {} }, 'sdata', => { '' => sub {} }, 'pi', => { '' => sub {} }, # Single functions 'start_document', => sub {}, 'end_document', => sub {}, 'conforming', => sub {}, 'cdata', => sub {}, 're', => sub {}, }; return bless $self, $class; } #### # # Document start and end events were automatically called by sgmlspl. # This module still registers drivers, but won't call them on itself. # Use the below methods from your script instead. # #### sub start_document {my $self = shift; &{ $self->{'start_document'}}}; sub end_document {my $self = shift; &{ $self->{'end_document'}}}; #### # # Main access point: declare handlers for different SGML events. # # Usage: sgml(event, handler); # # The event may be one of the following strings, or a special pattern. # The generic events are as follow: # # 'start' The beginning of the document. # 'end' The end of the document. # 'start_element' The beginning of an element. # 'end_element' The end of an element. # 'cdata' Regular character data. # 'sdata' Special system-specific data. # 're' A record-end. # 'pi' A processing instruction. # 'entity' An external-entity reference. # 'start_subdoc' The beginning of a subdocument entity. # 'end_subdoc' The end of a subdocument entity. # 'conforming' The document is conforming. # # In addition to these generic events, it is possible to handlers for # certain specific, named events, as follow: # # '' The beginning of element GI. # '' The end of element GI. # '[SDATA]' The system-specific data SDATA. # '&ENAME;' A reference to the external entity ENAME. # '{ENAME}' The beginning of the subdocument-entity # ENAME. '{/ENAME}' The end of the subdocument-entity # ENAME. # # # The handler may be a string, which will simply be printed when the # event occurs (this is usually useful only for the specific, named # events), or a reference to an anonymous subroutine, which will # receive two arguments: the event data and the event itself. For # example, # sgml('', "\n\\begin{foo}\n"); # and # sgml('', sub { output("\n\\begin{foo}\n"); }); # will have identical results. # #### sub sgml { my $self = shift; my ($spec,$handler) = (@_); if (ref($handler) ne 'CODE') { $handler =~ s/\\/\\\\/g; $handler =~ s/'/\\'/g; if ($handler eq '') { $handler = sub {}; } else { $handler = eval "sub { SGMLS::Output::output('$handler'); };"; } } SWITCH: { # start-document handler $spec eq 'start' && do { $self->{'start_document'} = $handler; last SWITCH; }; # end-document handler $spec eq 'end' && do { $self->{'end_document'} = $handler; last SWITCH; }; # start-element handler $spec =~ /^<([^\/].*|)>$/ && do { $self->{'start_element'}->{$1} = $handler; last SWITCH; }; # generic start-element handler $spec eq 'start_element' && do { $self->{'start_element'}->{''} = $handler; last SWITCH; }; # end-element handler $spec =~ /^<\/(.*)>$/ && do { $self->{'end_element'}->{$1} = $handler; last SWITCH; }; # generic end-element handler $spec =~ 'end_element' && do { $self->{'end_element'}->{''} = $handler; last SWITCH; }; # cdata handler $spec eq 'cdata' && do { $self->{'cdata'} = $handler; last SWITCH; }; # sdata handler $spec =~ /^\|(.*)\|$/ && do { $self->{'sdata'}->{$1} = $handler; last SWITCH; }; # generic sdata handler $spec eq 'sdata' && do { $self->{'sdata'}->{''} = $handler; last SWITCH; }; # record-end handler $spec eq 're' && do { $self->{'re'} = $handler; last SWITCH; }; # processing-instruction handler $spec eq 'pi' && do { $self->{'pi'} = $handler; last SWITCH; }; # entity-reference handler $spec =~ /^\&(.*);$/ && do { $self->{'entity'}->{$1} = $handler; last SWITCH; }; # generic entity-reference handler $spec eq 'entity' && do { $self->{'entity'}->{''} = $handler; last SWITCH; }; # start-subdoc handler $spec =~ /^\{([^\/].*|)\}$/ && do { $self->{'start_subdoc'}->{$1} = $handler; last SWITCH; }; # generic start-subdoc handler $spec eq 'start_subdoc' && do { $self->{'start_subdoc'}->{''} = $handler; last SWITCH; }; # end-subdoc handler $spec =~ /^\{\/(.*)\}$/ && do { $self->{'end_subdoc'}->{$1} = $handler; last SWITCH; }; # generic end-subdoc handler $spec eq 'end_subdoc' && do { $self->{'end_subdoc'}->{''} = $handler; last SWITCH; }; # conforming handler $spec eq 'conforming' && do { $self->{'conforming'} = $handler; last SWITCH; }; croak "Bad SGML handler pattern: $spec\n"; } } #### # # This method iterates along the ESIS stream. Just pass the events # through to do_event() and it will call the proper handler for you. # Goodbye, SWITCH ;-) # # #!/usr/local/bin/perl # require SGMLS; # require SGMLS::Handler; # # $handler = new SGMLS::Handler; # $parse = new SGMLS(STDIN); # # $handler->start_document; # while ($event = $parse->next_event) { # $handler->do_event(event); # }; # $handler->end_document; # #### sub do_event { my ($self, $event) = (@_); my $type = $event->type; $type =~ m/(start_element|end_element|entity|sdata|start_subdoc|end_subdoc)/ && do { &{ ($self->{$type}->{$event->data->name} || $self->{$type}->{''} || sub {} ) } ($event->data,$event); return; }; $type =~ m/(cdata|re|pi|conforming)/ && do { &{ $self->{$type} }($event->data,$event); return; }; croak "Unknown SGML event type: $type\n"; } 1; -- Snail : Ingo Macherius // L'Aigler Platz 4 // D-38678 Clausthal-Zellerfeld mailto:Ingo.Macherius@tu-clausthal.de http://home.tu-clausthal.de/~inim/ Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Zappa) ..................................... To leave this list, send an email message to ListManager@ActiveState.com with the following text in the body: Unsubscribe Perl-XML For non-automated Mailing List support, send email to ListHelp@ActiveState.com