org.newsml.toolkit
Interface NewsMLFactory

All Known Implementing Classes:
DOMNewsMLFactory

public interface NewsMLFactory

Factory interface for NewsML documents.

This is an abstract factory interface for creating NewsML documents and nodes from external resources: it contains methods that can create a top-level NewsML object or any arbitrary NewsML node (implementing the BaseNode interface) given a URL or a character stream. Normally, this will be the only public point of access to subpackages that contain contain concrete implementations, such as DOMNewsMLFactory.

HINT: a Java Reader can read from any source, not just a file. For example, if you need to read NewsML document from a string, try using a StringReader; if you need to parse data coming from somewhere else in your program, try a PipedReader.

This class includes setValidation(boolean) and getValidation() methods for enabling or disabling DTD validation. If the validation property is true, the factory will process the XML source with DTD-validation or throw an exception if the underlying implementation does not support validation; if it is false, the factory will process the XML source without DTD-validation or throw an exception if the underlying implementation does not support non-validating parsing. The default value is system-dependent, but will usually be false for reasons explained below.

If validation is false (recommended), the implementation will not load any external XML entities, including the external DTD subset, and will not report validation errors; if validation is true (dangerous: see below), the implementation will attempt to load all externally-referenced XML entities, and will fail with an exception if there are any validation errors.

While DTD-validation is frequently promoted in XML books and workshops, it is almost always a bad idea for production applications: it has the potential to cause serious problems and should be used only with extreme caution (or better yet, not at all). DTD validation opens your applications and your organization to the following serious risks:

  1. Performance degradation: a DTD hosted at a remote site could take seconds or minutes to load because of network congestion or a heavy load on the host: for a system processing thousands of NewsML packages daily, the delays may be fatal.
  2. Denial of service: if system hosting the DTD is down, or if the system administrator changes the location of the DTD file, your system will be unable to process any NewsML packages. Malicious parties may crack the system hosting the DTD and make subtle (and difficult to find changes) that cause all of your NewsML packages to be rejected, or the DTD may be legitimately upgraded to a newer version with the same effect. In essence, the security of your whole system is no better than the security of the external computer hosting the DTD.
  3. Sabotage: in addition to modifying the DTD to force your NewsML documents to be rejected, a malicious user could modify the DTD to change default values for attributes, causing your system to process the NewsML packages correctly but produce incorrect output and mis-categorization.
  4. Unintentional disclosure: since a DTD-validating system hits the external host every time it processes a NewsML package, the external host can do traffic analysis to learn about how your internal system works.

Version:
1.1beta
Author:
Reuters PLC
See Also:
NewsMLException

Method Summary
 NewsML createNewsML(Reader input, String baseURL)
          Create a top-level NewsML object from a character stream.
 NewsML createNewsML(String url)
          Create a top-level NewsML object from a URL.
 BaseNode createNode(Reader input, String baseURL)
          Create a NewsML node from a URL.
 BaseNode createNode(String url)
          Create a NewsML node from a URL.
 boolean getValidation()
          Get the validation flag.
 void setValidation(boolean validation)
          Set the validation flag.
 

Method Detail

setValidation

public void setValidation(boolean validation)
                   throws IOException
Set the validation flag.
Parameters:
validation - true if the factory is required to perform DTD validation, false if it is required not to.
Throws:
IOException - if the implementation does not support the requested state.
See Also:
getValidation()

getValidation

public boolean getValidation()
Get the validation flag.
Returns:
true if the factory is required to perform DTD validation, false if it is required not to.
See Also:
setValidation(boolean)

createNewsML

public NewsML createNewsML(String url)
                    throws IOException
Create a top-level NewsML object from a URL.
Parameters:
url - A string in URL format.
Returns:
A top-level NewsML node.
Throws:
IOException - if there is some kind of error reading the document, including a validation error if validation is requested.
NewsMLException - if the root element of the document is not NewsML, or if the underlying implementation does not support the requested validation type.

createNewsML

public NewsML createNewsML(Reader input,
                           String baseURL)
                    throws IOException
Create a top-level NewsML object from a character stream.
Parameters:
input - The character stream.
baseURL - The base URL for resolving relative links; if null, will default to a file: URL based on the current directory.
Returns:
The top-level NewsML node.
Throws:
IOException - if there is some kind of error reading the document.
NewsMLException - if the root element of the document is not NewsML.

createNode

public BaseNode createNode(String url)
                    throws IOException
Create a NewsML node from a URL.
Parameters:
url - A string in URL format.
Returns:
A new base node, or null if the root element type is not recognized or not supported.
Throws:
IOException - if there is some kind of error reading the document.

createNode

public BaseNode createNode(Reader input,
                           String baseURL)
                    throws IOException
Create a NewsML node from a URL.
Parameters:
input - The character stream.
baseURL - The base URL for resolving relative links; if null, will default to a file: URL based on the current directory.
Returns:
A new base node, or null if the root element type is not recognized or not supported.
Throws:
IOException - if there is some kind of error reading the document.