fi.karppinen.gnu.xml.util
Class XMLWriter

java.lang.Object
  extended by fi.karppinen.gnu.xml.util.XMLWriter
All Implemented Interfaces:
XmlDeclarationHandler, ContentHandler, DTDHandler, DeclHandler, LexicalHandler
Direct Known Subclasses:
TextConsumer

public class XMLWriter
extends Object
implements ContentHandler, LexicalHandler, DTDHandler, DeclHandler, XmlDeclarationHandler

This class is a SAX handler which writes all its input as a well formed XML or XHTML document. If driven using SAX2 events, this output may include a recreated document type declaration, subject to limitations of SAX (no internal subset exposed) or DOM (the important declarations, with their documentation, are discarded).

By default, text is generated "as-is", but some optional modes are supported. Pretty-printing is supported, to make life easier for people reading the output. XHTML (1.0) output has can be made particularly pretty. Canonical XML can also be generated, assuming the input is properly formed.


Some of the methods on this class are intended for applications to use directly, rather than as pure SAX2 event callbacks. Some of those methods access the JavaBeans properties (used to tweak output formats, for example canonicalization and pretty printing). Subclasses are expected to add new behaviors, not to modify current behavior, so many such methods are final.

The write*() methods may be slightly simpler for some applications to use than direct callbacks. For example, they support a simple policy for encoding data items as the content of a single element.

To reuse an XMLWriter you must provide it with a new Writer, since this handler closes the writer it was given as part of its endDocument() handling. (XML documents have an end of input, and the way to encode that on a stream is to close it.)


Note that any relative URIs in the source document, as found in entity and notation declarations, ought to have been fully resolved by the parser providing events to this handler. This means that the output text should only have fully resolved URIs, which may not be the desired behavior in cases where later binding is desired.

Note that due to SAX2 defaults, you may need to manually ensure that the input events are XML-conformant with respect to namespace prefixes and declarations. NSFilter is one solution to this problem, in the context of processing pipelines. Something as simple as connecting this handler to a parser might not generate the correct output. Another workaround is to ensure that the namespace-prefixes feature is always set to true, if you're hooking this directly up to some XMLReader implementation.

Author:
David Brownell, Henri Sivonen
See Also:
TextConsumer

Field Summary
private  boolean canonical
           
private  int column
           
private static int CTX_ATTRIBUTE
           
private static int CTX_CONTENT
           
private static int CTX_ENTITY
           
private static int CTX_NAME
           
private static int CTX_UNPARSED
           
private  int elementNestLevel
           
private  int entityNestLevel
           
private static String eol
           
private  ErrorHandler errHandler
           
private  boolean expandingEntities
           
private  boolean inCDATA
           
private  boolean inDoctype
           
private  boolean inEpilogue
           
private static int lineLength
           
private  Locator locator
           
private  boolean noWrap
           
private  Writer out
           
private  boolean prettyPrinting
           
private  Stack<String> space
           
private  boolean startedDoctype
           
private  StringBuilder stringBuf
           
private  boolean xhtml
           
 
Fields inherited from interface fi.karppinen.xml.XmlDeclarationHandler
XML_DECLARATION_HANDLER
 
Constructor Summary
XMLWriter()
          Constructs this handler with System.out used to write SAX events using the UTF-8 encoding.
XMLWriter(OutputStream out)
          Constructs a handler which writes all input to the output stream in the UTF-8 encoding, and closes it when endDocument is called.
XMLWriter(Writer writer)
          Constructs a handler which writes all input to the writer, and then closes the writer when the document ends.
 
Method Summary
 void attributeDecl(String eName, String aName, String type, String mode, String value)
          SAX2 : called on attribute declarations
 void characters(char[] ch, int start, int length)
          SAX1 : reports content characters
 void comment(char[] ch, int start, int length)
          SAX2 : called when comments are parsed.
private  void doIndent()
           
 void elementDecl(String name, String model)
          SAX2 : called on element declarations
 void endCDATA()
          SAX2 : called after parsing CDATA characters
 void endDocument()
          SAX1 : indicates the completion of a parse.
 void endDTD()
          SAX2 : called after the doctype is parsed
 void endElement(String uri, String localName, String qName)
          SAX2 : indicates the end of an element
 void endEntity(String name)
          SAX2 : called after parsing a general entity in content
 void endPrefixMapping(String prefix)
          SAX2 : ignored.
private  void escapeChars(char[] buf, int off, int len, int code)
           
 void externalEntityDecl(String name, String publicId, String systemId)
          SAX2 : called on external entity declarations
protected  void fatal(String message, Exception e)
          Used internally and by subclasses, this encapsulates the logic involved in reporting fatal errors.
 void flush()
          Flushes the output stream.
 void ignorableWhitespace(char[] ch, int start, int length)
          SAX1 : reports ignorable whitespace
private static boolean indentBefore(String tag)
           
 void internalEntityDecl(String name, String value)
          SAX2 : called on internal entity declarations
 boolean isCanonical()
          Returns value of flag controlling canonical output.
private static boolean isEmptyElementTag(String tag)
           
 boolean isExpandingEntities()
          Returns true if the output will have no entity references; returns false (the default) otherwise.
 boolean isPrettyPrinting()
          Returns value of flag controlling pretty printing.
 boolean isXhtml()
          Returns true if the output attempts to echo the input following "transitional" XHTML rules and matching the "HTML Compatibility Guidelines" so that an HTML version 3 browser can read the output as HTML; returns false (the default) othewise.
private  void newline()
           
 void notationDecl(String name, String publicId, String systemId)
          SAX1 : called on notation declarations
 void processingInstruction(String target, String data)
          SAX1 : reports a PI.
private  void rawWrite(char c)
           
private  void rawWrite(char[] buf, int offset, int length)
           
private  void rawWrite(String s)
           
 void setCanonical(boolean value)
          Sets the output style to be canonicalized.
 void setDocumentLocator(Locator l)
          SAX1 : provides parser status information
 void setErrorHandler(ErrorHandler handler)
          Assigns the error handler to be used to present most fatal errors.
 void setExpandingEntities(boolean value)
          Controls whether the output text contains references to entities (the default), or instead contains the expanded values of those entities.
 void setPrettyPrinting(boolean value)
          Controls pretty-printing, which by default is not enabled (and currently is most useful for XHTML output).
 void setWriter(Writer writer)
          Resets the handler to write a new text document.
 void setXhtml(boolean value)
          Controls whether the output should attempt to follow the "transitional" XHTML rules so that it meets the "HTML Compatibility Guidelines" appendix in the XHTML specification.
 void skippedEntity(String name)
          SAX1 : indicates a non-expanded entity reference
private static boolean spaceBefore(String tag)
           
private static boolean spacePreserve(String tag)
           
 void startCDATA()
          SAX2 : called before parsing CDATA characters
 void startDocument()
          SAX1 : indicates the beginning of a document parse.
 void startDTD(String name, String publicId, String systemId)
          SAX2 : called when the doctype is partially parsed Note that this, like other doctype related calls, is ignored when XHTML is in use.
 void startElement(String uri, String localName, String qName, Attributes atts)
          SAX2 : indicates the start of an element.
 void startEntity(String name)
          SAX2 : called before parsing a general entity in content
 void startPrefixMapping(String prefix, String uri)
          SAX2 : ignored.
 void unparsedEntityDecl(String name, String publicId, String systemId, String notationName)
          SAX1 : called on unparsed entity declarations
 void write(String data)
          Writes the string as if characters() had been called on the contents of the string.
 void writeElement(String uri, String localName, String qName, Attributes atts, int content)
          Writes an element that has content consisting of a single integer, encoded as a decimal string.
 void writeElement(String uri, String localName, String qName, Attributes atts, String content)
          Writes an element that has content consisting of a single string.
 void writeEmptyElement(String uri, String localName, String qName, Attributes atts)
          Writes an empty element.
private  void writeQuotedValue(String value, int code)
           
private  void writeStartTag(String name, Attributes atts, boolean isEmpty)
           
 void xmlDecl(String version, String encoding, String standalone)
          Receive a notification of the XML declaration.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CTX_ENTITY

private static final int CTX_ENTITY
See Also:
Constant Field Values

CTX_ATTRIBUTE

private static final int CTX_ATTRIBUTE
See Also:
Constant Field Values

CTX_CONTENT

private static final int CTX_CONTENT
See Also:
Constant Field Values

CTX_UNPARSED

private static final int CTX_UNPARSED
See Also:
Constant Field Values

CTX_NAME

private static final int CTX_NAME
See Also:
Constant Field Values

out

private Writer out

inCDATA

private boolean inCDATA

elementNestLevel

private int elementNestLevel

eol

private static final String eol
See Also:
Constant Field Values

stringBuf

private StringBuilder stringBuf

locator

private Locator locator

errHandler

private ErrorHandler errHandler

expandingEntities

private boolean expandingEntities

entityNestLevel

private int entityNestLevel

xhtml

private boolean xhtml

startedDoctype

private boolean startedDoctype

canonical

private boolean canonical

inDoctype

private boolean inDoctype

inEpilogue

private boolean inEpilogue

prettyPrinting

private boolean prettyPrinting

column

private int column

noWrap

private boolean noWrap

space

private Stack<String> space

lineLength

private static final int lineLength
See Also:
Constant Field Values
Constructor Detail

XMLWriter

public XMLWriter()
          throws IOException
Constructs this handler with System.out used to write SAX events using the UTF-8 encoding. Avoid using this except when you know it's safe to close System.out at the end of the document.

Throws:
IOException

XMLWriter

public XMLWriter(OutputStream out)
          throws IOException
Constructs a handler which writes all input to the output stream in the UTF-8 encoding, and closes it when endDocument is called. (Yes it's annoying that this throws an exception -- but there's really no way around it, since it's barely possible a JDK may exist somewhere that doesn't know how to emit UTF-8.)

Throws:
IOException

XMLWriter

public XMLWriter(Writer writer)
Constructs a handler which writes all input to the writer, and then closes the writer when the document ends.

See the description of the constructor which takes an encoding name for imporant information about selection of encodings.

Parameters:
writer - XML text is written to this writer.
Method Detail

setWriter

public final void setWriter(Writer writer)
Resets the handler to write a new text document.

Parameters:
writer - XML text is written to this writer.
Throws:
IllegalStateException - if the current document hasn't yet ended (with endDocument())

setErrorHandler

public void setErrorHandler(ErrorHandler handler)
Assigns the error handler to be used to present most fatal errors.


fatal

protected void fatal(String message,
                     Exception e)
              throws SAXException
Used internally and by subclasses, this encapsulates the logic involved in reporting fatal errors. It uses locator information for good diagnostics, if available, and gives the application's ErrorHandler the opportunity to handle the error before throwing an exception.

Throws:
SAXException

setXhtml

public final void setXhtml(boolean value)
Controls whether the output should attempt to follow the "transitional" XHTML rules so that it meets the "HTML Compatibility Guidelines" appendix in the XHTML specification. XHTML empty elements are printed specially.

When this option is enabled, it is the caller's responsibility to ensure that the input is otherwise valid as XHTML. Things to be careful of in all cases, as described in the appendix referenced above, include:

Additionally, some of the oldest browsers have additional quirks, to address with guidelines such as:

Also, some characteristics of the resulting output may be a function of whether the document is later given a MIME content type of text/html rather than one indicating XML ( application/xml or text/xml). Worse, some browsers ignore MIME content types and prefer to rely URI name suffixes -- so an "index.xml" could always be XML, never XHTML, no matter its MIME type.


isXhtml

public final boolean isXhtml()
Returns true if the output attempts to echo the input following "transitional" XHTML rules and matching the "HTML Compatibility Guidelines" so that an HTML version 3 browser can read the output as HTML; returns false (the default) othewise.


setExpandingEntities

public final void setExpandingEntities(boolean value)
Controls whether the output text contains references to entities (the default), or instead contains the expanded values of those entities.


isExpandingEntities

public final boolean isExpandingEntities()
Returns true if the output will have no entity references; returns false (the default) otherwise.


setPrettyPrinting

public final void setPrettyPrinting(boolean value)
Controls pretty-printing, which by default is not enabled (and currently is most useful for XHTML output). Pretty printing enables structural indentation, sorting of attributes by name, line wrapping, and potentially other mechanisms for making output more or less readable.

At this writing, structural indentation and line wrapping are enabled when pretty printing is enabled and the xml:space attribute has the value default (its other legal value is preserve, as defined in the XML specification). The three XHTML element types which use another value are recognized by their names (namespaces are ignored).

Also, for the record, the "pretty" aspect of printing here is more to provide basic structure on outputs that would otherwise risk being a single long line of text. For now, expect the structure to be ragged ... unless you'd like to submit a patch to make this be more strictly formatted!

Throws:
IllegalStateException - thrown if this method is invoked after output has begun.

isPrettyPrinting

public final boolean isPrettyPrinting()
Returns value of flag controlling pretty printing.


setCanonical

public final void setCanonical(boolean value)
Sets the output style to be canonicalized. Input events must meet requirements that are slightly more stringent than the basic well-formedness ones, and include:

Note that fragments of XML documents, as specified by an XPath node set, may be canonicalized. In such cases, elements may need some fixup (for xml:* attributes and application-specific context).

Throws:
IllegalArgumentException - if the output encoding is anything other than UTF-8.

isCanonical

public final boolean isCanonical()
Returns value of flag controlling canonical output.


flush

public final void flush()
                 throws IOException
Flushes the output stream. When this handler is used in long lived pipelines, it can be important to flush buffered state, for example so that it can reach the disk as part of a state checkpoint.

Throws:
IOException

write

public final void write(String data)
                 throws SAXException
Writes the string as if characters() had been called on the contents of the string. This is particularly useful when applications act as producers and write data directly to event consumers.

Throws:
SAXException

writeElement

public void writeElement(String uri,
                         String localName,
                         String qName,
                         Attributes atts,
                         String content)
                  throws SAXException
Writes an element that has content consisting of a single string.

Throws:
SAXException
See Also:
writeEmptyElement(java.lang.String, java.lang.String, java.lang.String, org.xml.sax.Attributes), startElement(java.lang.String, java.lang.String, java.lang.String, org.xml.sax.Attributes)

writeElement

public void writeElement(String uri,
                         String localName,
                         String qName,
                         Attributes atts,
                         int content)
                  throws SAXException
Writes an element that has content consisting of a single integer, encoded as a decimal string.

Throws:
SAXException
See Also:
writeEmptyElement(java.lang.String, java.lang.String, java.lang.String, org.xml.sax.Attributes), startElement(java.lang.String, java.lang.String, java.lang.String, org.xml.sax.Attributes)

setDocumentLocator

public final void setDocumentLocator(Locator l)
SAX1 : provides parser status information

Specified by:
setDocumentLocator in interface ContentHandler

startDocument

public void startDocument()
                   throws SAXException
SAX1 : indicates the beginning of a document parse. If you're writing (well formed) fragments of XML, neither this nor endDocument should be called.

Specified by:
startDocument in interface ContentHandler
Throws:
SAXException

endDocument

public void endDocument()
                 throws SAXException
SAX1 : indicates the completion of a parse. Note that all complete SAX event streams make this call, even if an error is reported during a parse.

Specified by:
endDocument in interface ContentHandler
Throws:
SAXException

isEmptyElementTag

private static final boolean isEmptyElementTag(String tag)

indentBefore

private static boolean indentBefore(String tag)

spaceBefore

private static boolean spaceBefore(String tag)

spacePreserve

private static boolean spacePreserve(String tag)

startPrefixMapping

public final void startPrefixMapping(String prefix,
                                     String uri)
SAX2 : ignored.

Specified by:
startPrefixMapping in interface ContentHandler

endPrefixMapping

public final void endPrefixMapping(String prefix)
SAX2 : ignored.

Specified by:
endPrefixMapping in interface ContentHandler

writeStartTag

private void writeStartTag(String name,
                           Attributes atts,
                           boolean isEmpty)
                    throws SAXException,
                           IOException
Throws:
SAXException
IOException

startElement

public final void startElement(String uri,
                               String localName,
                               String qName,
                               Attributes atts)
                        throws SAXException
SAX2 : indicates the start of an element. When XHTML is in use, avoid attribute values with line breaks or multiple whitespace characters, since not all user agents handle them correctly.

Specified by:
startElement in interface ContentHandler
Throws:
SAXException

writeEmptyElement

public void writeEmptyElement(String uri,
                              String localName,
                              String qName,
                              Attributes atts)
                       throws SAXException
Writes an empty element.

Throws:
SAXException
See Also:
startElement(java.lang.String, java.lang.String, java.lang.String, org.xml.sax.Attributes)

endElement

public final void endElement(String uri,
                             String localName,
                             String qName)
                      throws SAXException
SAX2 : indicates the end of an element

Specified by:
endElement in interface ContentHandler
Throws:
SAXException

characters

public final void characters(char[] ch,
                             int start,
                             int length)
                      throws SAXException
SAX1 : reports content characters

Specified by:
characters in interface ContentHandler
Throws:
SAXException

ignorableWhitespace

public final void ignorableWhitespace(char[] ch,
                                      int start,
                                      int length)
                               throws SAXException
SAX1 : reports ignorable whitespace

Specified by:
ignorableWhitespace in interface ContentHandler
Throws:
SAXException

processingInstruction

public final void processingInstruction(String target,
                                        String data)
                                 throws SAXException
SAX1 : reports a PI. This doesn't check for illegal target names, such as "xml" or "XML", or namespace-incompatible ones like "big:dog"; the caller is responsible for ensuring those names are legal.

Specified by:
processingInstruction in interface ContentHandler
Throws:
SAXException

skippedEntity

public void skippedEntity(String name)
                   throws SAXException
SAX1 : indicates a non-expanded entity reference

Specified by:
skippedEntity in interface ContentHandler
Throws:
SAXException

startCDATA

public final void startCDATA()
                      throws SAXException
SAX2 : called before parsing CDATA characters

Specified by:
startCDATA in interface LexicalHandler
Throws:
SAXException

endCDATA

public final void endCDATA()
                    throws SAXException
SAX2 : called after parsing CDATA characters

Specified by:
endCDATA in interface LexicalHandler
Throws:
SAXException

startDTD

public final void startDTD(String name,
                           String publicId,
                           String systemId)
                    throws SAXException
SAX2 : called when the doctype is partially parsed Note that this, like other doctype related calls, is ignored when XHTML is in use.

Specified by:
startDTD in interface LexicalHandler
Throws:
SAXException

endDTD

public final void endDTD()
                  throws SAXException
SAX2 : called after the doctype is parsed

Specified by:
endDTD in interface LexicalHandler
Throws:
SAXException

startEntity

public final void startEntity(String name)
                       throws SAXException
SAX2 : called before parsing a general entity in content

Specified by:
startEntity in interface LexicalHandler
Throws:
SAXException

endEntity

public final void endEntity(String name)
                     throws SAXException
SAX2 : called after parsing a general entity in content

Specified by:
endEntity in interface LexicalHandler
Throws:
SAXException

comment

public final void comment(char[] ch,
                          int start,
                          int length)
                   throws SAXException
SAX2 : called when comments are parsed. When XHTML is used, the old HTML tradition of using comments to for inline CSS, or for JavaScript code is discouraged. This is because XML processors are encouraged to discard, on the grounds that comments are for users (and perhaps text editors) not programs. Instead, use external scripts

Specified by:
comment in interface LexicalHandler
Throws:
SAXException

notationDecl

public final void notationDecl(String name,
                               String publicId,
                               String systemId)
                        throws SAXException
SAX1 : called on notation declarations

Specified by:
notationDecl in interface DTDHandler
Throws:
SAXException

unparsedEntityDecl

public final void unparsedEntityDecl(String name,
                                     String publicId,
                                     String systemId,
                                     String notationName)
                              throws SAXException
SAX1 : called on unparsed entity declarations

Specified by:
unparsedEntityDecl in interface DTDHandler
Throws:
SAXException

attributeDecl

public final void attributeDecl(String eName,
                                String aName,
                                String type,
                                String mode,
                                String value)
                         throws SAXException
SAX2 : called on attribute declarations

Specified by:
attributeDecl in interface DeclHandler
Throws:
SAXException

elementDecl

public final void elementDecl(String name,
                              String model)
                       throws SAXException
SAX2 : called on element declarations

Specified by:
elementDecl in interface DeclHandler
Throws:
SAXException

externalEntityDecl

public final void externalEntityDecl(String name,
                                     String publicId,
                                     String systemId)
                              throws SAXException
SAX2 : called on external entity declarations

Specified by:
externalEntityDecl in interface DeclHandler
Throws:
SAXException

internalEntityDecl

public final void internalEntityDecl(String name,
                                     String value)
                              throws SAXException
SAX2 : called on internal entity declarations

Specified by:
internalEntityDecl in interface DeclHandler
Throws:
SAXException

xmlDecl

public void xmlDecl(String version,
                    String encoding,
                    String standalone)
             throws SAXException
Description copied from interface: XmlDeclarationHandler
Receive a notification of the XML declaration.

Specified by:
xmlDecl in interface XmlDeclarationHandler
Parameters:
version - the XML version number
encoding - the encoding pseudo-attribute or null if not present
standalone - the standalone pseudo-attribute or null if not present
Throws:
SAXException - any SAX exception, possibly wrapping another exception
See Also:
XmlDeclarationHandler.xmlDecl(java.lang.String, java.lang.String, java.lang.String)

writeQuotedValue

private void writeQuotedValue(String value,
                              int code)
                       throws SAXException,
                              IOException
Throws:
SAXException
IOException

escapeChars

private void escapeChars(char[] buf,
                         int off,
                         int len,
                         int code)
                  throws SAXException,
                         IOException
Throws:
SAXException
IOException

newline

private void newline()
              throws SAXException,
                     IOException
Throws:
SAXException
IOException

doIndent

private void doIndent()
               throws SAXException,
                      IOException
Throws:
SAXException
IOException

rawWrite

private void rawWrite(char c)
               throws IOException
Throws:
IOException

rawWrite

private void rawWrite(String s)
               throws SAXException,
                      IOException
Throws:
SAXException
IOException

rawWrite

private void rawWrite(char[] buf,
                      int offset,
                      int length)
               throws SAXException,
                      IOException
Throws:
SAXException
IOException