fi.iki.hsivonen.xml.checker
Class NormalizationChecker

java.lang.Object
  extended by fi.iki.hsivonen.xml.checker.Checker
      extended by fi.iki.hsivonen.xml.checker.NormalizationChecker
All Implemented Interfaces:
ContentHandler

public final class NormalizationChecker
extends Checker

Checks that the following constructs do not start with a composing character:

Checks that the following constructs are in the Unicode Normalization Form C. (It is assumed the normalization of the rest of the constructs is enforced by other means, such as checking the document source for normalization.)

All Strings must be valid UTF-16!

This class can also be used as a source code mode where the source code of the document is fed to characters(). The mode modifies the error messages appropriately.

Version:
$Id: NormalizationChecker.java,v 1.6 2006/12/01 12:34:31 hsivonen Exp $
Author:
hsivonen

Field Summary
private  boolean alreadyComplainedAboutThisRun
          Indicates whether the current run has already caused an error.
private  boolean atStartOfRun
          Indicates whether the checker the next call to characters() is the first call in a run.
private  char[] buf
          A buffer for holding sequences overlap the SAX buffer boundary.
private  char[] bufHolder
          A holder for the original buffer (for the memory leak prevention mechanism).
private static com.ibm.icu.text.UnicodeSet COMPOSING_CHARACTERS
          A thread-safe set of composing characters as per Charmod Norm.
private  int pos
          The current used length of the buffer, i.e.
private  boolean sourceTextMode
          Indicates whether error messages related to source code checking should be used.
 
Constructor Summary
NormalizationChecker()
          Constructor for non-source mode.
NormalizationChecker(boolean sourceTextMode)
          Constructor with mode selection.
 
Method Summary
private  void appendToBuf(char[] ch, int start, int end)
          Appends a slice of an UTF-16 code unit array to the internal buffer.
 void characters(char[] ch, int start, int length)
          In the normal mode, this method has the usual SAX semantics.
 void endElement(String uri, String localName, String qName)
           
private  void errAboutTextRun()
          Emits an error stating that the current text run or the source text is not in NFC.
 void flush()
          Called to indicate the end of a run of characters.
private static boolean isComposingChar(int c)
          Returns true if the argument is a composing character and false otherwise.
private static boolean isComposingCharOrSurrogate(char c)
          Returns true if the argument is a composing BMP character or a surrogate and false otherwise.
 void processingInstruction(String target, String data)
           
 void reset()
          Does nothing.
 void startElement(String uri, String localName, String qName, Attributes atts)
           
 void startPrefixMapping(String prefix, String uri)
           
static boolean startsWithComposingChar(String str)
          Returns true if the argument starts with a composing character and false otherwise.
 
Methods inherited from class fi.iki.hsivonen.xml.checker.Checker
endDocument, endPrefixMapping, err, getDocumentLocator, getErrorHandler, ignorableWhitespace, setDocumentLocator, setErrorHandler, skippedEntity, startDocument, warn
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

COMPOSING_CHARACTERS

private static final com.ibm.icu.text.UnicodeSet COMPOSING_CHARACTERS
A thread-safe set of composing characters as per Charmod Norm.


buf

private char[] buf
A buffer for holding sequences overlap the SAX buffer boundary.


bufHolder

private char[] bufHolder
A holder for the original buffer (for the memory leak prevention mechanism).


pos

private int pos
The current used length of the buffer, i.e. the index of the first slot that does not hold current data.


atStartOfRun

private boolean atStartOfRun
Indicates whether the checker the next call to characters() is the first call in a run.


alreadyComplainedAboutThisRun

private boolean alreadyComplainedAboutThisRun
Indicates whether the current run has already caused an error.


sourceTextMode

private final boolean sourceTextMode
Indicates whether error messages related to source code checking should be used.

Constructor Detail

NormalizationChecker

public NormalizationChecker()
Constructor for non-source mode.


NormalizationChecker

public NormalizationChecker(boolean sourceTextMode)
Constructor with mode selection.

Parameters:
sourceTextMode - whether the source text-related messages should be enabled.
Method Detail

isComposingCharOrSurrogate

private static boolean isComposingCharOrSurrogate(char c)
Returns true if the argument is a composing BMP character or a surrogate and false otherwise.

Parameters:
c - a UTF-16 code unit
Returns:
true if the argument is a composing BMP character or a surrogate and false otherwise

isComposingChar

private static boolean isComposingChar(int c)
Returns true if the argument is a composing character and false otherwise.

Parameters:
c - a Unicode code point
Returns:
true if the argument is a composing character false otherwise

startsWithComposingChar

public static boolean startsWithComposingChar(String str)
                                       throws SAXException
Returns true if the argument starts with a composing character and false otherwise.

Parameters:
str - a string
Returns:
true if the argument starts with a composing character and false otherwise.
Throws:
SAXException - on malformed UTF-16

reset

public void reset()
Description copied from class: Checker
Does nothing. Subclasses are expected to override this method with an implementation that clears the state of the checker and releases objects the checker might hold references to.

Overrides:
reset in class Checker
See Also:
Checker.reset()

characters

public void characters(char[] ch,
                       int start,
                       int length)
                throws SAXException
In the normal mode, this method has the usual SAX semantics. In the source text mode, this method is used for reporting the source text.

Specified by:
characters in interface ContentHandler
Overrides:
characters in class Checker
Throws:
SAXException
See Also:
Checker.characters(char[], int, int)

errAboutTextRun

private void errAboutTextRun()
                      throws SAXException
Emits an error stating that the current text run or the source text is not in NFC.

Throws:
SAXException - if the ErrorHandler throws

appendToBuf

private void appendToBuf(char[] ch,
                         int start,
                         int end)
Appends a slice of an UTF-16 code unit array to the internal buffer.

Parameters:
ch - the array from which to copy
start - the index of the first element that is copied
end - the index of the first element that is not copied

endElement

public void endElement(String uri,
                       String localName,
                       String qName)
                throws SAXException
Specified by:
endElement in interface ContentHandler
Overrides:
endElement in class Checker
Throws:
SAXException
See Also:
Checker.endElement(java.lang.String, java.lang.String, java.lang.String)

processingInstruction

public void processingInstruction(String target,
                                  String data)
                           throws SAXException
Specified by:
processingInstruction in interface ContentHandler
Overrides:
processingInstruction in class Checker
Throws:
SAXException
See Also:
Checker.processingInstruction(java.lang.String, java.lang.String)

startElement

public void startElement(String uri,
                         String localName,
                         String qName,
                         Attributes atts)
                  throws SAXException
Specified by:
startElement in interface ContentHandler
Overrides:
startElement in class Checker
Throws:
SAXException
See Also:
Checker.startElement(java.lang.String, java.lang.String, java.lang.String, org.xml.sax.Attributes)

startPrefixMapping

public void startPrefixMapping(String prefix,
                               String uri)
                        throws SAXException
Specified by:
startPrefixMapping in interface ContentHandler
Overrides:
startPrefixMapping in class Checker
Throws:
SAXException
See Also:
Checker.startPrefixMapping(java.lang.String, java.lang.String)

flush

public void flush()
           throws SAXException
Called to indicate the end of a run of characters. When this class is used for checking source text, this method should be called after all the calls to characters().

Throws:
SAXException - if the ErrorHandler throws.