Package org.w3c.tidy
Class TidyUtils
- java.lang.Object
-
- org.w3c.tidy.TidyUtils
-
public final class TidyUtils extends java.lang.ObjectUtility class with handy methods, mainly for String handling or for reproducing c behaviours.- Version:
- $Revision $ ($Author $)
-
-
Field Summary
Fields Modifier and Type Field Description private static shortDIGITchar type: digit.private static shortLETTERchar type: letter.private static short[]lexmapused to classify chars for lexical purposes.private static shortLOWERCASEchar type: lowercase.private static shortNAMECHARchar type: namechar.private static shortNEWLINEchar type: newline.private static shortUPPERCASEchar type: uppercase.private static shortWHITEchar type: whitespace.
-
Constructor Summary
Constructors Modifier Constructor Description privateTidyUtils()utility class, don't instantiate.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static booleanfindBadSubString(java.lang.String s, java.lang.String p, int len)Return true if substring s is in p and isn't all in upper case.static charfoldCase(char c, boolean tocaps, boolean xmlTags)Fold case of a char.static byte[]getBytes(java.lang.String str)Should always be able convert to/from UTF-8, so encoding exceptions are converted to an Error to avoid adding throws declarations in lots of methods.static java.lang.StringgetString(byte[] bytes, int offset, int length)Should always be able convert to/from UTF-8, so encoding exceptions are converted to an Error to avoid adding throws declarations in lots of methods.static booleanisCharEncodingSupported(java.lang.String name)Is the given character encoding supported?static booleanisDigit(char c)Is the given char a digit?(package private) static booleanisInValuesIgnoreCase(java.lang.String[] validValues, java.lang.String valueToCheck)Check if the string valueToCheck is contained in validValues array (case insesitie comparison).static booleanisLetter(char c)Is the given char a letter?static booleanisLower(char c)Determines if the specified character is a lowercase character.static booleanisNamechar(char c)Is the given char valid in name? (letter, digit or "-", ".", ":", "_")(package private) static booleanisQuote(int c)Is the given character a single or double quote?static booleanisUpper(char c)Determines if the specified character is a uppercase character.static booleanisWhite(char c)Determines if the specified character is whitespace.(package private) static booleanisxdigit(char c)Is the character a hex digit?(package private) static booleanisXMLLetter(char c)Is the given char a valid xml letter?(package private) static booleanisXMLNamechar(char c)Is the given char valid in xml name?static intlastChar(java.lang.String str)Return the last char in string.private static shortmap(char c)Returns the constant which defines the classification of char in lexmap.private static voidmapStr(java.lang.String str, short code)Classify chars in String and put them in lexmap.(package private) static booleantoBoolean(int value)Converts a int to a boolean.static chartoLower(char c)Maps the given character to its lowercase equivalent.(package private) static inttoUnsigned(int c)convert an int to unsigned (& 0xFF).static chartoUpper(char c)Maps the given character to its uppercase equivalent.(package private) static intwstrnchr(java.lang.String s1, int len1, char cc)return offset of cc from beginning of s1, -1 if not found.(package private) static booleanwsubstr(java.lang.String s1, java.lang.String s2)Same as wsubstrn, but without a specified length.(package private) static booleanwsubstrn(java.lang.String s1, int len1, java.lang.String s2)check if the first String contains the second one.(package private) static booleanwsubstrncase(java.lang.String s1, int len1, java.lang.String s2)check if the first String contains the second one (ignore case).
-
-
-
Field Detail
-
DIGIT
private static final short DIGIT
char type: digit.- See Also:
- Constant Field Values
-
LETTER
private static final short LETTER
char type: letter.- See Also:
- Constant Field Values
-
NAMECHAR
private static final short NAMECHAR
char type: namechar.- See Also:
- Constant Field Values
-
WHITE
private static final short WHITE
char type: whitespace.- See Also:
- Constant Field Values
-
NEWLINE
private static final short NEWLINE
char type: newline.- See Also:
- Constant Field Values
-
LOWERCASE
private static final short LOWERCASE
char type: lowercase.- See Also:
- Constant Field Values
-
UPPERCASE
private static final short UPPERCASE
char type: uppercase.- See Also:
- Constant Field Values
-
lexmap
private static short[] lexmap
used to classify chars for lexical purposes.
-
-
Method Detail
-
toBoolean
static boolean toBoolean(int value)
Converts a int to a boolean.- Parameters:
value- int value- Returns:
trueif value is != 0
-
toUnsigned
static int toUnsigned(int c)
convert an int to unsigned (& 0xFF).- Parameters:
c- signed int- Returns:
- unsigned int
-
wsubstrn
static boolean wsubstrn(java.lang.String s1, int len1, java.lang.String s2)check if the first String contains the second one.- Parameters:
s1- full Stringlen1- maximum position in Strings2- String to search for- Returns:
- true if s1 contains s2 in the range 0-len1
-
wsubstrncase
static boolean wsubstrncase(java.lang.String s1, int len1, java.lang.String s2)check if the first String contains the second one (ignore case).- Parameters:
s1- full Stringlen1- maximum position in Strings2- String to search for- Returns:
- true if s1 contains s2 in the range 0-len1
-
wstrnchr
static int wstrnchr(java.lang.String s1, int len1, char cc)return offset of cc from beginning of s1, -1 if not found.- Parameters:
s1- Stringlen1- maximum offset (values > than lenl are ignored and returned as -1)cc- character to search for- Returns:
- index of cc in s1
-
wsubstr
static boolean wsubstr(java.lang.String s1, java.lang.String s2)Same as wsubstrn, but without a specified length.- Parameters:
s1- full Strings2- String to search for- Returns:
trueif s2 is found in s2 (case insensitive search)
-
isxdigit
static boolean isxdigit(char c)
Is the character a hex digit?- Parameters:
c- char- Returns:
trueif he given character is a hex digit
-
isInValuesIgnoreCase
static boolean isInValuesIgnoreCase(java.lang.String[] validValues, java.lang.String valueToCheck)Check if the string valueToCheck is contained in validValues array (case insesitie comparison).- Parameters:
validValues- array of valid valuesvalueToCheck- value to search for- Returns:
trueif valueToCheck is found in validValues
-
findBadSubString
public static boolean findBadSubString(java.lang.String s, java.lang.String p, int len)Return true if substring s is in p and isn't all in upper case. This is used to check the case of SYSTEM, PUBLIC, DTD and EN.- Parameters:
s- substringp- full stringlen- how many chars to check in p- Returns:
- true if substring s is in p and isn't all in upper case
-
isXMLLetter
static boolean isXMLLetter(char c)
Is the given char a valid xml letter?- Parameters:
c- char- Returns:
trueif the char is a valid xml letter
-
isXMLNamechar
static boolean isXMLNamechar(char c)
Is the given char valid in xml name?- Parameters:
c- char- Returns:
trueif the char is a valid xml name char
-
isQuote
static boolean isQuote(int c)
Is the given character a single or double quote?- Parameters:
c- char- Returns:
trueif c is " or '
-
getBytes
public static byte[] getBytes(java.lang.String str)
Should always be able convert to/from UTF-8, so encoding exceptions are converted to an Error to avoid adding throws declarations in lots of methods.- Parameters:
str- String- Returns:
- utf8 bytes
- See Also:
String.getBytes()
-
getString
public static java.lang.String getString(byte[] bytes, int offset, int length)Should always be able convert to/from UTF-8, so encoding exceptions are converted to an Error to avoid adding throws declarations in lots of methods.- Parameters:
bytes- byte arrayoffset- starting offset in byte arraylength- length in byte array starting from offset- Returns:
- same as
new String(bytes, offset, length, "UTF8")
-
lastChar
public static int lastChar(java.lang.String str)
Return the last char in string. This is useful when trailing quotemark is missing on an attribute- Parameters:
str- String- Returns:
- last char in String
-
isWhite
public static boolean isWhite(char c)
Determines if the specified character is whitespace.- Parameters:
c- char- Returns:
trueif char is whitespace.
-
isDigit
public static boolean isDigit(char c)
Is the given char a digit?- Parameters:
c- char- Returns:
trueif the given char is a digit
-
isLetter
public static boolean isLetter(char c)
Is the given char a letter?- Parameters:
c- char- Returns:
trueif the given char is a letter
-
isNamechar
public static boolean isNamechar(char c)
Is the given char valid in name? (letter, digit or "-", ".", ":", "_")- Parameters:
c- char- Returns:
trueif char is a name char.
-
isLower
public static boolean isLower(char c)
Determines if the specified character is a lowercase character.- Parameters:
c- char- Returns:
trueif char is lower case.
-
isUpper
public static boolean isUpper(char c)
Determines if the specified character is a uppercase character.- Parameters:
c- char- Returns:
trueif char is upper case.
-
toLower
public static char toLower(char c)
Maps the given character to its lowercase equivalent.- Parameters:
c- char- Returns:
- lowercase char.
-
toUpper
public static char toUpper(char c)
Maps the given character to its uppercase equivalent.- Parameters:
c- char- Returns:
- uppercase char.
-
foldCase
public static char foldCase(char c, boolean tocaps, boolean xmlTags)Fold case of a char.- Parameters:
c- chartocaps- convert to capsxmlTags- use xml tags? If true no change will be performed- Returns:
- folded char
-
mapStr
private static void mapStr(java.lang.String str, short code)Classify chars in String and put them in lexmap.- Parameters:
str- Stringcode- code associated to chars in the String
-
map
private static short map(char c)
Returns the constant which defines the classification of char in lexmap.- Parameters:
c- char- Returns:
- char type
-
isCharEncodingSupported
public static boolean isCharEncodingSupported(java.lang.String name)
Is the given character encoding supported?- Parameters:
name- character encoding name- Returns:
trueif encoding is supported, false otherwhise.
-
-