org.apache.poi.util
Class StringUtil

java.lang.Object
  extended by org.apache.poi.util.StringUtil

@Internal
public class StringUtil
extends java.lang.Object

Collection of string handling utilities


Nested Class Summary
static class StringUtil.StringsIterator
          An Iterator over an array of Strings.
 
Field Summary
static java.nio.charset.Charset BIG5
           
protected static java.nio.charset.Charset ISO_8859_1
           
static java.nio.charset.Charset UTF16LE
           
static java.nio.charset.Charset UTF8
           
static java.nio.charset.Charset WIN_1252
           
 
Method Summary
static int countMatches(java.lang.CharSequence haystack, char needle)
          Count number of occurrences of needle in haystack Has same signature as org.apache.commons.lang3.StringUtils#countMatches
static boolean endsWithIgnoreCase(java.lang.String haystack, java.lang.String suffix)
          Tests if the string ends with the specified suffix, ignoring case consideration.
static int getEncodedSize(java.lang.String value)
           
static java.lang.String getFromCompressedUnicode(byte[] string, int offset, int len)
          Read 8 bit data (in ISO-8859-1 codepage) into a (unicode) Java String and return.
static java.lang.String getFromUnicodeLE(byte[] string)
          Given a byte array of 16-bit unicode characters in little endian format (most important byte last), return a Java String representation of it.
static java.lang.String getFromUnicodeLE(byte[] string, int offset, int len)
          Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it.
static java.lang.String getPreferredEncoding()
           
static byte[] getToUnicodeLE(java.lang.String string)
          Convert String to 16-bit unicode characters in little endian format
static boolean hasMultibyte(java.lang.String value)
          check the parameter has multibyte character
static boolean isUnicodeString(java.lang.String value)
          Checks to see if a given String needs to be represented as Unicode
static java.lang.String join(java.lang.Object[] array)
           
static java.lang.String join(java.lang.Object[] array, java.lang.String separator)
           
static java.lang.String join(java.lang.String separator, java.lang.Object... array)
           
static void mapMsCodepoint(int msCodepoint, int unicodeCodepoint)
           
static java.lang.String mapMsCodepointString(java.lang.String string)
          Some strings may contain encoded characters of the unicode private use area.
static void putCompressedUnicode(java.lang.String input, byte[] output, int offset)
          Takes a unicode (java) string, and returns it as 8 bit data (in ISO-8859-1 codepage).
static void putCompressedUnicode(java.lang.String input, LittleEndianOutput out)
           
static void putUnicodeLE(java.lang.String input, byte[] output, int offset)
          Takes a unicode string, and returns it as little endian (most important byte last) bytes in the supplied byte array.
static void putUnicodeLE(java.lang.String input, LittleEndianOutput out)
           
static java.lang.String readCompressedUnicode(LittleEndianInput in, int nChars)
           
static java.lang.String readUnicodeLE(LittleEndianInput in, int nChars)
           
static java.lang.String readUnicodeString(LittleEndianInput in)
          InputStream in is expected to contain: ushort nChars byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0.
static java.lang.String readUnicodeString(LittleEndianInput in, int nChars)
          InputStream in is expected to contain: byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0.
static boolean startsWithIgnoreCase(java.lang.String haystack, java.lang.String prefix)
          Tests if the string starts with the specified prefix, ignoring case consideration.
static void writeUnicodeString(LittleEndianOutput out, java.lang.String value)
          OutputStream out will get: ushort nChars byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0.
static void writeUnicodeStringFlagAndData(LittleEndianOutput out, java.lang.String value)
          OutputStream out will get: byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ISO_8859_1

protected static final java.nio.charset.Charset ISO_8859_1

UTF16LE

public static final java.nio.charset.Charset UTF16LE

UTF8

public static final java.nio.charset.Charset UTF8

WIN_1252

public static final java.nio.charset.Charset WIN_1252

BIG5

public static final java.nio.charset.Charset BIG5
Method Detail

getFromUnicodeLE

public static java.lang.String getFromUnicodeLE(byte[] string,
                                                int offset,
                                                int len)
                                         throws java.lang.ArrayIndexOutOfBoundsException,
                                                java.lang.IllegalArgumentException
Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it. { 0x16, 0x00 } -0x16

Parameters:
string - the byte array to be converted
offset - the initial offset into the byte array. it is assumed that string[ offset ] and string[ offset + 1 ] contain the first 16-bit unicode character
len - the length of the final string
Returns:
the converted string, never null.
Throws:
java.lang.ArrayIndexOutOfBoundsException - if offset is out of bounds for the byte array (i.e., is negative or is greater than or equal to string.length)
java.lang.IllegalArgumentException - if len is too large (i.e., there is not enough data in string to create a String of that length)

getFromUnicodeLE

public static java.lang.String getFromUnicodeLE(byte[] string)
Given a byte array of 16-bit unicode characters in little endian format (most important byte last), return a Java String representation of it. { 0x16, 0x00 } -0x16

Parameters:
string - the byte array to be converted
Returns:
the converted string, never null

getToUnicodeLE

public static byte[] getToUnicodeLE(java.lang.String string)
Convert String to 16-bit unicode characters in little endian format

Parameters:
string - the string
Returns:
the byte array of 16-bit unicode characters

getFromCompressedUnicode

public static java.lang.String getFromCompressedUnicode(byte[] string,
                                                        int offset,
                                                        int len)
Read 8 bit data (in ISO-8859-1 codepage) into a (unicode) Java String and return. (In Excel terms, read compressed 8 bit unicode as a string)

Parameters:
string - byte array to read
offset - offset to read byte array
len - length to read byte array
Returns:
String generated String instance by reading byte array

readCompressedUnicode

public static java.lang.String readCompressedUnicode(LittleEndianInput in,
                                                     int nChars)

readUnicodeString

public static java.lang.String readUnicodeString(LittleEndianInput in)
InputStream in is expected to contain:
  1. ushort nChars
  2. byte is16BitFlag
  3. byte[]/char[] characterData
For this encoding, the is16BitFlag is always present even if nChars==0. This structure is also known as a XLUnicodeString.


readUnicodeString

public static java.lang.String readUnicodeString(LittleEndianInput in,
                                                 int nChars)
InputStream in is expected to contain:
  1. byte is16BitFlag
  2. byte[]/char[] characterData
For this encoding, the is16BitFlag is always present even if nChars==0.
This method should be used when the nChars field is not stored as a ushort immediately before the is16BitFlag. Otherwise, readUnicodeString(LittleEndianInput) can be used.


writeUnicodeString

public static void writeUnicodeString(LittleEndianOutput out,
                                      java.lang.String value)
OutputStream out will get:
  1. ushort nChars
  2. byte is16BitFlag
  3. byte[]/char[] characterData
For this encoding, the is16BitFlag is always present even if nChars==0.


writeUnicodeStringFlagAndData

public static void writeUnicodeStringFlagAndData(LittleEndianOutput out,
                                                 java.lang.String value)
OutputStream out will get:
  1. byte is16BitFlag
  2. byte[]/char[] characterData
For this encoding, the is16BitFlag is always present even if nChars==0.
This method should be used when the nChars field is not stored as a ushort immediately before the is16BitFlag. Otherwise, writeUnicodeString(LittleEndianOutput, String) can be used.


getEncodedSize

public static int getEncodedSize(java.lang.String value)
Returns:
the number of bytes that would be written by writeUnicodeString(LittleEndianOutput, String)

putCompressedUnicode

public static void putCompressedUnicode(java.lang.String input,
                                        byte[] output,
                                        int offset)
Takes a unicode (java) string, and returns it as 8 bit data (in ISO-8859-1 codepage). (In Excel terms, write compressed 8 bit unicode)

Parameters:
input - the String containing the data to be written
output - the byte array to which the data is to be written
offset - an offset into the byte arrat at which the data is start when written

putCompressedUnicode

public static void putCompressedUnicode(java.lang.String input,
                                        LittleEndianOutput out)

putUnicodeLE

public static void putUnicodeLE(java.lang.String input,
                                byte[] output,
                                int offset)
Takes a unicode string, and returns it as little endian (most important byte last) bytes in the supplied byte array. (In Excel terms, write uncompressed unicode)

Parameters:
input - the String containing the unicode data to be written
output - the byte array to hold the uncompressed unicode, should be twice the length of the String
offset - the offset to start writing into the byte array

putUnicodeLE

public static void putUnicodeLE(java.lang.String input,
                                LittleEndianOutput out)

readUnicodeLE

public static java.lang.String readUnicodeLE(LittleEndianInput in,
                                             int nChars)

getPreferredEncoding

public static java.lang.String getPreferredEncoding()
Returns:
the encoding we want to use, currently hardcoded to ISO-8859-1

hasMultibyte

public static boolean hasMultibyte(java.lang.String value)
check the parameter has multibyte character

Parameters:
value - string to check
Returns:
boolean result true:string has at least one multibyte character

isUnicodeString

public static boolean isUnicodeString(java.lang.String value)
Checks to see if a given String needs to be represented as Unicode

Parameters:
value - The string to look at.
Returns:
true if string needs Unicode to be represented.

startsWithIgnoreCase

public static boolean startsWithIgnoreCase(java.lang.String haystack,
                                           java.lang.String prefix)
Tests if the string starts with the specified prefix, ignoring case consideration.


endsWithIgnoreCase

public static boolean endsWithIgnoreCase(java.lang.String haystack,
                                         java.lang.String suffix)
Tests if the string ends with the specified suffix, ignoring case consideration.


mapMsCodepointString

public static java.lang.String mapMsCodepointString(java.lang.String string)
Some strings may contain encoded characters of the unicode private use area. Currently the characters of the symbol fonts are mapped to the corresponding characters in the normal unicode range.

Parameters:
string - the original string
Returns:
the string with mapped characters
See Also:
Private Use Area (symbol), Symbol font - Unicode alternatives for Greek and special characters in HTML

mapMsCodepoint

public static void mapMsCodepoint(int msCodepoint,
                                  int unicodeCodepoint)

join

@Internal
public static java.lang.String join(java.lang.Object[] array,
                                             java.lang.String separator)

join

@Internal
public static java.lang.String join(java.lang.Object[] array)

join

@Internal
public static java.lang.String join(java.lang.String separator,
                                             java.lang.Object... array)

countMatches

public static int countMatches(java.lang.CharSequence haystack,
                               char needle)
Count number of occurrences of needle in haystack Has same signature as org.apache.commons.lang3.StringUtils#countMatches

Parameters:
haystack - the CharSequence to check, may be null
needle - the character to count the quantity of
Returns:
the number of occurrences, 0 if the CharSequence is null