StringUtil (POI API Documentation)

java.lang.Object
- org.apache.poi.util.StringUtil

@Internal
public class StringUtil
extends java.lang.Object

Collection of string handling utilities

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class StringUtil.StringsIterator
An Iterator over an array of Strings.

Nested Classes
Modifier and Type	Class and Description
`static class`	`StringUtil.StringsIterator` An Iterator over an array of Strings.

Field Summary

Fields
Modifier and Type	Field and Description
`static java.nio.charset.Charset`	`BIG5`
`protected static java.nio.charset.Charset`	`ISO_8859_1`
`static java.nio.charset.Charset`	`UTF16LE`
`static java.nio.charset.Charset`	`UTF8`
`static java.nio.charset.Charset`	`WIN_1252`

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static int`	`countMatches(java.lang.CharSequence haystack, char needle)` Count number of occurrences of needle in haystack Has same signature as org.apache.commons.lang3.StringUtils#countMatches
`static boolean`	`endsWithIgnoreCase(java.lang.String haystack, java.lang.String suffix)` Tests if the string ends with the specified suffix, ignoring case consideration.
`static int`	`getEncodedSize(java.lang.String value)`
`static java.lang.String`	`getFromCompressedUnicode(byte[] string, int offset, int len)` Read 8 bit data (in ISO-8859-1 codepage) into a (unicode) Java String and return.
`static java.lang.String`	`getFromUnicodeLE(byte[] string)` Given a byte array of 16-bit unicode characters in little endian format (most important byte last), return a Java String representation of it.
`static java.lang.String`	`getFromUnicodeLE(byte[] string, int offset, int len)` Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it.
`static java.lang.String`	`getFromUnicodeLE0Terminated(byte[] string, int offset, int len)` Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it.
`static java.lang.String`	`getPreferredEncoding()`
`static byte[]`	`getToUnicodeLE(java.lang.String string)` Convert String to 16-bit unicode characters in little endian format
`static boolean`	`hasMultibyte(java.lang.String value)` check the parameter has multibyte character
`static boolean`	`isUnicodeString(java.lang.String value)` Checks to see if a given String needs to be represented as Unicode
`static boolean`	`isUpperCase(char c)`
`static java.lang.String`	`join(java.lang.Object[] array)`
`static java.lang.String`	`join(java.lang.Object[] array, java.lang.String separator)`
`static java.lang.String`	`join(java.lang.String separator, java.lang.Object... array)`
`static void`	`mapMsCodepoint(int msCodepoint, int unicodeCodepoint)`
`static java.lang.String`	`mapMsCodepointString(java.lang.String string)` Some strings may contain encoded characters of the unicode private use area.
`static void`	`putCompressedUnicode(java.lang.String input, byte[] output, int offset)` Takes a unicode (java) string, and returns it as 8 bit data (in ISO-8859-1 codepage).
`static void`	`putCompressedUnicode(java.lang.String input, LittleEndianOutput out)`
`static void`	`putUnicodeLE(java.lang.String input, byte[] output, int offset)` Takes a unicode string, and returns it as little endian (most important byte last) bytes in the supplied byte array.
`static void`	`putUnicodeLE(java.lang.String input, LittleEndianOutput out)`
`static java.lang.String`	`readCompressedUnicode(LittleEndianInput in, int nChars)`
`static java.lang.String`	`readUnicodeLE(LittleEndianInput in, int nChars)`
`static java.lang.String`	`readUnicodeString(LittleEndianInput in)` InputStream `in` is expected to contain: ushort nChars byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0.
`static java.lang.String`	`readUnicodeString(LittleEndianInput in, int nChars)` InputStream `in` is expected to contain: byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0.
`static boolean`	`startsWithIgnoreCase(java.lang.String haystack, java.lang.String prefix)` Tests if the string starts with the specified prefix, ignoring case consideration.
`static java.lang.String`	`toLowerCase(char c)`
`static java.lang.String`	`toUpperCase(char c)`
`static void`	`writeUnicodeString(LittleEndianOutput out, java.lang.String value)` OutputStream `out` will get: ushort nChars byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0.
`static void`	`writeUnicodeStringFlagAndData(LittleEndianOutput out, java.lang.String value)` OutputStream `out` will get: byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - ISO_8859_1
```
protected static final java.nio.charset.Charset ISO_8859_1
```
  - UTF16LE
```
public static final java.nio.charset.Charset UTF16LE
```
  - UTF8
```
public static final java.nio.charset.Charset UTF8
```
  - WIN_1252
```
public static final java.nio.charset.Charset WIN_1252
```
  - BIG5
```
public static final java.nio.charset.Charset BIG5
```
- Method Detail
  - getFromUnicodeLE
```
public static java.lang.String getFromUnicodeLE(byte[] string,
                                                int offset,
                                                int len)
                                         throws java.lang.ArrayIndexOutOfBoundsException,
                                                java.lang.IllegalArgumentException
```
    Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it.
    { 0x16, 0x00 } -0x16
    
    Parameters:
    
    string - the byte array to be converted
    
    offset - the initial offset into the byte array. it is assumed that string[ offset ] and string[ offset + 1 ] contain the first 16-bit unicode character
    
    len - the length of the final string
    
    Returns:
    
    the converted string, never null.
    
    Throws:
    
    java.lang.ArrayIndexOutOfBoundsException - if offset is out of bounds for the byte array (i.e., is negative or is greater than or equal to string.length)
    
    java.lang.IllegalArgumentException - if len is too large (i.e., there is not enough data in string to create a String of that length)
  - getFromUnicodeLE
```
public static java.lang.String getFromUnicodeLE(byte[] string)
```
    Given a byte array of 16-bit unicode characters in little endian format (most important byte last), return a Java String representation of it.
    { 0x16, 0x00 } -0x16
    
    Parameters:
    
    string - the byte array to be converted
    
    Returns:
    
    the converted string, never null
  - getToUnicodeLE
```
public static byte[] getToUnicodeLE(java.lang.String string)
```
    Convert String to 16-bit unicode characters in little endian format
    
    Parameters:
    
    string - the string
    
    Returns:
    
    the byte array of 16-bit unicode characters
  - getFromCompressedUnicode
```
public static java.lang.String getFromCompressedUnicode(byte[] string,
                                                        int offset,
                                                        int len)
```
    Read 8 bit data (in ISO-8859-1 codepage) into a (unicode) Java String and return. (In Excel terms, read compressed 8 bit unicode as a string)
    
    Parameters:
    
    string - byte array to read
    
    offset - offset to read byte array
    
    len - length to read byte array
    
    Returns:
    
    String generated String instance by reading byte array
  - readCompressedUnicode
```
public static java.lang.String readCompressedUnicode(LittleEndianInput in,
                                                     int nChars)
```
  - readUnicodeString
```
public static java.lang.String readUnicodeString(LittleEndianInput in)
```
    InputStream in is expected to contain:
    1. ushort nChars
    2. byte is16BitFlag
    3. byte[]/char[] characterData
    For this encoding, the is16BitFlag is always present even if nChars==0.
    This structure is also known as a XLUnicodeString.
  - readUnicodeString
```
public static java.lang.String readUnicodeString(LittleEndianInput in,
                                                 int nChars)
```
    InputStream in is expected to contain:
    1. byte is16BitFlag
    2. byte[]/char[] characterData
    For this encoding, the is16BitFlag is always present even if nChars==0.
    This method should be used when the nChars field is not stored as a ushort immediately before the is16BitFlag. Otherwise, readUnicodeString(LittleEndianInput) can be used.
  - writeUnicodeString
```
public static void writeUnicodeString(LittleEndianOutput out,
                                      java.lang.String value)
```
    OutputStream out will get:
    1. ushort nChars
    2. byte is16BitFlag
    3. byte[]/char[] characterData
    For this encoding, the is16BitFlag is always present even if nChars==0.
  - writeUnicodeStringFlagAndData
```
public static void writeUnicodeStringFlagAndData(LittleEndianOutput out,
                                                 java.lang.String value)
```
    OutputStream out will get:
    1. byte is16BitFlag
    2. byte[]/char[] characterData
    For this encoding, the is16BitFlag is always present even if nChars==0.
    This method should be used when the nChars field is not stored as a ushort immediately before the is16BitFlag. Otherwise, writeUnicodeString(LittleEndianOutput, String) can be used.
  - getEncodedSize
```
public static int getEncodedSize(java.lang.String value)
```
    Returns:
    
    the number of bytes that would be written by writeUnicodeString(LittleEndianOutput, String)
  - putCompressedUnicode
```
public static void putCompressedUnicode(java.lang.String input,
                                        byte[] output,
                                        int offset)
```
    Takes a unicode (java) string, and returns it as 8 bit data (in ISO-8859-1 codepage). (In Excel terms, write compressed 8 bit unicode)
    
    Parameters:
    
    input - the String containing the data to be written
    
    output - the byte array to which the data is to be written
    
    offset - an offset into the byte arrat at which the data is start when written
  - putCompressedUnicode
```
public static void putCompressedUnicode(java.lang.String input,
                                        LittleEndianOutput out)
```
  - putUnicodeLE
```
public static void putUnicodeLE(java.lang.String input,
                                byte[] output,
                                int offset)
```
    Takes a unicode string, and returns it as little endian (most important byte last) bytes in the supplied byte array. (In Excel terms, write uncompressed unicode)
    
    Parameters:
    
    input - the String containing the unicode data to be written
    
    output - the byte array to hold the uncompressed unicode, should be twice the length of the String
    
    offset - the offset to start writing into the byte array
  - putUnicodeLE
```
public static void putUnicodeLE(java.lang.String input,
                                LittleEndianOutput out)
```
  - readUnicodeLE
```
public static java.lang.String readUnicodeLE(LittleEndianInput in,
                                             int nChars)
```
  - getPreferredEncoding
```
public static java.lang.String getPreferredEncoding()
```
    Returns:
    
    the encoding we want to use, currently hardcoded to ISO-8859-1
  - hasMultibyte
```
public static boolean hasMultibyte(java.lang.String value)
```
    check the parameter has multibyte character
    
    Parameters:
    
    value - string to check
    
    Returns:
    
    boolean result true:string has at least one multibyte character
  - isUnicodeString
```
public static boolean isUnicodeString(java.lang.String value)
```
    Checks to see if a given String needs to be represented as Unicode
    
    Parameters:
    
    value - The string to look at.
    
    Returns:
    
    true if string needs Unicode to be represented.
  - startsWithIgnoreCase
```
public static boolean startsWithIgnoreCase(java.lang.String haystack,
                                           java.lang.String prefix)
```
    Tests if the string starts with the specified prefix, ignoring case consideration.
  - endsWithIgnoreCase
```
public static boolean endsWithIgnoreCase(java.lang.String haystack,
                                         java.lang.String suffix)
```
    Tests if the string ends with the specified suffix, ignoring case consideration.
  - toLowerCase
```
@Internal
public static java.lang.String toLowerCase(char c)
```
  - toUpperCase
```
@Internal
public static java.lang.String toUpperCase(char c)
```
  - isUpperCase
```
@Internal
public static boolean isUpperCase(char c)
```
  - mapMsCodepointString
```
public static java.lang.String mapMsCodepointString(java.lang.String string)
```
    Some strings may contain encoded characters of the unicode private use area. Currently the characters of the symbol fonts are mapped to the corresponding characters in the normal unicode range.
    
    Parameters:
    
    string - the original string
    
    Returns:
    
    the string with mapped characters
    
    See Also:
    
    Private Use Area (symbol), Symbol font - Unicode alternatives for Greek and special characters in HTML
  - mapMsCodepoint
```
public static void mapMsCodepoint(int msCodepoint,
                                  int unicodeCodepoint)
```
  - join
```
@Internal
public static java.lang.String join(java.lang.Object[] array,
                                              java.lang.String separator)
```
  - join
```
@Internal
public static java.lang.String join(java.lang.Object[] array)
```
  - join
```
@Internal
public static java.lang.String join(java.lang.String separator,
                                              java.lang.Object... array)
```
  - countMatches
```
public static int countMatches(java.lang.CharSequence haystack,
                               char needle)
```
    Count number of occurrences of needle in haystack Has same signature as org.apache.commons.lang3.StringUtils#countMatches
    
    Parameters:
    
    haystack - the CharSequence to check, may be null
    
    needle - the character to count the quantity of
    
    Returns:
    
    the number of occurrences, 0 if the CharSequence is null
  - getFromUnicodeLE0Terminated
```
public static java.lang.String getFromUnicodeLE0Terminated(byte[] string,
                                                           int offset,
                                                           int len)
                                                    throws java.lang.ArrayIndexOutOfBoundsException,
                                                           java.lang.IllegalArgumentException
```
    Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it. Scans the byte array for two continous 0 bytes and returns the string before.
    #61881: there seem to be programs out there, which write the 0-termination also at the beginning of the string. Check if the next two bytes contain a valid ascii char and correct the _recdata with a '?' char
    
    Parameters:
    
    string - the byte array to be converted
    
    offset - the initial offset into the byte array. it is assumed that string[ offset ] and string[ offset + 1 ] contain the first 16-bit unicode character
    
    len - the max. length of the final string
    
    Returns:
    
    the converted string, never null.
    
    Throws:
    
    java.lang.ArrayIndexOutOfBoundsException - if offset is out of bounds for the byte array (i.e., is negative or is greater than or equal to string.length)
    
    java.lang.IllegalArgumentException - if len is too large (i.e., there is not enough data in string to create a String of that length)

Class StringUtil

Nested Class Summary

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

ISO_8859_1

UTF16LE

UTF8

WIN_1252

BIG5

Method Detail

getFromUnicodeLE

getFromUnicodeLE

getToUnicodeLE

getFromCompressedUnicode

readCompressedUnicode

readUnicodeString

readUnicodeString

writeUnicodeString

writeUnicodeStringFlagAndData

getEncodedSize

putCompressedUnicode

putCompressedUnicode

putUnicodeLE

putUnicodeLE

readUnicodeLE

getPreferredEncoding

hasMultibyte

isUnicodeString

startsWithIgnoreCase

endsWithIgnoreCase

toLowerCase

toUpperCase

isUpperCase

mapMsCodepointString

mapMsCodepoint

join

join

join

countMatches

getFromUnicodeLE0Terminated