|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.poi.util.CodePageUtil
public class CodePageUtil
Utilities for working with Microsoft CodePages.
Provides constants for understanding numeric codepages, along with utilities to translate these into Java Character Sets.
Field Summary | |
---|---|
static int |
CP_037
Codepage 037, a special case |
static int |
CP_EUC_JP
Codepage for EUC-JP |
static int |
CP_EUC_KR
Codepage for EUC-KR |
static int |
CP_GB18030
Codepage for GB18030 |
static int |
CP_GB2312
Codepage for GB2312 |
static int |
CP_GBK
Codepage for GBK, aka MS936 |
static int |
CP_ISO_2022_JP1
Codepage for ISO-2022-JP |
static int |
CP_ISO_2022_JP2
Another codepage for ISO-2022-JP |
static int |
CP_ISO_2022_JP3
Yet another codepage for ISO-2022-JP |
static int |
CP_ISO_2022_KR
Codepage for ISO-2022-KR |
static int |
CP_ISO_8859_1
Codepage for ISO-8859-1 |
static int |
CP_ISO_8859_2
Codepage for ISO-8859-2 |
static int |
CP_ISO_8859_3
Codepage for ISO-8859-3 |
static int |
CP_ISO_8859_4
Codepage for ISO-8859-4 |
static int |
CP_ISO_8859_5
Codepage for ISO-8859-5 |
static int |
CP_ISO_8859_6
Codepage for ISO-8859-6 |
static int |
CP_ISO_8859_7
Codepage for ISO-8859-7 |
static int |
CP_ISO_8859_8
Codepage for ISO-8859-8 |
static int |
CP_ISO_8859_9
Codepage for ISO-8859-9 |
static int |
CP_JOHAB
Codepage for Johab |
static int |
CP_KOI8_R
Codepage for KOI8-R |
static int |
CP_MAC_ARABIC
Codepage for Macintosh Arabic (Java: MacArabic) |
static int |
CP_MAC_CENTRAL_EUROPE
Codepage for Macintosh Central Europe (Latin-2) (Java: MacCentralEurope) |
static int |
CP_MAC_CHINESE_SIMPLE
Codepage for Macintosh Chinese Simplified (Java: unknown - use EUC_CN, ISO2022_CN_GB, MS936 or cp935) |
static int |
CP_MAC_CHINESE_TRADITIONAL
Codepage for Macintosh Chinese Traditional (Java: unknown - use Big5, MS950, or cp937) |
static int |
CP_MAC_CROATIAN
Codepage for Macintosh Croatian (Java: MacCroatian) |
static int |
CP_MAC_CYRILLIC
Codepage for Macintosh Cyrillic (Java: MacCyrillic) |
static int |
CP_MAC_GREEK
Codepage for Macintosh Greek (Java: MacGreek) |
static int |
CP_MAC_HEBREW
Codepage for Macintosh Hebrew (Java: MacHebrew) |
static int |
CP_MAC_ICELAND
Codepage for Macintosh Iceland (Java: MacIceland) |
static int |
CP_MAC_JAPAN
Codepage for Macintosh Japan (Java: unknown - use SJIS, cp942 or cp943) |
static int |
CP_MAC_KOREAN
Codepage for Macintosh Korean (Java: unknown - use EUC_KR or cp949) |
static int |
CP_MAC_ROMAN
Codepage for Macintosh Roman (Java: MacRoman) |
static int |
CP_MAC_ROMAN_BIFF23
|
static int |
CP_MAC_ROMANIA
Codepage for Macintosh Romanian (Java: MacRomania) |
static int |
CP_MAC_THAI
Codepage for Macintosh Thai (Java: MacThai) |
static int |
CP_MAC_TURKISH
Codepage for Macintosh Turkish (Java: MacTurkish) |
static int |
CP_MAC_UKRAINE
Codepage for Macintosh Ukrainian (Java: MacUkraine) |
static int |
CP_MS949
Codepage for MS949 |
static int |
CP_SJIS
Codepage for SJIS |
static int |
CP_UNICODE
Codepage for Unicode |
static int |
CP_US_ACSII
Codepage for US-ASCII |
static int |
CP_US_ASCII2
Another codepage for US-ASCII |
static int |
CP_UTF16
Codepage for UTF-16 |
static int |
CP_UTF16_BE
Codepage for UTF-16 big-endian |
static int |
CP_UTF8
Codepage for UTF-8 |
static int |
CP_WINDOWS_1250
Codepage for Windows 1250 |
static int |
CP_WINDOWS_1251
Codepage for Windows 1251 |
static int |
CP_WINDOWS_1252
Codepage for Windows 1252 |
static int |
CP_WINDOWS_1252_BIFF23
|
static int |
CP_WINDOWS_1253
Codepage for Windows 1253 |
static int |
CP_WINDOWS_1254
Codepage for Windows 1254 |
static int |
CP_WINDOWS_1255
Codepage for Windows 1255 |
static int |
CP_WINDOWS_1256
Codepage for Windows 1256 |
static int |
CP_WINDOWS_1257
Codepage for Windows 1257 |
static int |
CP_WINDOWS_1258
Codepage for Windows 1258 |
static java.util.Set<java.nio.charset.Charset> |
DOUBLE_BYTE_CHARSETS
|
Constructor Summary | |
---|---|
CodePageUtil()
|
Method Summary | |
---|---|
static java.lang.String |
codepageToEncoding(int codepage)
Turns a codepage number into the equivalent character encoding's name (in Java NIO canonical naming format). |
static java.lang.String |
codepageToEncoding(int codepage,
boolean javaLangFormat)
Turns a codepage number into the equivalent character encoding's name, in either Java NIO or Java Lang canonical naming. |
static java.lang.String |
cp950ToString(byte[] data,
int offset,
int lengthInBytes)
This tries to convert a LE byte array in cp950 (Microsoft's dialect of Big5) to a String. |
static byte[] |
getBytesInCodePage(java.lang.String string,
int codepage)
Converts a string into bytes, in the equivalent character encoding to the supplied codepage number. |
static java.lang.String |
getStringFromCodePage(byte[] string,
int codepage)
Converts the bytes into a String, based on the equivalent character encoding to the supplied codepage number. |
static java.lang.String |
getStringFromCodePage(byte[] string,
int offset,
int length,
int codepage)
Converts the bytes into a String, based on the equivalent character encoding to the supplied codepage number. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final java.util.Set<java.nio.charset.Charset> DOUBLE_BYTE_CHARSETS
public static final int CP_037
Codepage 037, a special case
public static final int CP_SJIS
Codepage for SJIS
public static final int CP_GBK
Codepage for GBK, aka MS936
public static final int CP_MS949
Codepage for MS949
public static final int CP_UTF16
Codepage for UTF-16
public static final int CP_UTF16_BE
Codepage for UTF-16 big-endian
public static final int CP_WINDOWS_1250
Codepage for Windows 1250
public static final int CP_WINDOWS_1251
Codepage for Windows 1251
public static final int CP_WINDOWS_1252
Codepage for Windows 1252
public static final int CP_WINDOWS_1252_BIFF23
public static final int CP_WINDOWS_1253
Codepage for Windows 1253
public static final int CP_WINDOWS_1254
Codepage for Windows 1254
public static final int CP_WINDOWS_1255
Codepage for Windows 1255
public static final int CP_WINDOWS_1256
Codepage for Windows 1256
public static final int CP_WINDOWS_1257
Codepage for Windows 1257
public static final int CP_WINDOWS_1258
Codepage for Windows 1258
public static final int CP_JOHAB
Codepage for Johab
public static final int CP_MAC_ROMAN
Codepage for Macintosh Roman (Java: MacRoman)
public static final int CP_MAC_ROMAN_BIFF23
public static final int CP_MAC_JAPAN
Codepage for Macintosh Japan (Java: unknown - use SJIS, cp942 or cp943)
public static final int CP_MAC_CHINESE_TRADITIONAL
Codepage for Macintosh Chinese Traditional (Java: unknown - use Big5, MS950, or cp937)
public static final int CP_MAC_KOREAN
Codepage for Macintosh Korean (Java: unknown - use EUC_KR or cp949)
public static final int CP_MAC_ARABIC
Codepage for Macintosh Arabic (Java: MacArabic)
public static final int CP_MAC_HEBREW
Codepage for Macintosh Hebrew (Java: MacHebrew)
public static final int CP_MAC_GREEK
Codepage for Macintosh Greek (Java: MacGreek)
public static final int CP_MAC_CYRILLIC
Codepage for Macintosh Cyrillic (Java: MacCyrillic)
public static final int CP_MAC_CHINESE_SIMPLE
Codepage for Macintosh Chinese Simplified (Java: unknown - use EUC_CN, ISO2022_CN_GB, MS936 or cp935)
public static final int CP_MAC_ROMANIA
Codepage for Macintosh Romanian (Java: MacRomania)
public static final int CP_MAC_UKRAINE
Codepage for Macintosh Ukrainian (Java: MacUkraine)
public static final int CP_MAC_THAI
Codepage for Macintosh Thai (Java: MacThai)
public static final int CP_MAC_CENTRAL_EUROPE
Codepage for Macintosh Central Europe (Latin-2) (Java: MacCentralEurope)
public static final int CP_MAC_ICELAND
Codepage for Macintosh Iceland (Java: MacIceland)
public static final int CP_MAC_TURKISH
Codepage for Macintosh Turkish (Java: MacTurkish)
public static final int CP_MAC_CROATIAN
Codepage for Macintosh Croatian (Java: MacCroatian)
public static final int CP_US_ACSII
Codepage for US-ASCII
public static final int CP_KOI8_R
Codepage for KOI8-R
public static final int CP_ISO_8859_1
Codepage for ISO-8859-1
public static final int CP_ISO_8859_2
Codepage for ISO-8859-2
public static final int CP_ISO_8859_3
Codepage for ISO-8859-3
public static final int CP_ISO_8859_4
Codepage for ISO-8859-4
public static final int CP_ISO_8859_5
Codepage for ISO-8859-5
public static final int CP_ISO_8859_6
Codepage for ISO-8859-6
public static final int CP_ISO_8859_7
Codepage for ISO-8859-7
public static final int CP_ISO_8859_8
Codepage for ISO-8859-8
public static final int CP_ISO_8859_9
Codepage for ISO-8859-9
public static final int CP_ISO_2022_JP1
Codepage for ISO-2022-JP
public static final int CP_ISO_2022_JP2
Another codepage for ISO-2022-JP
public static final int CP_ISO_2022_JP3
Yet another codepage for ISO-2022-JP
public static final int CP_ISO_2022_KR
Codepage for ISO-2022-KR
public static final int CP_EUC_JP
Codepage for EUC-JP
public static final int CP_EUC_KR
Codepage for EUC-KR
public static final int CP_GB2312
Codepage for GB2312
public static final int CP_GB18030
Codepage for GB18030
public static final int CP_US_ASCII2
Another codepage for US-ASCII
public static final int CP_UTF8
Codepage for UTF-8
public static final int CP_UNICODE
Codepage for Unicode
Constructor Detail |
---|
public CodePageUtil()
Method Detail |
---|
public static byte[] getBytesInCodePage(java.lang.String string, int codepage) throws java.io.UnsupportedEncodingException
string
- The string to convertcodepage
- The codepage number
java.io.UnsupportedEncodingException
public static java.lang.String getStringFromCodePage(byte[] string, int codepage) throws java.io.UnsupportedEncodingException
string
- The byte of the string to convertcodepage
- The codepage number
java.io.UnsupportedEncodingException
public static java.lang.String getStringFromCodePage(byte[] string, int offset, int length, int codepage) throws java.io.UnsupportedEncodingException
string
- The byte of the string to convertcodepage
- The codepage number
java.io.UnsupportedEncodingException
public static java.lang.String codepageToEncoding(int codepage) throws java.io.UnsupportedEncodingException
Turns a codepage number into the equivalent character encoding's name (in Java NIO canonical naming format).
codepage
- The codepage number
java.io.UnsupportedEncodingException
- if the specified codepage is
less than zero.public static java.lang.String codepageToEncoding(int codepage, boolean javaLangFormat) throws java.io.UnsupportedEncodingException
Turns a codepage number into the equivalent character encoding's name, in either Java NIO or Java Lang canonical naming.
codepage
- The codepage numberjavaLangFormat
- Should Java Lang or Java NIO naming be used?
java.io.UnsupportedEncodingException
- if the specified codepage is
less than zero.public static java.lang.String cp950ToString(byte[] data, int offset, int lengthInBytes)
data
- offset
- lengthInBytes
-
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |