public final class ExtractorFactory
extends java.lang.Object
Note 1 - will fail for many file formats if the POI Scratchpad jar is not present on the runtime classpath
Note 2 - for text extractor creation across all formats, use
POIXMLExtractorFactory
contained within
the OOXML jar.
Note 3 - rather than using this, for most cases you would be better off switching to Apache Tika instead!
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
OOXML_PACKAGE
Some OPCPackages are packed in side an OLE2 container.
|
Modifier and Type | Method and Description |
---|---|
static void |
addProvider(ExtractorProvider provider) |
static POITextExtractor |
createExtractor(DirectoryNode root)
Create the Extractor, if possible.
|
static POITextExtractor |
createExtractor(DirectoryNode root,
java.lang.String password)
Create the Extractor, if possible.
|
static POITextExtractor |
createExtractor(java.io.File file)
Create an extractor that can be used to read text from the given file.
|
static POITextExtractor |
createExtractor(java.io.File file,
java.lang.String password)
Create an extractor that can be used to read text from the given file.
|
static POITextExtractor |
createExtractor(java.io.InputStream input)
Create an extractor that can be used to read text from the given file.
|
static POITextExtractor |
createExtractor(java.io.InputStream input,
java.lang.String password)
Create an extractor that can be used to read text from the given file.
|
static POITextExtractor |
createExtractor(POIFSFileSystem fs)
Create an extractor that can be used to read text from the given file.
|
static POITextExtractor |
createExtractor(POIFSFileSystem fs,
java.lang.String password)
Create an extractor that can be used to read text from the given file.
|
static java.lang.Boolean |
getAllThreadsPreferEventExtractors()
Should all threads prefer event based over usermodel based extractors?
(usermodel extractors tend to be more accurate, but use more memory)
Default is to use the thread level setting, which defaults to false.
|
static POITextExtractor[] |
getEmbeddedDocsTextExtractors(POIOLE2TextExtractor ext)
Returns an array of text extractors, one for each of
the embedded documents in the file (if there are any).
|
static boolean |
getPreferEventExtractor()
Should this thread use event based extractors is available?
Checks the all-threads one first, then thread specific.
|
static boolean |
getThreadPrefersEventExtractors()
Should this thread prefer event based over usermodel based extractors?
(usermodel extractors tend to be more accurate, but use more memory)
Default is false.
|
static void |
removeProvider(java.lang.Class<? extends ExtractorProvider> provider) |
static void |
setAllThreadsPreferEventExtractors(java.lang.Boolean preferEventExtractors)
Should all threads prefer event based over usermodel based extractors?
If set, will take preference over the Thread level setting.
|
static void |
setThreadPrefersEventExtractors(boolean preferEventExtractors)
Should this thread prefer event based over usermodel based extractors?
Will only be used if the All Threads setting is null.
|
public static final java.lang.String OOXML_PACKAGE
DirectoryNode
is called "EncryptedPackage"
,
otherwise the node is called "Package"public static boolean getThreadPrefersEventExtractors()
public static java.lang.Boolean getAllThreadsPreferEventExtractors()
public static void setThreadPrefersEventExtractors(boolean preferEventExtractors)
preferEventExtractors
- If this threads should prefer event based extractors.public static void setAllThreadsPreferEventExtractors(java.lang.Boolean preferEventExtractors)
preferEventExtractors
- If all threads should prefer event based extractors.public static boolean getPreferEventExtractor()
public static POITextExtractor createExtractor(POIFSFileSystem fs) throws java.io.IOException
fs
- The file-system which wraps the data of the file.java.io.IOException
- If reading the file-data failspublic static POITextExtractor createExtractor(POIFSFileSystem fs, java.lang.String password) throws java.io.IOException
fs
- The file-system which wraps the data of the file.password
- The password that is necessary to open the filejava.io.IOException
- If reading the file-data failspublic static POITextExtractor createExtractor(java.io.InputStream input) throws java.io.IOException
input
- A stream which wraps the data of the file.java.io.IOException
- If reading the file-data failsEmptyFileException
- If the given file is emptypublic static POITextExtractor createExtractor(java.io.InputStream input, java.lang.String password) throws java.io.IOException
input
- A stream which wraps the data of the file.password
- The password that is necessary to open the filejava.io.IOException
- If reading the file-data failsEmptyFileException
- If the given file is emptypublic static POITextExtractor createExtractor(java.io.File file) throws java.io.IOException
file
- The file to readjava.io.IOException
- If reading the file-data failsEmptyFileException
- If the given file is emptypublic static POITextExtractor createExtractor(java.io.File file, java.lang.String password) throws java.io.IOException
file
- The file to readpassword
- The password that is necessary to open the filejava.io.IOException
- If reading the file-data failsEmptyFileException
- If the given file is emptypublic static POITextExtractor createExtractor(DirectoryNode root) throws java.io.IOException
POIXMLExtractorFactory
for that.root
- The DirectoryNode
pointing to a document.POITextExtractor
, an exception is thrown if
no TextExtractor can be created for some reason.java.io.IOException
- If converting the DirectoryNode
into a HSSFWorkbook failsOldFileFormatException
- If the DirectoryNode
points to a format of
an unsupported version of Excel.java.lang.IllegalArgumentException
- If creating the Extractor failspublic static POITextExtractor createExtractor(DirectoryNode root, java.lang.String password) throws java.io.IOException
POIXMLExtractorFactory
for that.root
- The DirectoryNode
pointing to a document.password
- The password that is necessary to open the filePOITextExtractor
, an exception is thrown if
no TextExtractor can be created for some reason.java.io.IOException
- If converting the DirectoryNode
into a HSSFWorkbook failsOldFileFormatException
- If the DirectoryNode
points to a format of
an unsupported version of Excel.java.lang.IllegalArgumentException
- If creating the Extractor failspublic static POITextExtractor[] getEmbeddedDocsTextExtractors(POIOLE2TextExtractor ext) throws java.io.IOException
POITextExtractor
for each embedded file.ext
- The extractor to look at for embedded documentsjava.io.IOException
- If converting the DirectoryNode
into a HSSFWorkbook failsOldFileFormatException
- If the DirectoryNode
points to a format of
an unsupported version of Excel.java.lang.IllegalArgumentException
- If creating the Extractor failspublic static void addProvider(ExtractorProvider provider)
public static void removeProvider(java.lang.Class<? extends ExtractorProvider> provider)
Copyright 2022 The Apache Software Foundation or its licensors, as applicable.