public final class ExtractorFactory
extends java.lang.Object
Note 1 - will fail for many file formats if the POI Scratchpad jar is not present on the runtime classpath
Note 2 - rather than using this, for most cases you would be better off switching to Apache Tika instead!
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
CORE_DOCUMENT_REL |
Modifier and Type | Method and Description |
---|---|
static <T extends POITextExtractor> |
createExtractor(DirectoryNode poifsDir) |
static <T extends POITextExtractor> |
createExtractor(java.io.File f) |
static POITextExtractor |
createExtractor(java.io.InputStream inp) |
static POITextExtractor |
createExtractor(OPCPackage pkg)
Tries to determine the actual type of file and produces a matching text-extractor for it.
|
static <T extends POITextExtractor> |
createExtractor(POIFSFileSystem fs) |
static java.lang.Boolean |
getAllThreadsPreferEventExtractors()
Should all threads prefer event based over usermodel based extractors?
(usermodel extractors tend to be more accurate, but use more memory)
Default is to use the thread level setting, which defaults to false.
|
static POITextExtractor[] |
getEmbeddedDocsTextExtractors(POIOLE2TextExtractor ext)
Returns an array of text extractors, one for each of
the embedded documents in the file (if there are any).
|
static POITextExtractor[] |
getEmbeddedDocsTextExtractors(POIXMLTextExtractor ext)
Returns an array of text extractors, one for each of
the embedded documents in the file (if there are any).
|
static POITextExtractor[] |
getEmbededDocsTextExtractors(POIOLE2TextExtractor ext)
Deprecated.
Use the method with correct "embedded"
|
static POITextExtractor[] |
getEmbededDocsTextExtractors(POIXMLTextExtractor ext)
Deprecated.
Use the method with correct "embedded"
|
static boolean |
getPreferEventExtractor()
Should this thread use event based extractors is available?
Checks the all-threads one first, then thread specific.
|
static boolean |
getThreadPrefersEventExtractors()
Should this thread prefer event based over usermodel based extractors?
(usermodel extractors tend to be more accurate, but use more memory)
Default is false.
|
static void |
setAllThreadsPreferEventExtractors(java.lang.Boolean preferEventExtractors)
Should all threads prefer event based over usermodel based extractors?
If set, will take preference over the Thread level setting.
|
static void |
setThreadPrefersEventExtractors(boolean preferEventExtractors)
Should this thread prefer event based over usermodel based extractors?
Will only be used if the All Threads setting is null.
|
public static final java.lang.String CORE_DOCUMENT_REL
public static boolean getThreadPrefersEventExtractors()
public static java.lang.Boolean getAllThreadsPreferEventExtractors()
public static void setThreadPrefersEventExtractors(boolean preferEventExtractors)
public static void setAllThreadsPreferEventExtractors(java.lang.Boolean preferEventExtractors)
public static boolean getPreferEventExtractor()
public static <T extends POITextExtractor> T createExtractor(java.io.File f) throws java.io.IOException, OpenXML4JException, org.apache.xmlbeans.XmlException
java.io.IOException
OpenXML4JException
org.apache.xmlbeans.XmlException
public static POITextExtractor createExtractor(java.io.InputStream inp) throws java.io.IOException, OpenXML4JException, org.apache.xmlbeans.XmlException
java.io.IOException
OpenXML4JException
org.apache.xmlbeans.XmlException
public static POITextExtractor createExtractor(OPCPackage pkg) throws java.io.IOException, OpenXML4JException, org.apache.xmlbeans.XmlException
pkg
- An OPCPackage
.POIXMLTextExtractor
for the given file.java.io.IOException
- If an error occurs while reading the fileOpenXML4JException
- If an error parsing the OpenXML file format is found.org.apache.xmlbeans.XmlException
- If an XML parsing error occurs.java.lang.IllegalArgumentException
- If no matching file type could be found.public static <T extends POITextExtractor> T createExtractor(POIFSFileSystem fs) throws java.io.IOException, OpenXML4JException, org.apache.xmlbeans.XmlException
java.io.IOException
OpenXML4JException
org.apache.xmlbeans.XmlException
public static <T extends POITextExtractor> T createExtractor(DirectoryNode poifsDir) throws java.io.IOException, OpenXML4JException, org.apache.xmlbeans.XmlException
java.io.IOException
OpenXML4JException
org.apache.xmlbeans.XmlException
@Deprecated @Removal(version="4.2") public static POITextExtractor[] getEmbededDocsTextExtractors(POIOLE2TextExtractor ext) throws java.io.IOException, OpenXML4JException, org.apache.xmlbeans.XmlException
POITextExtractor
for each embedded file.java.io.IOException
OpenXML4JException
org.apache.xmlbeans.XmlException
public static POITextExtractor[] getEmbeddedDocsTextExtractors(POIOLE2TextExtractor ext) throws java.io.IOException, OpenXML4JException, org.apache.xmlbeans.XmlException
POITextExtractor
for each embedded file.java.io.IOException
OpenXML4JException
org.apache.xmlbeans.XmlException
@Deprecated @Removal(version="4.2") @NotImplemented public static POITextExtractor[] getEmbededDocsTextExtractors(POIXMLTextExtractor ext)
POITextExtractor
for each embedded file.@NotImplemented public static POITextExtractor[] getEmbeddedDocsTextExtractors(POIXMLTextExtractor ext)
POITextExtractor
for each embedded file.Copyright 2020 The Apache Software Foundation or its licensors, as applicable.