public final class POIXMLExtractorFactory extends java.lang.Object implements ExtractorProvider
Note 1 - will fail for many file formats if the POI Scratchpad jar is not present on the runtime classpath
Note 2 - rather than using this, for most cases you would be better off switching to Apache Tika instead!
Constructor and Description |
---|
POIXMLExtractorFactory() |
Modifier and Type | Method and Description |
---|---|
boolean |
accepts(FileMagic fm) |
POITextExtractor |
create(DirectoryNode poifsDir,
java.lang.String password)
Create Extractor from POIFS node
|
POITextExtractor |
create(java.io.File f,
java.lang.String password)
Create Extractor via file
|
POITextExtractor |
create(java.io.InputStream inp,
java.lang.String password)
Create Extractor via InputStream
|
POIXMLTextExtractor |
create(OPCPackage pkg)
Tries to determine the actual type of file and produces a matching text-extractor for it.
|
POITextExtractor |
create(POIFSFileSystem fs) |
static java.lang.Boolean |
getAllThreadsPreferEventExtractors()
Should all threads prefer event based over usermodel based extractors?
(usermodel extractors tend to be more accurate, but use more memory)
Default is to use the thread level setting, which defaults to false.
|
static boolean |
getPreferEventExtractor()
Should this thread use event based extractors is available?
Checks the all-threads one first, then thread specific.
|
static boolean |
getThreadPrefersEventExtractors()
Should this thread prefer event based over usermodel based extractors?
(usermodel extractors tend to be more accurate, but use more memory)
Default is false.
|
static void |
setAllThreadsPreferEventExtractors(java.lang.Boolean preferEventExtractors)
Should all threads prefer event based over usermodel based extractors?
If set, will take preference over the Thread level setting.
|
static void |
setThreadPrefersEventExtractors(boolean preferEventExtractors)
Should this thread prefer event based over usermodel based extractors?
Will only be used if the All Threads setting is null.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
identifyEmbeddedResources
public boolean accepts(FileMagic fm)
accepts
in interface ExtractorProvider
public static boolean getThreadPrefersEventExtractors()
public static java.lang.Boolean getAllThreadsPreferEventExtractors()
public static void setThreadPrefersEventExtractors(boolean preferEventExtractors)
public static void setAllThreadsPreferEventExtractors(java.lang.Boolean preferEventExtractors)
public static boolean getPreferEventExtractor()
public POITextExtractor create(java.io.File f, java.lang.String password) throws java.io.IOException
ExtractorProvider
create
in interface ExtractorProvider
f
- the filepassword
- the password or null
if not encryptedjava.io.IOException
- if file can't be read or parsedpublic POITextExtractor create(java.io.InputStream inp, java.lang.String password) throws java.io.IOException
ExtractorProvider
create
in interface ExtractorProvider
inp
- the streampassword
- the password or null
if not encryptedjava.io.IOException
- if stream can't be read or parsedpublic POIXMLTextExtractor create(OPCPackage pkg) throws java.io.IOException
pkg
- An OPCPackage
.POIXMLTextExtractor
for the given file.java.io.IOException
- If an error occurs while reading the filejava.lang.IllegalArgumentException
- If no matching file type could be found.public POITextExtractor create(POIFSFileSystem fs) throws java.io.IOException
java.io.IOException
public POITextExtractor create(DirectoryNode poifsDir, java.lang.String password) throws java.io.IOException
ExtractorProvider
create
in interface ExtractorProvider
poifsDir
- the nodepassword
- the password or null
if not encryptedjava.io.IOException
- if node can't be parsedCopyright 2021 The Apache Software Foundation or its licensors, as applicable.