Apache Software Foundation > Apache POI
 

Apache POI™ - Configuration

Overview

The best way to learn about using Apache POI is to read through the feature documentation and other online examples online.

To keep the features documentation focused on the APIs, there is little mention of some of the configuration settings that can be enabled that may prove useful to users who have to handle very large documents or very large throughput.

Configuration via Java-code when calling Apache POI

These API methods allow to configure behavior of Apache POI for special needs, e.g. when processing excessively large files.

Configuration Setting Description
org.apache.poi.ooxml.POIXMLTypeLoader.DEFAULT_XML_OPTIONS POI support for XSSF APIs relies heavily on XMLBeans. This instance can be configured. It is recommended to take care if you do change any of the config items. In POI 5.1.0, we will disallow Doc Type parsing in the XML files embedded in xlsx/docx/pptx/etc files, by default. DEFAULT_XML_OPTIONS.setDisallowDocTypeDeclaration(false) will undo this change.
org.apache.poi.util.IOUtils.setByteArrayMaxOverride(int maxOverride) If this value is set to > 0, IOUtils.safelyAllocate(long, int) will ignore the maximum record length parameter. This is designed to allow users to bypass the hard-coded maximum record lengths if they are willing to accept the risk of allocating memory up to the size specified. It also allows to impose a lower limit than used for very memory constrained systems.

Note: This is a per-allocation limit and does not allow you to limit overall sum of allocations! Use -1 for using the limits specified per record-type.

org.apache.poi.openxml4j.util.ZipSecureFile.setMinInflateRatio(double ratio) Sets the ratio between de- and inflated bytes to detect zipbomb. It defaults to 1% (= 0.01d), i.e. when the compression is better than 1% for any given read package part, the parsing will fail indicating a Zip-Bomb.
org.apache.poi.openxml4j.util.ZipSecureFile.setMaxEntrySize(long maxEntrySize) Sets the maximum file size of a single zip entry. It defaults to 4GB, i.e. the 32-bit zip format maximum. This can be used to limit memory consumption and protect against security vulnerabilities when documents are provided by users. POI 5.1.0 removes the previous limit of 4GB on this setting.
org.apache.poi.openxml4j.util.ZipSecureFile.setMaxTextSize(long maxTextSize) Sets the maximum number of characters of text that are extracted before an exception is thrown during extracting text from documents. This can be used to limit memory consumption and protect against security vulnerabilities when documents are provided by users. The default is approx 10 million chars. Prior to POI 5.1.0, the max allowed was approx 4 billion chars.
org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.setThresholdBytesForTempFiles(int thresholdBytes) Added in POI 5.1.0. Number of bytes at which a zip entry is regarded as too large for holding in memory and the data is put in a temp file instead - defaults to -1 meaning temp files are not used and that zip entries with more than 2GB of data after decompressing will fail, 0 means all zip entries are stored in temp files. A threshold like 50000000 (approx 50Mb is recommended)
org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.setEncryptTempFiles(boolean encrypt) Added in POI 5.1.0. Whether temp files should be encrypted (default false). Only affects temp files related to zip entries.
org.apache.poi.openxml4j.opc.ZipPackage.setUseTempFilePackageParts(boolean tempFilePackageParts) Added in POI 5.1.0. Whether to save package part data in temp files to save memory (default=false).
org.apache.poi.openxml4j.opc.ZipPackage.setEncryptTempFilePackageParts(boolean encryptTempFiles) Added in POI 5.1.0. Whether to encrypt package part temp files (default=false).
org.apache.poi.extractor.ExtractorFactory.setThreadPrefersEventExtractors(boolean preferEventExtractors) and org.apache.poi.extractor.ExtractorFactory.setAllThreadsPreferEventExtractors(Boolean preferEventExtractors) When creating text-extractors for documents, allows to choose a different type of extractor which parses documents via an event-based parser.
Various classes: setMaxRecordLength(int length) Allows to override the default max record length for various classes which parse input data. E.g. XMLSlideShow, XSSFBParser, HSLFSlideShow, HWPFDocument, HSSFWorkbook, EmbeddedExtractor, StringUtil, ...
This may be useful if you try to process very large files which otherwise trigger the excessive-memory-allocation prevention in Apache POI.
org.apache.poi.xslf.usermodel.XSLFPictureData.setMaxImageSize(int length) Allows to override the default max image size allowed for XSLF pictures.
org.apache.poi.xssf.usermodel.XSSFPictureData#setMaxImageSize(int length) Allows to override the default max image size allowed for XSSF pictures.
org.apache.poi.xwpf.usermodel.XWPFPictureData#setMaxImageSize(int length) Allows to override the default max image size allowed for XWPF pictures.

Observed Java System Properties

Apache POI supports some Java System Properties.

System property Description
java.io.tmpdir Apache POI uses the default mechanism of the JDK for specifying the location of temporary files.
org.apache.poi.hwpf.preserveBinTables and org.apache.poi.hwpf.preserveTextTable Allows to adjust how parsing Word documents via HWPF is handling tables.
org.apache.poi.ss.ignoreMissingFontSystem Added in POI 5.2.3. Instructs Apache POI to ignore some errors due to missing fonts and thus allows to perform more functionality even when no fonts are installed.
Note: Some functionality will still not be possible as it cannot use default-values, e.g. rendering slides, drawing, ...

by POI Developers