Apache POI™ - Security guidance
Overview
This page provides some guidance about how Apache POI can be used in security-sensible areas.
Information about related security vulnerabilities
Information about security issues is included in the Project News.
Reporting security vulnerabilities
Apache POI will try to fix security-related bugs with priority.
Please follow the general Apache Security Guidelines for proper handling.
But please note that by the nature of processing external files, you should design your application in a way which limits impact of malicious documents as much as possible. The higher your security-related requirements are, the more you likely need to invest in your application to contain effects.
Architecting your Application
If you are processing documents from an untrusted source, you should add a number of safeguards to your application to contain any unexpected side effects.
Apache POI cannot fully protect against some documents causing impact on the current process, therefore we suggest the following additional layers of security.
-
Expect any type of Exception when processing documents
As parsing the various formats is very complex and involved, there are some unexpected types of exceptions which can be thrown. E.g. StackOverflowError or many different types of RuntimeException.
Make sure to have a broad catch-statement around your document-parsing functionality and be prepared to handle all those gracefully. -
Expect long parsing time
As parsing the various formats is very complex and involved, some documents might cause prolonged CPU usage and long parsing time.
If this is a concern, make sure to have a way to stop processing after some time, maybe by the sandboxing approach described below. -
Memory use can be very high
The data in Microsoft format files is usually compressed so even small files can have a lot of data.
The core POI APIs are not optimized to avoid excessive memory use. POI has streaming APIs for reading and writing xlsx files - so if you are working with large xlsx files, you should consider using the streaming APIs. -
Consider sandboxing document-parsing
If you operate in a highly sensitive environment and would like to avoid any side effect from parsing documents on your application, then consider extracting the parsing logic into a separate process which is configured with appropriate memory settings and which you stop after some timeout. It is a good idea to be able to auto-restart the process in case of a crash.
-
Keep up to date with releases
Apache POI does occasionally issue CVEs for security issues. There are also other bug fixes and improvements in each release. Some of these fixes will be to make POI more robust against malicious inputs, even if they are not explicitly security-related.
by Dominik Stadler