Apache POI - Security guidance
This page provides some guidance about how Apache POI can be used in security-sensible areas.
Information about related security vulnerabilities
Information about security issues is included in the Project News.
Reporting security vulnerabilities
Apache POI will try to fix security-related bugs with priority.
Please follow the general Apache Security Guidelines for proper handling.
But please note that by the nature of processing external files, you should design your application in a way which limits impact of malicious documents as much as possible. The higher your security-related requirements are, the more you likely need to invest in your application to contain effects.
Architecting your Application
If you are processing documents from an untrusted source, you should add a number of safeguards to your application to contain any unexpected side effects.
Apache POI cannot fully protect against some documents causing impact on the current process, therefore we suggest the following additional layers of security.
Expect any type of Exception when processing documents
As parsing the various formats is very complex and involved, there are some unexpected types of exceptions which can be thrown. E.g. StackOverflow or many different types of RuntimeException.
Make sure to have a broad catch-statement around your document-parsing functionality and be prepared to handle all those gracefully.
Expect long parsing time
As parsing the various formats is very complex and involved, some documents might cause prolonged CPU usage and long parsing time.
If this is a concern, make sure to have a way to stop processing after some time, maybe by the sandboxing approach described below.
Memory use can be very high
The data in Microsoft format files is usually compressed so even small files can have a lot of data.
The core POI APIs are not optimized to avoid excessive memory use. POI has streaming APIs for reading and writing xlsx files - so if you are working with large xlsx files, you should consider using the streaming APIs.
Consider sandboxing document-parsing
If you operate in a highly sensitive enviornment and would like to avoid any side effect from parsing documents on your application, then consider extracting the parsing logic into a separate process which is configured with appropriate memory settings and which you stop after some timeout.
by Dominik Stadler