Apache POI - Case Studies
A number of people are using POI for a variety of purposes. As with any new API or technology, the first question people generally ask is not "how can I" but rather "Who else is doing what I'm about to do?" This is understandable with the abysmal success rate in the software business. These case statements are meant to help create confidence and understanding.
Submitting a Case Study
We are actively seeking case studies for this page (after all it just started). To submit a case study, either submit a patch for this page or email it to the mailing list (with [PATCH] prefixed subject, please).
Processing biometric scanner logs - Glassbeam
As a small startup there is no attendance management system in place. So they have a manual register where they record attendance. There also is a biometric scanner to allow entries through the office gates, which again maintains logs of entries. Instead of establishing an attendance management system, they decided to make use of these biometric scanner logs and generate an excel report instead.
A blog post describes how the startup uses Apache POI to generate reports about attendance of employees based on biometric scanner logs.
A fully working solution can be found on Github.
REWOO Scope is a modern and easy to use web-based enterprise content management system. It supports knowledge workers and managers in making the right decisions based upon all relevant information.
The system uses Apache POI to extract information stored within excel files and use it transparently within REWOO Scope. Thus, POI allows our customers to work in their standard office environment while also having all important information in the REWO Scope system.
QuestionPro is an online service allowing businesses and individuals to create, deploy and do in-depth analysis of Online Surveys. The technology is build on open-source frameworks like Struts, Velocity, POI, Lucene ... the List goes on. The application deployment is on a Linux Application Cluster farm with a Mysql database.
There are quite a few competitors delivering similar solutions using Microsoft Technologies like asp and .net. One of the distinct advantages our competitors had over us was the ability to generate Excel Spreadsheets, Access Databases (MDB) etc. on the fly using the Component Object Model (COM) - since their servers were running IIS and they had access to the COM registry and such.
QuestionPro's initial solution was to generate CSV files. This was easy however it was a cumbersome process for our clients to download the CSV files and then import them into Excel. Moreover, formatting information could not be preserved or captured using the CSV format. This is where POI came to our rescue. With a POI based solution, we could generate a full report with multiple sheets and all the analytical reports. To keep the solution scalable, we had a dedicated cluster for generating out the reports.
The Apache-POI project has helped QuestionPro compete with the other players in the marketplace with proprietary technology. It leveled the playing field with respect to reporting and data analysis solutions. It helped in opening doors into closed solutions like Microsoft's CDF. Today about 100 excel reports are generated daily, each with about 10-30 sheets in them.
POI In Action - http://www.questionpro.com/marketing/SurveyReport-289.xls
Sunshine Systems developed a POI based reporting solution for a price optimization software package which is used by major retail chains.
The solution allowed the retailer's merchandise planners and managers to request a markdown decision support reports and price change reports using a standard browser The users could specify report type, report options, as well as company, division, and department filter criteria. Report generation took place in the multi-threaded application server and was capable of supporting many simultaneous report requests.
The reporting application collected business information from the price optimization application's Oracle database. The data was aggregated and summarized based upon the specific report type and filter criteria requested by the user. The final report was rendered as a Microsoft Excel spreadsheet using the POI HSSF API and was stored on the report database server for that specific user as a BLOB. Reports could be seamlessly and easily viewed using the same browser.
The retailers liked the solution because they had instantaneous access to critical business data through an extremely easy to use browser interface. They did not need to train the broader user community on all the complexities of the optimization application. Furthermore, the reports were generated in an Excel spreadsheet format, which everyone was familiar with and which also allowed further data analysis using standard Excel features.
Rob Stevenson (rstevenson at sunshinesys dot com)
Bank of Lithuania
The Bank of Lithuania reports financial statistical data to Excel format using the Apache POI project's HSSF API. The system is based on Oracle JServer and utilizes a Java stored procedure that outputs to XLS format using the HSSF API. - Arian Lashkov (alaskov at lbank.lt)
Edwards And Kelcey Technology
Edwards and Kelcey Technology (http://www.ekcorp.com/) developed a Facility Management and Maintenance System for the Telecommunications industry based on Turbine and Velocity. Originally the invoicing was done with a simple CSV sheet which was then marked up by accounts and customized for each client. As growth has been consistent with the application, the requirement for invoices that need not be touched by hand increased. POI provided the solution to this issue, integrating easily and transparently into the system. POI HSSF was used to create the invoices directly from the server in Excel 97 format and now services over 150 unique invoices per month.
Cameron Riley (crileyNO@ SPAMekmail.com)
ClickFind Inc. used the POI projects HSSF API to provide their medical research clients with an Excel export from their electronic data collection web service Data Collector 3.0. The POI team's assistance allowed ClickFind to give their clients a data format that requires less technical expertise than the XML format used by the Data Collector application. This was important to ClickFind as many of their current and potential clients are already using Excel in their day-to-day operations and in established procedures for handling their generated clinical data. - Jared Walker (jared.walker at clickfind.com)
IKAN Software NV
In addition to Change Management and Database Modelling, IKAN Software NV (http://www.ikan.be/) develops and supports its own ETL (Extract/Transform/Load) tools.
IKAN's latest product is this domain is called ETL4ALL (http://www.ikan.be/etl4all/). ETL4ALL is an open source tool allowing data transfer from and to virtually any data source. Users can combine and examine data stored in relational databases, XML databases, PDF files, EDI, CSV files, etc.
It is obvious that Microsoft Excel files are also supported. POI has been used to successfully implement this support in ETL4ALL.
JM Lafferty Associates, Inc.
On its ForecastWorks website JM Lafferty Associates, Inc. produces dynamic on demand financial analyses of companies and institutional funds. The pages produced are selected and exported in several file formats including PPT and XLS.
- The PPT files produced are of high quality which is on a par with similar PDF files.
- The XLS files produced contain a complex forecasting model built from a template with a VBA Macro.
David Fisher (firstname.lastname@example.org)
iDATA Development Ltd (IDD)
IDD have developed the iEXL product to generate Excel spreadsheets directly on the Iseries/AS400 IBM I on Power platform.
Professional spreadsheets created via a menu system. Some basic programming is required for more complex options. When programming is required it can be carried out using RPG, SQL, QUERY, JAVA, COBOL etc. In other words your existing staffs knowledge
Design spreadsheets with:
- Fonts down to cell level
- Colours (Background and text) down to cell level
- Shading down to cell level
- Cell patterns down to cell level
- Cell initialization
- Freeze Panes
- Images/Pictures both static and dynamic
- Page breaks
- Sheet breaks
- Text insertion and much more
- Merge cells
- Row Height
- Cell text alignment
- Text Rotation
- 50 Database files per workbook.
- E-mail the spreadsheet
The product name is 'iEXL' and has been live on both European and North American systems for over four years. It is being used in preference to more established commercial products which our clients have already purchased. This is due to cost and ease of use.
All spreadsheets can be archived if required so that historical spreadsheets can be retrieved.
The system has benefits for all departments within an organisation. Examples of this are accounts department for things such as aged trial balance, distribution department for ASN’s, warehousing for stock figures, IS for security reporting etc.
Clients have at this point (June 2012) created over 300 spreadsheets which in turn have generated over 500,000 E-mails. iEXL has a menu driven email system.
Due to the Apache-POI project IDD have been able to create the IEXL product. This is a well priced product which allows companies of all sizes access to a product that opens up their reporting capabilities
Within the iEXLSOFTWARE.COM website you will find a full user manual, installation instructions, a call log (Ticket) system and a downloadable 45 day trial version.
Ugly Duckling focus on Software, Management and Finance. We have recently been using Apache POI to create tools for the mortgage group of ABN AMRO in the Netherlands. During this project we created a number of what we call 'Robots' using the HSSF API.
These robots run as services on the network and help automate the processing of large amounts of data. Our Robots can be used to spot problems that a human might not, and also to automate repetitive tasks.
We found Apache POI to be extremely useful. We took the base API, wrapped it in a builder pattern and thus created a DSL with a fluid interface. Throughout the project we enjoyed very much working with Apache POI and found it to be very reliable.
Deutsche Bahn uses POI's HWPF component to process complex specification documents stored in the legacy Microsoft Word file format.
In a joint effort with other international partners, Deutsche Bahn Netz AG, the owner of the German rail infrastructure, developed a novel software toolchain to facilitate the creation of an interoperable on-board component for a pan-European train protection system. One part of this toolchain is a domain-specific specification processor which reads the relevant requirements documents using Apache POI, enhances them and ultimately stores their contents as ReqIF. Contrary to DOC, this XML-based file format allows for proper traceability and versioning in a multi-tenant environment. Thus, it lends itself much better to the management and interchange of large sets of system requirements. The resulting ReqIF files are then consumed by the various tools in the later stages of the software development process.
Currently available, off-the-shelf software for requirement import performed very poorly on the original specification documents due to their structural complexity and heterogeneous formatting. POI not only helped to create a superior solution thanks to its rich API. Because of its open-source nature it also plays a key role in ensuring the maintainability of the resulting system which is expected to stay in operation for many decades to come.
POI has seen various enhancements for this challenging application. Most notably, these include the addition of extensive list numbering support, a feature which is now part of Apache TIKA. Numerous smaller improvements, such as support for table cell background shadings, interpretation of certain kinds of OfficeDrawings, and proper conversion of special characters, also helped to derive meaning from the input files. See here for details.
This work was funded by the German Federal Ministry of Education and Research (Grant No. 01IS12021) in the context of the ITEA2 project openETCS.
by Andrew C. Oliver, Cameron Riley, David Fisher, Dominik Stadler