org.apache.poi.hssf.extractor
Class EventBasedExcelExtractor

java.lang.Object
  extended by org.apache.poi.POITextExtractor
      extended by org.apache.poi.POIOLE2TextExtractor
          extended by org.apache.poi.hssf.extractor.EventBasedExcelExtractor
All Implemented Interfaces:
java.io.Closeable, ExcelExtractor

public class EventBasedExcelExtractor
extends POIOLE2TextExtractor
implements ExcelExtractor

A text extractor for Excel files, that is based on the HSSF EventUserModel API. It will typically use less memory than ExcelExtractor, but may not provide the same richness of formatting. Returns the textual content of the file, suitable for indexing by something like Lucene, but not really intended for display to the user.

To turn an excel file into a CSV or similar, then see the XLS2CSVmra example

See Also:
XLS2CSVmra

Field Summary
 
Fields inherited from class org.apache.poi.POIOLE2TextExtractor
document
 
Constructor Summary
EventBasedExcelExtractor(DirectoryNode dir)
           
EventBasedExcelExtractor(POIFSFileSystem fs)
           
 
Method Summary
 DocumentSummaryInformation getDocSummaryInformation()
          Would return the document information metadata for the document, if we supported it
 SummaryInformation getSummaryInformation()
          Would return the summary information metadata for the document, if we supported it
 java.lang.String getText()
          Retreives the text contents of the file
 void setFormulasNotResults(boolean formulasNotResults)
          Should we return the formula itself, and not the result it produces? Default is false
 void setIncludeCellComments(boolean includeComments)
          Would control the inclusion of cell comments from the document, if we supported it
 void setIncludeHeadersFooters(boolean includeHeadersFooters)
          Would control the inclusion of headers and footers from the document, if we supported it
 void setIncludeSheetNames(boolean includeSheetNames)
          Should sheet names be included? Default is true
 
Methods inherited from class org.apache.poi.POIOLE2TextExtractor
getDocument, getMetadataTextExtractor, getRoot
 
Methods inherited from class org.apache.poi.POITextExtractor
close, setFilesystem
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

EventBasedExcelExtractor

public EventBasedExcelExtractor(DirectoryNode dir)

EventBasedExcelExtractor

public EventBasedExcelExtractor(POIFSFileSystem fs)
Method Detail

getDocSummaryInformation

public DocumentSummaryInformation getDocSummaryInformation()
Would return the document information metadata for the document, if we supported it

Overrides:
getDocSummaryInformation in class POIOLE2TextExtractor
Returns:
The Document Summary Information or null if it could not be read for this document.

getSummaryInformation

public SummaryInformation getSummaryInformation()
Would return the summary information metadata for the document, if we supported it

Overrides:
getSummaryInformation in class POIOLE2TextExtractor
Returns:
The Summary information for the document or null if it could not be read for this document.

setIncludeCellComments

public void setIncludeCellComments(boolean includeComments)
Would control the inclusion of cell comments from the document, if we supported it

Specified by:
setIncludeCellComments in interface ExcelExtractor
Parameters:
includeComments - true if cell comments should be included

setIncludeHeadersFooters

public void setIncludeHeadersFooters(boolean includeHeadersFooters)
Would control the inclusion of headers and footers from the document, if we supported it

Specified by:
setIncludeHeadersFooters in interface ExcelExtractor
Parameters:
includeHeadersFooters - true if headers and footers should be included

setIncludeSheetNames

public void setIncludeSheetNames(boolean includeSheetNames)
Should sheet names be included? Default is true

Specified by:
setIncludeSheetNames in interface ExcelExtractor
Parameters:
includeSheetNames - true if the sheet names should be included

setFormulasNotResults

public void setFormulasNotResults(boolean formulasNotResults)
Should we return the formula itself, and not the result it produces? Default is false

Specified by:
setFormulasNotResults in interface ExcelExtractor
Parameters:
formulasNotResults - true if the formula itself is returned

getText

public java.lang.String getText()
Retreives the text contents of the file

Specified by:
getText in interface ExcelExtractor
Specified by:
getText in class POITextExtractor
Returns:
All the text from the document