org.apache.poi.hssf.extractor
Class ExcelExtractor

java.lang.Object
  extended by org.apache.poi.POITextExtractor
      extended by org.apache.poi.POIOLE2TextExtractor
          extended by org.apache.poi.hssf.extractor.ExcelExtractor
All Implemented Interfaces:
java.io.Closeable, ExcelExtractor

public class ExcelExtractor
extends POIOLE2TextExtractor
implements ExcelExtractor

A text extractor for Excel files.

Returns the textual content of the file, suitable for indexing by something like Lucene, but not really intended for display to the user.

To turn an excel file into a CSV or similar, then see the XLS2CSVmra example

See Also:
XLS2CSVmra

Field Summary
 
Fields inherited from class org.apache.poi.POIOLE2TextExtractor
document
 
Constructor Summary
ExcelExtractor(DirectoryNode dir)
           
ExcelExtractor(HSSFWorkbook wb)
           
ExcelExtractor(POIFSFileSystem fs)
           
 
Method Summary
static java.lang.String _extractHeaderFooter(HeaderFooter hf)
           
 java.lang.String getText()
          Retrieves all the text from the document.
static void main(java.lang.String[] args)
          Command line extractor.
 void setFormulasNotResults(boolean formulasNotResults)
          Should we return the formula itself, and not the result it produces? Default is false
 void setIncludeBlankCells(boolean includeBlankCells)
          Should blank cells be output? Default is to only output cells that are present in the file and are non-blank.
 void setIncludeCellComments(boolean includeCellComments)
          Should cell comments be included? Default is false
 void setIncludeHeadersFooters(boolean includeHeadersFooters)
          Should headers and footers be included in the output? Default is true
 void setIncludeSheetNames(boolean includeSheetNames)
          Should sheet names be included? Default is true
 
Methods inherited from class org.apache.poi.POIOLE2TextExtractor
getDocSummaryInformation, getDocument, getMetadataTextExtractor, getRoot, getSummaryInformation
 
Methods inherited from class org.apache.poi.POITextExtractor
close, setFilesystem
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ExcelExtractor

public ExcelExtractor(HSSFWorkbook wb)

ExcelExtractor

public ExcelExtractor(POIFSFileSystem fs)
               throws java.io.IOException
Throws:
java.io.IOException

ExcelExtractor

public ExcelExtractor(DirectoryNode dir)
               throws java.io.IOException
Throws:
java.io.IOException
Method Detail

main

public static void main(java.lang.String[] args)
                 throws java.io.IOException
Command line extractor.

Parameters:
args - the command line parameters
Throws:
java.io.IOException - if the file can't be read or contains errors

setIncludeSheetNames

public void setIncludeSheetNames(boolean includeSheetNames)
Description copied from interface: ExcelExtractor
Should sheet names be included? Default is true

Specified by:
setIncludeSheetNames in interface ExcelExtractor
Parameters:
includeSheetNames - true if the sheet names should be included

setFormulasNotResults

public void setFormulasNotResults(boolean formulasNotResults)
Description copied from interface: ExcelExtractor
Should we return the formula itself, and not the result it produces? Default is false

Specified by:
setFormulasNotResults in interface ExcelExtractor
Parameters:
formulasNotResults - true if the formula itself is returned

setIncludeCellComments

public void setIncludeCellComments(boolean includeCellComments)
Description copied from interface: ExcelExtractor
Should cell comments be included? Default is false

Specified by:
setIncludeCellComments in interface ExcelExtractor
Parameters:
includeCellComments - true if cell comments should be included

setIncludeBlankCells

public void setIncludeBlankCells(boolean includeBlankCells)
Should blank cells be output? Default is to only output cells that are present in the file and are non-blank.

Parameters:
includeBlankCells - true if blank cells should be included

setIncludeHeadersFooters

public void setIncludeHeadersFooters(boolean includeHeadersFooters)
Description copied from interface: ExcelExtractor
Should headers and footers be included in the output? Default is true

Specified by:
setIncludeHeadersFooters in interface ExcelExtractor
Parameters:
includeHeadersFooters - true if headers and footers should be included

getText

public java.lang.String getText()
Description copied from class: POITextExtractor
Retrieves all the text from the document. How cells, paragraphs etc are separated in the text is implementation specific - see the javadocs for a specific project for details.

Specified by:
getText in interface ExcelExtractor
Specified by:
getText in class POITextExtractor
Returns:
All the text from the document

_extractHeaderFooter

public static java.lang.String _extractHeaderFooter(HeaderFooter hf)