org.apache.poi
Class POIOLE2TextExtractor

java.lang.Object
  extended by org.apache.poi.POITextExtractor
      extended by org.apache.poi.POIOLE2TextExtractor
All Implemented Interfaces:
java.io.Closeable
Direct Known Subclasses:
EventBasedExcelExtractor, ExcelExtractor, HPSFPropertiesExtractor

public abstract class POIOLE2TextExtractor
extends POITextExtractor

Common Parent for OLE2 based Text Extractors of POI Documents, such as .doc, .xls You will typically find the implementation of a given format's text extractor under org.apache.poi.[format].extractor .

See Also:
ExcelExtractor, PowerPointExtractor, VisioTextExtractor, WordExtractor

Field Summary
protected  POIDocument document
          The POIDocument that's open
 
Constructor Summary
  POIOLE2TextExtractor(POIDocument document)
          Creates a new text extractor for the given document
protected POIOLE2TextExtractor(POIOLE2TextExtractor otherExtractor)
          Creates a new text extractor, using the same document as another text extractor.
 
Method Summary
 DocumentSummaryInformation getDocSummaryInformation()
          Returns the document information metadata for the document
 POIDocument getDocument()
          Return the underlying POIDocument
 POITextExtractor getMetadataTextExtractor()
          Returns an HPSF powered text extractor for the document properties metadata, such as title and author.
 DirectoryEntry getRoot()
          Return the underlying DirectoryEntry of this document.
 SummaryInformation getSummaryInformation()
          Returns the summary information metadata for the document.
 
Methods inherited from class org.apache.poi.POITextExtractor
close, getText, setFilesystem
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

document

protected POIDocument document
The POIDocument that's open

Constructor Detail

POIOLE2TextExtractor

public POIOLE2TextExtractor(POIDocument document)
Creates a new text extractor for the given document

Parameters:
document - The POIDocument to use in this extractor.

POIOLE2TextExtractor

protected POIOLE2TextExtractor(POIOLE2TextExtractor otherExtractor)
Creates a new text extractor, using the same document as another text extractor. Normally only used by properties extractors.

Parameters:
otherExtractor - the extractor which document to be used
Method Detail

getDocSummaryInformation

public DocumentSummaryInformation getDocSummaryInformation()
Returns the document information metadata for the document

Returns:
The Document Summary Information or null if it could not be read for this document.

getSummaryInformation

public SummaryInformation getSummaryInformation()
Returns the summary information metadata for the document.

Returns:
The Summary information for the document or null if it could not be read for this document.

getMetadataTextExtractor

public POITextExtractor getMetadataTextExtractor()
Returns an HPSF powered text extractor for the document properties metadata, such as title and author.

Specified by:
getMetadataTextExtractor in class POITextExtractor
Returns:
an instance of POIExtractor that can extract meta-data.

getRoot

public DirectoryEntry getRoot()
Return the underlying DirectoryEntry of this document.

Returns:
the DirectoryEntry that is associated with the POIDocument of this extractor.

getDocument

public POIDocument getDocument()
Return the underlying POIDocument

Returns:
the underlying POIDocument