Skip navigation links

Package org.apache.poi.hpsf

Processes streams in the Horrible Property Set Format (HPSF) in POI filesystems.

See: Description

Package org.apache.poi.hpsf Description

Processes streams in the Horrible Property Set Format (HPSF) in POI filesystems. Microsoft Office documents, i.e. POI filesystems, usually contain meta data like author, title, last saving time etc. These items are called properties and stored in property set streams along with the document itself. These streams are commonly named \005SummaryInformation and \005DocumentSummaryInformation. However, a POI filesystem may contain further property sets of other names or types.

In order to extract the properties from a POI filesystem, a property set stream's contents must be parsed into a PropertySet instance. Its subclasses SummaryInformation and DocumentSummaryInformation deal with the well-known property set streams \005SummaryInformation and \005DocumentSummaryInformation. (However, the streams' names are irrelevant. What counts is the property set's first section's format ID - see below.)

The factory method PropertySetFactory.create(org.apache.poi.poifs.filesystem.DirectoryEntry, java.lang.String) creates a PropertySet instance. This method always returns the most specific property set: If it identifies the stream data as a Summary Information or as a Document Summary Information it returns an instance of the corresponding class, else the general PropertySet.

A PropertySet contains a list of Sections which can be retrieved with PropertySet.getSections(). Each Section contains a Property array which can be retrieved with Section.getProperties(). Since the vast majority of PropertySets contains only a single Section, the convenience method PropertySet.getProperties() returns the properties of a PropertySets Section (throwing a NoSingleSectionException if the PropertySet contains more (or less) than exactly one Section).

Each Property has an ID, a type, and a value which can be retrieved with Property.getID(), Property.getType(), and Property.getValue(), respectively. The value's class depends on the property's type. The current implementation does not yet support all property types and restricts the values' classes to String, Integer and Date. A value of a yet unknown type is returned as a byte array containing the values origin bytes from the property set stream.

To retrieve the value of a specific Property, use Section.getProperty(long) or Section.getPropertyIntValue(long).

The SummaryInformation and DocumentSummaryInformation classes provide convenience methods for retrieving well-known properties. For example, an application that wants to retrieve a document's title string just calls SummaryInformation.getTitle() instead of going through the hassle of first finding out what the title's property ID is and then using this ID to get the property's value.

See Also:
[MS-OLEPS] Object Linking and Embedding (OLE) Property Set Data Structures
Skip navigation links

Copyright 2022 The Apache Software Foundation or its licensors, as applicable.