|Apache | POI||
POI-XWPF - A Quick Guide
XWPF has a fairly stable core API, providing read and write access to the main parts of a Word .docx file, but it isn't complete. For some things, it may be necessary to dive down into the low level XMLBeans objects to manipulate the ooxml structure. If you find yourself having to do this, please consider sending in a patch to enhance that, see the "Contribution to POI" page.
Basic Text Extraction#
For basic text extraction, make use of org.apache.poi.xwpf.extractor.XWPFWordExtractor. It accepts an input stream or a XWPFDocument. The getText() method can be used to get the text from all the paragraphs, along with tables, headers etc.
Specific Text Extraction#
To get specific bits of text, first create a org.apache.poi.xwpf.XWPFDocument. Select the IBodyElement of interest (Table, Paragraph etc), and from there get a XWPFRun. Finally fetch the text and properties from that.
Headers and Footers#
To get at the headers and footers of a word document, first create a org.apache.poi.xwpf.XWPFDocument. Next, you need to create a org.apache.poi.xwpf.usermodel.XWPFHeaderFooter, passing it your XWPFDocument. Finally, the XWPFHeaderFooter gives you access to the headers and footers, including first / even / odd page ones if defined in your document.
From a XWPFParagraph, it is possible to fetch the existing XWPFRun elements that make up the text. To add new text, the createRun() method will add a new XWPFRun to the end of the list. insertNewRun(int) can instead be used to add a new XWPFRun at a specific point in the paragraph.
Once you have a XWPFRun, you can use the setText(String) method to make changes to the text. To add whitespace elements such as tabs and line breaks, it is necessary to use methods like addTab() and addCarriageReturn().