Description
We are using Tika 1.11 to extract text from msword documents, and there are a few errors occurring when processing some docs.
This ticket relates to https://issues.apache.org/jira/browse/TIKA-1733 however in this case there is an unexpected NullPointerException and not a clear indication of the error.
Processing a saved copy of the document solves the error altogether. A difference found between the two documents was that the (HWPFDocument)document.getRange() returned different values.
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@58a306e2 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.tika.Tika.parseToString(Tika.java:496) at org.apache.tika.Tika.parseToString(Tika.java:610) Caused by: java.lang.NullPointerException at org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:311) at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:169) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 10 more