Details
Description
On the following Powerpoint file:
https://dl.dropboxusercontent.com/u/92341073/TCM%202012_DR_5.ppt
which opens fine with Powerpoint, the Tika parser throws the following error:
org.apache.poi.hslf.exceptions.HSLFException: java.util.zip.ZipException: invalid stored block lengths
at org.apache.poi.hslf.blip.WMF.getData(WMF.java:64)
at org.apache.tika.parser.microsoft.HSLFExtractor.handleSlideEmbeddedPictures(HSLFExtractor.java:324)
at org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:193)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:149)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
Caused by: java.util.zip.ZipException: invalid stored block lengths
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:164)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at org.apache.poi.hslf.blip.WMF.getData(WMF.java:58)
... 6 more
EDIT: the attached "Research forum" file emits a similar error "invalid block type".
EDIT2: the attached "Jankovic final Retreat 2002" file emits a similar "invalid literal/length code" error.
EDIT3: the attached "paperfigures" file emits "invalid distance too far back". Something is wrong with ZIP in Powerpoints.
EDIT4: in "Lab meeting", it's "Unexpected end of ZLIB input stream"
"suba" exhibits a similar error, "invalid distance too far back" but in a different exception.
Attachments
Attachments
Issue Links
- relates to
-
TIKA-2159 Handle pre-parse embedded object exceptions uniformly and more robustly
- Resolved