Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2215

TikaException about "Invalid embedded resource" on a valid PPT file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.14
    • 1.15, 2.0.0
    • parser
    • None
    • Windows 7 x64, JVM 1.8.0_101

    Description

      On the attached file, which opens with PowerPoint, the Tika parser throws the following error:

      org.apache.tika.exception.TikaException: Invalid embedded resource
      at org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc:243
      at org.apache.tika.parser.microsoft.HSLFExtractor.handleSlideEmbeddedResources:390
      at org.apache.tika.parser.microsoft.HSLFExtractor.parse:142
      at org.apache.tika.parser.microsoft.OfficeParser.parse:172
      at org.apache.tika.parser.microsoft.OfficeParser.parse:130
      Caused by: java.lang.IndexOutOfBoundsException: Block 32630271 not found
      at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.getBlockAt:486
      at org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.next:169
      at org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.next:142
      at org.apache.poi.poifs.filesystem.NDocumentInputStream.readFully:248
      at org.apache.poi.poifs.filesystem.DocumentInputStream.readFully:165
      at org.apache.poi.poifs.filesystem.DocumentInputStream.readFully:160
      at org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc:226
      at org.apache.tika.parser.microsoft.HSLFExtractor.handleSlideEmbeddedResources:390
      at org.apache.tika.parser.microsoft.HSLFExtractor.parse:142
      at org.apache.tika.parser.microsoft.OfficeParser.parse:172
      at org.apache.tika.parser.microsoft.OfficeParser.parse:130
      Caused by: java.lang.IndexOutOfBoundsException: Unable to read 512 bytes from 16706699264 in stream of length 164352
      at org.apache.poi.poifs.nio.ByteArrayBackedDataSource.read:42
      at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.getBlockAt:484
      at org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.next:169
      at org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.next:142
      at org.apache.poi.poifs.filesystem.NDocumentInputStream.readFully:248
      at org.apache.poi.poifs.filesystem.DocumentInputStream.readFully:165
      at org.apache.poi.poifs.filesystem.DocumentInputStream.readFully:160
      at org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc:226
      at org.apache.tika.parser.microsoft.HSLFExtractor.handleSlideEmbeddedResources:390
      at org.apache.tika.parser.microsoft.HSLFExtractor.parse:142
      at org.apache.tika.parser.microsoft.OfficeParser.parse:172
      at org.apache.tika.parser.microsoft.OfficeParser.parse:130

      Attachments

        1. Iverson.ppt
          8.25 MB
          Seva Alekseyev

        Issue Links

          Activity

            People

              Unassigned Unassigned
              sevaa Seva Alekseyev
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: