Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2290

PDFParser 'ocr' properties cannot be set via headers when using Tika JAXRS

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.13, 1.14
    • 1.15, 2.0.0
    • ocr, parser
    • None

    Description

      I have created a stackoverflow question on this topic here , but I'll reiterate the main issue.

      I am trying to use TikaJAXRS and add headers for setting PDFParser properties. Specifically the ocrStrategy property. However, when I add the header using X-Tika-PDFocrStrategy, I get an error stating that it is an invalid X-Tika-OCR header.

      After looking into the source code, I believe the issue might be with the 'fillParseContext' method in the TikaResource.java file.

      The if statement first looks for a key that starts with the OCR header prefix, and since the PDFParser's property name contains 'ocr', it is trying to find a property named 'ocrStrategy' in the OCRParser class, which doesn't exist.

      Attachments

        Activity

          People

            tallison Tim Allison
            koberlag Kevin Oberlag
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: