Skip to main content

Version: Current

Extract PDF Text With OCR

Description

This Activity reads the contents of the PDF text, including headers, and extracts the text.

Properties

Input

From Page Number – Set the page extraction mode into "Range" and specify the page numbers to start the extraction.
Image Format – Specify the image format to save the extracted images.
Image Resize Percentage – Allows you to rescale an image by the mentioned percentage.
OCR Engine – An instance of an OCR engine returned by one of the following activities.
Page Extraction Mode – Set the page extraction mode to "All," "Single," or "Range" to continue the extraction.
Password – TSets the password to the PDF file, if necessary.
PDF File Path – The name of the PDF file from where you want to extract the text.
Single Page Number – Set the page extraction mode to "Single" and specify the page number to extract text.
To Page Number – Set the page extraction mode to "Range" and specify which page to extract the text from.

Misc

DisplayName – Add a display name to your Activity.
Private – By default, Activity will log the values of your properties inside your workflow. If private is selected, then it stops logging.

Optional

Continue On Error – It Specifies whether the automation should continue even when the Activity throws an error. If True, the Activity continues without throwing any exceptions. If False, the Activity throws an exception. The default value is False.

note

Catches no error if this Activity is present inside the Try-Catch block and the value of this property is True.

Tesseract

Page Segmentation Mode - Set the page segmentation mode used for extracting data by Tesseract.

Output

Result – It displays the input text extracted from the PDF file using the OCR engine.

Example

Download Example

Extract PDF Text With OCR

Description
Properties
Example