Skip to main content

Extract Tables From PDF

Description

Extracts tables from a PDF file.

Design Time Configuration

At design time, you can configure the following properties:

  • PDF File Path—The PDF file path from which you want to extract tables.
  • Page(s) to read – Use the options below to read the tables within one or more pages.
OptionDescription
All    Choose this option to read the tables on all the page(s).
SingleChoose this option to read the tables on one page.
RangeChoose this option to read the tables on the specified page(s) range.
  • From page - This option is visible when you choose Range in Page(s) to read. Specify the initial page to initiate the table's reading.
  • To page—This option is visible when you choose Range in Page(s) to read. Specify the concluding page number within the range for the table's reading.
  • Page number - This option is visible when you choose Single in Page(s) to read. Specify the specific page number for reading the tables.
  • Table Style - Specify if the table you are extracting is in Spreadsheet format or Basic table with irregular structure.
OptionDescription
SpreadsheetChoose this option if the table style is in Spreadsheet format.
BasicChoose this option if the table style has an irregular structure.
  • Result – Returns Tables of type Dataset extracted from the PDF file.
note

If the table spans pages, the activity will merge and provide the data in one data table only if the header row is repeated at the start of every page. Otherwise, the activity will return a separate data table per page.

note

If two continuous tables in the document have duplicate header rows, the activity will merge these two tables into one data table.

  • Table Data Preview dialog — Clicking on the Preview button opens a dialog to test different settings and preview the data. The process fetches the same data when it executes.

    Correlation Scope

Properties

Input

  • Include Column Names—Specifies whether the first row in the tables should be considered a header row containing the column names.
note

If Include Column Names is false, then the Column name will have default Column name values like Column0, Column1, and so on."

  • Password—Specify the file's password if password protected.

Misc

  • DisplayName – Add a display name to your activity.
  • Private – By default, activity will log the values of your properties inside your workflow. If private is selected, then it stops logging.

Optional

  • Continue On Error – It Specifies whether the automation should continue even when the activity throws an error. If True, the activity continues without throwing any exceptions. If False, the activity throws an exception. The default value is False.
note

Catches no error if this activity is present inside the Try-Catch block and the value of this property is True.

Example

In this example, we will show how to extract table data from a PDF document and import it into a CSV file. The PDF file shipped, and the example process contains multiple tables extracted into CSV files.

Download Example

Correlation Scope