Extract Data

Description

The Extract Data activity enables data extraction from multiple web pages, applies filtering, and structures the data into a specified format for further processing.

Design-Time Configuration

The Extract Data Selector Configuration dialog generally allows the following properties to be configured.

Extract Data Selector

Property	Description
Row Selector	XPath expression for row selection.
columnSelector	Defines column properties: Column Name – The identifiable column name.XPath – The column selection XPath.Attribute – The attribute value to extract.
Next Page	XPath expression for navigating to the next page.

It is better to understand the configuration of this activity with an example use case.

For this example, we will extract the Product Name and Price from this Amazon URL.

Click Configure and select Browser and URL. This opens the Extract Data Selector dialog.

Extract Data Step1

Enter the column name, then click the button to activate element selection.

Extract Data Step2

Select the column element to extract its data. It highlights all matching elements.

Extract Data Step3_1

tip

If the process fails to highlight certain elements, manually select them and click Finish to save the selector. The Row Selector and column XPaths will then populate automatically.

Extract Data Step3_2

Retrieve specific attribute values from elements by configuring custom attributes.

Extract Data Attribute

Add all required columns by selecting the corresponding elements.

Extract Data Step4

To extract data across pages, specify the element to be clicked to move to the next page. Alternatively, click on the button next to Next Page . to activate element selection.
Click on any column item to highlight all elements in that column.

Extract Data Preview1

note

Ensure the selector works across multiple pages. Manually edit XPaths if needed for accuracy.

Preview

Click the Preview button to display the extracted data in a tabular format.

Extract Data Preview2

Properties

Input

Delay (in MS) – Specifies the delay in milliseconds before proceeding to the next page, ensuring adequate page load time.
Max Records– Defines the maximum number of records from which to extract data. Set -1 to extract all available records.
Web Page – References the currently opened web page object.

Misc

DisplayName – Assigns a display name for the activity.
Private – When enabled, prevents logging of property values within the workflow.

Optional

Continue On Error – Defines whether automation continues in case of errors: - True: The workflow proceeds without throwing an exception. - False (Default): The activity throws an exception upon encountering an error.
Timeout – Specifies a timeout (Timespan format) for the activity before aborting execution. The default is 10 minutes.

Output

Result – Stores the extracted data in a structured data table format.

Example

Download Example

Extract Data

Description​

Design-Time Configuration​

Preview​

Properties​

Input​

Misc​

Optional​

Output​

Example​