Skip to main content

Getting Started

Creating New Project

To create a new document training model project from IntelliTrainer:

  1. Invoke IntelliTrainer.exe from IntelliBuddies installation folder
  2. In the File tab, click on New and select Blank Project
  3. This will pop up a new project dialog:

FieldDescription
Project NameSpecify a name for this project. Ensure that the following characters are not used as part of the name: "<>:/\|?*
LocationSpecify a location to save this project in your file system. A new folder with the Project Name specified above will be created under this location. All the project-related files and resources will be stored under this folder.
Image Resize PercentSpecify the image resize percentage from the dropdown. This will help OCR Engine to extract text more accurately.
DescriptionType in some description about this project for your reference in the future.
  1. Click on the Create button to create a new project with the details provided above.

Document Templates

You can add, modify, and delete a document template for training your model from IntelliTrainer. You should open the corresponding project in the IntelliTrainer to manage the document templates for training your model.

Adding new document template

  1. Click on Batch tab in Ribbon Tabs panel
  2. Click on Add menu in Ribbon Menu panel
  3. Select the document template that you want to train as part of this project
  4. This will add a new node with the name of the document template file selected under the Batch panel

For example, if the selected document template name is invoice01.pdf, then the Batch panel would be updated as shown below:

Document Template Properties

Once you have added a new document template to the project, you can view and configure the document template properties from the Properties panel.

PropertyDescription
NameThe name of this document template. By default, it will be set to the file name of the document template added. You can modify the name according to your project needs.
Selection ModeSelect the document identification mode. You have the following options:
  • File Name Pattern: Identify the document based on the file name pattern
  • Keywords: Identify the document based on the existence of the specified keywords as part of the content of the document
  • Select All: Identify document based on both the above options
KeywordsThe keywords to be matched in case the Selection Mode was Keywords. You can add, edit, and remove keywords from here.
Match All KeywordsCheck this if you want to match all the specified keywords to identify a document. By default, the document will be identified as belonging to this template if one of the keywords matches.
ToleranceTolerance is to be used while matching the keywords inside the document content. The following options are available:
  • Weak: Enables exact match criteria by ensuring zero(0%) percent tolerance
  • Medium: Enables match with mild tolerance up to 25%
  • Strong: Enables match with strong tolerance up to 50%
  • Custom: Enables custom specified match tolerance
Custom ToleranceThe custom tolerance in percentage to be used in case the ** Tolerance** selected was Custom

Once you configure the document template properties, the same will be reflected in the Batch and Properties panels.

Page Templates

Once you add a new document template to the project, it automatically lists all the pages of this template under the corresponding document template node inside the Batch panel. You can view all the pages by expanding the corresponding document template node in the Batch panel.

Context Menu

You can manage the pages from the Batch panel. IntelliTrainer provides a context menu to manage the pages to be utilized for training under a corresponding document template.

MenuDescription
Add RegionAdds a new region node under this page
DisableDisable this page from the training project. Turning off the page will still keep the page node so that you can enable it back later
DeleteThis page from the training project is deleted. A page, once deleted, can never be reverted.

Page Template Properties

You can view and configure the page properties from the Properties panel for the page selected in the Batch panel.

PropertyDescription
NameThe name of this page. By default, the name would be set to Page #, where # would represent the page number of the corresponding page inside the document template
Title - PatternsYou can identify this document page by matching the patterns specified here.
Title - Match All PatternYou can check this flag to match all the patterns specified to identify this page.
Title - RegionYou can specify the region on this page to search for the Title Patterns.

Regions

The performance of OCR Engine depends on the size of the image processing. The smaller the size, the higher the performance. It has also been seen in some cases the accuracy of extraction will also improve if we provide accurate Clipping Region to OCR Engine. By defining the regions inside your pages, you can manage the data extraction to be much faster and more accurate. You can add a new region under a page using the Context Menu option Add Region for the corresponding page.

Context Menu

The region node inside the Batch panel provides the following context menu options:

MenuDescription
Add FieldAdds a new field under this region
CopyCopies the entire region onto the clipboard so that you can paste it to re-use this region under a different page or document template
DeleteDelete this region

Region Properties

You can view and configure the region properties from the Properties panel for the selected region under the Batch panel.

PropertyDescription
NameSpecify a name for this region. By default, a name would be assigned to this region in the format Region #, where # would be the region index.
RegionSpecify the BoundingRect for this region. By default, it would select the entire image as a region. You can select your region by clicking on the [...] button of the Region property inside the Properties panel. This will bring up the region selection dialog on the Image panel. You can then specify the region by holding the mouse's left button and dragging and releasing the button. You can then press the Apply button inside the region selection dialog to set the specified region inside the Region property.

Fields

A Field is the leaf node in the Batch panel. It represents specific information that needs to be extracted from the document. You can add a new field under a region through the region context menu.

Field Properties

You can view and configure the field properties in the Properties panel for the selected field inside the Batch panel.

PropertyDescription
NameSpecify a name for this field. By default, a name would be assigned to the field in the format Field #, where # represents the index of this field.
Default ValueThe default value to be assigned to this field
OCR ParametersOCR Parameters to be used by OCR Engine while extracting this field value
RegionThe Bounding Rectangle in the page where this field's value is located
TypeShould be one of the following:
  • Absolute Position: The specified region represents the absolute position of the field value
  • Relative to Anchor: The specified region represents the relative position of the field value from the specified anchor text
  • Relative to Field: The specified region represents the relative position of the field value from the specified Field
  • Relative to Title: The specified region represents the relative position of the field value from the page title
Relative Anchor PatternsThe anchor patterns to be used in case of Relative to Anchor type field
Relative FieldThe name of the field to be used in case of Relative to Field type
ToleranceTolerance is to be used while matching the anchor patterns inside the document content. The following options are available:
  • Weak: Enables exact match criteria by ensuring zero(0%) percent tolerance
  • Medium: Enables match with mild tolerance up to 25%
  • Strong: Enables match with strong tolerance up to 50%
  • Custom: Enables custom specified match tolerance
Custom PercentageThe custom tolerance in percentage to be used in case the ** Tolerance** selected was Custom

This way, you can add all the fields under the corresponding region. You can train the document training model by adding all the fields under this region and continue further to add any other regions under this page. Further, you can continue training the model to handle other pages under the current document template before adding more document templates to the model.

Validating Document Training Model

Once you have completed training for all the document templates, you can validate the document training model by clicking the Validate button inside the Batch ribbon tab menu. Any errors during the validation process will be reported under the Error panel.

Resolve all the errors before publishing or exporting the document training model.

Publishing Document Training Model

Once the validation of your document training model is successful, you can publish the training model so that it can be consumed as part of IntelliBuddies OCR Activities.

You can publish by clicking the Publish button in the Batch ribbon tab menu. This will bring up the publish dialog, asking you to select the location to publish this document training model.

The document training model will be serialized to a JSON file under the specified location on publishing. The name of the model would be selected based on the project name. The Output panel would display the message indicating the name of the training model published.

You can now use this training model for activities such as Identify Document With OCR and Extract PDF Data With OCR. The JSON file published by IntelliTrainer is the serialized version of DocumentQueries, which goes as input to these Activities