Version: Current

Website Ingestion

Overview

Website ingestion allows organizations to crawl, process, and semantically index website content for use within IB-X Conversational Agents.

The ingestion process extracts content from the configured website, generates semantic embeddings, and stores the processed knowledge for Agent-specific retrieval experiences.

Knowledge ingested through this workflow is available only to the Agent from which the ingestion was configured.

Website ingestion is performed through the Knowledge Base Configuration wizard available inside the Agent Designer.

The wizard guides users through the following stages:

Select the source type
Configure website details
Select content and collections
Review and process ingestion

Configuring Website Ingestion

To configure website ingestion:

Open the required Agent in the Designer
Click the Ingestions button available in the designer canvas
Click Add Ingestion

The Knowledge Base Configuration wizard is displayed.

Step 1 — Source

The Source step allows users to choose the type of knowledge source to ingest.

Select:

Website URL

This option enables website crawling and semantic ingestion of website content.

Other supported source types may include:

Source Type	Description
Website URL	Crawl and ingest website content
File Upload	Upload and ingest supported files
Video URL	Extract transcript and metadata from video content

Once the Website URL option is selected, continue to the next step.

Step 2 — Details

The Details step captures the primary website ingestion configuration.

Data Source Name

Specify a friendly display name for the ingestion source.

This name is displayed in the Ingestions dashboard and helps identify the configured knowledge source.

Example:

IB Website

Website URL

Specify the website address that should be crawled and ingested.

Example:

https://www.intellibuddies.com

The ingestion engine uses this URL as the starting point for website discovery and extraction.

Business Category

Optionally select the business category that best describes the website content.

The selected category helps the ingestion engine optimize semantic structuring and retrieval behavior for the processed content.

Supported categories currently include:

Category	Description
Customer Support	Support portals and troubleshooting content
E-commerce	Product catalogs and shopping websites
Documentation	Technical and product documentation
Blog	Article and blog-based content
Knowledge Base	Structured help and knowledge systems
Forum	Community discussion content
Other	General-purpose websites

note

Business Category selection helps improve semantic organization, retrieval quality, and search efficiency during Conversational Agent retrieval operations.

After configuring the website details, click Continue.

Step 3 — Content

The Content step allows users to control what content should be extracted and processed from the selected website.

This step includes:

Content Types
Additional Collections
Website Structure Selection
Content Summary

Content Types

Select the types of content that should be extracted during ingestion.

Supported content types currently include:

Content Type	Description
text	Website text and written content
image	Images discovered on the website
video	Embedded or linked video content
audio	Audio-based content
document	Downloadable or linked documents

Multiple content types can be selected depending on the ingestion requirements.

Additional Collections

Additional Collections allow the ingestion engine to create specialized semantic groupings in addition to the primary text content.

Supported collections currently include:

Collection	Description
Products	Product-related content and metadata
FAQs	Frequently asked questions
API References	Technical API documentation
Code Snippets	Source code and implementation examples

These collections help improve downstream semantic retrieval and contextual relevance.

Website Structure

The Website Structure section displays the discovered website hierarchy and URLs extracted from the configured website.

Users can:

Select individual URLs
Select specific sections
Select nested website structures
Select all discovered URLs

Only the selected content will be processed during ingestion.

This allows organizations to precisely control which sections of a website become part of the Agent knowledge store.

Content Summary

The Summary section displays ingestion metrics based on the selected content.

Typical metrics include:

Metric	Description
Selected URLs	Number of URLs selected for ingestion
Content Types	Number of enabled content types

The summary dynamically updates based on the current content selection.

Starting Ingestion

After completing content selection, click Process to begin ingestion.

The ingestion engine performs:

Website crawling
Content extraction
Content normalization
Semantic chunk generation
Embedding creation
Vector storage
Metadata persistence

The processed knowledge becomes available to the current Conversational Agent after ingestion completes successfully.

Monitoring Ingestion

Once ingestion begins, the configured source appears in the Ingestions dashboard.

The dashboard provides visibility into:

Ingestion status
Processed URLs
Ingestion runs
Runtime metrics
URL-level operations

Users can later:

Review ingestion runs
Rerun ingestion
Delete URLs
Delete ingestion runs
Edit ingestion configuration

Notes

Website ingestion is Agent-specific.
Knowledge ingested for one Agent is not shared with other Agents.
Ingestion usage contributes to the environment-level ingestion quota.
The Characters Limit is governed by the DATA_INGESTION_LIMIT subscription entitlement.

Overview​

Configuring Website Ingestion​

Step 1 — Source​

Step 2 — Details​

Data Source Name​

Website URL​

Business Category​

Step 3 — Content​

Content Types​

Additional Collections​

Website Structure​

Content Summary​

Starting Ingestion​

Monitoring Ingestion​

Notes​