Website Ingestion
Overview
Website ingestion allows organizations to crawl, process, and semantically index website content for use within IB-X Conversational Agents.
The ingestion process extracts content from the configured website, generates semantic embeddings, and stores the processed knowledge for Agent-specific retrieval experiences.
Knowledge ingested through this workflow is available only to the Agent from which the ingestion was configured.
Website ingestion is performed through the Knowledge Base Configuration wizard available inside the Agent Designer.
The wizard guides users through the following stages:
- Select the source type
- Configure website details
- Select content and collections
- Review and process ingestion
Configuring Website Ingestion
To configure website ingestion:
- Open the required Agent in the Designer
- Click the Ingestions button available in the designer canvas
- Click Add Ingestion
The Knowledge Base Configuration wizard is displayed.
Step 1 — Source
The Source step allows users to choose the type of knowledge source to ingest.
Select:
Website URL
This option enables website crawling and semantic ingestion of website content.
Other supported source types may include:
| Source Type | Description |
|---|---|
| Website URL | Crawl and ingest website content |
| File Upload | Upload and ingest supported files |
| Video URL | Extract transcript and metadata from video content |
Once the Website URL option is selected, continue to the next step.
Step 2 — Details
The Details step captures the primary website ingestion configuration.
Data Source Name
Specify a friendly display name for the ingestion source.
This name is displayed in the Ingestions dashboard and helps identify the configured knowledge source.
Example:
- IB Website
Website URL
Specify the website address that should be crawled and ingested.
Example:
The ingestion engine uses this URL as the starting point for website discovery and extraction.
Business Category
Optionally select the business category that best describes the website content.
The selected category helps the ingestion engine optimize semantic structuring and retrieval behavior for the processed content.
Supported categories currently include:
| Category | Description |
|---|---|
| Customer Support | Support portals and troubleshooting content |
| E-commerce | Product catalogs and shopping websites |
| Documentation | Technical and product documentation |
| Blog | Article and blog-based content |
| Knowledge Base | Structured help and knowledge systems |
| Forum | Community discussion content |
| Other | General-purpose websites |
Business Category selection helps improve semantic organization, retrieval quality, and search efficiency during Conversational Agent retrieval operations.
After configuring the website details, click Continue.
Step 3 — Content
The Content step allows users to control what content should be extracted and processed from the selected website.
This step includes:
- Content Types
- Additional Collections
- Website Structure Selection
- Content Summary
Content Types
Select the types of content that should be extracted during ingestion.
Supported content types currently include:
| Content Type | Description |
|---|---|
| text | Website text and written content |
| image | Images discovered on the website |
| video | Embedded or linked video content |
| audio | Audio-based content |
| document | Downloadable or linked documents |
Multiple content types can be selected depending on the ingestion requirements.
Additional Collections
Additional Collections allow the ingestion engine to create specialized semantic groupings in addition to the primary text content.
Supported collections currently include:
| Collection | Description |
|---|---|
| Products | Product-related content and metadata |
| FAQs | Frequently asked questions |
| API References | Technical API documentation |
| Code Snippets | Source code and implementation examples |
These collections help improve downstream semantic retrieval and contextual relevance.
Website Structure
The Website Structure section displays the discovered website hierarchy and URLs extracted from the configured website.
Users can:
- Select individual URLs
- Select specific sections
- Select nested website structures
- Select all discovered URLs
Only the selected content will be processed during ingestion.
This allows organizations to precisely control which sections of a website become part of the Agent knowledge store.
Content Summary
The Summary section displays ingestion metrics based on the selected content.
Typical metrics include:
| Metric | Description |
|---|---|
| Selected URLs | Number of URLs selected for ingestion |
| Content Types | Number of enabled content types |
The summary dynamically updates based on the current content selection.
Starting Ingestion
After completing content selection, click Process to begin ingestion.
The ingestion engine performs:
- Website crawling
- Content extraction
- Content normalization
- Semantic chunk generation
- Embedding creation
- Vector storage
- Metadata persistence
The processed knowledge becomes available to the current Conversational Agent after ingestion completes successfully.
Monitoring Ingestion
Once ingestion begins, the configured source appears in the Ingestions dashboard.
The dashboard provides visibility into:
- Ingestion status
- Processed URLs
- Ingestion runs
- Runtime metrics
- URL-level operations
Users can later:
- Review ingestion runs
- Rerun ingestion
- Delete URLs
- Delete ingestion runs
- Edit ingestion configuration
Notes
- Website ingestion is Agent-specific.
- Knowledge ingested for one Agent is not shared with other Agents.
- Ingestion usage contributes to the environment-level ingestion quota.
- The Characters Limit is governed by the
DATA_INGESTION_LIMITsubscription entitlement.