Skip to main content
Version: Current

Website Ingestion

Overview

Website ingestion allows organizations to crawl, process, and semantically index website content for use within IB-X Conversational Agents.

The ingestion process extracts content from the configured website, generates semantic embeddings, and stores the processed knowledge for Agent-specific retrieval experiences.

Knowledge ingested through this workflow is available only to the Agent from which the ingestion was configured.

Website ingestion is performed through the Knowledge Base Configuration wizard available inside the Agent Designer.

The wizard guides users through the following stages:

  1. Select the source type
  2. Configure website details
  3. Select content and collections
  4. Review and process ingestion

Configuring Website Ingestion

To configure website ingestion:

  • Open the required Agent in the Designer
  • Click the Ingestions button available in the designer canvas
  • Click Add Ingestion

The Knowledge Base Configuration wizard is displayed.


Step 1 — Source

The Source step allows users to choose the type of knowledge source to ingest.

Select:

Website URL

This option enables website crawling and semantic ingestion of website content.

Other supported source types may include:

Source TypeDescription
Website URLCrawl and ingest website content
File UploadUpload and ingest supported files
Video URLExtract transcript and metadata from video content

Once the Website URL option is selected, continue to the next step.


Step 2 — Details

The Details step captures the primary website ingestion configuration.


Data Source Name

Specify a friendly display name for the ingestion source.

This name is displayed in the Ingestions dashboard and helps identify the configured knowledge source.

Example:

  • IB Website

Website URL

Specify the website address that should be crawled and ingested.

Example:

The ingestion engine uses this URL as the starting point for website discovery and extraction.


Business Category

Optionally select the business category that best describes the website content.

The selected category helps the ingestion engine optimize semantic structuring and retrieval behavior for the processed content.

Supported categories currently include:

CategoryDescription
Customer SupportSupport portals and troubleshooting content
E-commerceProduct catalogs and shopping websites
DocumentationTechnical and product documentation
BlogArticle and blog-based content
Knowledge BaseStructured help and knowledge systems
ForumCommunity discussion content
OtherGeneral-purpose websites
note

Business Category selection helps improve semantic organization, retrieval quality, and search efficiency during Conversational Agent retrieval operations.

After configuring the website details, click Continue.


Step 3 — Content

The Content step allows users to control what content should be extracted and processed from the selected website.

This step includes:

  • Content Types
  • Additional Collections
  • Website Structure Selection
  • Content Summary

Content Types

Select the types of content that should be extracted during ingestion.

Supported content types currently include:

Content TypeDescription
textWebsite text and written content
imageImages discovered on the website
videoEmbedded or linked video content
audioAudio-based content
documentDownloadable or linked documents

Multiple content types can be selected depending on the ingestion requirements.


Additional Collections

Additional Collections allow the ingestion engine to create specialized semantic groupings in addition to the primary text content.

Supported collections currently include:

CollectionDescription
ProductsProduct-related content and metadata
FAQsFrequently asked questions
API ReferencesTechnical API documentation
Code SnippetsSource code and implementation examples

These collections help improve downstream semantic retrieval and contextual relevance.


Website Structure

The Website Structure section displays the discovered website hierarchy and URLs extracted from the configured website.

Users can:

  • Select individual URLs
  • Select specific sections
  • Select nested website structures
  • Select all discovered URLs

Only the selected content will be processed during ingestion.

This allows organizations to precisely control which sections of a website become part of the Agent knowledge store.


Content Summary

The Summary section displays ingestion metrics based on the selected content.

Typical metrics include:

MetricDescription
Selected URLsNumber of URLs selected for ingestion
Content TypesNumber of enabled content types

The summary dynamically updates based on the current content selection.


Starting Ingestion

After completing content selection, click Process to begin ingestion.

The ingestion engine performs:

  1. Website crawling
  2. Content extraction
  3. Content normalization
  4. Semantic chunk generation
  5. Embedding creation
  6. Vector storage
  7. Metadata persistence

The processed knowledge becomes available to the current Conversational Agent after ingestion completes successfully.


Monitoring Ingestion

Once ingestion begins, the configured source appears in the Ingestions dashboard.

The dashboard provides visibility into:

  • Ingestion status
  • Processed URLs
  • Ingestion runs
  • Runtime metrics
  • URL-level operations

Users can later:

  • Review ingestion runs
  • Rerun ingestion
  • Delete URLs
  • Delete ingestion runs
  • Edit ingestion configuration

Notes

  • Website ingestion is Agent-specific.
  • Knowledge ingested for one Agent is not shared with other Agents.
  • Ingestion usage contributes to the environment-level ingestion quota.
  • The Characters Limit is governed by the DATA_INGESTION_LIMIT subscription entitlement.