Version: Current

Knowledge Ingestion Service

Overview

The IB-X Knowledge Ingestion Service is an enterprise-grade semantic ingestion and retrieval platform that enables AI-powered activities and Agents to consume organization-specific knowledge through semantic search and Retrieval-Augmented Generation (RAG).

The platform transforms structured, semi-structured, and unstructured content into semantically searchable knowledge that can later be retrieved by AI Agents, Conversational Agents, and workflow activities during execution.

The Knowledge Ingestion Service supports:

Website ingestion
File-based ingestion
Semantic chunk generation
Embedding generation
Vector-based retrieval
Relationship mapping
Agent-specific knowledge isolation

The service operates as a dedicated background processing component within the IB-X ecosystem and integrates directly with AI-powered activities such as AI Agent, Conversational Agent, and Knowledge Base.

Purpose

The Knowledge Ingestion Service allows organizations to build AI-driven conversational experiences grounded in their own enterprise knowledge.

Instead of relying only on foundational model knowledge, Conversational Agents can retrieve semantically relevant information from:

Websites
Documents
Technical references
FAQs
Product information
Knowledge repositories

This enables organizations to build:

Enterprise AI assistants
Knowledge-driven conversational experiences
AI-powered support systems
Product knowledge assistants
Technical documentation assistants
Semantic search experiences

Agent-Specific Knowledge Model

Knowledge ingestion in IB-X is Agent-specific.

Each Agent maintains its own:

Knowledge sources
Ingestion runs
Semantic embeddings
Retrieval context
Specialized collections

Knowledge ingested for one Agent is isolated from other Agents and is accessible only to the owning Conversational Agent.

This architecture improves:

Retrieval relevance
Knowledge ownership
Domain specialization
Security boundaries
Conversational accuracy

High-Level Architecture

The Knowledge Ingestion platform processes content through multiple semantic processing stages.

The platform combines:

Relational metadata storage
Semantic vector storage
Graph relationship storage
AI-powered retrieval orchestration

to provide enterprise-grade semantic retrieval experiences.

For more information about the processing pipeline, storage model, retrieval architecture, and infrastructure components, refer to the Architecture documentation.

Supported Ingestion Sources

The platform currently supports the following ingestion source types.

Source Type	Description
Website URL	Crawl and ingest website content
File Upload	Upload and process supported files

Supported Content Types

The ingestion engine can process multiple content formats.

Supported content types currently include:

text
image
document

Semantic Retrieval

During conversation, Conversational Agents perform semantic retrieval against their associated knowledge space.

The retrieval process typically includes:

Semantic similarity search
Embedding retrieval
Context extraction
AI grounding
Response generation

This architecture enables Retrieval-Augmented Generation (RAG) experiences within the IB-X platform.

For more information about semantic retrieval, Agent-specific knowledge isolation, and Retrieval-Augmented Generation (RAG) workflows, refer to the Conversational Agent Integration documentation.

Infrastructure Components and Administration

The Knowledge Ingestion platform consists of multiple infrastructure components that work together to support semantic ingestion, embedding generation, retrieval orchestration, and relationship mapping.

The infrastructure configuration is managed globally from the root tenant while the ingested knowledge itself remains Agent-specific.

Component	Responsibility
Relational Database	Operational metadata storage
Vector Store	Semantic embedding storage
Graph Database	Relationship and traversal storage
External Services	Embedding and orchestration services
Image Extraction Engine	OCR and semantic image extraction

Knowledge Ingestion Service infrastructure is configured globally through the root tenant administration experience.

Administrative configuration includes:

Relational Database
Vector Store
Graph Database
External Services
Image Extraction

For more information about infrastructure configuration, Vector Store setup, Graph Database configuration, and external service integration, refer to the Administration documentation.

Licensing and Usage

Knowledge ingestion usage is governed through subscription entitlements.

The platform tracks:

Embeddings created
Bytes ingested
Characters ingested

The ingestion quota is controlled using:

DATA_INGESTION_LIMIT

This quota applies at the environment level.

Documentation Guide

The following documents provide detailed information about the Knowledge Ingestion platform.

Document	Description
Architecture	Core platform architecture and processing model
Administration	Infrastructure and service configuration
User Interface	Ingestion dashboard and management experience
Website Ingestion	Website crawling and ingestion workflow
File Upload Ingestion	File-based ingestion workflow
Conversational Agent Integration	Semantic retrieval and Agent grounding
Operations	Monitoring and operational management

Notes

Knowledge ingestion is Agent-specific.
Infrastructure configuration is managed globally from the root tenant.
Semantic embeddings are stored in the configured Vector Store.
Relationship-based retrieval scenarios may use the configured Graph Database.
Image extraction behavior can be customized using configurable extraction prompts.

Overview​

Purpose​

Agent-Specific Knowledge Model​

High-Level Architecture​

Supported Ingestion Sources​

Supported Content Types​

Semantic Retrieval​

Infrastructure Components and Administration​

Licensing and Usage​

Documentation Guide​

Notes​