Knowledge Ingestion Service
Overview
The IB-X Knowledge Ingestion Service is an enterprise-grade semantic ingestion and retrieval platform that enables Conversational Agents to consume organization-specific knowledge through semantic search and Retrieval-Augmented Generation (RAG).
The platform transforms structured, semi-structured, and unstructured content into semantically searchable knowledge that can later be retrieved by Conversational Agents during conversation.
The Knowledge Ingestion Service supports:
- Website ingestion
- File-based ingestion
- Semantic chunk generation
- Embedding generation
- Vector-based retrieval
- Relationship mapping
- Agent-specific knowledge isolation
The service operates as a dedicated background processing component within the IB-X ecosystem and integrates directly with Conversational Agents.
Purpose
The Knowledge Ingestion Service allows organizations to build AI-driven conversational experiences grounded in their own enterprise knowledge.
Instead of relying only on foundational model knowledge, Conversational Agents can retrieve semantically relevant information from:
- Websites
- Documents
- Technical references
- FAQs
- Product information
- Knowledge repositories
This enables organizations to build:
- Enterprise AI assistants
- Knowledge-driven conversational experiences
- AI-powered support systems
- Product knowledge assistants
- Technical documentation assistants
- Semantic search experiences
Agent-Specific Knowledge Model
Knowledge ingestion in IB-X is Agent-specific.
Each Agent maintains its own:
- Knowledge sources
- Ingestion runs
- Semantic embeddings
- Retrieval context
- Specialized collections
Knowledge ingested for one Agent is isolated from other Agents and is accessible only to the owning Conversational Agent.
This architecture improves:
- Retrieval relevance
- Knowledge ownership
- Domain specialization
- Security boundaries
- Conversational accuracy
High-Level Architecture
The Knowledge Ingestion platform processes content through multiple semantic processing stages.
Source Content
↓
Knowledge Ingestion Service
↓
Content Extraction
↓
Semantic Chunking
↓
Embedding Generation
↓
Vector Storage
↓
Conversational Agent Retrieval
The platform combines:
- Relational metadata storage
- Semantic vector storage
- Graph relationship storage
- AI-powered retrieval orchestration
to provide enterprise-grade semantic retrieval experiences.
For more information about the processing pipeline, storage model, retrieval architecture, and infrastructure components, refer to the Architecture documentation.
Supported Ingestion Sources
The platform currently supports the following ingestion source types.
| Source Type | Description |
|---|---|
| Website URL | Crawl and ingest website content |
| File Upload | Upload and process supported files |
Supported Content Types
The ingestion engine can process multiple content formats.
Supported content types currently include:
- text
- image
- document
Semantic Retrieval
During conversation, Conversational Agents perform semantic retrieval against their associated knowledge space.
The retrieval process typically includes:
- Semantic similarity search
- Embedding retrieval
- Context extraction
- AI grounding
- Response generation
This architecture enables Retrieval-Augmented Generation (RAG) experiences within the IB-X platform.
For more information about semantic retrieval, Agent-specific knowledge isolation, and Retrieval-Augmented Generation (RAG) workflows, refer to the Conversational Agent Integration documentation.
Infrastructure Components and Administration
The Knowledge Ingestion platform consists of multiple infrastructure components that work together to support semantic ingestion, embedding generation, retrieval orchestration, and relationship mapping.
The infrastructure configuration is managed globally from the root tenant while the ingested knowledge itself remains Agent-specific.
| Component | Responsibility |
|---|---|
| Relational Database | Operational metadata storage |
| Vector Store | Semantic embedding storage |
| Graph Database | Relationship and traversal storage |
| External Services | Embedding and orchestration services |
| Image Extraction Engine | OCR and semantic image extraction |
Knowledge Ingestion Service infrastructure is configured globally through the root tenant administration experience.
Administrative configuration includes:
- Relational Database
- Vector Store
- Graph Database
- External Services
- Image Extraction
For more information about infrastructure configuration, Vector Store setup, Graph Database configuration, and external service integration, refer to the Administration documentation.
Licensing and Usage
Knowledge ingestion usage is governed through subscription entitlements.
The platform tracks:
- Embeddings created
- Bytes ingested
- Characters ingested
The ingestion quota is controlled using:
DATA_INGESTION_LIMIT
This quota applies at the environment level.
Documentation Guide
The following documents provide detailed information about the Knowledge Ingestion platform.
| Document | Description |
|---|---|
| Architecture | Core platform architecture and processing model |
| Administration | Infrastructure and service configuration |
| User Interface | Ingestion dashboard and management experience |
| Website Ingestion | Website crawling and ingestion workflow |
| File Upload Ingestion | File-based ingestion workflow |
| Conversational Agent Integration | Semantic retrieval and Agent grounding |
| Operations | Monitoring and operational management |
Notes
- Knowledge ingestion is Agent-specific.
- Infrastructure configuration is managed globally from the root tenant.
- Semantic embeddings are stored in the configured Vector Store.
- Relationship-based retrieval scenarios may use the configured Graph Database.
- Image extraction behavior can be customized using configurable extraction prompts.