Skip to main content
Version: Current

Knowledge Ingestion Service

Overview

The IB-X Knowledge Ingestion Service is an enterprise-grade semantic ingestion and retrieval platform that enables Conversational Agents to consume organization-specific knowledge through semantic search and Retrieval-Augmented Generation (RAG).

The platform transforms structured, semi-structured, and unstructured content into semantically searchable knowledge that can later be retrieved by Conversational Agents during conversation.

The Knowledge Ingestion Service supports:

  • Website ingestion
  • File-based ingestion
  • Semantic chunk generation
  • Embedding generation
  • Vector-based retrieval
  • Relationship mapping
  • Agent-specific knowledge isolation

The service operates as a dedicated background processing component within the IB-X ecosystem and integrates directly with Conversational Agents.


Purpose

The Knowledge Ingestion Service allows organizations to build AI-driven conversational experiences grounded in their own enterprise knowledge.

Instead of relying only on foundational model knowledge, Conversational Agents can retrieve semantically relevant information from:

  • Websites
  • Documents
  • Technical references
  • FAQs
  • Product information
  • Knowledge repositories

This enables organizations to build:

  • Enterprise AI assistants
  • Knowledge-driven conversational experiences
  • AI-powered support systems
  • Product knowledge assistants
  • Technical documentation assistants
  • Semantic search experiences

Agent-Specific Knowledge Model

Knowledge ingestion in IB-X is Agent-specific.

Each Agent maintains its own:

  • Knowledge sources
  • Ingestion runs
  • Semantic embeddings
  • Retrieval context
  • Specialized collections

Knowledge ingested for one Agent is isolated from other Agents and is accessible only to the owning Conversational Agent.

This architecture improves:

  • Retrieval relevance
  • Knowledge ownership
  • Domain specialization
  • Security boundaries
  • Conversational accuracy

High-Level Architecture

The Knowledge Ingestion platform processes content through multiple semantic processing stages.

Source Content

Knowledge Ingestion Service

Content Extraction

Semantic Chunking

Embedding Generation

Vector Storage

Conversational Agent Retrieval

The platform combines:

  • Relational metadata storage
  • Semantic vector storage
  • Graph relationship storage
  • AI-powered retrieval orchestration

to provide enterprise-grade semantic retrieval experiences.

For more information about the processing pipeline, storage model, retrieval architecture, and infrastructure components, refer to the Architecture documentation.


Supported Ingestion Sources

The platform currently supports the following ingestion source types.

Source TypeDescription
Website URLCrawl and ingest website content
File UploadUpload and process supported files

Supported Content Types

The ingestion engine can process multiple content formats.

Supported content types currently include:

  • text
  • image
  • document

Semantic Retrieval

During conversation, Conversational Agents perform semantic retrieval against their associated knowledge space.

The retrieval process typically includes:

  1. Semantic similarity search
  2. Embedding retrieval
  3. Context extraction
  4. AI grounding
  5. Response generation

This architecture enables Retrieval-Augmented Generation (RAG) experiences within the IB-X platform.

For more information about semantic retrieval, Agent-specific knowledge isolation, and Retrieval-Augmented Generation (RAG) workflows, refer to the Conversational Agent Integration documentation.


Infrastructure Components and Administration

The Knowledge Ingestion platform consists of multiple infrastructure components that work together to support semantic ingestion, embedding generation, retrieval orchestration, and relationship mapping.

The infrastructure configuration is managed globally from the root tenant while the ingested knowledge itself remains Agent-specific.

ComponentResponsibility
Relational DatabaseOperational metadata storage
Vector StoreSemantic embedding storage
Graph DatabaseRelationship and traversal storage
External ServicesEmbedding and orchestration services
Image Extraction EngineOCR and semantic image extraction

Knowledge Ingestion Service infrastructure is configured globally through the root tenant administration experience.

Administrative configuration includes:

  • Relational Database
  • Vector Store
  • Graph Database
  • External Services
  • Image Extraction

For more information about infrastructure configuration, Vector Store setup, Graph Database configuration, and external service integration, refer to the Administration documentation.


Licensing and Usage

Knowledge ingestion usage is governed through subscription entitlements.

The platform tracks:

  • Embeddings created
  • Bytes ingested
  • Characters ingested

The ingestion quota is controlled using:

DATA_INGESTION_LIMIT

This quota applies at the environment level.


Documentation Guide

The following documents provide detailed information about the Knowledge Ingestion platform.

DocumentDescription
ArchitectureCore platform architecture and processing model
AdministrationInfrastructure and service configuration
User InterfaceIngestion dashboard and management experience
Website IngestionWebsite crawling and ingestion workflow
File Upload IngestionFile-based ingestion workflow
Conversational Agent IntegrationSemantic retrieval and Agent grounding
OperationsMonitoring and operational management

Notes

  • Knowledge ingestion is Agent-specific.
  • Infrastructure configuration is managed globally from the root tenant.
  • Semantic embeddings are stored in the configured Vector Store.
  • Relationship-based retrieval scenarios may use the configured Graph Database.
  • Image extraction behavior can be customized using configurable extraction prompts.