Skip to main content
Version: Current

Architecture

Overview

The IB-X Knowledge Ingestion Service is an enterprise-grade semantic ingestion and retrieval platform designed to provide Agent-specific knowledge retrieval for Conversational Agents.

The platform transforms structured, semi-structured, and unstructured content into semantically searchable knowledge by combining:

  • Content extraction
  • Semantic chunking
  • Embedding generation
  • Vector storage
  • Graph relationships
  • AI-powered retrieval

The architecture is designed around isolated Agent knowledge boundaries where each Agent maintains its own semantic knowledge space.


High-Level Architecture

The Knowledge Ingestion pipeline follows the processing architecture below:

Source Content

Knowledge Ingestion Service

Content Discovery & Extraction

Normalization & Semantic Chunking

Embedding Generation

Vector Storage

Relationship Mapping

Conversational Agent Retrieval

Ingestion Pipeline

The ingestion engine processes content through multiple stages.


1. Source Discovery

The ingestion engine discovers and loads content from supported sources.

Supported source types include:

  • Website URLs
  • Uploaded files
  • Video URLs

For website ingestion, the engine can:

  • Crawl website structures
  • Discover linked pages
  • Parse sitemap structures
  • Build hierarchical content maps

2. Content Extraction

The ingestion engine extracts semantic content from discovered sources.

Supported extraction types include:

TypeDescription
TextWritten content extraction
ImageImage analysis and OCR
AudioAudio transcript extraction
VideoVideo transcript and metadata extraction
DocumentStructured document parsing

Image-based extraction may use configurable image extraction prompts.


3. Content Normalization

Extracted content is normalized into a standardized semantic representation.

Normalization may include:

  • Text cleanup
  • Structural parsing
  • Metadata enrichment
  • Content classification
  • Semantic grouping

4. Semantic Chunking

Normalized content is divided into semantic chunks optimized for retrieval operations.

Chunking improves:

  • Retrieval precision
  • Context relevance
  • Embedding efficiency
  • Conversational grounding

5. Embedding Generation

The ingestion engine generates semantic embeddings for processed content chunks.

Embeddings are numerical vector representations used for semantic similarity search.

The embedding pipeline is orchestrated through the configured AI service infrastructure.


6. Vector Storage

Generated embeddings are persisted in the configured Vector Store.

IB-X currently supports:

  • Qdrant

The Vector Store enables:

  • Semantic similarity search
  • Context retrieval
  • Embedding indexing
  • Vector-based ranking

7. Relationship Mapping

Optional relationship mapping stores semantic relationships in the configured Graph Database.

IB-X currently supports:

  • Neo4j

Relationship mapping enables:

  • Knowledge traversals
  • Entity relationships
  • Semantic linking
  • Graph-based retrieval scenarios

Conversational Retrieval Flow

During conversation, the Conversational Agent performs semantic retrieval against its associated knowledge space.

The retrieval flow typically includes:

  1. User submits query
  2. Semantic search is performed
  3. Relevant embeddings are identified
  4. Related semantic chunks are retrieved
  5. Context is supplied to the AI model
  6. Grounded response is generated

This architecture enables Retrieval-Augmented Generation (RAG) experiences within IB-X.


Agent Knowledge Isolation

Knowledge ingestion in IB-X is Agent-specific.

Each Agent maintains:

  • Independent ingestion sources
  • Isolated semantic embeddings
  • Agent-specific ingestion runs
  • Dedicated retrieval context

This architecture prevents unrelated Agents from accessing or retrieving knowledge belonging to another Agent.

The isolation model improves:

  • Security
  • Contextual relevance
  • Knowledge ownership
  • Retrieval quality
  • Domain specialization

Core Components

Knowledge Ingestion Service

The Knowledge Ingestion Service is responsible for:

  • Source discovery
  • Content extraction
  • Semantic chunking
  • Embedding generation
  • Metadata persistence
  • Retrieval preparation
  • Ingestion orchestration

Relational Database

The Relational Database stores operational ingestion metadata.

Stored metadata includes:

  • Ingestion definitions
  • Runtime metadata
  • URL tracking
  • Processing state
  • Ingestion runs
  • Operational records

IB-X currently supports:

  • PostgreSQL

Vector Store

The Vector Store stores semantic embeddings and vector indexes.

Responsibilities include:

  • Embedding persistence
  • Semantic search
  • Similarity ranking
  • Retrieval optimization

IB-X currently supports:

  • Qdrant

Graph Database

The Graph Database stores semantic relationships and traversal structures.

Responsibilities include:

  • Relationship modeling
  • Semantic linking
  • Graph traversals
  • Connected retrieval scenarios

IB-X currently supports:

  • Neo4j

External Services

External Services provide orchestration and AI infrastructure integration.

Responsibilities include:

  • Embedding generation
  • AI inference
  • Pipeline orchestration
  • Operational coordination

Image Extraction Engine

The Image Extraction Engine processes image-based content discovered during ingestion.

Capabilities include:

  • OCR extraction
  • Semantic interpretation
  • Context generation
  • Image-based knowledge extraction

The extraction behavior can be customized using configurable extraction prompts.


Storage Architecture

The platform separates storage responsibilities across specialized infrastructure components.

ComponentResponsibility
Relational DatabaseOperational metadata
Vector StoreSemantic embeddings
Graph DatabaseRelationships and traversals

This separation improves scalability, retrieval efficiency, and operational maintainability.


Licensing and Usage Model

Knowledge ingestion usage is controlled through subscription entitlements.

The platform tracks:

  • Embeddings created
  • Bytes ingested
  • Characters ingested

The ingestion quota is governed through:

DATA_INGESTION_LIMIT

This quota applies at the environment level.


Administration Scope

Knowledge Ingestion Service infrastructure is configured globally from the root tenant.

Configuration includes:

  • Relational Database
  • Vector Store
  • Graph Database
  • External Services
  • Image Extraction

These configurations are shared across the environment while the ingested knowledge itself remains Agent-specific.



Notes

  • Knowledge ingestion in IB-X is Agent-specific and isolated to the owning Conversational Agent.
  • Infrastructure configuration for the Knowledge Ingestion Service is performed globally from the root tenant.
  • Ingestion usage limits are enforced using the DATA_INGESTION_LIMIT subscription entitlement.
  • The Vector Store, Graph Database, and Relational Database are independently configurable infrastructure components.
  • Image extraction behavior can be customized using configurable extraction prompts.


See Also

  • Conversational Agents
  • Agent Designer
  • AI Command Center
  • Agent Health Model
  • Integration Gateway