Version: Current

Global Agent Health Configuration

Overview

The Global Agent Health Configuration page allows administrators to define the global thresholds, tolerances, and evaluation settings used by the IB-X Agent Health monitoring system.

These settings influence how Agent Health Scores are calculated across the platform and determine how agents are classified into health states such as:

Healthy
Degraded
Unhealthy
Inactive

The configured thresholds are applied during the periodic health evaluation process performed by the AI Command Center.

note

The settings configured on this page act as the global default Agent Health configuration for the AI Command Center environment.

IB-X also supports Agent-level Health Configuration, where specific Agents can override these global settings with custom thresholds and tolerances.

When Agent-level configuration is defined, the Agent Health evaluation system uses the Agent-specific settings instead of the global configuration.

How to Access

To access the Agent Health Configuration page:

Open the Administration Panel from the AI Command Center by clicking on the Settings (Gear) icon available in the Top Navigation Bar.
Navigate to: Additional → Agent Health Configuration

Configuration Sections

The Agent Health Configuration page is organized into the following sections:

Evaluation Settings
Performance Thresholds
Fault Tolerance Settings

Evaluation Settings

The Evaluation Settings section controls the global evaluation period used for Agent Health calculations.

Evaluation Time Period (minutes)

Specify the evaluation window, in minutes, used for computing Agent Health metrics.

The system evaluates agent execution statistics, failures, SLA breaches, runtime behavior, and operational metrics within this rolling evaluation window.

Example:

1440 minutes = Last 24 Hours

This configuration directly impacts:

Health score calculations
Reliability analysis
Runtime trend evaluation
Stability measurements
Operational health classification

note

The default configuration uses a 24-hour evaluation window (1440 minutes).

Performance Thresholds

The Performance Thresholds section defines the expected execution quality and runtime characteristics used during Agent Health evaluation.

Expected Success Rate

Specify the minimum expected execution success percentage for agents.

This value is used by the Reliability Score calculation to determine whether the actual success rate meets operational expectations.

Higher values indicate stricter reliability expectations.

Recommended value:

>= 95

Example:

98 indicates the system expects at least 98% successful executions.

Expected Average Runtime (seconds)

Specify the expected average execution duration for agents.

This value is used during Performance Score calculations to compare actual runtime behavior against expected execution duration.

Example:

45 seconds

Agents consistently exceeding the expected runtime may receive lower performance scores.

Max Runtime Deviation Percent

Specify the maximum acceptable runtime deviation percentage before runtime penalties are applied.

This setting determines how much runtime variance is tolerated before the Agent Health system considers the execution behavior degraded.

Example:

30 allows runtime deviations up to 30% from the expected average runtime.

Higher deviation percentages provide greater runtime flexibility.

Lower values enforce stricter runtime consistency expectations.

Fault Tolerance Settings

The Fault Tolerance Settings section defines operational tolerance levels for failures, SLA breaches, and runtime instability.

These settings are used by Stability Score and Operational Score calculations.

Max Stuck Instances

Specify the maximum number of long-running or stuck workflow instances tolerated within the evaluation period.

Instances exceeding expected execution behavior may be classified as unstable or stuck.

This setting contributes to:

Stability Score calculations
Runtime stability analysis
Unhealthy override rule evaluation

Example:

Lower values enforce stricter stability expectations.

SLA Breach Tolerance

Specify the maximum number of SLA breaches tolerated during the evaluation window before penalties are applied.

This setting contributes to the Operational Score calculation.

Recommended value:

<= 3

Example:

0 indicates that no SLA breaches are tolerated.

Higher breach counts negatively impact the Agent Health Score.

Max Failures per Hour

Specify the maximum number of execution failures tolerated per hour before reliability penalties are applied.

This setting contributes to the Reliability Score calculation.

Example:

Agents exceeding the configured failure tolerance may be classified as degraded or unhealthy.

Save Configuration

Click Save Configuration to persist the updated Agent Health settings.

The updated configuration is applied to subsequent Agent Health evaluation cycles.

Notes

Agent Health Configuration settings are applied globally across the AI Command Center environment.
Changes to thresholds may affect Agent Health classifications and dashboard indicators.
Very strict thresholds may increase the number of degraded or unhealthy agents.
Recommended values help maintain balanced operational sensitivity while avoiding excessive false positives.

Overview​

How to Access​

Configuration Sections

Evaluation Settings​

Evaluation Time Period (minutes)​

Performance Thresholds​

Expected Success Rate​

Expected Average Runtime (seconds)​

Max Runtime Deviation Percent​

Fault Tolerance Settings​

Max Stuck Instances​

SLA Breach Tolerance​

Max Failures per Hour​

Save Configuration​

Notes​

Related​

Overview

How to Access

Evaluation Settings

Evaluation Time Period (minutes)

Performance Thresholds

Expected Success Rate

Expected Average Runtime (seconds)

Max Runtime Deviation Percent

Fault Tolerance Settings

Max Stuck Instances

SLA Breach Tolerance

Max Failures per Hour

Save Configuration

Notes

Related