Global Agent Health Configuration
Overview
The Global Agent Health Configuration page allows administrators to define the global thresholds, tolerances, and evaluation settings used by the IB-X Agent Health monitoring system.
These settings influence how Agent Health Scores are calculated across the platform and determine how agents are classified into health states such as:
- Healthy
- Degraded
- Unhealthy
- Inactive
The configured thresholds are applied during the periodic health evaluation process performed by the AI Command Center.
The settings configured on this page act as the global default Agent Health configuration for the AI Command Center environment.
IB-X also supports Agent-level Health Configuration, where specific Agents can override these global settings with custom thresholds and tolerances.
When Agent-level configuration is defined, the Agent Health evaluation system uses the Agent-specific settings instead of the global configuration.
How to Access
To access the Agent Health Configuration page:
- Open the Administration Panel from the AI Command Center by clicking on the Settings (Gear) icon available in the Top Navigation Bar.
- Navigate to: Additional → Agent Health Configuration
Configuration Sections
The Agent Health Configuration page is organized into the following sections:
- Evaluation Settings
- Performance Thresholds
- Fault Tolerance Settings
Evaluation Settings
The Evaluation Settings section controls the global evaluation period used for Agent Health calculations.
Evaluation Time Period (minutes)
Specify the evaluation window, in minutes, used for computing Agent Health metrics.
The system evaluates agent execution statistics, failures, SLA breaches, runtime behavior, and operational metrics within this rolling evaluation window.
Example:
1440minutes = Last 24 Hours
This configuration directly impacts:
- Health score calculations
- Reliability analysis
- Runtime trend evaluation
- Stability measurements
- Operational health classification
The default configuration uses a 24-hour evaluation window (1440 minutes).
Performance Thresholds
The Performance Thresholds section defines the expected execution quality and runtime characteristics used during Agent Health evaluation.
Expected Success Rate
Specify the minimum expected execution success percentage for agents.
This value is used by the Reliability Score calculation to determine whether the actual success rate meets operational expectations.
Higher values indicate stricter reliability expectations.
Recommended value:
>= 95
Example:
98indicates the system expects at least 98% successful executions.
Expected Average Runtime (seconds)
Specify the expected average execution duration for agents.
This value is used during Performance Score calculations to compare actual runtime behavior against expected execution duration.
Example:
45seconds
Agents consistently exceeding the expected runtime may receive lower performance scores.
Max Runtime Deviation Percent
Specify the maximum acceptable runtime deviation percentage before runtime penalties are applied.
This setting determines how much runtime variance is tolerated before the Agent Health system considers the execution behavior degraded.
Example:
30allows runtime deviations up to 30% from the expected average runtime.
Higher deviation percentages provide greater runtime flexibility.
Lower values enforce stricter runtime consistency expectations.
Fault Tolerance Settings
The Fault Tolerance Settings section defines operational tolerance levels for failures, SLA breaches, and runtime instability.
These settings are used by Stability Score and Operational Score calculations.
Max Stuck Instances
Specify the maximum number of long-running or stuck workflow instances tolerated within the evaluation period.
Instances exceeding expected execution behavior may be classified as unstable or stuck.
This setting contributes to:
- Stability Score calculations
- Runtime stability analysis
- Unhealthy override rule evaluation
Example:
1
Lower values enforce stricter stability expectations.
SLA Breach Tolerance
Specify the maximum number of SLA breaches tolerated during the evaluation window before penalties are applied.
This setting contributes to the Operational Score calculation.
Recommended value:
<= 3
Example:
0indicates that no SLA breaches are tolerated.
Higher breach counts negatively impact the Agent Health Score.
Max Failures per Hour
Specify the maximum number of execution failures tolerated per hour before reliability penalties are applied.
This setting contributes to the Reliability Score calculation.
Example:
2
Agents exceeding the configured failure tolerance may be classified as degraded or unhealthy.
Save Configuration
Click Save Configuration to persist the updated Agent Health settings.
The updated configuration is applied to subsequent Agent Health evaluation cycles.
Notes
- Agent Health Configuration settings are applied globally across the AI Command Center environment.
- Changes to thresholds may affect Agent Health classifications and dashboard indicators.
- Very strict thresholds may increase the number of degraded or unhealthy agents.
- Recommended values help maintain balanced operational sensitivity while avoiding excessive false positives.