Agent Health Model
Objective
Agent Health measures how well an Agent is performing within an evaluation window.
The model evaluates multiple aspects of an Agent and converts each into a normalized score between 0 and 1, where:
1= Fully Healthy0= Fully Unhealthy
The health evaluation process uses configurable thresholds and tolerance settings to determine operational expectations for reliability, performance, stability, and fault tolerance.
These settings can be configured at:
- Global level — applied system-wide through Global Agent Health Configuration
- Agent level — overrides the global configuration for specific Agents
This allows organizations to tailor health evaluation behavior based on workload characteristics, business criticality, operational SLAs, and runtime expectations.
Evaluation Window
All health calculations are performed using a fixed evaluation window of the last 24 hours.
The Agent Health Overview continuously evaluates agent behavior, execution reliability, operational stability, and performance metrics based on data collected during the most recent 24-hour period.
The evaluation window is system-defined and is not configurable for the Agent Health Overview.
Health States
| Score | State |
|---|---|
| >= 80 | Healthy |
| >= 50 and < 80 | Degraded |
| < 50 | Unhealthy |
| N = 0 | Inactive |
Health Dimensions
The overall Agent Health Score is derived from multiple operational dimensions that collectively evaluate the reliability, efficiency, stability, and operational readiness of an agent.
Each dimension focuses on a specific aspect of agent behavior and contributes a weighted score toward the final health calculation.
The following health dimensions are evaluated:
- Reliability Score
- Performance Score
- Stability Score
- Operational Score
Reliability Score (Weight: 40)
What it represents
Reliability measures how consistently the agent produces successful outcomes.
Inputs
- ESR = Expected Success Rate
- ASR = Actual Success Rate
- EFH = Expected Failures per Hour
- AFH = Actual Failures per Hour
- N = Total Executions
Success Score
Measures how well the actual success rate meets the expected success rate.
- If actual success rate meets or exceeds expectation, the score is 1
- Otherwise, the score is proportional to the shortfall
SuccessScore = Min(1, ASR / ESR)
Failure Score
Measures whether the actual failure rate stays within the expected limit.
- If actual failures are within expected limits, the score is 1
- Otherwise, the score decreases proportionally
FailureScore = Min(1, EFH / AFH)
Combine
The reliability score is derived by combining success and failure behavior, with higher emphasis on success.
- Success contributes 70%
- Failure contributes 30%
ReliabilityRaw =
0.7 * SuccessScore
+ 0.3 * FailureScore
Confidence Adjustment
To avoid misleading results when execution volume is low, a confidence factor is applied.
- N = total number of executions in the evaluation window
- N_full = minimum number of executions required for full confidence
ConfidenceFactor = Min(1, N / N_full)
Recommended:
N_full = 20
The final reliability score blends the observed reliability with a neutral baseline (1), based on confidence:
- When execution count is low → score stays closer to 1 (neutral)
- When execution count is sufficient → score reflects actual reliability
ReliabilityScore =
ConfidenceFactor * ReliabilityRaw
+ (1 - ConfidenceFactor) * 1
In low-volume scenarios, the system assumes the agent is healthy until sufficient data is available.
Performance Score (Weight: 20)
What it represents
Performance measures how efficiently the agent executes compared to expected runtime.
Inputs
- ER = Expected Runtime
- AR = Actual Runtime
- DeviationAllowed = Allowed Deviation (default 20%)
Deviation
Deviation measures how much the actual runtime differs from the expected runtime.
Deviation = (AR - ER) / ER
Score
PerformanceScore evaluates how closely execution time matches expectations.
- If the agent runs within expected time → score is 1
- If the runtime exceeds expectation → score decreases proportionally
- The penalty is capped using the allowed deviation
PerformanceScore =
if AR <= ER then 1
else 1 - Min(1, Deviation / DeviationAllowed)
Stability Score (Weight: 20)
What it represents
Stability measures whether the agent is producing abnormal or long-running executions.
It focuses on identifying signs of instability such as:
- executions taking significantly longer than expected
- potential hangs, retries, or resource contention
Input
S = Count of long-running instances (> 5x expected runtime)
Threshold
The threshold defines the level at which instability is considered significant.
- Below the threshold → impact on score is gradual
- At or beyond the threshold → maximum penalty is applied
This ensures that occasional delays do not overly affect the agent's health.
S_threshold = 15
Score
The score decreases as the number of long-running instances increases.
- If there are no long-running instances → score is 1
- As long-running instances increase → score gradually decreases
- The impact grows logarithmically, avoiding sudden drops for small issues
- The penalty is capped at the defined threshold
StabilityScore =
1 - Min(1, log2(S + 1) / log2(S_threshold + 1))
Operational Score (Weight: 20)
What it represents
Operational score measures how reliably the agent functions in real-world conditions.
It captures:
- Ability to meet defined SLAs (timeliness)
- Reliability of execution triggers
- Overall operational readiness of the agent
Inputs
- B = SLA breaches
- T = Trigger failures
Thresholds
The thresholds define the acceptable limits for operational deviations.
- Below the threshold → minimal impact on score
- At or beyond the threshold → maximum penalty is applied
B_threshold = 5
T_threshold = 5
Scores
Each component is normalized to a value between 0 and 1:
- No breaches or failures → score is 1
- Increasing breaches or failures → score decreases proportionally
- Penalty is capped at the defined threshold
SLAScore = 1 - Min(1, B / B_threshold)
TriggerScore = 1 - Min(1, T / T_threshold)
Combine
Operational score combines SLA compliance and trigger reliability, with higher weight given to SLA adherence.
OperationalScore =
0.6 * SLAScore
+ 0.4 * TriggerScore
Final Health Score
The overall health of an agent is computed as a weighted combination of four dimensions:
- Reliability (40%) — correctness of outcomes
- Performance (20%) — execution efficiency
- Stability (20%) — runtime consistency
- Operational (20%) — SLA adherence and trigger reliability
HealthScore =
40 * ReliabilityScore
+ 20 * PerformanceScore
+ 20 * StabilityScore
+ 20 * OperationalScore
Hard Override Rules
Certain critical operational conditions immediately force the agent health state to Unhealthy, regardless of the computed health score.
When any hard override rule is satisfied:
- The final health score is forcibly set to
49 - The agent health state is marked as Unhealthy
- The normal weighted score calculation is ignored
These rules are intended to identify severe operational instability, complete execution failures, or critical system reliability issues that require immediate attention.
The following conditions trigger the unhealthy override behavior:
| Rule Name | Condition | Description |
|---|---|---|
| Absolute Failure Rule | No successful executions while total executions > 0 | Indicates complete execution failure during the evaluation window. |
| Critical SLA Breach Rule | SLA breaches > 2 × SLA breach threshold | Indicates severe operational or performance degradation. |
| Trigger Reliability Failure Rule | Trigger failures > 2 × trigger failure threshold | Indicates major trigger reliability, connectivity, or execution initiation issues. |
| Severe Stability Degradation Rule | Long-running or stuck instances > 3 × stability threshold | Indicates severe runtime instability or potential execution deadlocks. |
Inactive Agents
An agent is considered Inactive when it has no executions within the evaluation window.
- N represents the total number of executions of the agent in the evaluation window.
If:
N = 0
Then:
- The agent state is marked as Inactive
- The agent is excluded from system health aggregation