Version: Current

Agent Health Model

Objective

Agent Health measures how well an Agent is performing within an evaluation window.

The model evaluates multiple aspects of an Agent and converts each into a normalized score between 0 and 1, where:

1 = Fully Healthy
0 = Fully Unhealthy

The health evaluation process uses configurable thresholds and tolerance settings to determine operational expectations for reliability, performance, stability, and fault tolerance.

These settings can be configured at:

Global level — applied system-wide through Global Agent Health Configuration
Agent level — overrides the global configuration for specific Agents

This allows organizations to tailor health evaluation behavior based on workload characteristics, business criticality, operational SLAs, and runtime expectations.

Evaluation Window

All health calculations are performed using a fixed evaluation window of the last 24 hours.

The Agent Health Overview continuously evaluates agent behavior, execution reliability, operational stability, and performance metrics based on data collected during the most recent 24-hour period.

The evaluation window is system-defined and is not configurable for the Agent Health Overview.

Health States

Score	State
>= 80	Healthy
>= 50 and < 80	Degraded
< 50	Unhealthy
N = 0	Inactive

Health Dimensions

The overall Agent Health Score is derived from multiple operational dimensions that collectively evaluate the reliability, efficiency, stability, and operational readiness of an agent.

Each dimension focuses on a specific aspect of agent behavior and contributes a weighted score toward the final health calculation.

The following health dimensions are evaluated:

Reliability Score
Performance Score
Stability Score
Operational Score

Reliability Score (Weight: 40)

What it represents

Reliability measures how consistently the agent produces successful outcomes.

Inputs

ESR = Expected Success Rate
ASR = Actual Success Rate
EFH = Expected Failures per Hour
AFH = Actual Failures per Hour
N = Total Executions

Success Score

Measures how well the actual success rate meets the expected success rate.

If actual success rate meets or exceeds expectation, the score is 1
Otherwise, the score is proportional to the shortfall

SuccessScore = Min(1, ASR / ESR)

Failure Score

Measures whether the actual failure rate stays within the expected limit.

If actual failures are within expected limits, the score is 1
Otherwise, the score decreases proportionally

FailureScore = Min(1, EFH / AFH)

Combine

The reliability score is derived by combining success and failure behavior, with higher emphasis on success.

Success contributes 70%
Failure contributes 30%

ReliabilityRaw =
  0.7 * SuccessScore
+ 0.3 * FailureScore

Confidence Adjustment

To avoid misleading results when execution volume is low, a confidence factor is applied.

N = total number of executions in the evaluation window
N_full = minimum number of executions required for full confidence

ConfidenceFactor = Min(1, N / N_full)

Recommended:

N_full = 20

The final reliability score blends the observed reliability with a neutral baseline (1), based on confidence:

When execution count is low → score stays closer to 1 (neutral)
When execution count is sufficient → score reflects actual reliability

ReliabilityScore =
  ConfidenceFactor * ReliabilityRaw
+ (1 - ConfidenceFactor) * 1

tip

In low-volume scenarios, the system assumes the agent is healthy until sufficient data is available.

Performance Score (Weight: 20)

What it represents

Performance measures how efficiently the agent executes compared to expected runtime.

Inputs

ER = Expected Runtime
AR = Actual Runtime
DeviationAllowed = Allowed Deviation (default 20%)

Deviation

Deviation measures how much the actual runtime differs from the expected runtime.

Deviation = (AR - ER) / ER

Score

PerformanceScore evaluates how closely execution time matches expectations.

If the agent runs within expected time → score is 1
If the runtime exceeds expectation → score decreases proportionally
The penalty is capped using the allowed deviation

PerformanceScore =
  if AR <= ER then 1
  else 1 - Min(1, Deviation / DeviationAllowed)

Stability Score (Weight: 20)

What it represents

Stability measures whether the agent is producing abnormal or long-running executions.

It focuses on identifying signs of instability such as:

executions taking significantly longer than expected
potential hangs, retries, or resource contention

Input

S = Count of long-running instances (> 5x expected runtime)

Threshold

The threshold defines the level at which instability is considered significant.

Below the threshold → impact on score is gradual
At or beyond the threshold → maximum penalty is applied

This ensures that occasional delays do not overly affect the agent's health.

S_threshold = 15

Score

The score decreases as the number of long-running instances increases.

If there are no long-running instances → score is 1
As long-running instances increase → score gradually decreases
The impact grows logarithmically, avoiding sudden drops for small issues
The penalty is capped at the defined threshold

StabilityScore =
  1 - Min(1, log2(S + 1) / log2(S_threshold + 1))

Operational Score (Weight: 20)

What it represents

Operational score measures how reliably the agent functions in real-world conditions.

It captures:

Ability to meet defined SLAs (timeliness)
Reliability of execution triggers
Overall operational readiness of the agent

Inputs

B = SLA breaches
T = Trigger failures

Thresholds

The thresholds define the acceptable limits for operational deviations.

Below the threshold → minimal impact on score
At or beyond the threshold → maximum penalty is applied

B_threshold = 5
T_threshold = 5

Scores

Each component is normalized to a value between 0 and 1:

No breaches or failures → score is 1
Increasing breaches or failures → score decreases proportionally
Penalty is capped at the defined threshold

SLAScore = 1 - Min(1, B / B_threshold)
TriggerScore = 1 - Min(1, T / T_threshold)

Combine

Operational score combines SLA compliance and trigger reliability, with higher weight given to SLA adherence.

OperationalScore =
  0.6 * SLAScore
+ 0.4 * TriggerScore

Final Health Score

The overall health of an agent is computed as a weighted combination of four dimensions:

Reliability (40%) — correctness of outcomes
Performance (20%) — execution efficiency
Stability (20%) — runtime consistency
Operational (20%) — SLA adherence and trigger reliability

HealthScore =
  40 * ReliabilityScore
+ 20 * PerformanceScore
+ 20 * StabilityScore
+ 20 * OperationalScore

Hard Override Rules

Certain critical operational conditions immediately force the agent health state to Unhealthy, regardless of the computed health score.

When any hard override rule is satisfied:

The final health score is forcibly set to 49
The agent health state is marked as Unhealthy
The normal weighted score calculation is ignored

These rules are intended to identify severe operational instability, complete execution failures, or critical system reliability issues that require immediate attention.

The following conditions trigger the unhealthy override behavior:

Rule Name	Condition	Description
Absolute Failure Rule	No successful executions while total executions > 0	Indicates complete execution failure during the evaluation window.
Critical SLA Breach Rule	SLA breaches > 2 × SLA breach threshold	Indicates severe operational or performance degradation.
Trigger Reliability Failure Rule	Trigger failures > 2 × trigger failure threshold	Indicates major trigger reliability, connectivity, or execution initiation issues.
Severe Stability Degradation Rule	Long-running or stuck instances > 3 × stability threshold	Indicates severe runtime instability or potential execution deadlocks.

Inactive Agents

An agent is considered Inactive when it has no executions within the evaluation window.

N represents the total number of executions of the agent in the evaluation window.

If:

N = 0

Then:

The agent state is marked as Inactive
The agent is excluded from system health aggregation

Objective​

Evaluation Window​

Health States​

Health Dimensions​

Reliability Score (Weight: 40)​

What it represents​

Inputs​

Success Score​

Failure Score​

Combine​

Confidence Adjustment​

Performance Score (Weight: 20)​

What it represents​

Inputs​

Deviation​

Score​

Stability Score (Weight: 20)​

What it represents​

Input​

Threshold​

Score​

Operational Score (Weight: 20)​

What it represents​

Inputs​

Thresholds​

Scores​

Combine​

Final Health Score​

Hard Override Rules​

Inactive Agents​

Objective

Evaluation Window

Health States

Health Dimensions

Reliability Score (Weight: 40)

What it represents

Inputs

Success Score

Failure Score

Combine

Confidence Adjustment

Performance Score (Weight: 20)

What it represents

Inputs

Deviation

Score

Stability Score (Weight: 20)

What it represents

Input

Threshold

Score

Operational Score (Weight: 20)

What it represents

Inputs

Thresholds

Scores

Combine

Final Health Score

Hard Override Rules

Inactive Agents