Skip to main content
Version: Current

Agent Health Configuration

Overview

The Agent Health Configuration dialog allows you to define health monitoring thresholds and tolerance levels for an individual Agent. This is applicable for both Server and Local Agents.

Agent-level configuration overrides the Global Agent Health Configuration configured in the Administration section of the AI Command Center.

This allows organizations to apply custom operational expectations for specific Agents based on their business criticality, runtime characteristics, workload patterns, or SLA requirements.

The dialog also provides the ability to:

  • View the current Agent Health Score
  • Recalculate the Agent Health Score
  • Reset configuration back to system defaults

How to Access

To open the Agent Health Configuration dialog:

  • Navigate to the Agents OR Local Agents -> Agents page
  • Locate the required Agent
  • Click the Context Menu (...)
  • Select Health Configuration

The Health Configuration dialog for the selected Agent is displayed.


Configuration Sections

The Agent Health Configuration dialog is organized into the following sections:

  • Evaluation Settings
  • Performance Thresholds
  • Fault Tolerance Settings
  • Health Score Summary

Evaluation Settings

The Evaluation Settings section controls the evaluation window used for computing Agent Health metrics for the selected Agent.


Evaluation Time Period (minutes)

Specify the evaluation window, in minutes, used for computing the Agent Health metrics.

The system evaluates the Agent behavior within this rolling evaluation period.

This affects:

  • Reliability analysis
  • Runtime measurements
  • Failure calculations
  • Stability analysis
  • Operational scoring

Example:

  • 1440 minutes = Last 24 Hours
note

When configured at the Agent level, this value overrides the Global Agent Health Configuration.


Performance Thresholds

The Performance Thresholds section defines the expected runtime and reliability characteristics for the selected Agent.


Expected Success Rate

Specify the minimum expected success percentage for the Agent.

This value is used during Reliability Score calculations.

Recommended value:

  • >= 95

Example:

  • 98

Higher values enforce stricter reliability expectations.


Expected Average Runtime (seconds)

Specify the expected average execution duration for the Agent.

This value is used during Performance Score calculations.

Example:

  • 45 seconds

Agents consistently exceeding the expected runtime may receive lower performance scores.


Max Runtime Deviation Percent

Specify the maximum allowed runtime deviation percentage before runtime penalties are applied.

This setting controls acceptable runtime variance tolerance.

Example:

  • 30

Lower values enforce stricter runtime consistency expectations.


Fault Tolerance Settings

The Fault Tolerance Settings section defines the operational tolerances for failures, SLA breaches, and runtime instability.


Max Stuck Instances

Specify the maximum number of long-running or stuck instances tolerated for the selected Agent.

This contributes to:

  • Stability Score calculations
  • Runtime stability analysis
  • Unhealthy override evaluation

Example:

  • 1

SLA Breach Tolerance

Specify the maximum number of SLA breaches tolerated during the evaluation period.

This contributes to the Operational Score calculation.

Recommended value:

  • <= 3

Example:

  • 0

Higher breach counts negatively impact the Agent Health Score.


Max Failures per Hour

Specify the maximum number of execution failures tolerated per hour.

This contributes to the Reliability Score calculation.

Example:

  • 2

Agents exceeding the configured tolerance may be classified as degraded or unhealthy.


Health Score Summary

The bottom section of the dialog displays the current health evaluation summary for the selected Agent.

The displayed information includes:

  • Current Agent Health Score
  • Current Health State
  • Health classification indicator

Possible health states include:

  • Healthy
  • Degraded
  • Unhealthy
  • Inactive

Calculate Score

Click Calculate Score to recompute the current Agent Health Score using the latest execution and operational metrics.

This operation allows administrators to immediately validate the effect of configuration changes before saving them.


Reset to System Default

Click Reset to System Default to remove the Agent-specific Health Configuration and revert back to the Global Agent Health Configuration.

After reset, the Agent will inherit all threshold and tolerance settings from the system-wide configuration.


Save Configuration

Click Save Configuration to persist the Agent-specific Health Configuration.

Once saved, the selected Agent will use these custom settings during subsequent health evaluation cycles.


Cancel

Click Cancel to close the dialog without saving changes.


Notes

  • Agent-level Health Configuration overrides the Global Agent Health Configuration.
  • Different Agents may require different operational tolerances depending on workload characteristics and business criticality.
  • Excessively strict thresholds may result in more frequent degraded or unhealthy classifications.
  • Health scores are recalculated periodically by the AI Command Center health evaluation engine.