How to Use the AGILITY Active Probe
The AGILITY Active Probe is designed to automate triggering analysis, collecting results, and tracking performance metrics.
Prerequisites:
Grafana access for monitoring and visualization.
The Active Probe’s dependencies are automatically installed during deployment.
Running the Active Probe
Once deployed, the Active Probe runs automatically:
Automatic Triggering: The probe triggers diagnostic analysis automatically according to a predefined schedule set during deployment. This schedule is expressed in minutes and is configured via the
EXECUTION_SCHEDULE
environment variable:name: EXECUTION_SCHEDULE value: {{ (index .Values "active-probe" "interval") | default "15" | quote }}
Analysis Collection: The Active Probe collects analysis results by accessing AGILITY API endpoints (e.g.,
/v1/analysis
and/v1/analysis/{analysis_id}/summary
).Result Validation: The probe compares the collected results with expected outcomes, logging the success or failure based on the analysis status.
Monitoring and Viewing Results
Real-time monitoring and result tracking are integrated seamlessly with AGILITY:
Dashboard for Results: A dedicated dashboard shows:
Validated Results (successful analyses).
Failed Results, with labels such as model, call ID, and cause for failure.
OpenTelemetry Integration: Metrics like
active_probe_attempt_count
andactive_probe_analysis_count
are forwarded to your monitoring systems, providing real-time insights into probe performance.
Handling Failures and Retries
In addition, probe is designed to gracefully handle errors in all stages of operation.
Retry Logic: The probe automatically retries the analysis collection process if initial attempts fail.
Failure Logging: If the analysis cannot be completed successfully within the retry window, the failure is logged, and counters are updated accordingly.
Metrics and Analytics
The Active Probe integrates with OpenTelemetry to provide detailed metrics on network performance:
active_probe_attempt_count: Total number of analysis attempts.
active_probe_analysis_count: Total number of completed analyses.
Additionally, the probe sends the status of each analysis (success, failure, and other statuses) as labels within these metrics, allowing for more granular tracking and analysis of probe activity.
Troubleshooting
For troubleshooting, follow these steps:
Minio S3 Configuration: Ensure that file locations are accessible by the Active Probe.
API Connectivity: Verify that the AGILITY API is responding and providing expected data.
Telemetry Monitoring: Confirm that OpenTelemetry metrics are correctly forwarded to monitoring systems.
Log Review: Examine logs for errors related to analysis collection or comparison failures.
Add Comment