Active Probe

How to Use the AGILITY Active Probe

The AGILITY Active Probe is designed to automate and optimize network diagnostics by triggering analysis, collecting results, and tracking performance metrics. Here’s how to effectively use this feature in your network diagnostic workflows.

Prerequisites:

AGILITY Platform access with appropriate user roles.
Basic understanding of network diagnostics, SFTP, and S3 operations.
Installed dependencies for the Active Probe (see Dependencies section).

1. Setting Up Active Probe

To begin using the Active Probe, ensure that the following components are set up:

AGILITY API Access: Ensure that you have proper access to the AGILITY API endpoints required to collect analysis results. You will need to configure the probe to interact with these endpoints.
File Storage Setup: The Active Probe requires access to file storage systems like MinIO or a dedicated storage CoS for storing network capture files (PCAPs) that will be analyzed.
Telemetry Setup: Make sure your OpenTelemetry (OTEL) integration is correctly configured for metrics collection and forwarding to your monitoring systems.

2. Configuring the Active Probe

A. Setting the File Locations for Analysis

Define File Sources: Choose the set of files (PCAPs) you want to analyze. These files can either be:
- Stored in MinIO or
- Located on dedicated storage.
Selecting Files: Ensure that the files selected for analysis are in a supported format (e.g., PCAP) and accessible to the Active Probe module.

B. Schedule Configuration

The Active Probe can be configured to run at specific times or intervals. You can set up a schedule for triggering the analysis:

Define a Schedule: Specify how often you want the probe to run. You can set a recurring schedule or trigger the analysis on-demand.
Modify the Schedule: Update the schedule via the configuration interface to align with your network’s diagnostic needs.

C. Setting Up the API for Results Collection

Ensure that your AGILITY platform API is configured to provide results for the /v1/analysis and /v1/analysis/{analysis_id}/summary endpoints. The Active Probe will call these endpoints to retrieve the analysis results and compare them with expected outcomes.

3. Running the Active Probe

Once set up, the Active Probe will trigger the analysis automatically or on a defined schedule. Here's how it operates:

Trigger the Analysis: When the Active Probe is activated, it triggers the analysis by performing an SFTP or S3 copy operation to initiate the diagnostic process.
Collecting Analysis Results:
- The Active Probe uses the /v1/analysis endpoint to fetch the analysis results.
- It then uses the /v1/analysis/{analysis_id}/summary to gather detailed information about the test.
Comparing Results: The collected results are compared against pre-defined expected results.
- If the status of the analysis is “processing” or “warning”, the probe considers it a success.
- If no call flows are found, or the status is something other than “processing” or “warning,” the probe logs it as a failure.
Incrementing Counters: The probe maintains counters for:
- Validated Results (successful analyses).
- Failed Results (failed analyses). These counters are labeled with key identifiers, such as model, pcap_name, call_id, and cause_for_failure.

4. Monitoring and Viewing Results

A. Dashboard for Results

A dedicated dashboard provides a clear view of the Active Probe’s performance. The dashboard displays:

Validated Results: The number of successful analysis attempts.
Failed Results: The number of failed attempts, with labels such as model, call ID, and cause for failure.

You can view these metrics in the AGILITY monitoring interface or in a connected central monitoring system.

B. OpenTelemetry Integration

The Active Probe sends real-time metrics to OpenTelemetry (OTEL). These metrics are sent via the following labels:

active_probe_attempt_count: Tracks the total number of analysis attempts.
active_probe_analysis_count: Tracks the total number of analyses completed successfully.

These metrics are forwarded to your central or local monitoring systems, depending on your configuration.

5. Handling Failures and Retries

If the analysis fails or the probe does not retrieve the expected results, the following logic is implemented:

Retry Logic: The Comparator Module retries the analysis fetch operation until the analysis ID is found and the status is no longer "processing". This retry logic ensures the probe attempts to fetch results multiple times to ensure accuracy.
Failure Marking: If the analysis ID cannot be found within the retry window, the analysis is marked as a failure, and the appropriate failure counters are updated.

6. Advanced Features

A. Pipeline Automation

The Active Probe supports Prefect Pipeline Automation. This feature allows you to:

Automate Pipeline Creation: Automatically generate Prefect pipelines for the Active Probe’s operations, ensuring seamless integration with your broader diagnostic workflows.
Configure Pipelines: Define and schedule the pipelines to automate the process of triggering, collecting, and comparing analysis results without manual intervention.

B. Custom Telemetry Labels

You can customize the telemetry labels used to track and monitor the probe's performance. These labels allow for better classification and filtering of probe results in your monitoring dashboards.

7. Metrics and Analytics

The following metrics are exposed through OpenTelemetry:

active_probe_attempt_count: The total number of attempts made to collect and compare analysis results.
active_probe_analysis_count: The total number of successful analysis comparisons.

These metrics include relevant labels that provide context to the results, such as model, pcap_name, call_id, and cause_for_failure.

8. Troubleshooting

In case of issues with the Active Probe, the following steps can help identify and resolve problems:

Check SFTP/S3 Configuration: Ensure that the file locations are correct and accessible by the Active Probe.
Review API Connectivity: Verify that the AGILITY API endpoints are responding and returning the expected data.
Monitor Telemetry Output: Check the OpenTelemetry output to ensure that metrics are being sent correctly to the monitoring systems.
Examine Logs: Review logs for errors related to file selection, analysis collection, and comparison failures. Logs can be extended to additional outputs like Loki or OTEL if needed.

Dependencies

Ensure that the following libraries are installed for proper operation:

APScheduler==3.10.4
requests==2.31.0
PyYAML==6.0
opentelemetry-api==1.21.0
opentelemetry-sdk==1.21.0
opentelemetry-exporter-otlp==1.21.0
opentelemetry-exporter-otlp-proto-grpc==1.21.0
opentelemetry-instrumentation==0.42b0
boto3==1.26.96

Conclusion

The AGILITY Active Probe is a powerful tool for ensuring efficient, real-time performance monitoring. With flexible configuration options, automated scheduling, integrated monitoring via OpenTelemetry, and detailed failure tracking, the Active Probe provides a robust solution for maintaining optimal network performance and ensuring high availability of services.

By following the steps above, you can quickly set up, configure, and use the Active Probe to automate your network diagnostics and gain deep insights into network health.