Set up automated data ingestion
Note: Autoloader is currently using StreamSets to ingest data. This will be replaced by a custom solution in the future.
Autoloader
Autoloader is a tool used for automated data ingestion, which provides an alternative to manually uploading data through the UI. The excerpt provides an example command for creating a pipeline in StreamSets, which includes various parameters such as the SFTP or S3 server URL, credentials, file patterns, and pipeline ID. The command is intended to be run inline using kubectl exec
in a Kubernetes deployment.
Prerequisites
Kubectl
AGILITY application installed
High Level Diagram
Create a new source pipeline
The Autoloader supports two types of sources: SFTP and S3. The following sections describe how to create a new pipeline for each type of source.
Common parameters
Environment Variable | Description | Default Value |
AGILITY_NS | Namespace where AGILITY is deployed |
|
STREAMSETS_SECRET_NAME | Secret name with Streamsets admin user password |
|
STREAMSETS_LOGIN | User to access Streamsets |
|
STREAMSETS_PASSWORD | If provided, uses this as password to create pipeline | If not provided, get password from |
STREAMSETS_URL | URL to access Streamsets API |
|
AGILITY_BACKEND_URL | Internal backend server URL. Leave the default value. |
|
SOURCE_TYPE | Source type where AGILITY will pull the file from (values: SFTP, S3, default: SFTP) |
|
PIPELINE_ID | ID of the SFTP pipeline. Should contain alphanumeric, dash, or underscore characters. MANDATORY | <empty> |
AGILITY_SERVICE_KEYS | Comma-separated list of services/models needed for given auto-loader instance. Note: this is required property in case auto service detection is not enabled; if auto service detection is enabled, an empty value for service keys will result in auto service selection | <empty> |
AGILITY_USERNAME | User under which predictions will be processed and made visible. Other users will not see these predictions. | <empty> |
Service Keys
Note: Values for these keys might be different depending on customizations. Generic values can be used as default unless specified by B-Yond customer support.
service_key | service_name | Description |
4g_5g_nsa_connectivity | 4G & 5G NSA Connectivity | Supported Protocols: S1AP, GTPv2, DIAMETER, HTTP2, PFCP |
5g_sa_connectivity | 5G SA Connectivity | Captures AMF, SMF, UDM and PCF |
5g_sa_data | 5G SA Data | Supported Protocols: TCP, ICMPv4, ICMPv6, DNS, HTTP2, PFCP, NGAP, GTPV2 |
5g_sa_mobility | 5G SA Mobility | Supported Protocols: NGAP, NAS-5GS, HTTP2, PFCP, GTPV2, S1AP, SIP, DIAMETER |
volte | Voice & Video over LTE | Captures MME, S-GW, P-GW, OCS, HSS, S-CSCF, TAS, IBCF, MGCF |
vonr | Voice & Video over New Radio | Captures GNodeB, AMF, SMF, UPF, UDM, PCF, I/S, TAS |
To create sftp pipeline in StreamSets run the following command (all inline)
Pipeline parameters
Environment Variable | Description | Default Value |
| Set the value to SFTP | always set |
| Source of files, includes SFTP server and path | <empty> |
| Source SFTP username | <empty> |
| Source SFTP password, if SFTP_PRIVATE_SSH_KEY or SFTP_PRIVATE_SSH_KEY_FILE is provided, this parmaeter is discarded | <empty> |
| Private ssh key for SFTP_USERNAME. When provided Private key authentication mode will be used instead of password authentication | <empty> |
| File containing private ssh key for SFTP_USERNAME. When provided Private key authentication mode will be used instead of password authentication | <empty> |
| File pattern to download files |
|
Environment Variable | Description |
| Set the value to |
| Internal password for StreamSets. Leave the value as specified in the examples. |
| Full URL of the SFTP server where files to ingest are located. |
| User name to connect to the SFTP server. |
| Private key to connect to the SFTP server. |
| Password to connect to the SFTP server. |
| File pattern to filter out files that are not PCAP files. Use the pattern |
| ID of the SFTP pipeline. Should contain alphanumeric, dash, or underscore characters. |
| Internal backend server URL. Leave the default value. |
| User under which predictions will be processed and made visible. Other users will not see these predictions. |
| Comma-separated list of services/models needed for given auto-loader instance. |
Examples
This example is to connect to an SFTP server using username and private-key type of credentials:
kubectl exec -n agility agility-autoloader-0 -- bash -c "cd /byond && \
AGILITY_NS=agility \
STREAMSETS_PASSWORD=$(kubectl get secrets -n agility agility-autoloader-secrets -o jsonpath="{.data.admin\.password}" | base64 -d) \
SOURCE_TYPE=SFTP \
SFTP_RESOURCE_URL=sftp://sftp.b-yond.com:22/private/upload/sample \
SFTP_USERNAME=my-username \
SFTP_PRIVATE_SSH_KEY='$(cat ../sftp_key)' \
SFTP_FILE_PATTERN='*.(pcap|pcapng|cap|zip)' \
PIPELINE_ID=test-pipeline \
AGILITY_BACKEND_URL=http://agility-backend \
AGILITY_USERNAME='agility-admin@b-yond.com' \
./create_replace.sh"
This other example is to connect to an SFTP server using username and password type of credentials:
kubectl exec -n agility agility-autoloader-0 -- bash -c "cd /byond && \
AGILITY_NS=agility \
STREAMSETS_PASSWORD=$(kubectl get secrets -n agility agility-autoloader-secrets -o jsonpath="{.data.admin\.password}" | base64 -d) \
SOURCE_TYPE=SFTP \
SFTP_RESOURCE_URL=sftp://sftp.b-yond.com:22/private/upload/sample \
SFTP_USERNAME=my-username \
SFTP_PASSWORD=my-password \
AGILITY_SERVICE_KEYS=comma-seperated-service-list \
SFTP_FILE_PATTERN='*.(pcap|pcapng|cap|zip)' \
PIPELINE_ID=test-pipeline \
AGILITY_BACKEND_URL=http://agility-backend \
AGILITY_USERNAME='agility-admin@b-yond.com' \
./create_replace.sh"
Adjust the values specially for SFTP_RESOURCE_URL
, SFTP_USERNAME
, SFTP_PRIVATE_SSH_KEY
, and AGILITY_USERNAME
accordingly.
Create a new S3 source pipeline in cluster
To create a new S3 source pipeline, ensure you have the necessary AWS S3 configuration details. The script will configure and deploy a pipeline to pull data from a specified S3 bucket.
Pipeline parameters
Parameter | Description | Default Value |
| Set the value to | always set |
| URL of the S3 service endpoint | <empty> |
| AWS access key ID for S3 authentication | <empty> |
| AWS secret access key for S3 authentication | <empty> |
| Name of the S3 bucket from which to pull data | <empty> |
| Prefix of the folder in the S3 bucket (optional), examples : US/East/MD/ , US/ | <empty> |
| Pattern to match files in the S3 bucket, examples: |
|
| Order to read files (TIMESTAMP or LEXICOGRAPHICAL) | <empty> |
| JSON template with pipeline definition. |
|
Examples
kubectl exec -n cv agility-autoloader-0 -- bash -c "cd /byond && \
AGILITY_NS=cv \
STREAMSETS_PASSWORD=$(kubectl get secrets -n cv agility-autoloader-secrets -o jsonpath="{.data.admin\.password}" | base64 -d) \
SOURCE_TYPE=S3 \
S3_ENDPOINT_URL=https://s3.us-east-2.amazonaws.com \
S3_ACCESS_KEY_ID=your-access-key-id \
S3_SECRET_ACCESS_KEY=your-secret-access-key \
S3_BUCKET=my-bucket \
S3_FOLDER_PREFIX=US/ \
S3_FILE_PATTERN=**/prod/*.pcap \
S3_READ_ORDER=TIMESTAMP \
PIPELINE_ID=test-pipeline \
AGILITY_BACKEND_URL=http://agility-backend.cv \
AGILITY_USERNAME=someone@b-yond.com \
./create_replace.sh"
Adjust the values specially for S3_ENDPOINT_URL
, S3_ACCESS_KEY_ID
, S3_SECRET_ACCESS_KEY
, S3_BUCKET
, S3_FOLDER_PREFIX
, S3_FILE_PATTERN
, S3_READ_ORDER
, and AGILITY_USERNAME
accordingly.
Start the pipeline
Currently there is no shell command to start the pipeline.
To start the pipeline, port-forward to StreamSets UI and hit the start button.
After creating the pipeline, port-forward to agility-autoloader-0 with this command:
Go to
localhost:18630
to view this page:The user name should be
admin.
Retrieve the password using this command:Once logged in, the dashboard displays:
Go to the pipeline you have created and view the pipeline page.
To start the pipeline click Start.
View Statistics
While running the pipeline, select Summary to view the pipeline statistics.
Check logs
To check the logs while running the pipeline, run this command:
View analyses in AGILITY
In the AGILITY UI, the analyses being created under your user name are shown in the My Analyses tab:
Stop the pipeline
To stop the pipeline, click the stop button:
Update the pipeline parameters
To update the parameters of that pipeline, go to Configuration>Parameters.