1. Baseline Analytics#
Activity name | Baseline Analytics |
Activity ID | 69 |
Short Description | Detect anomalies through training and providing a baseline |
Difficulty | Beginner |
Tools used | |
Topology Nodes | PE3 |
References | NSP baseline |
This guided activity walks you through detecting port flaps using telemetry data and statistical anomaly detection techniques. The workflow collects port telemetry from routers, applies Z-score-based anomaly detection after training a baseline.
1.1 Prerequisites#
- Basic understanding of NSP
- Understanding on time-series data and statistical algorithm
- Knowing the common models in NSP and filter can be used on these model to query the appropriate router resource.
1.2 Key Learnings#
By the end of this activity, you'll be able to create and train a baseline for anamoly detection.
1.3 Terminology#
Baseline Training and Z-score Detection
Train on historical port status data to establish a statistical norm. Use Z-score to flag deviations beyond a defined threshold, indicating a flap event.
Statistical Terminology
Understand how statistical algorithm are used for anomaly detection
1.4 High-level Data Flow#
-
Define the Collection Interval
This is how often you collect data. Example: Every 15 minutes, collect metrics like received-octets.
-
Filter the NSP resource using a path-filter
Select which resources to monitor using a filter path. Example: Filter for ports with traffic, specific interfaces, or NE-IDs.
-
Aggregate the data based on counter, gauge or sample
You collect metrics like: * counters (e.g., received-octets), * gauges (e.g., CPU usage), * samples (e.g., latency).
-
Seasonality and window allows us to calculate the mean and variance
You define a season (e.g., 1 week) and split it into windows (e.g., 15-min intervals). For a 1-week season with 15m windows, there are: 7 days x 24 hours x 4 windows/hour = 672 windows Each window has its own baseline value — the expected metric for that time slot based on past values.
-
Update the Baseline
After each new measurement, the system updates the expected value for that window using a baseline algorithm . So, the value you see for Tuesday 08:00–08:15 is based on past Tuesdays at the same time.
-
Apply one of the anamoly algorithm to detect anamoly based on the mean and variance
Once we have a baseline (expected mean and variance), we apply an anomaly detection algorithm. If a new value deviates too far from the expected value (e.g., outside 3×standard deviation), it's flagged as an anomaly.
1.5 Tasks#
You should read these tasks from top-to-bottom before beginning the activity.
It is tempting to skip ahead but tasks may require you to have completed previous tasks before tackling them.
1.5.1 Configure a Baseline Analytics#
You can select multiple resources
Warning
Make sure to select resources from routers that belong to your group. Access control is not enforced.
Click on PLOT
Warning
With the settings provided (collection: 30s, window: 1min, season: 5min) you should wait around 10min to see results!
1.5.2 Generate some traffic#
Use high frequency ping (rapid, large mtu) to generate some traffic to ensure hitting the baseline. Continue monitoring the plotter.
Note
Based on the seasonality and window length, the detector rule will begin to apply. You need to configure the comparator to detect values less than a threshold close to 0. This helps identify anomalies such as port flapping, where there is a sudden drop in the metric value toward zero.
Anamolies are pushed into a kafka topic which can be used to trigger email notification or closed of automation.
SUMMARY
Name: Detect Port Flaps
Collection Interval: 30 secs
Sesonality: 5 mins
Window Duration: 1 mins
Telemetry Type: /interfaces/interface
Object Filter: /nsp-equipment:network/network-element[ne-id={{router-id}}]
Detector:
Threshold: 0.0001
Comparison: Less than
Algorithm: Z-score Absolute
Evaluate What: Value
Evaluate When: End of Window
1.5.3 Next steps#
Here are some ideas on how to continue:
- What's the difference between indicators and baseline?
- Add baselines for other KPIs like CPU and memory.