Machine Learning with Time-Series Data: From Notebook to Production

Data Engineering Beginner

Follow a real project, classifying datacenter network traffic into operational zones, from labeling raw data through time-series feature engineering to a deployed pipeline.

This talk follows the path of a real-world project: classifying datacenter network traffic into operational zones (peak, off-peak, super off-peak) using time-series data and gradient-boosted models. The goal of the project is to provide a basis for dynamic threshold determination for network traffic across multiple data centers.

First comes data labeling from scratch while developing and comparing strategies (capacity thresholds, hour-of-day percentiles, daily percentiles) for automated labeling and visualization.

Next is time-series feature engineering in practice: cyclical encoding for spans of time, lag features, rolling statistics using strictly prior data, and normalization strategies.

Finally comes the transition from experimentation to production: containerizing the development environment, refactoring notebooks into reusable components, building a configurable training pipeline, and deploying trained models to production.

Across all three areas, the focus is on the decisions that determine whether an ML project delivers value, and the lessons learned in the trenches. Attendees will leave with a practical framework for moving machine learning projects from prototype to production.