Ragged Time Series, and the Data Science of Data Science Salaries
AI, ML and Data Science Intermediate
We gathered ~20 salary surveys of data scientists spanning 2009-2023. The data was collected by different stakeholders with different goals, and forms an excellent case study of ragged time series. Timing irregularities preclude standard or off-the-shelf time series analyses; instead, we modeled them using neural networks and matrix completion. We will discuss the data challenges, our approach and our forecasts for 2024-5.
Ragged time series are data collected over time with irregular start and stopping points. We will:
- Describe the challenges of working with ragged data. ARIMA is not satisfactory!
- Show how to model it using neural networks (supervised) and matrix completion (unsupervised) approaches.
- Apply those methods to real world surveys of Data Scientist salaries.
- DS salaries are relevant for understanding the labor market and how it has shifted in the last fifteen years as the job evolved.