Ragged Time Series, and the Data Science of Data Science Salaries

AI, ML and Data Science Intermediate

We gathered ~20 salary surveys of data scientists spanning 2009-2023. The data was collected by different stakeholders with different goals, and forms an excellent case study of ragged time series. Timing irregularities preclude standard or off-the-shelf time series analyses; instead, we modeled them using neural networks and matrix completion. We will discuss the data challenges, our approach and our forecasts for 2024-5.

Ragged time series are data collected over time with irregular start and stopping points. We will:

  • Describe the challenges of working with ragged data. ARIMA is not satisfactory!
  • Show how to model it using neural networks (supervised) and matrix completion (unsupervised) approaches.
  • Apply those methods to real world surveys of Data Scientist salaries.
  • DS salaries are relevant for understanding the labor market and how it has shifted in the last fifteen years as the job evolved.