Is AI Putting Data Science Teams Out of Work, Too?
Building a video-to-training-data platform with vibe coding tools turned many data scientists into data wranglers. What happens to the role when the custom-code tax disappears?
This talk examines the maturation of platforms serving data science and ML teams — using as a case study the building of one that turns raw video into rights-cleared training data for generative AI, robotics, autonomous vehicles, wearables, and physical AI. Data science teams across industries are reinventing largely the same data engineering pipelines, spending enormous time and money on it, precisely because end-to-end platforms do not yet exist. They are data wranglers, not data scientists. A new generation of platforms, increasingly built with vibe coding tools, is absorbing that repeated, bespoke work into the end-to-end process, with both a human UX and automated agents for each step: registering ownership, segmentation, indexing, discovery, labeling with human-in-the-loop, automated labeling fine-tuned on that output, coverage and diversity analysis, augmentation, normalization, test-script generation, packing and deployment, and benchmarking and gap analysis for more data. Data scientists are watching tools do in seconds what used to take weeks, and everyone is trying to figure out where they fit. This talk looks at three things: which parts of the process benefit most from AI and which roles humans must still play; what happens when the custom-code and integration tax goes away; and whether vibe coding is actually good enough to build production data tooling, including where the real gains are and where the LLM quietly fails in ways you have to know enough to catch. Together, these are a harder, more precise look at last year's conference question: is data science still a job or is it just getting more fun?