PaCMAP ensembles for occupational specializations in the California Cloud Workforce

Emerging Tech

The California Cloud Workforce is an initiative in LA-area community colleges to develop the skills for future employment in Cloud and DevOps roles, spearheaded by Santa Monica College. Because there are more than 20 colleges participating and because the technology and required skills evolves rapidly, we have developed an NLP ensemble using federal data to identify occupational specializations in Cloud Computing, and the relevant coursework across many different institutions.

Training and using the model consisted of several phases:

  • Extracting occupational data from the O*NET system and curricula data from Course Outlines of Record
  • Creating component models using DistilBERT, traditional NLP topic models, and the Bloom taxonomy of educational objectives
  • Ensembling the component models using PaCMAP
  • Deployment, and aggregating and visualizing results

Using PaCMAP and DistilBERT produced a more parsimonious model that can leverage both transformers architecture and domain specific knowledge, and can be calibrated for Cloud Computing or other programs, and is so easy to manage that students deploy it themselves as part of their coursework.