Leveraging Neighborhood Socioeconomic Data to Predict Individual Experiences: A Health Care Application

ML & Data Science Intermediate

This presentation will provide an overview of publicly available neighborhood deprivation and vulnerability indices, and will illustrate the use of these data in a machine learning model to predict hospital patient socioeconomic experiences. Takeaways will be especially relevant to those interested in social equity and algorithmic bias and fairness.

Understanding individuals’ social and economic experiences is important in fields such as health, education, marketing, and more. When individual-level data is not available to characterize socioeconomic experiences, organizations often use neighborhood-level data as a proxy. However, little is known about the extent to which neighborhood data is actually predictive of individual experiences. This analysis used machine learning to predict hospital patient social risk based on four publicly available neighborhood deprivation and vulnerability indices. The dataset included approximately ten thousand patients who self-reported their social risk in a screening questionnaire, and whose addresses were geocoded to the census block group level. Multiple classification models were compared, and a final model was selected based on accuracy metrics. Model performance was low-to-moderate (AUC < 0.7), and error was higher for certain demographic groups. Additionally, while some neighborhood data items were highly predictive of individual-level social risk, others were less predictive. Results indicate that neighborhood-level data may be useful for predicting individual-level social and economic experiences, but the quality of predictions may depend on the neighborhood-level features included in the model and on bias measurement and mitigation techniques employed.