'Humans in the Loop: Incorporating Expert and Crowdsourced Knowledge for Prediction in Social Science'


PhD Kivan Polimis

Postdoctoral researcher at Università Bocconi's Dondena Centre for Research on Social Dynamics and Public Policy in Milan
Start date

Tuesday, 17 Apr 2018, 16:00

End date

Tuesday, 17 Apr 2018, 17:30

Aula A
International Institute of Social Studies
Kivan Polimis



Development Research Seminar by Kivan Polimis, recent PhD graduate from the University of Washington


Survey datasets are challenging for prediction since they are often wider than they are long and riddled with missingness. These characteristics pose problems for traditional machine learning approaches, which are usually applied to mostly complete data with more observations than variables.
We investigate techniques to improve feature extraction and imputing missing data for the Fragile Families Challenge. We use our own surveys to elicit priors from both experts and laypeople about the importance of different variables to different outcomes.

This strategy allows the option to trim features before prediction or incorporate domain wisdom into prediction. We find that human-informed trimming reduces predictive performance, but incorporating human priors into machine learning approaches might improve it. Separately, while some form of imputation is essential, complicated approaches do not obviously outperform simple ones. All of the techniques we document are easy to implement, which we hope will encourage further testing of their relative performance.

About the speaker

Kivan Polimis is a postdoctoral researcher at Bocconi University's Dondena Centre for Research on
Social Dynamics and Public Policy and postdoctoral affiliate at the Bocconi Institute for Data Science and Analytics. He received his PhD from the University of Washington in 2017 working with Data Science for Social Good and Microsoft to develop programming solutions to infrastructure problems in transportation and the legal system.

His research focuses on combining computational social science approaches with large scale social media to evaluate population dynamics.