An emergent threat to the practical use of machine learning is the presence of bias in the data used to train models. Biased training data can result in models which make incorrect, or disproportionately correct decisions, or that reinforce the injustices reflected in their training data. For example, recent works have shown that semantics derived automatically from text corpora contain human biases, and found that the accuracy of face and gender recognition systems are systematically lower for people of color and women.
While the root causes of AI bias are difficult to pin down, a common cause of bias is the violation of the pervasive assumption that the data used to train models are unbiased samples of an underlying “test distribution,” which represents the conditions that the trained model will encounter in the future. Overcoming the bias introduced by the discrepancy between train and test distributions has been the focus of a long line of research in truncated Statistics. We provide computationally and statistically efficient algorithms for truncated density estimation and truncated linear, logistic and probit regression in high dimensions, through a general, practical framework based on Stochastic Gradient Descent. We illustrate the efficacy of our framework through several experiments.
(Based on joint works with Themis Gouleakis, Andrew Ilyas, Vasilis Kontonis, Sujit Rao, Christos Tzamos, Manolis Zampetakis)
Constantinos Daskalakis is a Professor of Computer Science and Electrical Engineering at MIT. He holds a Diploma in Electrical and Computer Engineering from the National Technical University of Athens, and a Ph.D. in Electrical Engineering and Computer Sciences from UC-Berkeley. His research interests lie in Theoretical Computer Science and its interface with Economics, Probability Theory, Machine Learning and Statistics. He has been honored with the 2007 Microsoft Graduate Research Fellowship, the 2008 ACM Doctoral Dissertation Award, the Game Theory and Computer Science (Kalai) Prize from the Game Theory Society, the 2010 Sloan Fellowship in Computer Science, the 2011 SIAM Outstanding Paper Prize, the 2011 Ruth and Joel Spira Award for Distinguished Teaching, the 2012 Microsoft Research Faculty Fellowship, the 2015 Research and Development Award by the Giuseppe Sciacca Foundation, the 2017 Google Faculty Research Award, the 2018 Simons Investigator Award, the 2018 Rolf Nevanlinna Prize from the International Mathematical Union, the 2018 ACM Grace Murray Hopper Award, and the 2019 Bodossaki Foundation Distinguished Young Scientists Award. He is also a recipient of Best Paper awards at the ACM Conference on Economics and Computation in 2006 and in 2013.