In the video, Andrew Ilyas discusses his research on adversarial examples and data modeling in machine learning, emphasizing the need for a holistic understanding of the entire machine learning pipeline to ensure robustness and reliability. He highlights the importance of analyzing biases in datasets, such as those found in ImageNet, and expresses his commitment to bridging the gap between academic research and industry applications as he prepares for a professorship at Carnegie Mellon University.
In the video, Andrew Ilyas, a PhD student at MIT who is about to start as a professor at CMU, discusses his research on adversarial examples and data modeling in machine learning. He emphasizes the importance of understanding the entire machine learning pipeline, from data collection to model deployment, to ensure robustness and reliability in machine learning systems. Ilyas argues for a holistic approach to machine learning, where researchers should not only focus on individual components but also consider how they interact with each other. He highlights the need for predictability in machine learning systems, aiming to understand when models will perform well and when they will fail.
Ilyas explains the concept of adversarial examples, which are small perturbations made to inputs that cause machine learning models to misbehave. He describes how these examples can be generated in both image and language contexts, leading to unintended behaviors in models. The discussion then shifts to data modeling, where Ilyas presents his work on predicting model behavior based on the training data used. He introduces the idea of treating machine learning as a mapping from training datasets to predictions, allowing researchers to analyze how changes in the dataset affect model outputs.
The conversation delves into the methodology behind data modeling, where Ilyas explains how he and his team developed a surrogate model to predict model behavior based on different training datasets. They found that this approach works surprisingly well, even for complex models, and can provide insights into the importance of specific training examples. Ilyas also discusses the connections between data modeling and other methods, such as influence functions and Shapley values, highlighting the broader implications of understanding data attribution in machine learning.
Ilyas shares insights from his research on the biases present in datasets, particularly focusing on the ImageNet dataset. He explains how the data collection process can introduce biases that affect model performance and generalization. By studying the ImageNet dataset, Ilyas and his collaborators uncovered various biases, including misclassifications and ambiguous classes, which can lead to models relying on non-robust features. This work emphasizes the need for careful data collection and analysis to ensure that machine learning models are trained on representative and unbiased datasets.
Finally, Ilyas discusses his future plans as he prepares to start a professorship at Carnegie Mellon University. He expresses his interest in continuing research that advances the understanding of machine learning models and their behavior in production settings. Ilyas is keen on collaborating with domain experts in various fields, such as robotics and the sciences, to address practical challenges in machine learning. The video concludes with Ilyas emphasizing the importance of bridging the gap between academic research and industry applications to create more robust and reliable machine learning systems.