ASA Statistical Learning and Data Science
ASA Statistical Learning and Data Science
  • 34
  • 13 726
Weijie Su: How Statistics Can Advance Large Language Models: Fairness Alignment and Watermarking
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS)
December webinar: How Statistics Can Advance Large Language Models: Fairness Alignment and Watermarking
Record: December 3, 2024
Presenter: Weijie Su is an Associate Professor in the Wharton Statistics and Data Science Department and, by courtesy, in the Departments of Computer Information Science and Mathematics at the University of Pennsylvania. He is a co-director of Penn Research in Machine Learning Center. Prior to joining Penn, he received his Ph.D. in statistics from Stanford University in 2016 and bachelor's degree in mathematics from Peking University in 2011. His research interests span the statistical foundations of AI, privacy-preserving machine learning, high-dimensional statistics, and optimization. He serves as an associate editor of the Journal of Machine Learning Research, Journal of the American Statistical Association, Foundations and Trends in Statistics, and Operations Research. His work has been recognized with several awards, such as the Stanford Anderson Dissertation Award, NSF CAREER Award, Sloan Research Fellowship, IMS Peter Hall Prize, SIAM Early Career Prize in Data Science, ASA Noether Early Career Award, and the ICBS Frontiers of Science Award in Mathematics.
Abstract: Large language models (LLMs) have rapidly emerged as a transformative innovation in machine learning. However, their increasing influence on human decision-making processes raises critical societal questions. In this talk, we will demonstrate how statistics can help address two key challenges: ensuring fairness for minority groups through alignment and combating misinformation through watermarking. First, we tackle the challenge of creating fair LLMs that equitably represent and serve diverse populations. We derive a regularization term that is both necessary and sufficient for aligning LLMs with human preferences, ensuring equitable outcomes across different demographics. Second, we introduce a general statistical framework to analyze the efficiency of watermarking schemes for LLMs. We develop optimal detection rules for an important watermarking scheme recently developed at OpenAI and empirically demonstrate its superiority over the existing detection method. Throughout the talk, we will showcase how statistical insights can not only address pressing challenges posed by LLMs but also unlock substantial opportunities for the field of statistics to drive responsible generative AI development. This talk is based on arXiv:2405.16455 and arXiv:2404.01245.
For more information about or to join ASA SLDS, visit
community.amstat.org/slds/home
www.amstat.org/
Переглядів: 194

Відео

Nathaniel O’Connell: A Comparison of Methods of Cross-Validation for Small Data
Переглядів 11021 день тому
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) November webinar: A Comparison of Methods of Cross-Validation for Small Data - Practical Guidance for Prediction Model Development with limited sample sizes Record: November 19, 2024 Presenter: Dr. Nathaniel (Nate) O’Connell is an assistant professor in the Department of Biostatistics and Data Scienc...
Runze Li: High-Dimensional Statistical Inference
Переглядів 315Місяць тому
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) October webinar: High-Dimensional Statistical Inference Record: October 29, 2024 Presenter: Runze Li is the Eberly Family Chair Professor in Statistics, The Pennsylvania State University. He served as Co-Editor of Annals of Statistics from 2013 to 2015. Runze Li is a Fellow of IMS, ASA and AAAS. His ...
Jing Lei: Winners with Confidence: Discrete Argmin Inference with an Application to Model Selection
Переглядів 2662 місяці тому
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) September webinar: Winners with Confidence: Discrete Argmin Inference with an Application to Model Selection Record: September 24, 2024 Presenter: Jing Lei is Professor of Statistics & Data Science at Carnegie Mellon University. He received his Bachelor of Science degree from the School of Mathematic...
Emmanuel Candès: Statistical methods for assessing the factual accuracy of large language models
Переглядів 1,8 тис.3 місяці тому
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) August webinar: Statistical methods for assessing the factual accuracy of large language models Record: August 29, 2024 Presenter: Emmanuel Candès is the Barnum-Simons Chair in Mathematics and Statistics, professor of electrical engineering (by courtesy), and a member of the Institute of Computationa...
Bikram Karmakar: A new paradigm for causal inference in the presence of unmeasured confounders
Переглядів 2104 місяці тому
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) July webinar: A new paradigm for causal inference in the presence of unmeasured confounders by calibrating a resistant population's variance Record: July 30, 2024 Presenter: Bikram Karmakar is an Assistant Professor in the Statistics Department at University of Florida. Prof Karmakar teaches advanced...
Bei Jiang: Online Local Differential Private Quantile Inference
Переглядів 865 місяців тому
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) June webinar: Online Local Differential Private Quantile Inference Record: June 21, 2024 Presenter: Dr. Bei Jiang is an Associate Professor at the Department of Mathematical and Statistical Sciences of the University of Alberta, a Fellow and a Canada CIFAR AI chair affiliated with the Alberta Machine...
Andrej Risteski: The statistical cost of score-based losses
Переглядів 6436 місяців тому
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) December webinar: Can Statistics Save Machine Learning from a Crisis? A Regression Approach to Peer Review in NeurIPS/ICML Record: May 30, 2024 Presenter: Andrej Risteski is an Assistant Professor at the Machine Learning Department in Carnegie Mellon University. Prior to that, he was a Norbert Wiener...
Bin Yu: Why Veridical Data Science? And How?
Переглядів 2698 місяців тому
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) March webinar: Why Veridical Data Science? And How? Record: April 4, 2023 Presenter: Bin Yu is Chancellor's Distinguished Professor and Class of 1936 Second Chair in Statistics, EECS, and Computational Biology at UC Berkeley. Her research focuses on the practice and theory of statistical machine lear...
Mladen Kolar: Adaptive Stochastic Optimization with Constraints
Переглядів 2079 місяців тому
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) February webinar: Adaptive Stochastic Optimization with Constraints Record: February 27, 2024 Presenter: Mladen Kolar is a professor in the Department of Data Sciences and Operations at the USC Marshall School of Business. Mladen earned his PhD in Machine Learning from Carnegie Mellon University in 2...
Andrew Gelman: Learning from mistakes
Переглядів 1,5 тис.10 місяців тому
Links mentioned in the talk: Election poll example: web.archive.org/web/20090326143823/www.fivethirtyeight.com/2009/03/how-did-white-people-vote.html Nudge example: statmodeling.stat.columbia.edu/2009/05/11/discussion_and/ statmodeling.stat.columbia.edu/2022/06/04/pizzagate-and-nudge-an-opportunity-lost/ This talk: statmodeling.stat.columbia.edu/2024/01/23/learning-from-mistakes-my-online-talk-...
Weijie Su: A Regression Approach to Peer Review in NeurIPS/ICML
Переглядів 317Рік тому
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) December webinar: Can Statistics Save Machine Learning from a Crisis? A Regression Approach to Peer Review in NeurIPS/ICML Record: December 15, 2023 Presenter: Weijie Su is an Associate Professor at the University of Pennsylvania, with an appointment in the Wharton Statistics and Data Science Departm...
Glen Wright Colopy: The Pareto Principle in Data Science: Maximizing Value and Efficiency
Переглядів 389Рік тому
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) November webinar: The Pareto Principle in Data Science: Maximizing Value and Efficiency Record: November 30, 2023 Presenter: Glen Wright Colopy is the Head of Data Science & Statistics at Wildfell, a startup specializing in custom software and data science solutions for the biotech and life science i...
Mengye Ren: Lifelong Learning in Structured Environments
Переглядів 259Рік тому
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) October webinar: Lifelong Learning in Structured Environments Record: October 26, 2023 Presenter: Mengye Ren is an assistant professor of computer science and data science at New York University (NYU). Before joining NYU, he was a visiting faculty researcher at Google Brain Toronto working with Prof....
Krishna Balasubramanian: Optimization-based analysis of sampling algorithms
Переглядів 208Рік тому
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) September webinar: Optimization-based analysis of sampling algorithms Record: September 27, 2023 Presenter: Krishna Balasubramanian is currently an Associate Professor in the Department of Statistics at the University of California, Davis, where he also holds affiliations with the Graduate Group in A...
Jiashun Jin: The statistics triangle
Переглядів 313Рік тому
Jiashun Jin: The statistics triangle
Lucas Janson: Exact Conditional Independence Testing and Conformal Inference
Переглядів 696Рік тому
Lucas Janson: Exact Conditional Independence Testing and Conformal Inference
Hongtu Zhu: Statistical Learning Methods for Neuroimaging Data Analysis with Applications
Переглядів 369Рік тому
Hongtu Zhu: Statistical Learning Methods for Neuroimaging Data Analysis with Applications
Rina Foygel Barber: Stability of black-box algorithms
Переглядів 555Рік тому
Rina Foygel Barber: Stability of black-box algorithms
Qiqi Deng: A Brief introduction for drug development and how biostatistician can contribute
Переглядів 219Рік тому
Qiqi Deng: A Brief introduction for drug development and how biostatistician can contribute
Jason Klusowski: Pointwise Behavior of Recursive Partitioning
Переглядів 229Рік тому
Jason Klusowski: Pointwise Behavior of Recursive Partitioning
Rui Song: On causal decision making
Переглядів 1,2 тис.Рік тому
Rui Song: On causal decision making
George Michailidis: Statistical models for mixed frequency data in forecasting economic indicators
Переглядів 1,2 тис.Рік тому
George Michailidis: Statistical models for mixed frequency data in forecasting economic indicators
Hui Zou: Sparse Convoluted Rank Regression in High Dimensions
Переглядів 382Рік тому
Hui Zou: Sparse Convoluted Rank Regression in High Dimensions
Barbara Day: How to conduct a successful job search and negotiate your best offer
Переглядів 1042 роки тому
Barbara Day: How to conduct a successful job search and negotiate your best offer
Ryan Tibshirani: Delphi's Epidata Project
Переглядів 1992 роки тому
Ryan Tibshirani: Delphi's Epidata Project
Anderson Ye Zhang: Spectral Clustering
Переглядів 1552 роки тому
Anderson Ye Zhang: Spectral Clustering
Linglong Kong: Exploration and Optimization in Deep Reinforcement Learning
Переглядів 1382 роки тому
Linglong Kong: Exploration and Optimization in Deep Reinforcement Learning
Xuan Bi: Data privacy
Переглядів 1252 роки тому
Xuan Bi: Data privacy
Zhenyu Zhao: From Experimentation to Causal Learning
Переглядів 2552 роки тому
Zhenyu Zhao: From Experimentation to Causal Learning

КОМЕНТАРІ

  • @Love_Kanavi
    @Love_Kanavi 11 днів тому

    Very inspirational!

  • @Love_Kanavi
    @Love_Kanavi Місяць тому

    👍

  • @alxfgh
    @alxfgh 3 місяці тому

    Thanks for sharing!

  • @WaliBayaran
    @WaliBayaran 3 місяці тому

    Mayday Mayday. We are on harsh democracy situation in Indonesia. We need a hand. Do not miss it out message. Thank you Bro. ❤

  • @dangernoodle2868
    @dangernoodle2868 8 місяців тому

    On the topic of being an asshole when giving criticism. It's like Gelman says, you're best positioned to try and see through the delivery to the content but on the other hand as someone giving feedback it's important to be clear so that the other party doesn't have to do that work. It means that it's on everybody to try and communicate clearly but also for us to acknowledge that the rough delivery comes not from bad people but from people who are feeling something which motivated them to say something in the first place but that can pollute the message. Being shocking is useful to catch people's attention but especially once the dialogue is going you need to cut it out ASAP. But if we need to rely on shock to cut through noise then ideally you don't rely on that but rather find a way to make the environment less noisy so that consensus is enough to make the right conversation happen.

  • @BikangPan
    @BikangPan 11 місяців тому

    Very Good Presentation!

  • @wesley3684
    @wesley3684 11 місяців тому

    😒 "Promo sm"

  • @gerryg6439
    @gerryg6439 Рік тому

    Great talk!

  • @johndziak6865
    @johndziak6865 Рік тому

    Thank you for this very helpful overview!