Data valuation for machine learning [PyCon DE & PyData Berlin 2024]

Поділитися
Вставка
  • Опубліковано 2 жов 2024
  • 🔊 Recorded at PyCon DE & PyData Berlin 2024, 22.04.2024
    2024.pycon.de/...
    🎓 Watch as Miguel de Benito Delgado and Kristof Schröder discuss the importance of data valuation techniques in machine learning and demonstrate how the open-source library pyDVL can enhance data quality and model performance effortlessly.
    Speakers:
    Miguel de Benito Delgado, Kristof Schröder
    Description:
    In a talk about data valuation for machine learning, Miguel de Benito Delgado, an applied researcher at the appliedAI Initiative, and Kristof Schröder from the TransferLab team at appliedAI Institute discussed the significance of determining the value of training data points in machine learning models. The core idea of data-centric machine learning emphasizes the importance of enhancing data quality over model improvement, especially in scenarios with limited or costly data. They introduced the concept of data valuation as the process of assigning a value to each element in a training set based on its impact on the model's performance. By using the open-source library pyDVL, they showcased how to identify mislabeled or out-of-distribution samples efficiently. The talk highlighted the practical implications of data valuation in data engineering, model debugging, and development processes, emphasizing the relevance of focusing on valuable data points while eliminating less useful ones. The speakers also discussed the evolution of data valuation methods and recent advances in the field, addressing strategies for repairing or pruning corrupt data and optimizing data collection through techniques like active learning. pyDVL, a library designed for robust data valuation implementations, was presented as a valuable tool for detecting data pipeline issues and enhancing model performance in machine learning applications.
    ⭐️ About PyCon DE & PyData Berlin:
    The PyCon DE & PyData conference unite the Python, AI, and data science communities, offering a unique platform for collaboration and innovation. The PyCon DE & PyData Berlin 2024 conference, hosted in partnership with the local Berlin PyData chapter, provided an exceptional experience, fostering deeper connections within the Python community while showcasing advancements in AI and data science. Attendees enjoyed a diverse and engaging program, solidifying the event as a highlight for Python and AI enthusiasts nationwide.
    Follow us:
    • LinkedIn: / 28908640
    • X: www.x.com/pyconde
    • X: www.x.com/pyda...
    Links:
    • Conference website: pycon.de
    • Related sessions: 2024.pycon.de/p...
    The conference is organized by
    • Python Softwareverband e.V.: pysv.org
    • NumFOCUS Inc.: numfocus.org
    • Pioneers Hub gemeinnützige GmbH: pioneershub.org
    If you enjoyed this session, please like, comment, and subscribe to our channel for more insightful talks and discussions.
    Share this video with your network to spread the knowledge!
    Hashtags:
    #Python #PyConDE #PyData #OpenSource #AI #DataScience #MachineLearning #SoftwareDevelopment #LLMs #Community
    Acknowledgements:
    Special thanks to all the volunteers and sponsors who made this event possible.
    About:
    Python Softwareverband e.V.:
    PySV is a non-profit that promotes the use and development of Python in Germany through events, education, and advocacy, fostering an open Python community.
    NumFOCUS Inc.
    supports open-source scientific computing by providing financial and logistical support to key projects like NumPy and Jupyter, promoting sustainable development and collaboration.
    Pioneers Hub gemeinnützige GmbH:
    is a non-profit fostering innovation in AI and tech by connecting experts and promoting knowledge exchange through events and collaborative initiatives.
    www.pydata.org
    PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
  • Наука та технологія

КОМЕНТАРІ •