talk-data.com talk-data.com

Event

Data Council 2023

2026-01-10 YouTube Visit website ↗

Activities tracked

5

Filtering by: Python ×

Sessions & talks

Showing 1–5 of 5 · Newest first

Search within this event →
Automatically Fix Data Issues & Label Errors in Most ML Datasets | Cleanlab

Automatically Fix Data Issues & Label Errors in Most ML Datasets | Cleanlab

2023-05-11 Watch
video
Curtis Northcutt (Cleanlab)

ABOUT THE TALK: In this talk, we discuss cleanlab open-source (github.com/cleanlab/cleanlab) and Cleanlab Studio (https://cleanlab.ai/studio). Cleanlab open-source is a fast-growing python framework for data-centric AI that automatically detects issues in ML datasets. Cleanlab Studio is a no-code web interface used by universities and fortune 500 companies for dataset issue detection and fixing. Cleanlab algorithms have theoretical support for improved accuracy on real-world, messy data.

ABOUT THE SPEAKER: Curtis Northcutt is an American computer scientist and entrepreneur focusing on machine learning and AI to empower people. He is the CEO and co-founder of Cleanlab, an AI software company that improves machine learning model performance by automatically fixing data and label issues in real-world, messy datasets. Curtis completed his PhD at MIT where he invented Cleanlab’s algorithms for automatically finding and fixing label issues in any dataset.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Hierarchical Forecasting in Python | Nixtla

Hierarchical Forecasting in Python | Nixtla

2023-05-11 Watch
video
Max Mergenthaler (Nixtla)

A vast amount of time series datasets are organized into structures with different levels or hierarchies of aggregation.

In this talk, we introduce the open-source Hierarchical Forecast library, which contains different reconciliation algorithms, preprocessed datasets, evaluation metrics, and a compiled set of statistical baseline models. This Python-based framework aims to bridge the gap between statistical modeling and Machine Learning in the time series field.

ABOUT THE SPEAKER: Max Mergenthaler is the CEO and Co-Founder of Nixtla, a time-series research and deployment startup. He is also a seasoned entrepreneur with a proven track record as the founder of multiple technology startups. With a decade of experience in the ML industry, he has extensive expertise in building and leading international data teams. Max has also made notable contributions to the Data Science field through his co-authorship of papers on forecasting algorithms and decision theory.

👉 Sign up for our “No BS” Newsletter to get the latest technical data & AI content: https://datacouncil.ai/newsletter

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

HuggingFace + Ray AIR Integration: A Python Developer’s Guide to Scaling Transformers | AnyScale

HuggingFace + Ray AIR Integration: A Python Developer’s Guide to Scaling Transformers | AnyScale

2023-05-11 Watch
video
Antoni Baum (Anyscale) , Jules S. Damji (Anyscale Inc)

ABOUT THE TALK: Hugging Face Transformers is a popular open-source project with cutting-edge Machine Learning (ML). Still, meeting the computational requirements for advanced models it provides often requires scaling beyond a single machine. This session explores the integration between Hugging Face and Ray AI Runtime (AIR), allowing users to scale their model training and data loading seamlessly. We will dive deep into the implementation and API and explore how we can use Ray AIR to create an end-to-end Hugging Face workflow, from data ingest through fine-tuning and HPO to inference and serving.

ABOUT THE SPEAKERS: Jules S. Damji is a lead developer advocate at Anyscale Inc, an MLflow contributor, and co-author of Learning Spark, 2nd Edition. He is a hands-on developer with over 25 years of experience and has worked at leading companies, such as Sun Microsystems, Netscape, @Home, Opsware/LoudCloud, VeriSign, ProQuest, Hortonworks, and Databricks, building large-scale distributed systems.

Antoni Baum is a software engineer at Anyscale, working on Ray Tune, XGBoost-Ray, Ray AIR, and other ML libraries. In his spare time, he contributes to various open source projects, trying to make machine learning more accessible and approachable.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

How to Interpret & Explain Your Black Box Models | Anaconda

How to Interpret & Explain Your Black Box Models | Anaconda

2023-05-11 Watch
video
Sophia Yang (Anaconda)

ABOUT THE TALK: There has been an increasing interest in machine learning model interpretability and explainability. Researchers and ML practitioners have designed many explanation techniques such as explainable boosting machine, visual analytics, distillation, prototypes, saliency map, counterfactual, feature visualization, LIME, SHAP, interpretML, and TCAV. In this talk, Sophia Yang provides a high-level overview of the popular model explanation techniques.

ABOUT THE SPEAKER: Sophia Yang is a Senior Data Scientist and a Developer Advocate at Anaconda. She is passionate about the data science community and the Python open-source community. She is the author of multiple Python open-source libraries such as condastats, cranlogs, PyPowerUp, intake-stripe, and intake-salesforce.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Publishing Jupyter Notebooks with Quarto | RStudio

Publishing Jupyter Notebooks with Quarto | RStudio

2023-05-11 Watch
video
J.J. Allaire (RStudio)

ABOUT THE TALK: Quarto is a multi-language, open-source toolkit for creating data-driven websites, reports, presentations, and scientific articles, built on Jupyter.

This talk teaches you how to use Quarto to publish Jupyter notebooks as production quality websites, books, blogs, presentations, PDFs, Office documents, and more. It covers how to publish notebooks within existing content management systems like Hugo, Docusaurus, and Confluence and also explore how Quarto works under the hood along with how the system can be extended to accommodate unique requirements and workflows.

ABOUT THE SPEAKER: J.J. Allaire is the founder of RStudio and the creator of the RStudio IDE. He is an author of several packages in the R Markdown publishing ecosystem and has also worked extensively on the R interfaces to Python and TensorFlow. J.J. is now leading the Quarto project, which is a new Jupyter-based scientific and technical publishing system.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai