talk-data.com talk-data.com

Topic

Data Science

machine_learning statistics analytics

1516

tagged

Activity Trend

68 peak/qtr
2020-Q1 2026-Q1

Activities

1516 activities · Newest first

Conversation Simulator: A Real Life Case Leveraging OpenAI's API | Crisis Text Line

ABOUT THE TALK: While we will never replace human to human interaction for crisis intervention, there are plenty of opportunities to build intelligence with AI/ML models that crisis responders could greatly benefit from.

In this talk Maddie Schults and Mateo Garcia introduce their conversation simulator, a tool that we built leveraging openAI's API that allows them to train crisis responders on how to support people in crisis with close to real life situations and can help reduce anxiety for new crisis responders as they log on the platform for the first time.

ABOUT THE SPEAKERS: Maddie Schults is the General Manager at Crisis Text Line. She is a product leader and technologist with over 20 years of experience envisioning, building and launching enterprise software products. At Crisis Text Line, Maddie is responsible for building the Global Product for crisis care intervention and its adoption globally in different countries and languages.

Mateo Garcia is Lead Data Scientist at Crisis Text Line, where he oversees all the Analytics & Data Science efforts. He is a data leader with +7 industry experience scaling data teams from the ground up and building data products at different start-ups and consulting firms.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

What it Takes to Support the World's Most Popular Open Source Communities | NumFOCUS

ABOUT THE TALK: This talk walks you through the structure of NumFOCUS, the programs, challenges, and vision for a sustainable, inclusive, and vibrant open source community. This talk will deep dive on sustainability endeavors, including diversity and inclusion, and how you can get involved in the NumFOCUS community.

ABOUT THE SPEAKER: Dr. Katrina Riehl is President of the Board of Directors at NumFOCUS, Head of the Streamlit Data Team at Snowflake, and Adjunct Lecturer at Georgetown University. For almost two decades, Katrina has worked extensively in the fields of scientific computing, machine learning, data mining, and visualization. Most notably, she has helped lead data science efforts at the University of Texas Austin Applied Research Laboratory, Apple, HomeAway (now, Vrbo), and Cloudflare.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

How Vercel Builds Dozens of Metrics from One Heterogenous Table

ABOUT THE TALK: This talk discusses how Vercel leverages dozens of metrics created from one heterogenous table to drive business, technical, product, and operations decisions across the company. Vercel's approach has empowered technical and non-technical stakeholders to jump into their analytical discovery from the metrics table with more frequent iterations and less involvement from the data team.

Centralizing data and metadata used in creating Vercel's many metrics has increased the number of stakeholders that can participate in analytics, decreased the time needed to troubleshoot outlier events, and removed the data team as a dependency for all data-related tasks.

ABOUT THE SPEAKER: Thomas Mickley-Doyle leads analytics and data science initiatives at Vercel, scaling insights across engineering, product, and design. He focuses on making data modeling, analytics, and decision-making more accessible for all users.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Hot Takes and Tragic Mistakes: How (not) to Integrate Data People in Your App Dev Team Workflows

ABOUT THE TALK: Everyone wants to create new products with AI/ML inside, but you need to integrate your data scientists and data engineers into traditional development teams to do that. But what exactly do they do, and where in the process do they fit? Does their work entirely fall under software engineering, product, or something else? Are you even ready for AI/ML? Has anyone figured this out?

Data-scientist-turned-product person Noelle Saldana shares her observations and opinions on how companies should (and shouldn't) use their data people and her hot takes and tragic mistakes to do it the right way the first time.

ABOUT THE SPEAKER: Noelle Saldana has fifteen years of Data Science experience and is passionate about the value data brings to both products and decision-making. She has led Data Science initiatives at companies across multiple industry verticals, ranging from early startups to Fortune 500 enterprises. Her recent focus has been the intersection of product and data strategy; instrumenting data and eliminating data technical debt to enable robust Data Science and Product Analytics downstream.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

A Deep Dive into the dbt Manifest | Squarespace

ABOUT THE TALK: Ever noticed the manifest.json file that dbt puts into your target folder? This little file contains rich information about your dbt project that enables numerous fun use cases! These include complex deployment configurations, quality enforcement, and streamlined development workflows. This talk will go over what the manifest is and how it is produced, along with case studies of how the manifest is used across the community and in Squarespace’s data pipelines.

ABOUT THE SPEAKER: Aaron Richter is a software developer with a passion for all things data. His work involves making sure data is clean and accessible, and that the tools to access it are at peak performance. Aaron is currently a data engineer at Squarespace, where he supports the company’s analytics platform. Previously, he built the data warehouse at Modernizing Medicine, and worked as a data science advocate at Saturn Cloud.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

When to Move from Batch to Streaming and how to do it without hiring an entirely new team | Bytewax

ABOUT THE TALK: With more and more demand for data pipelines and applications to go real-time it can get overwhelming. This talk demystifies the when, why, and how of moving from batch processing to real-time/stream processing. We will look at arguments for and against stream processing, common architectures, common pitfalls, and open source tools used.

ABOUT THE SPEAKER: Zander Matheson is the founder and CEO of Bytewax, an open source software company focused on enabling more developers to work with streaming data. Before starting Bytewax he worked on data infrastructure and data science at GitHub and Heroku.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

How to Interpret & Explain Your Black Box Models | Anaconda

ABOUT THE TALK: There has been an increasing interest in machine learning model interpretability and explainability. Researchers and ML practitioners have designed many explanation techniques such as explainable boosting machine, visual analytics, distillation, prototypes, saliency map, counterfactual, feature visualization, LIME, SHAP, interpretML, and TCAV. In this talk, Sophia Yang provides a high-level overview of the popular model explanation techniques.

ABOUT THE SPEAKER: Sophia Yang is a Senior Data Scientist and a Developer Advocate at Anaconda. She is passionate about the data science community and the Python open-source community. She is the author of multiple Python open-source libraries such as condastats, cranlogs, PyPowerUp, intake-stripe, and intake-salesforce.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Extinguishing the Garbage Fire of ML Testing | Mailchimp

ABOUT THE TALK:
Our traditional testing and CI methods for Data Science are not working, but we can't just give up on providing guardrails.

As engineers, how do you solve ML testing?

In this talk, Emily Curtain discusses: - abstracting, decoupling, and separating concerns - keeping pytest only where it belongs - substituting testing for observability in appropriate places - applying data reliability practices and thereby solving some problems at the source - by honoring Data Scientists' mental models, and ways of working

ABOUT THE SPEAKER: Emily Curtin is a Staff MLOps Engineer at Intuit Mailchimp. She leads a crazy good team focused on helping Data Scientists do higher quality work faster and more intuitively.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Writing Unit Tests for Data Science Code | Microsoft

ABOUT THE TALK: In Data Science, the small piece of code that you want to test also needs to take in data, training a model, or evaluating a model, but all of these steps are complicated and consist of many smaller units.

Learn from Dr. Nile Wilson her Software Engineering best practices for testing Data Science Code and some of the common scenarios for data, such as mocking calls or mocking data.

ABOUT THE SPEAKER: Dr. Nile Wilson is a Data Scientist 2 in Industry Solutions Engineering at Microsoft, focused on developing and implementing Machine Learning solutions for enterprise customers. She has worked with interdisciplinary teams across various industries to develop production-ready data science solutions to drive business impact.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Scaling Experimentation to 20 Billion Users | Statsig

ABOUT THE TALK:

Statsig is a product observability platform that helps product teams move faster and make better decisions. Companies like Notion, Flipkart, Eventbrite, Ancestry, and Univision use it to release features, run experiments and measure impact.

In only two years, Statsig is supporting thousands of experiments across billions of users (unique company specific userIDs). In this session you will learn lessons on their company's growth.

ABOUT THE SPEAKER: Timothy Chan is an experienced data science professional, currently serving as the Data Science Lead at Statsig. Before joining Statsig, Timothy spent almost 5 years as a Data Scientist at Facebook (now Meta), where he was involved in projects across Facebook App and Reality Labs. His background includes working in biotech, researching treatments for diseases such as Alzheimer’s, Multiple Sclerosis, Lupus, and Cancer.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil

With the advances in AI products and the explosion of ChatGPT in recent months, it is becoming easier to imagine a world where AI and humans work seamlessly together—revolutionizing how we solve complex problems and transform our daily lives. This is especially the case for data professionals. In this episode of our AI series, we speak to Sarah Schlobohm, Head of AI at Kubrick Group. Dr. Schlobohm leads the training of the next generation of machine learning engineers. With a background in finance and consulting, Sarah has a deep understanding of the intersection between business strategy, data science, and AI. Prior to her work in finance, Sarah became a chartered accountant, where she honed her skills in financial analysis and strategy. Sarah worked for one of the world's largest banks, where she used data science to fight financial crime, making significant contributions to the industry's efforts to combat money laundering and other illicit activities. Sarah shares her extensive knowledge on incorporating AI within data teams for maximum impact, covering a wide array of AI-related topics, including upskilling, productivity, and communication, to help data professionals understand how to integrate generative AI effectively in their daily work. Throughout the episode, Sarah explores the challenges and risks of AI integration, touching on the balance between privacy and utility. She highlights the risks data teams can avoid when using AI products and how to approach using AI products the right way. She also covers how different roles within a data team might make use of generative AI, as well as how it might effect coding ability going forward. Sarah also shares use cases for those in non-data teams, such as marketing, while also highlighting what to consider when using outputs from GPT models. Sarah shares the impact chatbots might have on education calling attention to the power of AI tutors in schools. Sarah encourages people to start using AI now, considering the barrier to entry is so low, and how that might not be the case going forward. From automating mundane tasks to enabling human-AI collaboration that makes work more enjoyable, Sarah underscores the transformative power of AI in shaping the future of humanity. Whether you're an AI enthusiast, data professional, or someoone with an interest in either this episode will provide you with a deeper understanding of the practical aspects of AI implementation.

Data Science for Civil Engineering

This book explains use of data science-based techniques for modelling and providing optimal solutions to complex problems in civil engineering. It deals with the basics of data science and essential mathematics and covers pertinent applications in structural and environmental engineering, construction management, and transportation.

Job Ready SQL

Learn the most important SQL skills and apply them in your job—quickly and efficiently! SQL (Structured Query Language) is the modern language that almost every relational database system supports for adding data, retrieving data, and modifying data in a database. Although basic visual tools are available to help end-users input common commands, data scientists, business intelligence analysts, Cloud engineers, Machine Learning programmers, and other professionals routinely need to query a database using SQL. Job Ready SQL provides you with the foundational skills necessary to work with data of any kind. Offering a straightforward ‘learn-by-doing’ approach, this concise and highly practical guide teaches you all the basics of SQL so you can apply your knowledge in real-world environments immediately. Throughout the book, each lesson includes clear explanations of key concepts and hands-on exercises that mirror real-world SQL tasks. Teaches the basics of SQL database creation and management using easy-to-understand language Helps readers develop an understanding of fundamental concepts and more advanced applications such as data engineering and data science Discusses the key types of SQL commands, including Data Definition Language (DDL) commands and Data Manipulation Language (DML) commands Includes useful reference information on querying SQL-based databases Job Ready SQL is a must-have resource for students and working professionals looking to quickly get up to speed with SQL and take their relational database skills to the next level.

Ten years ago, Salesforce was trying to generate $1Bn of revenue in a quarter. Today, they create over $30Bn of revenue in year. Simultaneously, over the last decade we have seen huge advances in the world of data and data science. In this episode, Laura Gent Felker, Director of Data Insights and Scalability at Salesforce, talks about her experience in building and leading data teams within the organization over the last ten years. Laura shares her insights on how to create a learning culture within a team, how to prioritize projects while accounting for long-term strategy, and the importance of setting aside time for innovation. Laura also discusses how to ensure that the projects the team works on genuinely provide business value. She suggests creating a two-way street with executive leadership and understanding the collective value across a variety of stakeholders also citing that some of the best innovation she has seen come from her team is when they have had to solve high-priority short-term business problems. 

In addition, Laura shares a multi-layered approach to building a learning community within a data team. She explains that a culture of collaboration and trust is important in the direct data team, and the wider community within organizations. 

Laura also talks about the frameworks and mental models that can help develop business acumen. She highlights the importance of dedicating time to this area and being able to communicate insights effectively.

Throughout the episode, Laura's insights provide valuable guidance for both junior and experienced data professionals, consumers and leaders in creating a learning culture, prioritizing projects, and building a strong data community within organizations.

Practical Data Privacy

Between major privacy regulations like the GDPR and CCPA and expensive and notorious data breaches, there has never been so much pressure to ensure data privacy. Unfortunately, integrating privacy into data systems is still complicated. This essential guide will give you a fundamental understanding of modern privacy building blocks, like differential privacy, federated learning, and encrypted computation. Based on hard-won lessons, this book provides solid advice and best practices for integrating breakthrough privacy-enhancing technologies into production systems. Practical Data Privacy answers important questions such as: What do privacy regulations like GDPR and CCPA mean for my data workflows and data science use cases? What does "anonymized data" really mean? How do I actually anonymize data? How does federated learning and analysis work? Homomorphic encryption sounds great, but is it ready for use? How do I compare and choose the best privacy-preserving technologies and methods? Are there open-source libraries that can help? How do I ensure that my data science projects are secure by default and private by design? How do I work with governance and infosec teams to implement internal policies appropriately?

Data literacy is becoming increasingly recognized as a valuable skill in today's workforce. We all interact with data on a daily basis, and organizations are now realizing the tremendous benefits of having a workforce that is well-versed in data, from interacting with dashboards to data analysis and data science. But, it all starts with data literacy.  In this episode, we speak with Valerie Logan, CEO and Founder of The Data Lodge. Valerie is committed to data literacy, she believes that in today's digital society, data literacy is a life skill. With advisory services, bootcamps, a resource library and community services at The Data Lodge, Valerie is certifying the world’s first Data Literacy Program Leads and pioneering the path forward in cracking the data culture code. Valerie is also known for helping popularize the term "Data Literacy." In this episode, she shares insights on what a successful data literacy journey looks like, best practices for evangelizing data literacy programs, how to avoid siloed efforts between departments and much more. Valerie sheds light on the difficulties organizations face when trying to prioritize data literacy and data culture. She suggests that this is because humans are still at the center of organizations, and changing their behaviour is a challenge. She also talks about what data literacy means, and how the definition adapts to use cases.  Valerie offers guidance on how to secure executive buy-in for data upskilling programs, explaining that finding a sponsor for the program is the first step. She also talks about the importance of extending buy-in to people who are less directly involved with data and upskilling, emphasizing how the program will help strategic objectives.

Valerie also provides insights on the hallmarks of an effective pilot program for data literacy, suggesting that organizations go where there's already interest and that a good pilot is one where before and after effects can be measured. She also shares tips on how organizations can ensure that their data literacy program helps them achieve their strategic business goals.

Throughout the episode, Valerie outlines the benefit and scope data literacy can have on an organization, with one of the most pertinent pieces of wisdom being a warning to organisations that risk ignoring upskilling and investing in data.

Links mentioned in the show: RADAR 2023: Building an Enterprise Data Strategy that Puts People FirstThe Data LodgeThe State of Data Literacy in 2023What is Data Maturity and Why Does it Matter?

Jupyter Notebooks have been a widely popular tool for data science in recent years due to their ability to combine code, text, and visualizations in a single document.

Despite its popularity, the core functionality and user experience of the Classic Jupyter Notebook interface has remained largely unchanged over the past years.

Lately the Jupyter Notebook project decided to base its next major version 7 on JupyterLab components and extensions, which means many JupyterLab features are also available to Jupyter Notebook users.

In this presentation, we will demo the new features coming in Jupyter Notebook version 7 and how they are relevant to existing users of the Classic Notebook.

Jupyter notebooks are a popular tool for data science and scientific computing, allowing users to mix code, text, and multimedia in a single document. However, sharing Jupyter notebooks can be challenging, as they require installing a specific software environment to be viewed and executed.

JupyterLite is a Jupyter distribution that runs entirely in the web browser without any server components. A significant benefit of this approach is the ease of deployment. With JupyterLite, the only requirement to provide a live computing environment is a collection of static assets. In this talk, we will show how you can create such static website and deploy it to your users.

Doing data science in international development often means finding the right-sized solution in resource-constrained settings.

This talk walks you through how my team helped answer thousands of questions from pregnant folks and new parents on a South African maternal and child health helpline, which model we ended up choosing and why (hint: resource-constraints!), and how we've packaged everything into a service that anyone can start for themselves,

By the end of the talk, I hope you'll know how to start your own FAQ-answering service and learn about one example of doing data science in international development.