talk-data.com talk-data.com

Topic

Data Science

machine_learning statistics analytics

1516

tagged

Activity Trend

68 peak/qtr
2020-Q1 2026-Q1

Activities

1516 activities · Newest first

Summary

The primary application of data has moved beyond analytics. With the broader audience comes the need to present data in a more approachable format. This has led to the broad adoption of data products being the delivery mechanism for information. In this episode Ranjith Raghunath shares his thoughts on how to build a strategy for the development, delivery, and evolution of data products.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! As more people start using AI for projects, two things are clear: It’s a rapidly advancing field, but it’s tough to navigate. How can you get the best results for your use case? Instead of being subjected to a bunch of buzzword bingo, hear directly from pioneers in the developer and data science space on how they use graph tech to build AI-powered apps. . Attend the dev and ML talks at NODES 2023, a free online conference on October 26 featuring some of the brightest minds in tech. Check out the agenda and register today at Neo4j.com/NODES. This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Your host is Tobias Macey and today I'm interviewing Ranjith Raghunath about tactical elements of a data product strategy

Interview

Introduction How did you get involved in the area of data management? Can you describe what is encompassed by the idea of a data product strategy?

Which roles in an organization need to be involved in the planning and implementation of that strategy?

order of operations:

strategy -> platform design -> implementation/adoption platform implementation -> product strategy -> interface development

managing grain of data in products team organization to support product development/deployment customer communications - what questions to ask? requirements gathering, helping to understand "the art of the possible" What are the most interesting, innovative, or unexpected ways that you have seen organizations approach data product strategies? What are the most interesting, unexpected, or challenging lessons that you have learned while working on

How do #DataScience Teams answer the questions of business teams? How do #DataScientists determine the right questions to solve for? How do you convince business people to be open to a broader answer than they specifically asked for? Marcello Molinaro helps answer these questions from his unique perspective as a #DataAnalyst for Mozart Data on this latest episode of #Data Unchained!

data #datascience #technology #MozartData

Cyberpunk by jiglr | https://soundcloud.com/jiglrmusic Music promoted by https://www.free-stock-music.com Creative Commons Attribution 3.0 Unported License https://creativecommons.org/licenses/by/3.0/deed.en_US Hosted on Acast. See acast.com/privacy for more information.

Summary

Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems for engineering teams. In this episode Eric Sammer discusses why more companies are including real-time capabilities in their products and the ways that Decodable makes it faster and easier.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! As more people start using AI for projects, two things are clear: It’s a rapidly advancing field, but it’s tough to navigate. How can you get the best results for your use case? Instead of being subjected to a bunch of buzzword bingo, hear directly from pioneers in the developer and data science space on how they use graph tech to build AI-powered apps. . Attend the dev and ML talks at NODES 2023, a free online conference on October 26 featuring some of the brightest minds in tech. Check out the agenda and register today at Neo4j.com/NODES. Your host is Tobias Macey and today I'm interviewing Eric Sammer about starting your stream processing journey with Decodable

Interview

Introduction How did you get involved in the area of data management? Can you describe what Decodable is and the story behind it?

What are the notable changes to the Decodable platform since we last spoke? (October 2021) What are the industry shifts that have influenced the product direction?

What are the problems that customers are trying to solve when they come to Decodable? When you launched your focus was on SQL transformations of streaming data. What was the process for adding full Java support in addition to SQL? What are the developer experience challenges that are particular to working with streaming data?

How have you worked to address that in the Decodable platform and interfaces?

As you evolve the technical and product direction, what is your heuristic for balancing the unification of interfaces and system integration against the ability to swap different components or interfaces as new technologies are introduced? What are the most interesting, innovative, or unexpected ways that you have seen Decodable used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Decodable? When is Decodable the wrong choice? What do you have planned for the future of Decodable?

Contact Info

esammer on GitHub LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Decodable

Podcast Episode

Understanding the Apache Flink Journey Flink

Podcast Episode

Debezium

Podcast Episode

Kafka Redpanda

Podcast Episode

Kinesis PostgreSQL

Podcast Episode

Snowflake

Podcast Episode

Databricks Startree Pinot

Podcast Episode

Rockset

Podcast Episode

Druid InfluxDB Samza Storm Pulsar

Podcast Episode

ksqlDB

Podcast Episode

dbt GitHub Actions Airbyte Singer Splunk Outbox Pattern

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Neo4J: NODES Conference Logo

NODES 2023 is a free online conference focused on graph-driven innovations with content for all skill levels. Its 24 hours are packed with 90 interactive technical sessions from top developers and data scientists across the world covering a broad range of topics and use cases. The event tracks: - Intelligent Applications: APIs, Libraries, and Frameworks – Tools and best practices for creating graph-powered applications and APIs with any software stack and programming language, including Java, Python, and JavaScript - Machine Learning and AI – How graph technology provides context for your data and enhances the accuracy of your AI and ML projects (e.g.: graph neural networks, responsible AI) - Visualization: Tools, Techniques, and Best Practices – Techniques and tools for exploring hidden and unknown patterns in your data and presenting complex relationships (knowledge graphs, ethical data practices, and data representation)

Don’t miss your chance to hear about the latest graph-powered implementations and best practices for free on October 26 at NODES 2023. Go to Neo4j.com/NODES today to see the full agenda and register!Rudderstack: Rudderstack

Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstackMaterialize: Materialize

You shouldn't have to throw away the database to build with fast-changing data. Keep the familiar SQL, keep the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date.

That is Materialize, the only true SQL streaming database built from the ground up to meet the needs of modern data products: Fresh, Correct, Scalable — all in a familiar SQL UI. Built on Timely Dataflow and Differential Dataflow, open source frameworks created by cofounder Frank McSherry at Microsoft Research, Materialize is trusted by data and engineering teams at Ramp, Pluralsight, Onward and more to build real-time data products without the cost, complexity, and development time of stream processing.

Go to materialize.com today and get 2 weeks free!Datafold: Datafold

This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare…

Summary

The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold As more people start using AI for projects, two things are clear: It’s a rapidly advancing field, but it’s tough to navigate. How can you get the best results for your use case? Instead of being subjected to a bunch of buzzword bingo, hear directly from pioneers in the developer and data science space on how they use graph tech to build AI-powered apps. . Attend the dev and ML talks at NODES 2023, a free online conference on October 26 featuring some of the brightest minds in tech. Check out the agenda and register today at Neo4j.com/NODES. You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! Your host is Tobias Macey and today I'm interviewing Max Cho about the wild world of insurance companies and the challenges of collecting quality data for this opaque industry

Interview

Introduction How did you get involved in the area of data management? Can you describe what CoverageCat is and the story behind it? What are the different sources of data that you work with?

What are the most challenging aspects of collecting that data? Can you describe the formats and characteristics (3 Vs) of that data?

What are some of the ways that the operational model of insurance companies have contributed to its opacity as an industry from a data perspective? Can you describe how you have architected your data platform?

How have the design and goals changed since you first started working on it? What are you optimizing for in your selection and implementation process?

What are the sharp edges/weak points that you worry about in your existing data flows?

How do you guard against those flaws in your day-to-day operations?

What are the

DataAnalyst are very passionate about their jobs and are able to work in a variety of different industries but, just how is that possible? Are #Data Analysts solving for the same problems in every industry? Or, are Data #Analyst solving for different problems in different industries? Find out the answer to all these questions and more as Kumarika Sau, Data Analyst for Oxford Nanopore Technologies, joins us on this #podcast episode of Data Unchained!

data #datascience #technology #datascience #industry #oxford

Cyberpunk by jiglr | https://soundcloud.com/jiglrmusic Music promoted by https://www.free-stock-music.com Creative Commons Attribution 3.0 Unported License https://creativecommons.org/licenses/by/3.0/deed.en_US Hosted on Acast. See acast.com/privacy for more information.

Hands-On Web Scraping with Python - Second Edition

In "Hands-On Web Scraping with Python," you'll learn how to harness the power of Python libraries to extract, process, and analyze data from the web. This book provides a practical, step-by-step guide for beginners and data enthusiasts alike. What this Book will help me do Master the use of Python libraries like requests, lxml, Scrapy, and Beautiful Soup for web scraping. Develop advanced techniques for secure browsing and data extraction using APIs and Selenium. Understand the principles behind regex and PDF data parsing for comprehensive scraping. Analyze and visualize data using data science tools such as Pandas and Plotly. Build a portfolio of real-world scraping projects to demonstrate your capabilities. Author(s) Anish Chapagain, the author of "Hands-On Web Scraping with Python," is an experienced programmer and instructor who specializes in Python and data-related technologies. With his vast experience in teaching individuals from diverse backgrounds, Anish approaches complex concepts with clarity and a hands-on methodology. Who is it for? This book is perfect for aspiring data scientists, Python beginners, and anyone who wants to delve into web scraping. Readers should have a basic understanding of how websites work but no prior coding experience is required. If you aim to develop scraping skills and understand data analysis, this book is the ideal starting point.

podcast_episode
by Ken Ono (University of Virginia (UVA)) , August Lamb (University of Virginia (UVA)) , Will Tenpas (University of Virginia (UVA)) , Kate Douglass (University of Virginia (UVA))

This episode is a collaboration between UVA Data Points and Hoos in STEM.

This episode of UVA Data Points features Ken Ono discussing the growth of data science at UVA and its increasing importance in various disciplines, including how he uses it to help swimmers improve performance. Ono is a professor of mathematics and STEM advisor to the provost, as well as a professor of data science by courtesy. He recently supported the women's team at the U.S. Olympic Trials in Japan.

Ono speaks with three UVA swimmers who are pursuing graduate degrees in data science and statistics while also performing as student-athletes: August Lamb, Kate Douglass, and Will Tenpas. They discuss student life, balancing academics with swimming, and how data science and mathematics are helping them win championships.

Summary

Artificial intelligence applications require substantial high quality data, which is provided through ETL pipelines. Now that AI has reached the level of sophistication seen in the various generative models it is being used to build new ETL workflows. In this episode Jay Mishra shares his experiences and insights building ETL pipelines with the help of generative AI.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! As more people start using AI for projects, two things are clear: It’s a rapidly advancing field, but it’s tough to navigate. How can you get the best results for your use case? Instead of being subjected to a bunch of buzzword bingo, hear directly from pioneers in the developer and data science space on how they use graph tech to build AI-powered apps. . Attend the dev and ML talks at NODES 2023, a free online conference on October 26 featuring some of the brightest minds in tech. Check out the agenda and register at Neo4j.com/NODES. Your host is Tobias Macey and today I'm interviewing Jay Mishra about the applications for generative AI in the ETL process

Interview

Introduction How did you get involved in the area of data management? What are the different aspects/types of ETL that you are seeing generative AI applied to?

What kind of impact are you seeing in terms of time spent/quality of output/etc.?

What kinds of projects are most likely to benefit from the application of generative AI? Can you describe what a typical workflow of using AI to build ETL workflows looks like?

What are some of the types of errors that you are likely to experience from the AI? Once the pipeline is defined, what does the ongoing maintenance look like? Is the AI required to operate within the pipeline in perpetuity?

For individuals/teams/organizations who are experimenting with AI in their data engineering workflows, what are the concerns/questions that they are trying to address? What are the most interesting, innovative, or unexpected w

Já exploramos com o Grupo Boticário, assuntos desde como é trabalhar com dados, até mesmo, como fazem uso de Modern Data Stack. Agora, queremos saber como a IA está mudando a forma do trabalho de uma das empresas mais admiradas da America Latina, da Pesquisa State of Data Brazil.

Neste episódio do Data Hackers — a maior comunidade de AI e Data Science do Brasil-, conheçam esse time de especialistas : a Isabella Becker — DPO (Data Protection Officer); e o Bruno Gobbet — Senior Data Manager; ambos atuantes na área de dados do Grupo Boticário.

Lembrando que você pode encontrar todos os podcasts da comunidade Data Hackers no Spotify, iTunes, Google Podcast, Castbox e muitas outras plataformas. Caso queira, você também pode ouvir o episódio aqui no post mesmo!

Link no Medium: https://medium.com/data-hackers/como-ia-est%C3%A1-mudando-a-forma-do-grupo-botic%C3%A1rio-trabalhar-data-hackers-podcast-74-c45006b64d67

Falamos no episódio

Conheça nosso convidado:

Isabella Becker — DPO ( Data Protection Officer) Bruno Gobbet — Senior Data Manager

Bancada Data Hackers:

Paulo Vasconcellos Monique Femme

Links de referências:

GH TECH (Medium): https://medium.com/gbtech Data Hackers News ( noticias semanais sobre a área de dados, AI e tecnologia) — https://podcasters.spotify.com/pod/show/datahackers/episodes/Data-Hackers-News-1---Amazon-investe-US-4-bi-na-Anthropic--Microsoft-anuncia-Copilot-para-Windows-11--OpenAI-anuncia-DALL-E-3-e29r06f Série Netflix Coded Bias: https://www.netflix.com/br/title/81328723 Livro ( Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy): https://www.amazon.com/Weapons-Math-Destruction-Increases-Inequality/dp/0553418815

Streamlit for Data Science - Second Edition

Streamlit for Data Science is your complete guide to mastering the creation of powerful, interactive data-driven applications using Python and Streamlit. With this comprehensive resource, you'll learn everything from foundational Streamlit skills to advanced techniques like integrating machine learning models and deploying apps to cloud platforms, enabling you to significantly enhance your data science toolkit. What this Book will help me do Master building interactive applications using Streamlit, including techniques for user interfaces and integrations. Develop visually appealing and functional data visualizations using Python libraries in Streamlit. Learn to integrate Streamlit applications with machine learning frameworks and tools like Hugging Face and OpenAI. Understand and apply best practices to deploy Streamlit apps to cloud platforms such as Streamlit Community Cloud and Heroku. Improve practical Python skills through implementing end-to-end data applications and prototyping data workflows. Author(s) Tyler Richards, the author of Streamlit for Data Science, is a senior data scientist with in-depth practical experience in building data-driven applications. With a passion for Python and data visualization, Tyler leverages his knowledge to help data professionals craft effective and compelling tools. His teaching approach combines clarity, hands-on exercises, and practical relevance. Who is it for? This book is written for data scientists, engineers, and enthusiasts who use Python and want to create dynamic data-driven applications. With a focus on those who have some familiarity with Python and libraries like Pandas or NumPy, it assists readers in building on their knowledge by offering tailored guidance. Perfect for those looking to prototype data projects or enhance their programming toolkit.

Data Engineering and Data Science

DATA ENGINEERING and DATA SCIENCE Written and edited by one of the most prolific and well-known experts in the field and his team, this exciting new volume is the “one-stop shop” for the concepts and applications of data science and engineering for data scientists across many industries. The field of data science is incredibly broad, encompassing everything from cleaning data to deploying predictive models. However, it is rare for any single data scientist to be working across the spectrum day to day. Data scientists usually focus on a few areas and are complemented by a team of other scientists and analysts. Data engineering is also a broad field, but any individual data engineer doesn’t need to know the whole spectrum of skills. Data engineering is the aspect of data science that focuses on practical applications of data collection and analysis. For all the work that data scientists do to answer questions using large sets of information, there have to be mechanisms for collecting and validating that information. In this exciting new volume, the team of editors and contributors sketch the broad outlines of data engineering, then walk through more specific descriptions that illustrate specific data engineering roles. Data-driven discovery is revolutionizing the modeling, prediction, and control of complex systems. This book brings together machine learning, engineering mathematics, and mathematical physics to integrate modeling and control of dynamical systems with modern methods in data science. It highlights many of the recent advances in scientific computing that enable data-driven methods to be applied to a diverse range of complex systems, such as turbulence, the brain, climate, epidemiology, finance, robotics, and autonomy. Whether for the veteran engineer or scientist working in the field or laboratory, or the student or academic, this is a must-have for any library.

From the dawn of humanity, decisions, both big and small, have shaped our trajectory. Decisions have built civilizations, forged alliances, and even charted the course of our very evolution. And now, as data & AI become more widespread, the potential upside for better decision making is massive. Yet, like any technology, the true value of data & AI is realized by how we wield it.  We're often drawn to the allure of the latest tools and techniques, but it's crucial to remember that these tools are only as effective as the decisions we make with them. ChatGPT is only as good as the prompt you decide to feed it and what you decide to do with the output. A dashboard is only as good as the decisions that it influences. Even a data science team is only as effective as the value they deliver to the organization.  So in this vast landscape of data and AI, how can we master the art of better decision making? How can we bridge data & AI with better decision intelligence? ​​Cassie Kozyrkov founded the field of Decision Intelligence at Google where, until recently, she served as Chief Decision Scientist, advising leadership on decision process, AI strategy, and building data-driven organizations. Upon leaving Google, Cassie started her own company of which she is the CEO, Data Scientific. In almost 10 years at the company, Cassie personally trained over 20,000 Googlers in data-driven decision-making and AI and has helped over 500 projects implement decision intelligence best practices. Cassie also previously served in Google's Office of the CTO as Chief Data Scientist, and the rest of her 20 years of experience was split between consulting, data science, lecturing, and academia.  Cassie is a top keynote speaker and a beloved personality in the data leadership community, followed by over half a million tech professionals. If you've ever went on a reading spree about AI, statistics, or decision-making, chances are you've encountered her writing, which has reached millions of readers.  In the episode Cassie and Richie explore misconceptions around data science, stereotypes associated with being a data scientist, what the reality of working in data science is, advice for those starting their career in data science, and the challenges of being a data ‘jack-of-all-trades’.  Cassie also shares what decision-science and decision intelligence are, what questions to ask future employers in any data science interview, the importance of collaboration between decision-makers and domain experts, the differences between data science models and their real-world implementations, the pros and cons of generative AI in data science, and much more.  Links mentioned in the Show: Data scientist: The sexiest job of the 22nd centuryThe Netflix PrizeAI Products: Kitchen AnalogyType one, Two & Three Errors in StatisticsCourse: Data-Driven Decision Making for BusinessRadar: Data & AI Literacy...

The Unrealized Opportunities with Real-Time Data

The amount of data generated from various processes and platforms has increased exponentially in the past decade, and the challenges of filtering useful data out of streams of raw data has become even greater. Meanwhile, the essence of making useful insights from that data has become even more important. In this incisive report, Federico Castanedo examines the challenges companies face when acting on data at rest as well as the benefits you unlock when acting on data as it's generated. Data engineers, enterprise architects, CTOs, and CIOs will explore the tools, processes, and mindset your company needs to process streaming data in real time. Learn how to make quick data-driven decisions to gain an edge on competitors. This report helps you: Explore gaps in today's real-time data architectures, including the limitations of real-time analytics to act on data immediately Examine use cases that can't be served efficiently with real-time analytics Understand how stream processing engines work with real-time data Learn how distributed data processing architectures, stream processing, streaming analytics, and event-based architectures relate to real-time data Understand how to transition from traditional batch processing environments to stream processing Federico Castanedo is an academic director and adjunct professor at IE University in Spain. A data science and AI leader, he has extensive experience in academia, industry, and startups.

Mergulhamos no emocionante mundo da análise de dados na maior cervejaria do Brasil. Prepare-se para uma jornada fascinante, onde a paixão pela cerveja encontra a ciência dos dados. Em um papo, muito divertido, exploramos como a análise de dados desempenha um papel crucial desde o plantio, até a distribuição e qualidade consistente dos produtos icônicos da Ambev.

Neste episódio do Data Hackers — a maior comunidade de AI e Data Science do Brasil-, conheçam esse time de especialistas da Ambev Tech — núcleo de tecnologia da Ambev : o Daniel Cassiano  — Diretor de Data & Analytics na América do Sul; a Gabriela Madia  — Gerente de Governança de Dados & Analytics; o Daniel Henrique - Gerente de Engenharia de Plataforma de Dados, e o Felipe Contratres — Líder do Centro de Excelência de Analytics.

Lembrando que você pode encontrar todos os podcasts da comunidade Data Hackers no Spotify, iTunes, Google Podcast, Castbox e muitas outras plataformas. Caso queira, você também pode ouvir o episódio aqui no post mesmo!

Link do Medium:

Falamos no episódio

Conheça nosso convidado:

Daniel Cassiano — Diretor de Data & Analytics para América do Sul Gabriela Madia — Gerente de Governança de Dados & Analytics Daniel Henrique — Gerente de Engenharia de Plataforma de Dados Felipe Contratres — Líder do Centro de Excelência de Analytics

Bancada Data Hackers:

Paulo Vasconcellos Monique Femme Gabriel Lages

Links de referências:

Projeto que monitora o lado da rua que o caminhão estaciona, (post explicativo): https://www.instagram.com/reel/CxatwlqrGuV/?igshid=MzRlODBiNWFlZA== Projeto Diesel Analytics ( Projeto no Meetup Tech & Cheers em São Paulo): https://www.instagram.com/reel/CuHfmL8rQGI/?igshid=MzRlODBiNWFlZA== Link de Vagas (Ambev Tech) — https://ambevtech.gupy.io/ Ambev Tech (@ambevtech) — Fotos e vídeos do Instagram  — https://www.instagram.com/ambevtech/ Linkedin: https://www.linkedin.com/company/ambevtech/ Ambev Tech Talk (Podcast com episódio mensal sobre Dados — Papo de Dados) : https://open.spotify.com/show/07cPNODgBHWh2JMkHbZxXG?si=2432135d6daa4a18

Today I’m joined by Anthony Deighton, General Manager of Data Products at Tamr. Throughout our conversation, Anthony unpacks his definition of a data product and we discuss whether or not he feels that Tamr itself is actually a data product. Anthony shares his views on why it’s so critical to focus on solving for customer needs and not simply the newest and shiniest technology. We also discuss the challenges that come with building a product that’s designed to facilitate the creation of better internal data products, as well as where we are in this new wave of data product management, and the evolution of the role.

Highlights/ Skip to:

I introduce Anthony, General Manager of Data Products at Tamr, and the topics we’ll be discussing today (00:37) Anthony shares his observations on how BI analytics are an inch deep and a mile wide due to the data that’s being input (02:31) Tamr’s focus on data products and how that reflects in Anthony’s recent job change from Chief Product Officer to General Manager of Data Products (04:35) Anthony’s definition of a data product (07:42) Anthony and I explore whether he feels that decision support is necessary for a data product (13:48) Whether or not Anthony feels that Tamr qualifies as a data product (17:08) Anthony speaks to the importance of focusing on outcomes and benefits as opposed to endlessly knitting together features and products (19:42) The challenges Anthony sees with metrics like Propensity to Churn (21:56) How Anthony thinks about design in a product like Tamr (30:43) Anthony shares how data science at Tamr is a tool in his toolkit and not viewed as a “fourth” leg of the product triad/stool (36:01) Anthony’s views on where we are in the evolution of the DPM role (41:25) What Anthony would do differently if he could start over at Tamr knowing what he knows now (43:43)

Links Tamr: https://www.tamr.com/ Innovating: https://www.amazon.com/Innovating-short-guide-making-things/dp/B0C8R79PVB The Mom Test: https://www.amazon.com/The-Mom-Test-Rob-Fitzpatrick-audiobook/dp/B07RJZKZ7F LinkedIn: https://www.linkedin.com/in/anthonydeighton/

In data science, the push for unbiased machine learning models is evident. So much effort is made into ensuring the products we create are done thoughtfully and correctly, but are we investing the same effort in ensuring our teams, the very architects of these models, are diverse and inclusive? Bias in data can lead to skewed results, and similarly, a lack of diversity in teams can result in narrow perspectives. As we prioritize building diversity and inclusion into our data, it's equally crucial to embed these principles within our teams. So, who is best equipped to guide us in integrating DEI from a data perspective? Tracy Daniels is the Chief Data Officer for Truist Financial Corporation. She leads the team responsible for Truist’s enterprise data capabilities, including strategy, governance, data platform delivery, client, master & reference data, and the centers of excellence for business intelligence visualization and artificial intelligence & machine learning. She is also the executive sponsor for Truist’s Enterprise Technology & Operations Diversity Council. Daniels joined Truist in 2018. She has more than 25 years of banking and technology experience leading high performing technology portfolio, development, infrastructure and global operations organizations. Tracy enjoys participating in civic and philanthropic endeavors including serving on the Georgia State University Foundation Board of Trustees. She has been recognized as a National 2013 WOC STEM Rising Star award recipient, the 2017 Working Mother magazine Mother of the Year recipient, and a 2021 Women In Technology (WIT) Women of the Year in STEAM finalist. In the episode Tracy and Richie discuss Truist's approach to Diversity, Equity, and Inclusion (DEI) and its alignment with the company's purpose and values, the distinction between diversity and inclusion, the positive outcomes of implementing DEI correctly, the importance of not missing opportunities both externally with customers and internally with talent, the significance of aligning diversity programs with business metrics and hiring to promote DEI, considerations for job advertisements that appeal to a diverse audience, and much more.  Links mentioned in the show: McKinsey on Diversity and InclusionBrookings Piece on Mitigating Bias in DataAlgorithmic Justice LeagueEuropean Legislation on Data and DiversityCourse: AI EthicsRadar: Data & AI Literacy Edition

Learning Data Science

As an aspiring data scientist, you appreciate why organizations rely on data for important decisions—whether it's for companies designing websites, cities deciding how to improve services, or scientists discovering how to stop the spread of disease. And you want the skills required to distill a messy pile of data into actionable insights. We call this the data science lifecycle: the process of collecting, wrangling, analyzing, and drawing conclusions from data. Learning Data Science is the first book to cover foundational skills in both programming and statistics that encompass this entire lifecycle. It's aimed at those who wish to become data scientists or who already work with data scientists, and at data analysts who wish to cross the "technical/nontechnical" divide. If you have a basic knowledge of Python programming, you'll learn how to work with data using industry-standard tools like pandas. Refine a question of interest to one that can be studied with data Pursue data collection that may involve text processing, web scraping, etc. Glean valuable insights about data through data cleaning, exploration, and visualization Learn how to use modeling to describe the data Generalize findings beyond the data

In the realm of AI, GPUs play a crucial role in data science projects. Yet, squeezing the most out of them is a challenge, particularly for data scientists with limited resources and time. This talk explores the often-overlooked world of GPU orchestration and allocation with a dash of humour. We’ll address common mistakes like underutilization, static assignments, and resource sharing pitfalls. Introducing Genv, an open-source tool, we’ll transform these challenges into opportunities for seamless and efficient AI workflows.

Join me with data interview expert Nick Singh to discuss how to ace your data science and analytics interviews, from preparation tips to tackling SQL technical questions, in this must-listen episode of the Data Career Podcast—you won't want to miss it!

Connect with Nick Singh:

🎁 Free SQL Tutorial from Data Lemur: https://datalemur.com/sql-tutorial

🤝 Connect on Linkedin

📕 Buy "Ace the Data Science Interview" book

🐵DataLemur

👔 Get a discount on my data interview prep course: https://www.datacareerjumpstart.com//interview

📩 Get my weekly email with helpful data career tips

📊 Come to my next free “How to Land Your First Data Job” training

🏫 Check out my 10-week data analytics bootcamp

Timestamps:

(16:10) - 📊 Best interview tips

(36:30) - 🔄 Mock interview practice

Connect with Avery:

📺 Subscribe on YouTube

🎙Listen to My Podcast

👔 Connect with me on LinkedIn

📸 Instagram

🎵 TikTok

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa