Como a Inteligência Artificial está mudando o mercado de trabalho - Data Hackers Podcast 81

2024-03-08 · Data Hackers Listen

podcast_episode

by Yara Mascarenhas (TDC) , Monique Femme (PUCRS) , Paulo Vasconcellos , Gabriel Lages , Ahirton Lopes (Mackensie; Magna Sistemas)

AI/ML Analytics

Se você quer descobrir, como a Inteligência Artificial está redefinindo as regras do jogo no mercado de trabalho, e até mesmo, na maneira como trabalhamos e colaboramos…

Neste episódio do Data Hackers — a maior comunidade de AI e Data Science do Brasil-, chamamos o Yara Mascarenhas — CEO do TDC, e Ahirton Lopes — Head of Data TIVIT, para entender, se existe mesmo, uma visão única sobre as oportunidades e desafios que a IA apresenta para todos nós no mercado de trabalho.

Lembrando que você pode encontrar todos os podcasts da comunidade Data Hackers no Spotify, iTunes, Google Podcast, Castbox e muitas outras plataformas.

Conheça nosso convidado:

Yara Mascarenhas — CEO do TDC Ahirton Lopes — Head of Data TIVIT

Nossa Bancada Data Hackers:

Monique Femme — Head of Community Management na Data Hackers Paulo Vasconcellos — Co-founder da Data Hackers e Principal Data Scientist na Hotmart. Gabriel Lages — Co-founder da Data Hackers e Data & Analytics Sr. Director na Hotmart.

Falamos no episódio:

Baixe o relatório completo do State of Data Brazil 2023 : https://stateofdata.datahackers.com.br/ Inscreva-se na Newsletter Data Hackers: TDC 2024 SUMMIT SÃO PAULO (AI): https://thedevconf.com/tdc/2024/summit-sao-paulo/

When And How To Conduct An AI Program

2024-03-03 · Data Engineering Podcast Listen

podcast_episode

by Colleen Tartow (Starburst Data) , Tobias Macey

AI/ML Analytics Cloud Computing Dagster Data Engineering Data Lake Data Lakehouse Data Management Delta Hudi Iceberg Cyber Security +2 more

Summary

Artificial intelligence technologies promise to revolutionize business and produce new sources of value. In order to make those promises a reality there is a substantial amount of strategy and investment required. Colleen Tartow has worked across all stages of the data lifecycle, and in this episode she shares her hard-earned wisdom about how to conduct an AI program for your organization.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Join us at the top event for the global data community, Data Council Austin. From March 26-28th 2024, we'll play host to hundreds of attendees, 100 top speakers and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data and sharing their insights and learnings through deeply technical talks. As a listener to the Data Engineering Podcast you can get a special discount off regular priced and late bird tickets by using the promo code dataengpod20. Don't miss out on our only event this year! Visit dataengineeringpodcast.com/data-council and use code dataengpod20 to register today! Your host is Tobias Macey and today I'm interviewing Colleen Tartow about the questions to answer before and during the development of an AI program

Interview

Introduction How did you get involved in the area of data management? When you say "AI Program", what are the organizational, technical, and strategic elements that it encompasses?

How does the idea of an "AI Program" differ from an "AI Product"? What are some of the signals to watch for that indicate an objective for which AI is not a reasonable solution?

Who needs to be involved in the process of defining and developing that program?

What are the skills and systems that need to be in place to effectively execute on an AI program?

"AI" has grown to be an even more overloaded term than it already was. What are some of the useful clarifying/scoping questions to address when deciding the path to deployment for different definitions of "AI"? Organizations can easily fall into the trap of green-lighting an AI project before they have done the work of ensuring they have the necessary data and the ability to process it. What are the steps to take to build confidence in the availability of the data?

Even if you are sure that you can get the data, what are t

Cracking the Data Science Interview

2024-02-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Aaren Stubberfield , Leondra R. Gonzalez

AI/ML Bash Git Python SQL data data-science

"Cracking the Data Science Interview" is your ultimate resource for preparing for roles in the competitive field of data science. With this book, you'll explore essential topics such as Python, SQL, statistics, and machine learning, as well as learn practical skills for building portfolios and acing interviews. Follow its guidance and you'll be equipped to stand out in any data science interview. What this Book will help me do Confidently explain complex statistical and machine learning concepts. Develop models and deploy them while ensuring version control and efficiency. Learn and apply scripting skills in shell and Bash for productivity. Master Git workflows to handle collaborative coding in projects. Perfectly tailor portfolios and resumes to land data science opportunities. Author(s) Leondra R. Gonzalez, with years of data science and mentorship experience, co-authors this book with None Stubberfield, a seasoned expert in technology and machine learning. Together, they integrate their expertise to provide practical advice for navigating the data science job market. Who is it for? If you're preparing for data science interviews, this book is for you. It's ideal for candidates with a foundational knowledge of Python, SQL, and statistics looking to refine and expand their technical and professional skills. Professionals transitioning into data science will also find it invaluable for building confidence and succeeding in this rewarding field.

Data Cleaning with Power BI

2024-02-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Gus Frazer

Analytics BI Data Quality DAX Microsoft Power BI business-intelligence data data-science microsoft-power-platform power-bi

Delve into the powerful world of data cleaning with Microsoft Power BI in this detailed guide. You'll learn how to connect, transform, and optimize data from various sources, setting a strong foundation for insightful data-driven decisions. Equip yourself with the skills to master data quality, leverage DAX and Power Query, and produce actionable insights with improved efficiency. What this Book will help me do Master connecting to various data sources and importing data effectively into Power BI. Learn to use the Query Editor to clean and transform data efficiently. Understand how to use the M language to perform advanced data transformations. Gain expertise in creating optimized data models and handling relationships within Power BI. Explore insights-driven exploratory data analysis using Power BI's powerful tools. Author(s) None Frazer is an experienced data professional with a deep knowledge of business intelligence tools and analytics processes. With a strong background in data science and years of hands-on experience using Power BI, Frazer brings practical advice to help users improve their data preparation and analysis skills. Known for creating resources that are both comprehensive and approachable, Frazer is dedicated to empowering readers in their data journey. Who is it for? This book is ideal for data analysts, business intelligence professionals, and business analysts who work regularly with data. If you are someone with a basic understanding of BI tools and concepts looking to deepen their skills, especially in Power BI, this book will guide you effectively. It will also help data scientists and other professionals interested in data cleaning to build a robust basis for data quality and analysis. Whether you're addressing common data challenges or seeking to enhance your BI capabilities, this guide is tailored to accommodate your needs.

Learn Microsoft Fabric

2024-02-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Arshad Ali , Bradley Schacht

AI/ML Analytics Data Analytics Microsoft Fabric Cyber Security Spark SQL analytics-platforms data data-science microsoft-fabric

Dive into the wonders of Microsoft Fabric, the ultimate solution for mastering data analytics in the AI era. Through engaging real-world examples and hands-on scenarios, this book will equip you with all the tools to design, build, and maintain analytics systems for various use cases like lakehouses, data warehouses, real-time analytics, and data science. What this Book will help me do Understand and utilize the key components of Microsoft Fabric for modern analytics. Build scalable and efficient data analytics solutions with medallion architecture. Implement real-time analytics and machine learning models to derive actionable insights. Monitor and administer your analytics platform for high performance and security. Leverage AI-powered assistant Copilot to boost analytics productivity. Author(s) Arshad Ali and None Schacht bring years of expertise in data analytics and system architecture to this book. Arshad is a seasoned professional specialized in AI-integrated analytics platforms, while None Schacht has a proven track record in deploying enterprise data solutions. Together, they provide deep insights and practical knowledge with a structured and approachable teaching method. Who is it for? Ideal for data professionals such as data analysts, engineers, scientists, and AI/ML experts aiming to enhance their data analytics skills and master Microsoft Fabric. It's also suited for students and new entrants to the field looking to establish a firm foundation in analytics systems. Requires a basic understanding of SQL and Spark.

Ensure that your models graduate from Jupyter Notebooks to production - with special guest

2024-02-28 · Designing ML Systems

talk

by Kyle Gallatin (Handshake)

AI/ML Python

Kyle Gallatin is currently a Senior Machine Learning (ML) Engineer at Handshake. As a previous Data Scientist and Software Engineer, Kyle has extensive experience engineering data and ML model features, building ML models and ML pipelines, and deploying ML models to production. Kyle is also the author of the O’Reilly report: The Framework for ML Governance, 2nd author of O’Reilly’s Machine Learning in Python Cookbook, an instructor at New York City Data Science Academy, and frequent publisher of other topics in ML across multiple publications.

Graph Algorithms for Data Science

2024-02-26 · O'Reilly Data Science Books O'Reilly Amazon

book

by Tomaz Bratanic

AI/ML CSV NLP SQL data data-science

Practical methods for analyzing your data with graphs, revealing hidden connections and new insights. Graphs are the natural way to represent and understand connected data. This book explores the most important algorithms and techniques for graphs in data science, with concrete advice on implementation and deployment. You don’t need any graph experience to start benefiting from this insightful guide. These powerful graph algorithms are explained in clear, jargon-free text and illustrations that makes them easy to apply to your own projects. In Graph Algorithms for Data Science you will learn: Labeled-property graph modeling Constructing a graph from structured data such as CSV or SQL NLP techniques to construct a graph from unstructured data Cypher query language syntax to manipulate data and extract insights Social network analysis algorithms like PageRank and community detection How to translate graph structure to a ML model input with node embedding models Using graph features in node classification and link prediction workflows Graph Algorithms for Data Science is a hands-on guide to working with graph-based data in applications like machine learning, fraud detection, and business data analysis. It’s filled with fascinating and fun projects, demonstrating the ins-and-outs of graphs. You’ll gain practical skills by analyzing Twitter, building graphs with NLP techniques, and much more. About the Technology A graph, put simply, is a network of connected data. Graphs are an efficient way to identify and explore the significant relationships naturally occurring within a dataset. This book presents the most important algorithms for graph data science with examples from machine learning, business applications, natural language processing, and more. About the Book Graph Algorithms for Data Science shows you how to construct and analyze graphs from structured and unstructured data. In it, you’ll learn to apply graph algorithms like PageRank, community detection/clustering, and knowledge graph models by putting each new algorithm to work in a hands-on data project. This cutting-edge book also demonstrates how you can create graphs that optimize input for AI models using node embedding. What's Inside Creating knowledge graphs Node classification and link prediction workflows NLP techniques for graph construction About the Reader For data scientists who know machine learning basics. Examples use the Cypher query language, which is explained in the book. About the Author Tomaž Bratanič works at the intersection of graphs and machine learning. Arturo Geigel was the technical editor for this book. Quotes Undoubtedly the quickest route to grasping the practical applications of graph algorithms. Enjoyable and informative, with real-world business context and practical problem-solving. - Roger Yu, Feedzai Brilliantly eases you into graph-based applications. - Sumit Pal, Independent Consultant I highly recommend this book to anyone involved in analyzing large network databases. - Ivan Herreros, talentsconnect Insightful and comprehensive. The author’s expertise is evident. Be prepared for a rewarding journey. - Michal Štefaňák, Volke

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

2024-02-25 · Data Engineering Podcast Listen

podcast_episode

by Paul Dix (InfluxData) , Tobias Macey

AI/ML Analytics Arrow Cloud Computing Dagster Data Engineering Data Lake Data Lakehouse Data Management Delta Hudi Iceberg +4 more

Summary

Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and development into building these software systems there are a number of common components that are shared across implementations. When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow ecosystem ready and waiting with useful building blocks to accelerate the process. In this episode he explains how he used the combination of Apache Arrow, Flight, Datafusion, and Parquet to lay the foundation of the newest version of his time-series database.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Join us at the top event for the global data community, Data Council Austin. From March 26-28th 2024, we'll play host to hundreds of attendees, 100 top speakers and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data and sharing their insights and learnings through deeply technical talks. As a listener to the Data Engineering Podcast you can get a special discount off regular priced and late bird tickets by using the promo code dataengpod20. Don't miss out on our only event this year! Visit dataengineeringpodcast.com/data-council and use code dataengpod20 to register today! Your host is Tobias Macey and today I'm interviewing Paul Dix about his investment in the Apache Arrow ecosystem and how it led him to create the latest PFAD in database design

Interview

Introduction How did you get involved in the area of data management? Can you start by describing the FDAP stack and how the components combine to provide a foundational architecture for database engines?

This was the core of your recent re-write of the InfluxDB engine. What were the design goals and constraints that led you to this architecture?

Each of the architectural components are well engineered for their particular scope. What is the engineering work that is involved in building a cohesive platform from those components? One of the major benefits of using open source components is the network effect of ecosystem integrations. That can also be a risk when the community vision for the project doesn't align with your own goals. How have you worked to mitigate that risk in your specific platform? Can you describe the

"Beware the simple questions" - A live recording that level sets of Data Science.

2024-02-21 · Making Data Simple Listen

podcast_episode

by Suj Perepa (IBM) , Rachel Reinitz (IBM) , Darrell Reimer (IBM) , Al Martin (IBM)

AI/ML GenAI IBM

Send us a text A Data Science level set in 2024. This episode is a live recording that talks to Distinguished Engineers at IBM from Client Engineering, Finance, and Research with Rachel Reinitz, Suj Perepa, and "Beware of simple questions" Darrell Reimer, respectively. Pardon just a bit of sound quality issues.

00:16 The Field of Data Science00:50 Meet Rachel, Suj, and Darrell03:37 What is Data Science Today?07:00 Data Science Skills10:25 How has Data Science Changed12:07 A Day in the Life14:25 AI Engineers?23:45 Fake News and Cost30:36 What's Next?33:27 Too Much GenAI?36:49 Low Barrier Risks39:29 Deep Science vs Deep Business42:17 For Fun LinkedIn: linkedin.com/in/rreinitz, https://www.linkedin.com/in/sperepa/ linkedin.com/in/darrellreimer

Website: https://www.ibm.com/products/watsonx-ai/foundation-models

Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Mastering Microsoft Fabric: SAASification of Analytics

2024-02-21 · O'Reilly Data Science Books O'Reilly Amazon

book

by Debananda Ghosh

AI/ML Analytics AWS Azure ADF BI Cloud Computing Data Engineering Data Lakehouse Data Management DWH LLM +9 more

Learn and explore the capabilities of Microsoft Fabric, the latest evolution in cloud analytics suites. This book will help you understand how users can leverage Microsoft Office equivalent experience for performing data management and advanced analytics activity. The book starts with an overview of the analytics evolution from on premises to cloud infrastructure as a service (IaaS), platform as a service (PaaS), and now software as a service (SaaS version) and provides an introduction to Microsoft Fabric. You will learn how to provision Microsoft Fabric in your tenant along with the key capabilities of SaaS analytics products and the advantage of using Fabric in the enterprise analytics platform. OneLake and Lakehouse for data engineering is discussed as well as OneLake for data science. Author Ghosh teaches you about data warehouse offerings inside Microsoft Fabric and the new data integration experience which brings Azure Data Factory and Power Query Editor of Power BI together in a single platform. Also demonstrated is Real-Time Analytics in Fabric, including capabilities such as Kusto query and database. You will understand how the new event stream feature integrates with OneLake and other computations. You also will know how to configure the real-time alert capability in a zero code manner and go through the Power BI experience in the Fabric workspace. Fabric pricing and its licensing is also covered. After reading this book, you will understand the capabilities of Microsoft Fabric and its Integration with current and upcoming Azure OpenAI capabilities. What You Will Learn Build OneLake for all data like OneDrive for Microsoft Office Leverage shortcuts for cross-cloud data virtualization in Azure and AWS Understand upcoming OpenAI integration Discover new event streaming and Kusto query inside Fabric real-time analytics Utilize seamless tooling for machine learning and data science Who This Book Is For Citizen users and experts in the data engineering and data science fields, along with chief AI officers

[AI and the Modern Data Stack] #182 How Databricks is Transforming Data Warehousing and AI with Ari Kaplan, Head Evangelist & Robin Sutara, Field CTO at Databricks

2024-02-20 · DataFramed Listen

podcast_episode

by Ari Kaplan (Databricks) , Richie (DataCamp) , Robin Sutara (Databricks)

AI/ML Analytics Big Data Data Analytics Data Governance Data Lakehouse Databricks DWH GenAI IBM Marketing Modern Data Stack +5 more

Databricks started out as a platform for using Spark, a big data analytics engine, but it's grown a lot since then. Databricks now allows users to leverage their data and AI projects in the same place, ensuring ease of use and consistency across operations. The Databricks platform is converging on the idea of data intelligence, but what does this mean, how will it help data teams and organizations, and where does AI fit in the picture? Ari is Databricks’ Head of Evangelism and "The Real Moneyball Guy" - the popular movie was partly based on his analytical innovations in Major League Baseball. He is a leading influencer in analytics, artificial intelligence, data science, and high-growth business innovation. Ari was previously the Global AI Evangelist at DataRobot, Nielsen’s regional VP of Analytics, Caltech Alumni of the Decade, President Emeritus of the worldwide Independent Oracle Users Group, on Intel’s AI Board of Advisors, Sports Illustrated Top Ten GM Candidate, an IBM Watson Celebrity Data Scientist, and on the Crain’s Chicago 40 Under 40. He's also written 5 books on analytics, databases, and baseball. Robin is the Field CTO at Databricks. She has consulted with hundreds of organizations on data strategy, data culture, and building diverse data teams. Robin has had an eclectic career path in technical and business functions with more than two decades in tech companies, including Microsoft and Databricks. She also has achieved multiple academic accomplishments from her juris doctorate to a masters in law to engineering leadership. From her first technical role as an entry-level consumer support engineer to her current role in the C-Suite, Robin supports creating an inclusive workplace and is the current co-chair of Women in Data Safety Committee. She was also recognized in 2023 as a Top 20 Women in Data and Tech, as well as DataIQ 100 Most Influential People in Data. In the episode, Richie, Ari, and Robin explore Databricks, the application of generative AI in improving services operations and providing data insights, data intelligence, and lakehouse technology, the wide-ranging applications of generative AI, how AI tools are changing data democratization, the challenges of data governance and management and how tools like Databricks can help, how jobs in data and AI are changing and much more. About the AI and the Modern Data Stack DataFramed Series This week we’re releasing 4 episodes focused on how AI is changing the modern data stack and the analytics profession at large. The modern data stack is often an ambiguous and all-encompassing term, so we intentionally wanted to cover the impact of AI on the modern data stack from different angles. Here’s what you can expect: Why the Future of AI in Data will be Weird with Benn Stancil, CTO at Mode & Field CTO at ThoughtSpot — Covering how AI will change analytics workflows and tools How Databricks is Transforming Data Warehousing and AI with Ari Kaplan, Head Evangelist & Robin Sutara, Field CTO at Databricks — Covering Databricks, data intelligence and how AI tools are changing data democratizationAdding AI to the Data Warehouse with Sridhar Ramaswamy, CEO at Snowflake — Covering Snowflake and its uses, how generative AI is changing the attitudes of leaders towards data, and how to improve your data managementAccelerating AI Workflows with Nuri Cankaya, VP of AI Marketing & La Tiffaney Santucci, AI Marketing Director at Intel — Covering AI’s impact on marketing analytics, how AI is being integrated into existing products, and the democratization of AI Links Mentioned in the Show: DatabricksDelta Lakea href="https://mlflow.org/" rel="noopener...

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

2024-02-18 · Data Engineering Podcast Listen

podcast_episode

by Dain Sundstrom (Starburst) , Tobias Macey

AI/ML Analytics Cloud Computing Dagster Data Engineering Data Lake Data Lakehouse Data Management Delta Hudi Iceberg Cyber Security +2 more

Summary

A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Multiple open source projects and vendors have been working together to make this vision a reality. In this episode Dain Sundstrom, CTO of Starburst, explains how the combination of the Trino query engine and the Iceberg table format offer the ease of use and execution speed of data warehouses with the infinite storage and scalability of data lakes.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Join in with the event for the global data community, Data Council Austin. From March 26th-28th 2024, they'll play host to hundreds of attendees, 100 top speakers, and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working togethr to build the future of data. As a listener to the Data Engineering Podcast you can get a special discount of 20% off your ticket by using the promo code dataengpod20. Don't miss out on their only event this year! Visit: dataengineeringpodcast.com/data-council today. Your host is Tobias Macey and today I'm interviewing Dain Sundstrom about building a data lakehouse with Trino and Iceberg

Interview

Introduction How did you get involved in the area of data management? To start, can you share your definition of what constitutes a "Data Lakehouse"?

What are the technical/architectural/UX challenges that have hindered the progression of lakehouses? What are the notable advancements in recent months/years that make them a more viable platform choice?

There are multiple tools and vendors that have adopted the "data lakehouse" terminology. What are the benefits offered by the combination of Trino and Iceberg?

What are the key points of comparison for that combination in relation to other possible selections?

What are the pain points that are still prevalent in lakehouse architectures as compared to warehouse or vertically integrated systems?

What progress is being made (within or across the ecosystem) to address those sharp edges?

For someone who is interested in building a data lakehouse with Trino and Iceberg, how does that influence their selection of other platform elements? What are the differences in terms of pipeline design/access and usage patterns when using a Trino

97: Winning with Data Science; A Handbook for Business Leaders w/ Akshay Swaminathan

2024-02-14 · Data Career Podcast: Helping You Land a Data Analyst Job FAST Listen

podcast_episode

by Avery Smith , Akshay Swaminathan

AI/ML Analytics Data Analytics

In this podcast episode, Avery talks with Akshay Swaminathan, co-author of the book 'Winning with Data Science: A Handbook for Business Leaders.

They discuss the vague terms often thrown around in the data science industry and underline the importance of understanding these terms in order to make effective business decisions.

Swaminathan highlights the need to leverage data science in healthcare and real estate.

The episode covers various aspects of data analysis, including prediction, association, and description.

Connect with Akshay Swaminathan :

🤝 Connect on Linkedin

📘 Learn About Winning with Data Science

✉️ Discover what we wish we knew about landing the dream job

🤖 Data Analytics Answers At Your Finger Tips

🤝 Ace your data analyst interview with the interview simulator

📩 Get my weekly email with helpful data career tips

📊 Come to my next free “How to Land Your First Data Job” training

🏫 Check out my 10-week data analytics bootcamp

Timestamps:

(12:14) - The Value of Data Science in Business (14:57) - The Role of Data Science in Problem Solving (20:36) - Understanding the Difference Between Data Science and Data Analytics (25:39) - The Importance of Framing Business Questions for Data Science (30:48) - The Power of Meeting Business Needs with Data Science: A Real-Life Example (38:26 ) - The Limitations of Data Science

Connect with Avery:

📺 Subscribe on YouTube

🎙Listen to My Podcast

👔 Connect with me on LinkedIn

📸 Instagram

🎵 TikTok

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Bioinformatics, Genomics, & Data Science w/ Eleanor Howe

2024-02-09 · Data Unchained

podcast_episode

by Eleanor Howe (Diamond Age Data Science)

AI/ML LLM

On this #podcast #episode of Data Unchained, Eleanor Howe, #Founder and #CEO of Diamond Age Data Science, joins us to discuss how she is using #ComputationalBiology to broaden the understanding and structure of #datasets within genomics to inform #drugdiscovery in #oncology, #Cardiovasculardisease and other #Rarediseases.

storage #data #datascience #datasources #datasource #LLM #largelanguagemodels #largelanguagemodel #artificialintelligence #ai

Cyberpunk by jiglr | https://soundcloud.com/jiglrmusic Music promoted by https://www.free-stock-music.com Creative Commons Attribution 3.0 Unported License https://creativecommons.org/licenses/by/3.0/deed.en_US Hosted on Acast. See acast.com/privacy for more information.

Learn Python the Hard Way: A Deceptively Simple Introduction to the Terrifyingly Beautiful World of Computers and Data Science, 5th Edition

2024-02-07 · O'Reilly Data Science Books O'Reilly Amazon

book

by Zed A. Shaw

Python SQL programming-languages software-development

You Will Learn Python! Zed Shaw has created the world's most reliable system for learning Python. Follow it and you will succeed--just like the millions of beginners Zed has taught to date! You bring the discipline, persistence, and attention; the author supplies the masterful knowledge you need to succeed. In Learn Python the Hard Way, Fifth Edition, you'll learn Python by working through 60 lovingly crafted exercises. Read them. Type in the code. Run it. Fix your mistakes. Repeat. As you do, you'll learn how a computer works, how to solve problems, and how to enjoy programming . . . even when it's driving you crazy. Install a complete Python environment Organize and write code Fix and break code Basic mathematics Strings and text Interact with users Work with files Looping and logic Object-oriented programming Data structures using lists and dictionaries Modules, classes, and objects Python packaging Automated testing Basic SQL for Data Science Web scraping Fixing bad data (munging) The "Data" part of "Data Science" It'll be frustrating at first. But if you keep trying, you'll get it--and it'll feel amazing! This course will reward you for every minute you put into it. Soon, you'll know one of the world's most powerful, popular programming languages. You'll be a Python programmer. This Book Is Perfect For Total beginners with zero programming experience Junior developers who know one or two languages Returning professionals who haven't written code in years Aspiring Data Scientists or academics who need to learn to code Seasoned professionals looking for a fast, simple crash course in Python for Data Science Register your book for convenient access to downloads, updates, and/or corrections as they become available. See inside book for details.

The AI Playbook | A Conversation with Eric Siegel

2024-02-06 · UVA Data Points Listen

podcast_episode

by Michael Albert (UVA's Darden School) , Eric Siegel (Machine Learning Week; Columbia University) , Marc Ruggiano (University of Virginia’s Collaboratory for Applied Data Science in Business)

AI/ML Analytics CRM

In his new book, The AI Playbook: Mastering the Rare Art of Machine Learning Deployment, Eric Siegel offers a detailed playbook for how business professionals can launch machine learning projects, providing both success stories where private industry got it right as well as cautionary tales others can learn from.

Siegel laid out the key findings of his book in our latest episode during a wide-ranging conversation with Marc Ruggiano, director of the University of Virginia’s Collaboratory for Applied Data Science in Business, and Michael Albert, an assistant professor of business administration at UVA's Darden School. The discussion, featuring three experts in business analytics, takes an in-depth look at the intersection of artificial intelligence, machine learning, business, and leadership.

http://www.bizML.com

https://www.darden.virginia.edu/faculty-research/centers-initiatives/data-analytics/bodily-professor

https://pubsonline.informs.org/do/10.1287/LYTX.2023.03.10/full/

https://www.kdnuggets.com/survey-machine-learning-projects-still-routinely-fail-to-deploy

CRISPDM: https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining

CRM: https://en.wikipedia.org/wiki/Customer_relationship_management

Data Science and Machine Learning Applications in Subsurface Engineering

2024-02-06 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Daniel Asante Otchere

AI/ML ai-ml data machine-learning

This book provides comprehensive research and explores the different applications of data science and machine learning in subsurface engineering.

#179 Why ML Projects Fail, and How to Ensure Success with Eric Siegel, Founder of Machine Learning Week, Former Columbia Professor, and Bestselling Author

2024-02-05 · DataFramed Listen

podcast_episode

by Eric Siegel (Machine Learning Week; Columbia University) , Adel (DataFramed)

AI/ML Analytics Computer Science GenAI MLOps

We are in a Generative AI hype cycle. Every executive looking at the potential generative AI today is probably thinking about how they can allocate their department's budget to building some AI use cases. However, many of these use cases won't make it into production. In a similar vein, the hype around machine learning in the early 2010s led to lots of hype around the technology, but a lot of the value did not pan out. Four years ago, VentureBeat showed that 87% of data science projects did not make it into production. And in a lot of ways, things haven’t gotten much better. And if we don't learn why that is the case, generative AI could be destined to a similar fate. Eric Siegel, Ph.D., is a leading consultant and former Columbia University professor who helps companies deploy machine learning. He is the founder of the long-running Machine Learning Week conference series and its new sister, Generative AI World, the instructor of the acclaimed online course “Machine Learning Leadership and Practice – End-to-End Mastery,” executive editor of The Machine Learning Times, and a frequent keynote speaker. He wrote the bestselling Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, as well as The AI Playbook: Mastering the Rare Art of Machine Learning Deployment. Eric’s interdisciplinary work bridges the stubborn technology/business gap. At Columbia, he won the Distinguished Faculty award when teaching graduate computer science courses in ML and AI. Later, he served as a business school professor at UVA Darden. Eric also publishes op-eds on analytics and social justice. In the episode, Adel and Eric explore the reasons why machine learning projects don't make it into production, the BizML Framework or how to bring business stakeholders into the room when building machine learning use cases, the skill gap between business stakeholders and data practitioners, use cases of organizations have leveraged machine learning for operational improvements, what the previous machine learning hype cycle can teach us about generative AI and a lot more. Links Mentioned in the Show: The AI Playbook: Mastering the Rare Art of Machine Learning Deployment by Eric SiegelGenerating ROI with AIBizML Cheat SheetGooderSurvey: Machine Learning Projects Still Routinely Fail to Deploy[Skill Track] MLOps Fundamentals

PRIVACY ENGINEERING IN ANALYTICS AND AB TESTING

2024-02-01 · Superweek 2024

talk

by Matt Gershoff (Conductrics, New York - USA)

Analytics

Often in analytics and data science we have the 'big table' mental picture of data where we are continuously trying to append and link new bits of data back to each customer. The issue is that using approaches that follow this model often don't really follow a privacy by default design - rather this is more of an identify by default approach.

Principles of Data Science - Third Edition

2024-01-31 · O'Reilly Data Science Books O'Reilly Amazon

book

by Sinan Ozdemir (LoopGenius)

AI/ML Computer Science NLP Python data data-science

Principles of Data Science offers an end-to-end introduction to data science fundamentals, blending key mathematical concepts with practical programming. You'll learn how to clean and prepare data, construct predictive models, and leverage modern tools like pre-trained models for NLP and computer vision. By integrating theory and practice, this book sets the foundation for impactful data-driven decision-making. What this Book will help me do Develop a solid understanding of foundational statistics and machine learning. Learn how to clean, transform, and visualize data for impactful analysis. Explore transfer learning and pre-trained models for advanced AI tasks. Understand ethical implications, biases, and governance in AI and ML. Gain the knowledge to implement complete data pipelines effectively. Author(s) Sinan Ozdemir is an experienced data scientist, educator, and author with a deep passion for making complex topics accessible. With a background in computer science and applied statistics, Sinan has taught data science at leading institutions and authored multiple books on the topic. His practical approach to teaching combines real-world examples with insightful explanations, ensuring learners gain both competence and confidence. Who is it for? This book is ideal for beginners in data science who want to gain a comprehensive understanding of the field. If you have a background in programming or mathematics and are eager to combine these skills to analyze and extract insights from data, this book will guide you. Individuals working with machine learning or AI who need to solidify their foundational knowledge will find it invaluable. Some familiarity with Python is recommended to follow along seamlessly.

talk-data.com

Data Science

Activity Trend

Top Events

Top Speakers

Como a Inteligência Artificial está mudando o mercado de trabalho - Data Hackers Podcast 81

When And How To Conduct An AI Program

Cracking the Data Science Interview

Data Cleaning with Power BI

Learn Microsoft Fabric

Ensure that your models graduate from Jupyter Notebooks to production - with special guest

Graph Algorithms for Data Science

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

"Beware the simple questions" - A live recording that level sets of Data Science.

Mastering Microsoft Fabric: SAASification of Analytics

[AI and the Modern Data Stack] #182 How Databricks is Transforming Data Warehousing and AI with Ari Kaplan, Head Evangelist & Robin Sutara, Field CTO at Databricks

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

97: Winning with Data Science; A Handbook for Business Leaders w/ Akshay Swaminathan

Bioinformatics, Genomics, & Data Science w/ Eleanor Howe

storage #data #datascience #datasources #datasource #LLM #largelanguagemodels #largelanguagemodel #artificialintelligence #ai

Learn Python the Hard Way: A Deceptively Simple Introduction to the Terrifyingly Beautiful World of Computers and Data Science, 5th Edition

The AI Playbook | A Conversation with Eric Siegel

Data Science and Machine Learning Applications in Subsurface Engineering

#179 Why ML Projects Fail, and How to Ensure Success with Eric Siegel, Founder of Machine Learning Week, Former Columbia Professor, and Bestselling Author

PRIVACY ENGINEERING IN ANALYTICS AND AB TESTING

Principles of Data Science - Third Edition