talk-data.com talk-data.com

Topic

Data Science

machine_learning statistics analytics

1516

tagged

Activity Trend

68 peak/qtr
2020-Q1 2026-Q1

Activities

1516 activities · Newest first

In this podcast, I give my opinion if YOU should do the Masters in Analytics from Georgia Tech (OMSA). I’ll share my experience, what I thought was good, and not so good, and help you make your decision!

Watch this episode on YouTube: https://www.youtube.com/watch?v=dpVNRB67-So&t=1s

If you want a free way to kickstart your analytics career, check out my free 33-page PDF giving you an introduction to everything you need to know: https://www.datacareerjumpstart.com/roadmap

If you’re just starting out, you can check out my 21 Day To Data Challenge: https://www.datacareerjumpstart.com/challenge

Want to learn data science while building your portfolio? Check out Data Career Jumpstart: https://www.datacareerjumpstart.com/data-career-jumpstart-course

MORE DATA ANALYTICS CONTENT HERE:

📺 Subscribe YouTube: https://www.youtube.com/c/AverySmithDataCareerJumpstart/videos

🎙Listen to My Podcast: https://podcasts.apple.com/us/podcast/data-career-podcast/id1547386535

👔 Connect with me on LinkedIn: https://www.linkedin.com/in/averyjsmith/

📸 Instagram: https://www.instagram.com/datacareerjumpstart/

👾Join My Discord: https://www.datacareerjumpstart.com/discord

🎵 TikTok: https://www.tiktok.com/@verydata? 

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Summary Pandas is a powerful tool for cleaning, transforming, manipulating, or enriching data, among many other potential uses. As a result it has become a standard tool for data engineers for a wide range of applications. Matt Harrison is a Python expert with a long history of working with data who now spends his time on consulting and training. He recently wrote a book on effective patterns for Pandas code, and in this episode he shares advice on how to write efficient data processing routines that will scale with your data volumes, while being understandable and maintainable.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Today’s episode is Sponsored by Prophecy.io – the low-code data engineering platform for the cloud. Prophecy provides an easy-to-use visual interface to design & deploy data pipelines on Apache Spark & Apache Airflow. Now all the data users can use software engineering best practices – git, tests and continuous deployment with a simple to use visual designer. How does it work? – You visually design the pipelines, and Prophecy generates clean Spark code with tests on git; then you visually schedule these pipelines on Airflow. You can observe your pipelines with built in metadata search and column level lineage. Finally, if you have existing workflows in AbInitio, Informatica or other ETL formats that you want to move to the cloud, you can import them automatically into Prophecy making them run productively on Spark. Create your free account today at dataengineeringpodcast.com/prophecy. The only thing worse than having bad data is not knowing that you have it. With Bigeye’s data observability platform, if there is an issue with your data or data pipelines you’ll know right away and can get it fixed before the business is impacted. Bigeye let’s data teams measure, improve, and communicate the quality of your data to company stakeholders. With complete API access, a user-friendly interface, and automated yet flexible alerting, you’ve got everything you need to establish and maintain trust in your data. Go to dataengineeringpodcast.com/bigeye today to sign up and start trusting your analyses. Your host is Tobias Macey and today I’m interviewing Matt Harrison about useful tips for using Pandas for data engineering projects

Interview

Introduction How did you get involved in the area of data management? What are the main tasks that you have seen Pandas used for in a data engineering context? What are some of the common mistakes that can lead to poor performance when scaling to large data sets? What are some of the utility features that you have found most helpful for data processing? One of the interesting add-ons to Pandas is its integration with Arrow. What are some of the considerations for how and when to use the Arrow capabilities vs. out-of-the-box Pandas? Pandas is a tool that spans data processing and data science. What are some of the ways that data engineers should think about writing their code to make it accessible to data scientists for supporting collaboration across data workflows? Pandas is often used for transformation logic. What are some of the ways that engineers should approach the design of their code to make it understandable and maint

In this episode, I interviewed Kyle Pastor (aka @DataStuffPlus 70K followers on Instagram). We chatted about how Kyle got started with data, why he runs his Instagram, and why he does fun data projects.

When breaking into data, it’s always important to have a portfolio of projects to show off, and who knows, these projects could turn into businesses, job offers, or sponsorship opportunities.

You can follow Kyle’s writing and tutorials on his Medium.

Also, don’t miss Kyle’s data viz Instagram.

Want a free guide to get your data journey started? Get a free data roadmap here.

Ready to jumpstart your data career? Try the #21DaysToData Challenge.

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

What Is Causal Inference?

Causal inference lies at the heart of our ability to understand why things happen by helping us predict the results of our actions. This process is vital for businesses that aspire to turn data and information into valuable knowledge. With this report, data scientists and analysts will learn a principled way of thinking about causality, using a suite of causal inference techniques now available. Authors Hugo Bowne-Anderson, a data science consultant, and Mike Loukides, vice president of content strategy at O'Reilly Media, introduce causality and discuss randomized control trials (RCTs), key aspects of causal graph theory, and much-needed techniques from econometrics. You'll explore: Techniques from econometrics, including randomized control trials, the causality gold standard used in A/B-testing The constant-effects model for dealing with all things not being equal across the groups you're comparing Regression for dealing with confounding variables and selection bias Instrumental variables to estimate causal relationships in situations where regression won't work Techniques from causal graph theory including forks and colliders, the graphical tools for representing common causal patterns Backdoor and front-door adjustments for making causal inferences in the presence of confounders

As we enter the new year—it seems like we’re telescoping into the future of work. Companies embracing remote work, the great resignation putting pressure on teams to create more fulfilling roles—signals an expanding opportunity for applicants to find their dream roles in data science, but also for hiring managers to create awesome candidate experiences. 

Today’s guests, Nick Singh, and Kevin Huo, authors of Ace The Data Science Interview, discuss how aspiring data scientists and data scientists can stand out from their crowd—and what hiring managers need to change to win over talent today. 

Join us as we discuss:

How to wow recruiters and hiring managers with your resumeThe type of skills aspiring data scientists need to show on the job huntThe value of direct email over job listingsWhat recruiters and hiring managers need to change in an evolving job market

Relevant links from the interview:

Ace the Data Science InterviewFollow Nick Singh on LinkedInFollow Kevin Huo on LinkedInNoah Gift’s Appearance on DataFramedSign up to gain early access to gain DataCamp Talent—DataCamp’s portal for data science jobs

We talked about:

Alexey’s background Being a principal data scientist DataTalks.Club The beginning and growth of DataTalks.Club Sustaining the pace Types of talks Popular and favorite talks Making DataTalks.Club self-sufficient Alexey’s book and course Advice for people starting in data science and staying motivated Not keeping up to date with new tools Staying productive Learning technical subjects and keeping notes Inspiration and idea generation for DataTalks.Club

Links:

https://eugeneyan.com/writing/informal-mentors-alexey-grigorev/ 

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Data science and machine learning are integral parts of most large-scale product manufacturing processes and are used to understand customer needs, detect quality issues, automate repetitive tasks and optimise supply chains. It’s an invisible glue that helps us produce more things for less, and in a timely fashion. To learn more about this fascinating topic, I recently spoke to Ranga Ramesh who is Senior Director, Quality Innovation and Transformation at Georgia-Pacific. Georgia-Pacific is one of the world’s largest manufacturers of consumer paper products and uses AI technologies throughout their manufacturing process. In this episode of Leaders of Analytics, we explore how computer vision and machine learning can be used to classify tissue paper softness and instantly detect quality issues that could otherwise render large volumes of product useless. Ranga’s work is featured as a case study in our recently published book, Demystifying AI for the Enterprise.

Data science and machine learning are continuing to evolve as core capabilities across many industries. But high-quality data science output is only half the story. As the data science profession matures from “back office support” to leading from the front, there is an increasing need for more integrated systems that plug into business operations. To get the most out of these capabilities, organisations must move beyond just building robust models, and establish operational processes that can produce, implement and maintain machine learning systems at scale. Enter MLOps. To understand the fundamentals and best practices of MLOps, I recently spoke to Shalini Kurapati who is CEO of Clearbox.ai. Clearbox AI is the data-centric MLOps company that enables trustworthy and human-centred AI. Their AI Control Room automatically produces synthetic data and insights to solve the issues related to data quality, data access and sharing, and privacy aspects that block AI adoption in companies. In this episode of Leaders of Analytics, we cover: What MLOps is and why we need it to succeed with advanced data science solutionsHow to get beyond the proof-of-concept-to-production gap and get models into operationThe importance of data-centric AI in building MLOps best practicesThe most common AI pitfalls to avoidHow Human Centred Design principles can be used to build AI for good, and much more.Check out Clearbox here: https://clearbox.ai/ Connect with Shalini here: https://www.linkedin.com/in/shalini-kurapati-phd-she-her-06516324/

We talked about:

Mariano’s background Typical day of a manager Becoming a manager Preparing for the transition Balancing projects and assumptions Search and recommendations Dealing with unfamiliar domains Structuring projects Connecting product and data science Rules of Machine Learning CRISP-DM and deployment Giving feedback Dealing with people leaving the team Doing technical work as a manager Dealing with bad hires Keeping up with the industry

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Installing and Configuring IBM Db2 AI for IBM z/OS v1.4.0

Artificial intelligence (AI) enables computers and machines to mimic the perception, learning, problem-solving, and decision-making capabilities of the human mind. AI development is made possible by the availability of large amounts of data and the corresponding development and wide availability of computer systems that can process all that data faster and more accurately than humans can. What happens if you infuse AI with a world-class database management system, such as IBM Db2®? IBM® has done just that with Db2 AI for z/OS (Db2ZAI). Db2ZAI is built to infuse AI and data science to assist businesses in the use of AI to develop applications more easily. With Db2ZAI, the following benefits are realized: Data science functionality Better built applications Improved database performance (and DBA's time and efforts are saved) through simplification and automation of error reporting and routine tasks Machine learning (ML) optimizer to improve query access paths and reduce the need for manual tuning and query optimization Integrated data access that makes data available from various vendors including private cloud providers. This IBM Redpaper® publication helps to simplify your installation by tailoring and configuration of Db2 AI for z/OS®. It was written for system programmers, system administrators, and database administrators.

Welcome to 2022! 🎉 Thank you so much for listening! In this episode, I review 2021, discuss goals, and introduce a new challenge!

Check out The 21 Days To Data Challenge: https://www.datacareerjumpstart.com/Challenge

New Data Career Podcast episodes EVERY Monday morning

Here’s what I did in 2021:

Quit my job Snow Data Science Consulted for 15 businesses Ran 50 miles, 60k elevation = 11 peaks Ran a marathon Sold a house, bought a house Interned with the Utah Jazz Graduated with masters from Georgia Tech 20 days with youth group in Dominican Republic Launched Data Career Jumpstart

Please subscribe to the podcast, and leave us a review! It means the world to me!

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Numerical Methods Using Java: For Data Science, Analysis, and Engineering

Implement numerical algorithms in Java using NM Dev, an object-oriented and high-performance programming library for mathematics.You’ll see how it can help you easily create a solution for your complex engineering problem by quickly putting together classes. Numerical Methods Using Java covers a wide range of topics, including chapters on linear algebra, root finding, curve fitting, differentiation and integration, solving differential equations, random numbers and simulation, a whole suite of unconstrained and constrained optimization algorithms, statistics, regression and time series analysis. The mathematical concepts behind the algorithms are clearly explained, with plenty of code examples and illustrations to help even beginners get started. What You Will Learn Program in Java using a high-performance numerical library Learn the mathematics for a wide range of numerical computing algorithms Convert ideas and equations into code Put together algorithms and classes to build your own engineering solution Build solvers for industrial optimization problems Do data analysis using basic and advanced statistics Who This Book Is For Programmers, data scientists, and analysts with prior experience with programming in any language, especially Java.

Data Science in Engineering and Management

This book brings insight into Data Science and offers applications and implementation strategies. It includes recent developments and future trends and covers the concept of Data Science along with its origin. It focuses on the mechanisms of extracting data along with classifications, architectural concepts, and predictive analysis.

Send us a text Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next.

Abstract Making Data Simple Podcast is hosted by Al Martin, VP, IBM Expert Services Delivery, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun. This week on Making Data Simple, we have Benn Stancil, Chief Analytics Officer + Founder @ Mode. Benn is an accomplished data analyst with deep expertise in collaborative Business Intelligence and Interactive Data Science. Benn is Co-founder, President, and Chief  Analytics Officer of Mode, an award-winning SaaS company that combines the best elements of Business Intelligence (ABI), Data Science (DS) and Machine Learning (ML) to empower data teams to answer impactful questions and collaborate on analysis across a range of business functions. Under Benn’s leadership, the Mode platform has evolved to enable data teams to explore, visualize, analyze and share data in a powerful end-to-end workflow. Prior to founding Mode, Benn served in senior Analytics positions at Microsoft and Yammer, and worked as a  researcher for the International Economics Program at the Carnegie Endowment for International Peace. Benn also served as an Undergraduate Research Fellow at Wake Forest University,  where he received his B.S. in Mathematics and Economics. Benn believes in fostering a shared sense of humility and gratitude.

Show Notes 1:22 – Benn’s history 7:09 – Tell us how you got to where you are today 9:14 – Tell us about Mode 12:08 – What is your definition of the Chief Analytics Officer? 21:53 – Why do we need another BI tool? 24:09 – What’s your secret sauce? 27:48 – Where did the name Mode come from? 28:41 – How do we use Mode? 31:08 – What is you goto market strategy?  32:38 – Any client references? 34:58 – “The missing piece in the modern data stack” tell us about this Mode  Email: [email protected] [email protected] Twitter: benn stancil Connect with the Team Producer Kate Brown - LinkedIn. Producer Steve Templeton - LinkedIn. Host Al Martin - LinkedIn and Twitter.  Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

We talked about

Geo’s background Technical Product Manager Building ML platform Working on internal projects Prioritizing the backlog Defining the problems Observability metrics Avoiding jumping into “solution mode” Breaking down the problem Important skills for product managers The importance of a technical background Data Lead vs Staff Data Scientist vs Data PM Approvals and rollout Engineering/platform teams Data scientists’ role in the engineering team Scrum and Agile in data science Transitioning from Data Scientist to Technical PM Books to read for the transition Transitioning for non-technical people Doing user research Quality assurance in ML Advice for supporting an ML team as a Scrum master

Links:

Geo's LinkedIn: https://www.linkedin.com/in/geojolly/ Product School community: https://productschool.com/ http://theleanstartup.com/  Netflix CPO Medium blog: https://gibsonbiddle.medium.com/ Glovo is hiring: https://jobs.glovoapp.com/en/?d=4040726002

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Access For Dummies

Become a database boss —and have fun doing it—with this accessible and easy-to-follow guide to Microsoft Access Databases hold the key to organizing and accessing all your data in one convenient place. And you don’t have to be a data science wizard to build, populate, and organize your own. With Microsoft Access For Dummies, you’ll learn to use the latest version of Microsoft’s Access software to power your database needs. Need to understand the essentials before diving in? Check out our Basic Training in Part 1 where we teach you how to navigate the Access workspace and explore the foundations of databases. Ready for more advanced tutorials? Skip right to the sections on Data Management, Queries, or Reporting where we walk you through Access’s more sophisticated capabilities. Not sure if you have Access via Office 2021 or Office 365? No worries – this book covers Access now matter how you access it. The book also shows you how to: Handle the most common problems that Access users encounter Import, export, and automatically edit data to populate your next database Write powerful and accurate queries to find exactly what you’re looking for, exactly when you need it Microsoft Access For Dummies is the perfect resource for anyone expected to understand, use, or administer Access databases at the workplace, classroom, or any other data-driven destination.

My guest on this episode of Leaders of Analytics is Kate Strachnyi. Kate is a well-known figure in the global data community. She is a master educator and prolific content creator who has built an online community of almost 200,000 followers. Through the DATAcated brand she runs online training, seminars, conferences, expos and podcasts while connecting data professionals across the world. She is also the author of four books in the data science genre and a marathon runner. I recently caught up with Kate to learn more about what it takes to keep up with the fast-paced and ever-evolving world of data and analytics. In this episode we discuss: The most important data science skills in the next 5-10 yearsThe most underrated skill in data scienceHow to make your day productive and enjoyableCareer advice for someone starting out in data science todayMinting NFTs for the global data community, and much moreYou can find more from Kate here: DATAcated: https://datacated.com/ LinkedIn: https://www.linkedin.com/in/kate-strachnyi-data/

We talked about:

CJ’s background Evolutionary biology Learning machine learning Learning on the job and being honest with what you don’t know Convincing that you will be useful CJ’s first interview Transitioning to industry Tailoring your CV Data science courses Moving to Berlin Being selective vs ‘spray and pray’ Moving on to new jobs Plan for transitioning to industry Requirements for getting hired Publications, portfolios and pet projects Adjusting to industry Bad habits from academia Topics with long-term value CJ’s textbook

Links:

CJ's LinkedIn: https://www.linkedin.com/in/christina-jenkins/ Positions for master students: one two

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Já imaginou que desafios um time de dados tem ao disruptar o mercado imobiliário? Que tipo de dados eles usam e como coletam? Que produtos eles criam? E o que é o tal do Data BizDev? Isso e muito mais histórias curiosas você vai conhecer no episódio de hoje, em que convidamos o time de dados da Loft para contar como é o dia a dia de profissionais de dados nesse unicórnio brasileiro. Com a gente hoje está Guilherme Marmerola — Senior Data Science Manager — , Renata Nobre — Group Product Manager — , e Daniel Scalli — Head of Data Science. Confira esse episódio que está sensacional.

Nossos convidados Linkedin do Daniel Scalli Linkedin do Guilherme Marmerola Linkedin da Renata Nobre Acesse o post no Medium para ter acesso as referências do episódio: https://medium.com/data-hackers/como-a-loft-utiliza-dados-para-reinventar-o-mercado-imobili%C3%A1rio-data-hackers-podcast-50-9358ded1d7f0