talk-data.com talk-data.com

Topic

Data Science

machine_learning statistics analytics

1516

tagged

Activity Trend

68 peak/qtr
2020-Q1 2026-Q1

Activities

1516 activities · Newest first

Data Science at the Command Line, 2nd Edition

This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You'll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. To get you started, author Jeroen Janssens provides a Docker image packed with over 100 Unix power tools--useful whether you work with Windows, macOS, or Linux. You'll quickly discover why the command line is an agile, scalable, and extensible technology. Even if you're comfortable processing data with Python or R, you'll learn how to greatly improve your data science workflow by leveraging the command line's power. This book is ideal for data scientists, analysts, engineers, system administrators, and researchers. Obtain data from websites, APIs, databases, and spreadsheets Perform scrub operations on text, CSV, HTML, XML, and JSON files Explore data, compute descriptive statistics, and create visualizations Manage your data science workflow Create your own tools from one-liners and existing Python or R code Parallelize and distribute data-intensive pipelines Model data with dimensionality reduction, regression, and classification algorithms Leverage the command line from Python, Jupyter, R, RStudio, and Apache Spark

Ways to learn more from Lillian: 

Data Science for Dummies Launch Party: Data Science For Dummies, 3rd Edition hits the streets in September, 2021 – but not without a proper launch party to celebrate. You’re invited! RSVP here: https://businessgrowth.ai/ The Data Entrepreneur’s Toolkit: A recommendation set for 32 free (or low-cost) tools & processes that'll actually grow your data business (even if you still haven’t put up that website yet!). https://www.data-mania.com/data-entrepreneur-toolkit/ The Data Superhero Quiz: A fun, free 45-second quiz that uncovers the ideal data career path for your personality type and skill set.https://data-mania.com/data-superhero-quiz Weekly Free Trainings: We currently publish 2 free trainings per week on YouTube! https://www.youtube.com/channel/UCK4MGP0A6lBjnQWAmcWBcKQ

Want to break into data science? Check out my new course coming out on August 18th: Data Career Jumpstart - https://www.datacareerjumpstart.com

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Digital transformation 2.0 is upon us! We have spent the last two decades digitising many products, services and processes to create digital experiences that are consistent, reliable and always on. That’s digital transformation 1.0 stuff. The next decade will be all about creating data-driven personalisation at scale. Rather than treating everyone the same in our digital environment, we will increasingly be using customer data to tailor the customer experience to individual customer needs. In this episode of Leaders of Analytics, we hear from Prashant Natarajan, Vice President of Strategy & Products at H2O.ai. Prashant has spent more than 15 years helping organisations to successful digital transformations through his leadership roles in the sphere of technology and AI. He has made it his career to demystify AI and digital transformation for organisations and their staff across multiple industries and continents. In this episode of Leaders of Analytics, we discuss: what’s required to do digital transformation 2.0 successfullyhow to create data-first organisationshow to use AI to take the robot out of humansthe future of automated machine learninghow organisations can ensure that their data science investments deliver actual business outcomesour upcoming book, Demystifying AI for the Enterprise, which Prashant and I have co-authored alongside 5 other domain experts.

Introduction to Statistical and Machine Learning Methods for Data Science

Boost your understanding of data science techniques to solve real-world problems Data science is an exciting, interdisciplinary field that extracts insights from data to solve business problems. This book introduces common data science techniques and methods and shows you how to apply them in real-world case studies. From data preparation and exploration to model assessment and deployment, this book describes every stage of the analytics life cycle, including a comprehensive overview of unsupervised and supervised machine learning techniques. The book guides you through the necessary steps to pick the best techniques and models and then implement those models to successfully address the original business need. No software is shown in the book, and mathematical details are kept to a minimum. This allows you to develop an understanding of the fundamentals of data science, no matter what background or experience level you have.

Knowledge Graphs

Applying knowledge in the right context is the most powerful lever businesses can use to become agile, creative, and resilient. Knowledge graphs add context, meaning, and utility to business data. They drive intelligence into data for unparalleled automation and visibility into processes, products, and customers. Businesses use knowledge graphs to anticipate downstream effects, make decisions based on all relevant information, and quickly respond to dynamic markets. In this report for chief information and data officers, Jesus Barassa, Amy E. Hodler, and Jim Webber from Neo4j show how to use knowledge graphs to gain insights, reveal a flexible and intuitive representation of complex data relationships, and make better predictions based on holistic information. Explore knowledge graph mechanics and common organizing principles Build and exploit a connected representation of your enterprise data environment Use decisioning knowledge graphs to explore the advantages of adding relationships to data analytics and data science Conduct virtual testing using software versions of real-world processes Deploy knowledge graphs for more trusted data, higher accuracies, and better reasoning for contextual AI

Data Science Projects with Python - Second Edition

Data Science Projects with Python offers a hands-on, project-based approach to learning data science using real-world data sets and tools. You will explore data using Python libraries like pandas and Matplotlib, build machine learning models with scikit-learn, and apply advanced techniques like XGBoost and SHAP values. This book equips you to confidently extract insights, evaluate models, and deliver results with clarity. What this Book will help me do Learn to load, clean, and preprocess data using Python and pandas. Build and evaluate predictive models, including logistic regression and random forests. Visualize data effectively using Python libraries like Matplotlib. Master advanced techniques like XGBoost and algorithmic fairness. Communicate data-driven insights to aid decision making in practical scenarios. Author(s) Stephen Klosterman is an experienced data scientist with a strong focus on practical applications of machine learning in business. Combining a rich academic background with hands-on industry experience, he excels at explaining complex concepts in an approachable way. As the author of 'Data Science Projects with Python,' his goal is to provide learners with the skills needed for real-world data science challenges. Who is it for? This book is ideal for beginners in data science and machine learning who have some basic programming knowledge in Python. Aspiring data scientists will benefit from its practical, end-to-end examples. Professionals seeking to expand their skillset in predictive modeling and delivering business insights will find this book invaluable. Some foundation in statistics and programming is recommended.

Connect with Florin Badita: https://www.linkedin.com/in/baditaflorin/

Want to break into data science? Check out my new course coming out on August 18th: Data Career Jumpstart - https://www.datacareerjumpstart.com

Subscribe on YouTube: https://www.youtube.com/channel/UCuyfszBAd3gUt9vAbC1dfqA

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Designing Big Data Platforms

DESIGNING BIG DATA PLATFORMS Provides expert guidance and valuable insights on getting the most out of Big Data systems An array of tools are currently available for managing and processing data—some are ready-to-go solutions that can be immediately deployed, while others require complex and time-intensive setups. With such a vast range of options, choosing the right tool to build a solution can be complicated, as can determining which tools work well with each other. Designing Big Data Platforms provides clear and authoritative guidance on the critical decisions necessary for successfully deploying, operating, and maintaining Big Data systems. This highly practical guide helps readers understand how to process large amounts of data with well-known Linux tools and database solutions, use effective techniques to collect and manage data from multiple sources, transform data into meaningful business insights, and much more. Author Yusuf Aytas, a software engineer with a vast amount of big data experience, discusses the design of the ideal Big Data platform: one that meets the needs of data analysts, data engineers, data scientists, software engineers, and a spectrum of other stakeholders across an organization. Detailed yet accessible chapters cover key topics such as stream data processing, data analytics, data science, data discovery, and data security. This real-world manual for Big Data technologies: Provides up-to-date coverage of the tools currently used in Big Data processing and management Offers step-by-step guidance on building a data pipeline, from basic scripting to distributed systems Highlights and explains how data is processed at scale Includes an introduction to the foundation of a modern data platform Designing Big Data Platforms: How to Use, Deploy, and Maintain Big Data Systems is a must-have for all professionals working with Big Data, as well researchers and students in computer science and related fields.

We talked about:

Ben’s Background Building solutions for customers Why projects don’t make it to production Why do people choose overcomplicated solutions? The dangers of isolating data science from the business unit The importance of being able to explain things Maximizing chances of making into production The IKEA effect Risks of implementing novel algorithms If it can be done simply – do that first Don’t become the guinea pig for someone’s white paper The importance of stat skills and coding skills Structuring an agile team for ML work Timeboxing research Mentoring Ben’s book ‘Uncool techniques’ at AI-First companies Should managers learn data science? Do data scientists need to specialize to be successful?

Links:

Ben's book: https://www.manning.com/books/machine-learning-engineering-in-action (get 35% off with code "ctwsummer21")

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Connect with Dustin Schimek! https://www.linkedin.com/in/dustinschimek/

Want to break into data science? Check out my new course coming out later this summer: Data Career Jumpstart - https://www.datacareerjumpstart.com

Subscribe on YouTube: https://www.youtube.com/channel/UCuyfszBAd3gUt9vAbC1dfqA

Want to leave a question for the Ask Avery Show?

Written Mailbag: https://forms.gle/78zD544drpDAcTRV9 Audio Mailbag: https://anchor.fm/datacareerpodcast/message

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

In this episode, I interview Mark Freeman and talk about how he transitioned from public health to data science! We talk about what worked well in his journey, and what didn't, including a $20,000 investment gone sideways. Mark also gives some amazing LinkedIn job hacks! 

Connect with Mark on LinkedIn: https://www.linkedin.com/in/mafreeman2/ 

Check out opening's at Humu (Mark's company): https://boards.greenhouse.io/humu

Want to break into data science? Check out my new course coming out later this summer: Data Career Jumpstart - https://www.datacareerjumpstart.com

Subscribe on YouTube: https://www.youtube.com/channel/UCuyfszBAd3gUt9vAbC1dfqA

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Advanced Analytics with Transact-SQL: Exploring Hidden Patterns and Rules in Your Data

Learn about business intelligence (BI) features in T-SQL and how they can help you with data science and analytics efforts without the need to bring in other languages such as R and Python. This book shows you how to compute statistical measures using your existing skills in T-SQL. You will learn how to calculate descriptive statistics, including centers, spreads, skewness, and kurtosis of distributions. You will also learn to find associations between pairs of variables, including calculating linear regression formulas and confidence levels with definite integration. No analysis is good without data quality. Advanced Analytics with Transact-SQL introduces data quality issues and shows you how to check for completeness and accuracy, and measure improvements in data quality over time. The book also explains how to optimize queries involving temporal data, such as when you search for overlapping intervals. More advanced time-oriented information in the book includes hazard and survival analysis. Forecasting with exponential moving averages and autoregression is covered as well. Every web/retail shop wants to know the products customers tend to buy together. Trying to predict the target discrete or continuous variable with few input variables is important for practically every type of business. This book helps you understand data science and the advanced algorithms use to analyze data, and terms such as data mining, machine learning, and text mining. Key to many of the solutions in this book are T-SQL window functions. Author Dejan Sarka demonstrates efficient statistical queries that are based on window functions and optimized through algorithms built using mathematical knowledge and creativity. The formulas and usage of those statistical procedures are explained so you can understand and modify the techniques presented. T-SQL is supported in SQL Server,Azure SQL Database, and in Azure Synapse Analytics. There are so many BI features in T-SQL that it might become your primary analytic database language. If you want to learn how to get information from your data with the T-SQL language that you already are familiar with, then this is the book for you. What You Will Learn Describe distribution of variables with statistical measures Find associations between pairs of variables Evaluate the quality of the data you are analyzing Perform time-series analysis on your data Forecast values of a continuous variable Perform market-basket analysis to predict customer purchasing patterns Predict target variable outcomes from one or more input variables Categorize passages of text by extracting and analyzing keywords Who This Book Is For Database developers and database administrators who want to translate their T-SQL skills into the world of business intelligence (BI) and data science. For readers who want to analyze large amounts of data efficiently by using their existing knowledge of T-SQL and Microsoft’s various database platforms such as SQL Server and Azure SQL Database. Also for readers who want to improve their querying by learning new and original optimization techniques.

In this episode, I talk with Andreas Kretz (https://www.linkedin.com/in/andreas-kretz/) who is an amazing resource for the data engineering community. He runs an incredibly affordable data engineering bootcamp called Learn Data Engineering (https://learndataengineering.com) and also has an extensive YouTube (https://www.youtube.com/channel/UCY8mzqqGwl5_bTpBY9qLMAA). 

We talked about how Andreas got started with data engineering, why he like it so much, and how others can get started. I also share my story of interviewing with Facebook for a data engineering position. 

Want to break into data science? Check out my new course coming out later this summer: Data Career Jumpstart - https://www.datacareerjumpstart.com

Want to leave a question for the Ask Avery Show?

Written Mailbag: https://forms.gle/78zD544drpDAcTRV9

Audio Mailbag: https://anchor.fm/datacareerpodcast/message

Want to be on The Ask Avery Show? Sign up for a spot here:

https://calendly.com/datacareer/ask-avery?month=2021-05

Subscribe on YouTube: https://www.youtube.com/channel/UCuyfszBAd3gUt9vAbC1dfqA

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

In this episode, I talk to Matt Blasa (https://www.linkedin.com/in/mblasa/) about how he does data science freelancing. We also talk about online portfolios, data governance, and why he posts on LinkedIn. Enjoy!

🎙 PLEASE FOLLOW & SUBSCRIBE TO THE POD 

Want to break into data science? Check out my new course coming out later this summer: Data Career Jumpstart - https://www.datacareerjumpstart.com

Want to leave a question for the Ask Avery Show?

Written Mailbag: https://forms.gle/78zD544drpDAcTRV9

Audio Mailbag: https://anchor.fm/datacareerpodcast/message

Want to be on The Ask Avery Show? Sign up for a spot here:

https://calendly.com/datacareer/ask-avery?month=2021-05

Subscribe on YouTube: https://www.youtube.com/channel/UCuyfszBAd3gUt9vAbC1dfqA

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Essentials of Data Science and Analytics

Data science and analytics have emerged as the most desired fields in driving business decisions. Using the techniques and methods of data science, decision makers can uncover hidden patterns in their data, develop algorithms and models that help improve processes and make key business decisions. Data science is a data driven decision making approach that uses several different areas and disciplines with a purpose of extracting insights and knowledge from structured and unstructured data. The algorithms and models of data science along with machine learning and predictive modeling are widely used in solving business problems and predicting future outcomes. This book combines the key concepts of data science and analytics to help you gain a practical understanding of these fields. The four different sections of the book are divided into chapters that explain the core of data science. Given the booming interest in data science, this book is timely and informative.

We talked about:

Andreas’s background Why data engineering is becoming more popular Who to hire first – a data engineer or a data scientist? How can I, as a data scientist, learn to build pipelines? Don’t use too many tools What is a data pipeline and why do we need it? What is ingestion? Can just one person build a data pipeline? Approaches to building data pipelines for data scientists Processing frameworks Common setup for data pipelines — car price prediction Productionizing the model with the help of a data pipeline Scheduling Orchestration Start simple Learning DevOps to implement data pipelines How to choose the right tool Are Hadoop, Docker, Cloud necessary for a first job/internship? Is Hadoop still relevant or necessary? Data engineering academy How to pick up Cloud skills Avoid huge datasets when learning Convincing your employer to do data science How to find Andreas

Links:

LinkedIn: https://www.linkedin.com/in/andreas-kretz Data engieering cookbook: https://cookbook.learndataengineering.com/ Course: https://learndataengineering.com/

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Matt Francsis (https://www.linkedin.com/in/matthewfrancsis/) joined the show today and talked about his journey from geology to data science. He talked about how a data science bootcamp and a "stepping-stone" job that utilized his geology background, ended up helping him break into the field completely. 

Want to break into data science? Check out my new course coming out later this summer: Data Career Jumpstart - https://www.datacareerjumpstart.com

Want to leave a question for the Ask Avery Show?

Written Mailbag: https://forms.gle/78zD544drpDAcTRV9

Audio Mailbag: https://anchor.fm/datacareerpodcast/message

Want to be on The Ask Avery Show? Sign up for a spot here:

https://calendly.com/datacareer/ask-avery?month=2021-05

Watch The Ask Avery Show Live Tuesday’s at 8PM: https://www.datacareerjumpstart.com/AskAvery

Add The Ask Avery Show to your calendar: https://calendar.google.com/calendar/ical/c_u2rk36mj5mgqg5g42glm9a741c%40group.calendar.google.com/public/basic.ics

Subscribe on YouTube: https://www.youtube.com/channel/UCuyfszBAd3gUt9vAbC1dfqA

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Data quality has become a much discussed topic in the fields of data engineering and data science, and it has become clear that data validation is absolutely crucial to ensuring the reliability of any data products and insights produced by an organization’s data pipelines. This session will outline patterns for combining three popular open source tools in the data ecosystem - dbt, Airflow, and Great Expectations - and use them to build a robust data pipeline with data validation at each critical step.

At Near we work on TBs of Location data with close to real time modelling to generate key consumer insights and estimates for our clients across the globe. We have hundreds of country specific models deployed and managed through airflow to achieve this goal. Some of the workflows that we have deployed our schedule based, some are dynamic and some are trigger based. In this session I would be discussing some of the workflows that are being scheduled and monitored using airflow and the key benefits and also the challenges that we have faced in our production systems.

In this talk, we present Viewflow, an open-source Airflow-based framework that allows data scientists to create materialized views in SQL, R, and Python without writing Airflow code. We will start by explaining what problem does Viewflow solve: writing and maintaining complex Airflow code instead of focusing on data science. Then we will see how Viewflow solves that problem. We will continue by showing how to use VIewflow with several real-world examples. Finally, we will see what the upcoming features of Viewflow are! Resources: Announcement blog post: https://medium.com/datacamp-engineering/viewflow-fe07353fa068 GitHub repo: https://github.com/datacamp/viewflow