talk-data.com talk-data.com

Topic

Data Analytics

data_analysis statistics insights

760

tagged

Activity Trend

38 peak/qtr
2020-Q1 2026-Q1

Activities

760 activities · Newest first

Essential PySpark for Scalable Data Analytics

Dive into the world of scalable data processing with 'Essential PySpark for Scalable Data Analytics'. This book is a comprehensive guide that helps beginners understand and utilize PySpark to process, analyze, and draw insights from large datasets effectively. With hands-on tutorials and clear explanations, you will gain the confidence to tackle big data analytics challenges. What this Book will help me do Understand and apply the distributed computing paradigm for big data. Learn to perform scalable data ingestion, cleansing, and preparation using PySpark. Create and utilize data lakes and the Lakehouse paradigm for efficient data storage and access. Develop and deploy machine learning models with scalability in mind. Master real-time analytics pipelines and create impactful data visualizations. Author(s) None Nudurupati is an experienced data engineer and educator, specializing in distributed systems and big data technologies. With years of practical experience in the field, None brings a clear and approachable teaching style to technical topics. Passionate about empowering readers, the author has designed this book to be both practical and inspirational for aspiring data practitioners. Who is it for? This book is ideal for data professionals including data scientists, engineers, and analysts looking to scale their data analytics processes. It assumes familiarity with basic data science concepts and Python, as well as some experience with SQL-like data analysis. This is particularly suitable for individuals aiming to expand their knowledge in distributed computing and PySpark to handle big data challenges. Achieving scalable and efficient data solutions is at the core of this guide.

Modern Analytics Platforms

From a global pandemic to extreme weather, the events of 2020 and 2021 have caused organizations to make quick and constant adjustments to their strategy and operations. This transformation is likely to continue and have a major impact on analytics. Not only do responders to Experian's annual Global Data Management survey confirm more demand for data insights, but most of them also believe the lack of agility hurt their organization's responses to fast-changing business needs. With this O'Reilly report, you'll learn how organizations have begun to take new approaches to analytics for business reinvention and digital transformation. Chief analytics and data officers and data analytics, data science, data visualization leaders will explore converged analytics and find out how it differs from legacy and current analytics approaches. You'll see where your organization stands in its journey to convergence--and what you need to do next. This report helps you: Examine how three organizations in different industries and with different objectives have benefited from modern analytics Learn how analytics has evolved to support greater business agility at scale Examine the alignment of people, processes, tools, and data in converged analytics Learn the five stages of analytical competition and six dimensions for benchmarking maturity Explore practices that you can adopt to improve your analytics capabilities and your agility

Beginning Apache Spark 3: With DataFrame, Spark SQL, Structured Streaming, and Spark Machine Learning Library

Take a journey toward discovering, learning, and using Apache Spark 3.0. In this book, you will gain expertise on the powerful and efficient distributed data processing engine inside of Apache Spark; its user-friendly, comprehensive, and flexible programming model for processing data in batch and streaming; and the scalable machine learning algorithms and practical utilities to build machine learning applications. Beginning Apache Spark 3 begins by explaining different ways of interacting with Apache Spark, such as Spark Concepts and Architecture, and Spark Unified Stack. Next, it offers an overview of Spark SQL before moving on to its advanced features. It covers tips and techniques for dealing with performance issues, followed by an overview of the structured streaming processing engine. It concludes with a demonstration of how to develop machine learning applications using Spark MLlib and how to manage the machine learning development lifecycle. This book is packed with practical examples and code snippets to help you master concepts and features immediately after they are covered in each section. After reading this book, you will have the knowledge required to build your own big data pipelines, applications, and machine learning applications. What You Will Learn Master the Spark unified data analytics engine and its various components Work in tandem to provide a scalable, fault tolerant and performant data processing engine Leverage the user-friendly and flexible programming model to perform simple to complex data analytics using dataframe and Spark SQL Develop machine learning applications using Spark MLlib Manage the machine learning development lifecycle using MLflow Who This Book Is For Data scientists, data engineers and software developers.

We talked about:

Rishabh's background Rishabh’s experience  as a sales engineer Prescriptive analytics vs predictive analytics The problem with the term ‘data science’ Is machine learning a part of analytics? Day-to-day of people that work with ML Rule-based systems to machine learning The role of analysts in rule-based systems and in data teams Do data analysts know data better than data scientists? Data analysts’ documentation and recommendations Iterative work - data scientists/ML vs data analysts Analyzing results of experiments Overlaps between machine learning and analytics Using tools to bridge the gap between ML and analytics Do companies overinvest in ML and underinvest in analystics? Do companies hire data scientists while forgetting to hire data analysts? The difficulty of finding senior data analysts Is data science sexier than data analytics? Should ML and data analytics teams work together or independently? Building data teams Rishabh’s newsletter – MLOpsRoundup

Links:

https://mlopsroundup.substack.com/ https://twitter.com/rish_bhargava

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Power Query Cookbook

The "Power Query Cookbook" is your comprehensive guide to mastering data preparation and transformation using Power Query. With this book, you'll learn to connect to data sources, reshape data to fit business requirements, and use both no-code transformations and custom M code solutions to unlock the full potential of your data. Step-by-step examples will guide you through optimizing dataflows in Power BI. What this Book will help me do Master connecting to various data sources and performing intuitive transformations using Power Query. Learn to reshape and enrich data to meet complex business requirements efficiently. Explore advanced capabilities of Power Query, including M code and online dataflows. Develop custom data transformations with a blend of GUI-based and M code techniques. Optimize the performance of Power BI Dataflows using best practices and diagnostics tools. Author(s) None Janicijevic is a seasoned expert in data analytics, specializing in Microsoft Power BI and Power Query. With years of experience in data engineering and a passion for teaching, None brings a clear, actionable, and results-driven approach to demystifying complex technical concepts. Their work empowers professionals with the tools they need to excel in data-driven decision-making. Who is it for? This book is designed for data analysts, business intelligence developers, and data engineers aiming to enhance their skills in data preparation using Power Query. If you have a basic understanding of Power BI and want to delve into integrating and optimizing data from multiple sources, this book is for you. It's ideal for professionals seeking practical insights and techniques to improve data transformations. Novices with some exposure to BI tools will also find the material accessible and rewarding.

We talked to George Firican from Lights on Data about all things data governance and management. Check out their YouTube! 

As always, please leave a review and subscribe to the Data Career Podcast! 

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Join host Avery Smith on this episode of the Data Career Podcast for an exciting 'Ask Avery' session! We cover various topics, including the roles and differences between data analysts, data engineers, and data scientists, as well as transitioning careers, essential skills for data engineering, forecasting techniques, and more.

f you have questions about data visualization, Python, or breaking into data science, this episode has got you covered.

Tune in for valuable insights and professional advice to boost your data career!

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Emily Vu changed her life with a Tweet she thought meant nothing. That tweet ended up being seen by millions, inspiring hundreds, opening dozens of doors, and eventually landed to an internship and full-time job offer from tech giant, Spotify.

In this episode, Emily and I discuss how she used the internet and a personal brand to overcome a non-tech background, to land awesome tech jobs. 

Follow Emily on Twitter: https://twitter.com/emvutweets

Follow Emily on TikTok: https://www.tiktok.com/@itsemvu?lang=en

Check out Emily's custom resume Kofi: https://ko-fi.com/emilyvu/commissions#buyCommissionModal

Want to break into data science? Check out online bootcamp Data Career Jumpstart - https://www.datacareerjumpstart.com - where I help you learn data science, and build a personal brand by focusing on projects and building an online portfolio. 

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Ryan Wade joins us on AOF today to talk about how to use advanced analytics in your organization! Ryan has been in the analytics game for the last 20 years and is now a Senior Solution Consultant at Blue Granite, based in Indianapolis, Indiana. He recently authored the amazing must-read book, Advanced Analytics in Power BI with R and Python, and in today's chat, we get to hear all about why he wrote the book, who it is for and how you can use it to accelerate your data journey! I met Ryan while speaking at a few conferences and was always impressed with his knowledge and great sense of humor! A professional football player turned data scientist, Ryan has a passion for breaking down advanced analytics in a way anyone can understand. Whether you're already using advanced analytics or researching how to get started Ryan's knowledge on the topic will help you. Tune in with a pencil and paper in hand!   In this episode, you'll learn: [0:09:22] The rise of the R and Python programming languages in the data world. [0:16:44] The necessary, well-thought-out preparatory steps for a project utilizing advanced analytics. [0:19:39] Why attention-grabbing visuals are not the most important part of data storytelling! [0:23:13] Creating a sufficient team for data analytics and the vital roles of the database administrator, active directory administrator, and more! [0:39:07] Client conversations around shortcomings and hurdles in advanced analytics.  For full show notes, and the links mentioned visit: https://bibrainz.com/podcast/82   Enjoyed the Show?  Please leave us a review on iTunes.

Foundations of Data Intensive Applications

PEEK “UNDER THE HOOD” OF BIG DATA ANALYTICS The world of big data analytics grows ever more complex. And while many people can work superficially with specific frameworks, far fewer understand the fundamental principles of large-scale, distributed data processing systems and how they operate. In Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood, renowned big-data experts and computer scientists Drs. Supun Kamburugamuve and Saliya Ekanayake deliver a practical guide to applying the principles of big data to software development for optimal performance. The authors discuss foundational components of large-scale data systems and walk readers through the major software design decisions that define performance, application type, and usability. You???ll learn how to recognize problems in your applications resulting in performance and distributed operation issues, diagnose them, and effectively eliminate them by relying on the bedrock big data principles explained within. Moving beyond individual frameworks and APIs for data processing, this book unlocks the theoretical ideas that operate under the hood of every big data processing system. Ideal for data scientists, data architects, dev-ops engineers, and developers, Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood shows readers how to: Identify the foundations of large-scale, distributed data processing systems Make major software design decisions that optimize performance Diagnose performance problems and distributed operation issues Understand state-of-the-art research in big data Explain and use the major big data frameworks and understand what underpins them Use big data analytics in the real world to solve practical problems

Data Science for Marketing Analytics - Second Edition

In 'Data Science for Marketing Analytics', you'll embark on a journey that integrates the power of data analytics with strategic marketing. With a focus on practical application, this guide walks you through using Python to analyze datasets, implement machine learning models, and derive data-driven insights. What this Book will help me do Gain expertise in cleaning, exploring, and visualizing marketing data using Python. Build machine learning models to predict customer behavior and sales outcomes. Leverage unsupervised learning techniques for effective customer segmentation. Compare and optimize predictive models using advanced evaluation methods. Master Python libraries like pandas and Matplotlib for data manipulation and visualization. Author(s) Mirza Rahim Baig, Gururajan Govindan, and Vishwesh Ravi Shrimali combine their extensive expertise in data analytics and marketing to bring you this comprehensive guide. Drawing from years of applying analytics in real-world marketing scenarios, they provide a hands-on approach to learning data science tools and techniques. Who is it for? This book is perfect for marketing professionals and analysts eager to harness the capabilities of Python to enhance their data-driven strategies. It is also ideal for data scientists looking to apply their skills in marketing across various roles. While a basic understanding of data analysis and Python will help, all key concepts are introduced comprehensively for beginners.

Data Analytics Made Easy

By reading "Data Analytics Made Easy," you'll gain a solid understanding of data analysis and visualization without requiring coding skills. This book emphasizes practical knowledge and use cases, covering storytelling, automation, machine learning, and business dashboards with tools like KNIME and Power BI. What this Book will help me do Understand the fundamentals of data analytics and how to leverage data for business insights. Create and automate data workflows using the no-code KNIME Analytics Platform. Develop interactive dashboards and data visualizations with Microsoft Power BI. Learn the basics of machine learning and how to apply models for business use. Enhance presentations and influence decisions through effective data storytelling. Author(s) None De Mauro is an experienced author and professional in the field of data analytics. Passionate about making complex topics approachable, None specializes in explaining technical concepts in simpler terms, ensuring readers can easily grasp and apply them in their work. Who is it for? This book is perfect for professionals or beginners who want to work with and interpret data effectively. Ideal for individuals in business roles or management positions looking to enhance their skills in data analytics and build a foundational understanding of machine learning and visualization.

Ways to learn more from Lillian: 

Data Science for Dummies Launch Party: Data Science For Dummies, 3rd Edition hits the streets in September, 2021 – but not without a proper launch party to celebrate. You’re invited! RSVP here: https://businessgrowth.ai/ The Data Entrepreneur’s Toolkit: A recommendation set for 32 free (or low-cost) tools & processes that'll actually grow your data business (even if you still haven’t put up that website yet!). https://www.data-mania.com/data-entrepreneur-toolkit/ The Data Superhero Quiz: A fun, free 45-second quiz that uncovers the ideal data career path for your personality type and skill set.https://data-mania.com/data-superhero-quiz Weekly Free Trainings: We currently publish 2 free trainings per week on YouTube! https://www.youtube.com/channel/UCK4MGP0A6lBjnQWAmcWBcKQ

Want to break into data science? Check out my new course coming out on August 18th: Data Career Jumpstart - https://www.datacareerjumpstart.com

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Knowledge Graphs

Applying knowledge in the right context is the most powerful lever businesses can use to become agile, creative, and resilient. Knowledge graphs add context, meaning, and utility to business data. They drive intelligence into data for unparalleled automation and visibility into processes, products, and customers. Businesses use knowledge graphs to anticipate downstream effects, make decisions based on all relevant information, and quickly respond to dynamic markets. In this report for chief information and data officers, Jesus Barassa, Amy E. Hodler, and Jim Webber from Neo4j show how to use knowledge graphs to gain insights, reveal a flexible and intuitive representation of complex data relationships, and make better predictions based on holistic information. Explore knowledge graph mechanics and common organizing principles Build and exploit a connected representation of your enterprise data environment Use decisioning knowledge graphs to explore the advantages of adding relationships to data analytics and data science Conduct virtual testing using software versions of real-world processes Deploy knowledge graphs for more trusted data, higher accuracies, and better reasoning for contextual AI

Connect with Florin Badita: https://www.linkedin.com/in/baditaflorin/

Want to break into data science? Check out my new course coming out on August 18th: Data Career Jumpstart - https://www.datacareerjumpstart.com

Subscribe on YouTube: https://www.youtube.com/channel/UCuyfszBAd3gUt9vAbC1dfqA

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Designing Big Data Platforms

DESIGNING BIG DATA PLATFORMS Provides expert guidance and valuable insights on getting the most out of Big Data systems An array of tools are currently available for managing and processing data—some are ready-to-go solutions that can be immediately deployed, while others require complex and time-intensive setups. With such a vast range of options, choosing the right tool to build a solution can be complicated, as can determining which tools work well with each other. Designing Big Data Platforms provides clear and authoritative guidance on the critical decisions necessary for successfully deploying, operating, and maintaining Big Data systems. This highly practical guide helps readers understand how to process large amounts of data with well-known Linux tools and database solutions, use effective techniques to collect and manage data from multiple sources, transform data into meaningful business insights, and much more. Author Yusuf Aytas, a software engineer with a vast amount of big data experience, discusses the design of the ideal Big Data platform: one that meets the needs of data analysts, data engineers, data scientists, software engineers, and a spectrum of other stakeholders across an organization. Detailed yet accessible chapters cover key topics such as stream data processing, data analytics, data science, data discovery, and data security. This real-world manual for Big Data technologies: Provides up-to-date coverage of the tools currently used in Big Data processing and management Offers step-by-step guidance on building a data pipeline, from basic scripting to distributed systems Highlights and explains how data is processed at scale Includes an introduction to the foundation of a modern data platform Designing Big Data Platforms: How to Use, Deploy, and Maintain Big Data Systems is a must-have for all professionals working with Big Data, as well researchers and students in computer science and related fields.

Connect with Dustin Schimek! https://www.linkedin.com/in/dustinschimek/

Want to break into data science? Check out my new course coming out later this summer: Data Career Jumpstart - https://www.datacareerjumpstart.com

Subscribe on YouTube: https://www.youtube.com/channel/UCuyfszBAd3gUt9vAbC1dfqA

Want to leave a question for the Ask Avery Show?

Written Mailbag: https://forms.gle/78zD544drpDAcTRV9 Audio Mailbag: https://anchor.fm/datacareerpodcast/message

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

In this episode, I interview Mark Freeman and talk about how he transitioned from public health to data science! We talk about what worked well in his journey, and what didn't, including a $20,000 investment gone sideways. Mark also gives some amazing LinkedIn job hacks! 

Connect with Mark on LinkedIn: https://www.linkedin.com/in/mafreeman2/ 

Check out opening's at Humu (Mark's company): https://boards.greenhouse.io/humu

Want to break into data science? Check out my new course coming out later this summer: Data Career Jumpstart - https://www.datacareerjumpstart.com

Subscribe on YouTube: https://www.youtube.com/channel/UCuyfszBAd3gUt9vAbC1dfqA

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

In this episode, I talk with Andreas Kretz (https://www.linkedin.com/in/andreas-kretz/) who is an amazing resource for the data engineering community. He runs an incredibly affordable data engineering bootcamp called Learn Data Engineering (https://learndataengineering.com) and also has an extensive YouTube (https://www.youtube.com/channel/UCY8mzqqGwl5_bTpBY9qLMAA). 

We talked about how Andreas got started with data engineering, why he like it so much, and how others can get started. I also share my story of interviewing with Facebook for a data engineering position. 

Want to break into data science? Check out my new course coming out later this summer: Data Career Jumpstart - https://www.datacareerjumpstart.com

Want to leave a question for the Ask Avery Show?

Written Mailbag: https://forms.gle/78zD544drpDAcTRV9

Audio Mailbag: https://anchor.fm/datacareerpodcast/message

Want to be on The Ask Avery Show? Sign up for a spot here:

https://calendly.com/datacareer/ask-avery?month=2021-05

Subscribe on YouTube: https://www.youtube.com/channel/UCuyfszBAd3gUt9vAbC1dfqA

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa