Python

144: Why Should You Build Projects as a Data Analyst (Thu Vu’s Story)

2025-01-21 · Data Career Podcast: Helping You Land a Data Analyst Job FAST Listen

podcast_episode

by Thu Vu (Thu Vu (personal brand)) , Avery Smith

AI/ML Analytics Data Analytics Data Science LLM

How do you make data analytics fun and engaging? In this episode, I chat with YouTube sensation Thu Vu. We discuss Python's growing significance, trends in the data job market, plus get a sneak peek into her new initiative, Python for AI Projects. 💌 Join 10k+ aspiring data analysts & get my tips in your inbox weekly 👉 https://www.datacareerjumpstart.com/newsletter 🆘 Feeling stuck in your data journey? Come to my next free "How to Land Your First Data Job" training 👉 https://www.datacareerjumpstart.com/training 👩‍💻 Want to land a data job in less than 90 days? 👉 https://www.datacareerjumpstart.com/daa 👔 Ace The Interview with Confidence 👉 https://www.datacareerjumpstart.com/interviewsimulator ⌚ TIMESTAMPS 05:54 - Creating cool projects with Local LLMs 13:48 - Learning and Teaching Python for AI 24:09 - Trends in Data and Tech Job Market 🔗 CONNECT WITH THU VU 🎥 YouTube Channel: https://www.youtube.com/@Thuvu5 🤝 LinkedIn: https://www.linkedin.com/in/thu-hien-vu-3766b174/ 📸 Instagram: https://www.instagram.com/thuvu.analytics/ 🎵 TikTok: https://www.tiktok.com/@thuvu.datanalytics 💻 Website: https://thuhienvu.com/ Free Data Science & AI tips thu-vu.ck.page/49c5ee08f6 Master Python for AI projects python-course-earlybird.framer.website 🔗 CONNECT WITH AVERY 🎥 YouTube Channel: https://www.youtube.com/@averysmith 🤝 LinkedIn: https://www.linkedin.com/in/averyjsmith/ 📸 Instagram: https://instagram.com/datacareerjumpstart 🎵 TikTok: https://www.tiktok.com/@verydata 💻 Website: https://www.datacareerjumpstart.com/ Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Writing a custom scikit-learn estimator

2025-01-14 · January Members Talk evening

talk

by Tamara Atanasoska (:probably..)

scikit-learn

Scikit-learn is a popular machine learning library. It currently has over 200 estimators ready to use for a vast array of use cases. What if you are working on something special that still hasn't found its way into the library? Scikit-learn offers a way to write new compatible estimators, which can be seamlessly integrated with the rest of the library. We will look into what an estimator is, what API that scikit-learn estimators have, reasons why you would like to implement your own and an example of how to. We will end with real-world examples of how other OSS projects use this for their needs.

Contributing to OpenSource - how to get started in 5 minutes!

2025-01-14 · January Members Talk evening

talk

by Stefanie Senger (:probabl.)

open source scikit-learn

This talk will introduce scikit-learn users to the new API for metadata routing, a feature introduced in the recent releases and almost fully available since version 1.5 (released in May 2024).

CSVs Will Never Die And OneSchema Is Counting On It

2025-01-13 · Data Engineering Podcast Listen

podcast_episode

by Andrew Luo (OneSchema) , Tobias Macey

AI/ML CRM CSV Data Engineering Data Management Datafold SQL

Summary In this episode of the Data Engineering Podcast Andrew Luo, CEO of OneSchema, talks about handling CSV data in business operations. Andrew shares his background in data engineering and CRM migration, which led to the creation of OneSchema, a platform designed to automate CSV imports and improve data validation processes. He discusses the challenges of working with CSVs, including inconsistent type representation, lack of schema information, and technical complexities, and explains how OneSchema addresses these issues using multiple CSV parsers and AI for data type inference and validation. Andrew highlights the business case for OneSchema, emphasizing efficiency gains for companies dealing with large volumes of CSV data, and shares plans to expand support for other data formats and integrate AI-driven transformation packs for specific industries.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Your host is Tobias Macey and today I'm interviewing Andrew Luo about how OneSchema addresses the headaches of dealing with CSV data for your businessInterview IntroductionHow did you get involved in the area of data management?Despite the years of evolution and improvement in data storage and interchange formats, CSVs are just as prevalent as ever. What are your opinions/theories on why they are so ubiquitous?What are some of the major sources of CSV data for teams that rely on them for business and analytical processes?The most obvious challenge with CSVs is their lack of type information, but they are notorious for having numerous other problems. What are some of the other major challenges involved with using CSVs for data interchange/ingestion?Can you describe what you are building at OneSchema and the story behind it?What are the core problems that you are solving, and for whom?Can you describe how you have architected your platform to be able to manage the variety, volume, and multi-tenancy of data that you process?How have the design and goals of the product changed since you first started working on it?What are some of the major performance issues that you have encountered while dealing with CSV data at scale?What are some of the most surprising things that you have learned about CSVs in the process of building OneSchema?What are the most interesting, innovative, or unexpected ways that you have seen OneSchema used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on OneSchema?When is OneSchema the wrong choice?What do you have planned for the future of OneSchema?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links OneSchemaEDI == Electronic Data InterchangeUTF-8 BOM (Byte Order Mark) CharactersSOAPCSV RFCIcebergSSIS == SQL Server Integration ServicesMS AccessDatafusionJSON SchemaSFTP == Secure File Transfer ProtocolThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

#74 Hello 2025! OpenAI’s O3, Deep Seek V3, Bolt.new and Doom Goes Artsy

2025-01-09 · DataTopics: All Things Data, AI & Tech Listen

podcast_episode

AI/ML DynamoDB LLM Pydantic

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society. Dive into conversations that flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style! In this episode, we explore: OpenAI’s O3: Features, O1 Comparison, Release Date & more.Advent of Code: How LLMs performed on the 2024 coding challenges.DeepSeek V3: A breakthrough AI model developed for a fraction of GPT-4’s cost, yet rivaling top benchmarks.Shadow Workspace: How Cursor compares to Copilot with features like integrated models, documentation, and search.Bolt.new: Why it’s poised to revolutionize web app development with prompt-driven innovation.O1 Preview’s Chess Hack: When smarter means “cheater” in a fascinating experiment against Stockfish.Pydantic AI: A new tool bringing structure and intelligence to Python’s AI workflows.RightTyper: A tool to infer and apply type hints for cleaner, more efficient Python code.Doom: The Gallery Experience: A whimsical take on art appreciation in a retro gaming environment.Suno V4: The next-gen music generator, featuring "Bart, the Data Dynamo."Ghostty Terminal: The terminal emulator developers are raving about.

Breaking Down Data Silos: AI and ML in Master Data Management

2025-01-03 · Data Engineering Podcast Listen

podcast_episode

by Dan Bruckner (Tamr) , Tobias Macey

AI/ML Collibra Data Engineering Data Management Datafold LLM Master Data Management

Summary In this episode of the Data Engineering Podcast Dan Bruckner, co-founder and CTO of Tamr, talks about the application of machine learning (ML) and artificial intelligence (AI) in master data management (MDM). Dan shares his journey from working at CERN to becoming a data expert and discusses the challenges of reconciling large-scale organizational data. He explains how data silos arise from independent teams and highlights the importance of combining traditional techniques with modern AI to address the nuances of data reconciliation. Dan emphasizes the transformative potential of large language models (LLMs) in creating more natural user experiences, improving trust in AI-driven data solutions, and simplifying complex data management processes. He also discusses the balance between using AI for complex data problems and the necessity of human oversight to ensure accuracy and trust.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. As a listener of the Data Engineering Podcast you clearly care about data and how it affects your organization and the world. For even more perspective on the ways that data impacts everything around us don't miss Data Citizens® Dialogues, the forward-thinking podcast brought to you by Collibra. You'll get further insights from industry leaders, innovators, and executives in the world's largest companies on the topics that are top of mind for everyone. In every episode of Data Citizens® Dialogues, industry leaders unpack data’s impact on the world; like in their episode “The Secret Sauce Behind McDonald’s Data Strategy”, which digs into how AI-driven tools can be used to support crew efficiency and customer interactions. In particular I appreciate the ability to hear about the challenges that enterprise scale businesses are tackling in this fast-moving field. The Data Citizens Dialogues podcast is bringing the data conversation to you, so start listening now! Follow Data Citizens Dialogues on Apple, Spotify, YouTube, or wherever you get your podcasts.Your host is Tobias Macey and today I'm interviewing Dan Bruckner about the application of ML and AI techniques to the challenge of reconciling data at the scale of businessInterview IntroductionHow did you get involved in the area of data management?Can you start by giving an overview of the different ways that organizational data becomes unwieldy and needs to be consolidated and reconciled?How does that reconciliation relate to the practice of "master data management"What are the scaling challenges with the current set of practices for reconciling data?ML has been applied to data cleaning for a long time in the form of entity resolution, etc. How has the landscape evolved or matured in recent years?What (if any) transformative capabilities do LLMs introduce?What are the missing pieces/improvements that are necessary to make current AI systems usable out-of-the-box for data cleaning?What are the strategic decisions that need to be addressed when implementing ML/AI techniques in the data cleaning/reconciliation process?What are the risks involved in bringing ML to bear on data cleaning for inexperienced teams?What are the most interesting, innovative, or unexpected ways that you have seen ML techniques used in data resolution?What are the most interesting, unexpected, or challenging lessons that you have learned while working on using ML/AI in master data management?When is ML/AI the wrong choice for data cleaning/reconciliation?What are your hopes/predictions for the future of ML/AI applications in MDM and data cleaning?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links TamrMaster Data ManagementCERNLHCMichael StonebrakerConway's LawExpert SystemsInformation RetrievalActive LearningThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Julia Quick Syntax Reference: A Pocket Guide for Data Science Programming

2025-01-03 · O'Reilly Data Science Books O'Reilly Amazon

book

by Antonello Lobianco

AI/ML API Data Science data data-science

Learn the Julia programming language as quickly as possible. This book is a must-have reference guide that presents the essential Julia syntax in a well-organized format, updated with the latest features of Julia’s APIs, libraries, and packages. This book provides an introduction that reveals basic Julia structures and syntax; discusses data types, control flow, functions, input/output, exceptions, metaprogramming, performance, and more. Additionally, you'll learn to interface Julia with other programming languages such as R for statistics or Python. At a more applied level, you will learn how to use Julia packages for data analysis, numerical optimization, symbolic computation, and machine learning, and how to present your results in dynamic documents. The Second Edition delves deeper into modules, environments, and parallelism in Julia. It covers random numbers, reproducibility in stochastic computations, and adds a section on probabilistic analysis. Finally, it provides forward-thinking introductions to AI and machine learning workflows using BetaML, including regression, classification, clustering, and more, with practical exercises and solutions for self-learners. What You Will Learn Work with Julia types and the different containers for rapid development Use vectorized, classical loop-based code, logical operators, and blocks Explore Julia functions: arguments, return values, polymorphism, parameters, anonymous functions, and broadcasts Build custom structures in Julia Use C/C++, Python or R libraries in Julia and embed Julia in other code. Optimize performance with GPU programming, profiling and more. Manage, prepare, analyse and visualise your data with DataFrames and Plots Implement complete ML workflows with BetaML, from data coding to model evaluation, and more. Who This Book Is For Experienced programmers who are new to Julia, as well as data scientists who want to improve their analysis or try out machine learning algorithms with Julia.

Episode 214: Advent of Code in BQN (vs Python)

2024-12-27 · ADSP: Algorithms + Data Structures = Programs Listen

podcast_episode

by Conor Hoekstra , Bryce Adelstein Lelbach (NVIDIA) , Ben Deane

C++ GitHub

In this episode, Conor and Ben chat about different approaches to solving Advent of Code problems in BQN, Python and more. Link to Episode 214 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Socials ADSP: The Podcast: TwitterConor Hoekstra: Twitter | BlueSky | MastodonBen Deane: Twitter | BlueSkyShow Notes Date Generated: 2024-12-16 Date Released: 2024-12-27 Advent of Code 2024Conor's AoC Video PlaylistAPLBQNPython functools.cacheTo Mock a MockingbirdCppNorth 2024 Keynote: Advent of Code, Behind the Scenes - Eric WastlBQN ‿∘ (Computed Reshape)BQN •ParseFloatBQN •BQNBQN ⍟ (repeat)C++20 std::views::iotaHaskell iteratePython Counter collectionBQN /⁼ Indices Inverse Histogram IdiomBQN AoC 2024 LeaderboardIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

Building a Data Vision Board: A Guide to Strategic Planning

2024-12-23 · Data Engineering Podcast Listen

podcast_episode

by Lior Barak , Tobias Macey

AI/ML Data Engineering Data Management Datafold KPI

Summary In this episode of the Data Engineering Podcast Lior Barak shares his insights on developing a three-year strategic vision for data management. He discusses the importance of having a strategic plan for data, highlighting the need for data teams to focus on impact rather than just enablement. He introduces the concept of a "data vision board" and explains how it can help organizations outline their strategic vision by considering three key forces: regulation, stakeholders, and organizational goals. Lior emphasizes the importance of balancing short-term pressures with long-term strategic goals, quantifying the cost of data issues to prioritize effectively, and maintaining the strategic vision as a living document through regular reviews. He encourages data teams to shift from being enablers to impact creators and provides practical advice on implementing a data vision board, setting clear KPIs, and embracing a product mindset to create tangible business impacts through strategic data management.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementIt’s 2024, why are we still doing data migrations by hand? Teams spend months—sometimes years—manually converting queries and validating data, burning resources and crushing morale. Datafold's AI-powered Migration Agent brings migrations into the modern era. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today to learn how Datafold can automate your migration and ensure source to target parity. Your host is Tobias Macey and today I'm interviewing Lior Barak about how to develop your three year strategic vision for dataInterview IntroductionHow did you get involved in the area of data management?Can you start by giving an outline of the types of problems that occur as a result of not developing a strategic plan for an organization's data systems?What is the format that you recommend for capturing that strategic vision?What are the types of decisions and details that you believe should be included in a vision statement?Why is a 3 year horizon beneficial? What does that scale of time encourage/discourage in the debate and decision-making process?Who are the personas that should be included in the process of developing this strategy document?Can you walk us through the steps and processes involved in developing the data vision board for an organization?What are the time-frames or milestones that should lead to revisiting and revising the strategic objectives?What are the most interesting, innovative, or unexpected ways that you have seen a data vision strategy used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on data strategy development?When is a data vision board the wrong choice?What are some additional resources or practices that you recommend teams invest in as a supplement to this strategic vision exercise?Contact Info LinkedInSubstackParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links Vision Board OverviewEpisode 397: Defining A Strategy For Your Data ProductsMinto Pyramid PrincipleKPI == Key Performance IndicatorOKR == Objectives and Key ResultsPhil Jackson: Eleven Rings (affiliate link)The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

9 Hidden Data Visualization Tricks to Transform Your Visuals using Plotly library in Python

2024-12-20 · Data & AI with Mukundan | Learn AI by Building Listen

podcast_episode

by Mukundan Sankar

AI/ML BI DataViz Matplotlib Plotly Seaborn

Key Takeaways: 1. Why Plotly is a Game-Changer Unlike Matplotlib or Seaborn, Plotly offers interactive and dynamic visualizations that are perfect for storytelling.Unlock powerful features that go beyond basic bar charts or scatter plots.2. 9 Hidden Plotly Tricks: Custom Pairwise Correlation Matrix: Add annotations and custom color scales for deeper insights.Dynamic Data Highlighting: Like Excel, conditional formatting but on steroids.Density Contours: Visualize class distribution and clustering with ease.Faceted Histograms: Compare subgroups in a single view.Threshold Lines: Emphasize decision boundaries effectively.Custom Annotations: Turn visuals into storytelling tools.3D Scatter Plots: Explore invisible relationships in 3D.Animated Visualizations: Reveal dynamic patterns over time.Interactive Tooltips: Make charts engaging and informative.3. Real-world Applications Business intelligence, scientific research, and education examples.Techniques aren’t just about aesthetics—they’re about actionable insights.4. Bonus Resources Complete code examples are in the links below: Medium Members: https://medium.com/towards-artificial-intelligence/9-hidden-plotly-tricks-every-data-scientist-needs-to-know-eb7f2181df56Non-Medium Members can read for Free here: https://mukundansankar.substack.com/p/9-hidden-plotly-tricks-every-dataDatasets from the UCI Machine Learning Repository for hands-on practice.https://archive.ics.uci.edu/datasetsTwitter: @sankarmukund475

Snowflake Recipes: A Problem-Solution Approach to Implementing Modern Data Pipelines

2024-12-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by John Eipe , Dillon Dayton

Agile/Scrum AI/ML AWS Azure BigQuery Cloud Computing Data Governance GCP Microsoft Cyber Security Snowflake SQL +3 more

Explore Snowflake’s core concepts and unique features that differentiates it from industry competitors, such as, Azure Synapse and Google BigQuery. This book provides recipes for architecting and developing modern data pipelines on the Snowflake data platform by employing progressive techniques, agile practices, and repeatable strategies. You’ll walk through step-by-step instructions on ready-to-use recipes covering a wide range of the latest development topics. Then build scalable development pipelines and solve specific scenarios common to all modern data platforms, such as, data masking, object tagging, data monetization, and security best practices. Throughout the book you’ll work with code samples for Amazon Web Services, Microsoft Azure, and Google Cloud Platform. There’s also a chapter devoted to solving machine learning problems with Snowflake. Authors Dillon Dayton and John Eipe are both Snowflake SnowPro Core certified, specializing in data and digital services, and understand the challenges of finding the right solution to complex problems. The recipes in this book are based on real world use cases and examples designed to help you provide quality, performant, and secured data to solve business initiatives. What You’ll Learn Handle structured and un- structured data in Snowflake. Apply best practices and different options for data transformation. Understand data application development. Implement data sharing, data governance and security. Who This book Is For Data engineers, scientists and analysts moving into Snowflake, looking to build data apps. This book expects basic knowledge in Cloud (AWS or Azure or GCP), SQL and Python

Essential Data Analytics, Data Science, and AI: A Practical Guide for a Data-Driven World

2024-12-18 · O'Reilly Data Science Books O'Reilly Amazon

book

by Maxine Attobrah

AI/ML Analytics Cloud Computing Data Analytics Data Science ETL/ELT Tableau data data-science

In today’s world, understanding data analytics, data science, and artificial intelligence is not just an advantage but a necessity. This book is your thorough guide to learning these innovative fields, designed to make the learning practical and engaging. The book starts by introducing data analytics, data science, and artificial intelligence. It illustrates real-world applications, and, it addresses the ethical considerations tied to AI. It also explores ways to gain data for practice and real-world scenarios, including the concept of synthetic data. Next, it uncovers Extract, Transform, Load (ETL) processes and explains how to implement them using Python. Further, it covers artificial intelligence and the pivotal role played by machine learning models. It explains feature engineering, the distinction between algorithms and models, and how to harness their power to make predictions. Moving forward, it discusses how to assess machine learning models after their creation, with insights into various evaluation techniques. It emphasizes the crucial aspects of model deployment, including the pros and cons of on-device versus cloud-based solutions. It concludes with real-world examples and encourages embracing AI while dispelling fears, and fostering an appreciation for the transformative potential of these technologies. Whether you’re a beginner or an experienced professional, this book offers valuable insights that will expand your horizons in the world of data and AI. What you will learn: What are Synthetic data and Telemetry data How to analyze data using programming languages like Python and Tableau. What is feature engineering What are the practical Implications of Artificial Intelligence Who this book is for: Data analysts, scientists, and engineers seeking to enhance their skills, explore advanced concepts, and stay up-to-date with ethics. Business leaders and decision-makers across industries are interested in understanding the transformative potential and ethical implications of data analytics and AI in their organizations.

Modern Business Analytics

2024-12-17 · O'Reilly Data Science Books O'Reilly Amazon

book

by Deanne Larson

Analytics Data Science GitHub business-intelligence data data-science

Deriving business value from analytics is a challenging process. Turning data into information requires a business analyst who is adept at multiple technologies including databases, programming tools, and commercial analytics tools. This practical guide shows programmers who understand analysis concepts how to build the skills necessary to achieve business value. Author Deanne Larson, data science practitioner and academic, helps you bridge the technical and business worlds to meet these requirements. You'll focus on developing these skills with R and Python using real-world examples. You'll also learn how to leverage methodologies for successful delivery. Learning methodology combined with open source tools is key to delivering successful business analytics and value. This book shows you how to: Apply business analytics methodologies to achieve successful results Cleanse and transform data using R and Python Use R and Python to complete exploratory data analysis Create predictive models to solve business problems in R and Python Use Python, R, and business analytics tools to handle large volumes of data Commit code to GitHub to collaborate with data engineers and data scientists Measure success in business analytics

How Orchestration Impacts Data Platform Architecture

2024-12-16 · Data Engineering Podcast Listen

podcast_episode

by Hugo Lu , Tobias Macey

AI/ML Collibra Data Engineering Data Management Datafold SQL

Summary The core task of data engineering is managing the flows of data through an organization. In order to ensure those flows are executing on schedule and without error is the role of the data orchestrator. Which orchestration engine you choose impacts the ways that you architect the rest of your data platform. In this episode Hugo Lu shares his thoughts as the founder of an orchestration company on how to think about data orchestration and data platform design as we navigate the current era of data engineering.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementIt’s 2024, why are we still doing data migrations by hand? Teams spend months—sometimes years—manually converting queries and validating data, burning resources and crushing morale. Datafold's AI-powered Migration Agent brings migrations into the modern era. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today to learn how Datafold can automate your migration and ensure source to target parity. As a listener of the Data Engineering Podcast you clearly care about data and how it affects your organization and the world. For even more perspective on the ways that data impacts everything around us don't miss Data Citizens® Dialogues, the forward-thinking podcast brought to you by Collibra. You'll get further insights from industry leaders, innovators, and executives in the world's largest companies on the topics that are top of mind for everyone. In every episode of Data Citizens® Dialogues, industry leaders unpack data’s impact on the world, from big picture questions like AI governance and data sharing to more nuanced questions like, how do we balance offense and defense in data management? In particular I appreciate the ability to hear about the challenges that enterprise scale businesses are tackling in this fast-moving field. The Data Citizens Dialogues podcast is bringing the data conversation to you, so start listening now! Follow Data Citizens Dialogues on Apple, Spotify, YouTube, or wherever you get your podcasts.Your host is Tobias Macey and today I'm interviewing Hugo Lu about the data platform and orchestration ecosystem and how to navigate the available optionsInterview IntroductionHow did you get involved in building data platforms?Can you describe what an orchestrator is in the context of data platforms?There are many other contexts in which orchestration is necessary. What are some examples of how orchestrators have adapted (or failed to adapt) to the times?What are the core features that are necessary for an orchestrator to have when dealing with data-oriented workflows?Beyond the bare necessities, what are some of the other features and design considerations that go into building a first-class dat platform or orchestration system?There have been several generations of orchestration engines over the past several years. How would you characterize the different coarse groupings of orchestration engines across those generational boundaries?How do the characteristics of a data orchestrator influence the overarching architecture of an organization's data platform/data operations?What about the reverse?How have the cycles of ML and AI workflow requirements impacted the design requirements for data orchestrators?What are the most interesting, innovative, or unexpected ways that you have seen data orchestrators used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on data orchestration?When is an orchestrator the wrong choice?What are your predictions and/or hopes for the future of data orchestration?Contact Info MediumLinkedInParting Question From your perspective, what is the biggest thing data teams are missing in the technology today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links OrchestraPrevious Episode: Overview Of The State Of Data OrchestrationCronArgoCDDAGKubernetesData MeshAirflowSSIS == SQL Server Integration ServicesPentahoKettleDataVoloNiFiPodcast EpisodeDagstergRPCCoalescePodcast EpisodedbtDataHubPalantirThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

How to Use Dynamic Topic Modeling to Boost Your Marketing and Strategy

2024-12-13 · Data & AI with Mukundan | Learn AI by Building Listen

podcast_episode

by Mukundan Sankar

AI/ML Marketing

Episode Summary: In this episode, Mukundan simplifies the concept of Dynamic Topic Modeling (DTM) for listeners and discusses its transformative impact on businesses. DTM is a machine learning method used to track the evolution of themes in text data over time. It helps companies to make smarter decisions by staying in tune with customer needs and market trends. Key Topics Covered: Introduction to Dynamic Topic ModelingWhat it is and why it matters for businesses.Real-world examples like customer reviews and social media trends.How Dynamic Topic Modeling WorksOver time, analyze text data (e.g., reviews, surveys, reports).Groups words into topics such as price, quality, or features.Applications of Dynamic Topic ModelingAdjusting marketing strategies to customer priorities.Enhancing product features based on evolving feedback.Predicting and responding to trends like sustainability in physical products.Tracking employee feedback to refine HR strategies and reduce churn.Step-by-Step Guide to Implementing DTMCollecting text data (e.g., reviews, surveys).Using tools like Python or pre-built software for analysis.Generating clear visuals and actionable insights.Benefits for BusinessesUnderstanding customer and employee feedback more effectively.Staying ahead of competitors.Saving time while making informed, data-driven decisions.Call to ActionEncourage listeners to explore DTM to gain a competitive edge.Mukundan invites questions and collaboration via email: mukundansankar.substack.com.Memorable Quotes: "Dynamic Topic Modeling helps businesses turn text data into actionable business strategies.""With DTM, you can stay ahead of competitors by understanding what customers truly care about over time.""It's not just about making decisions but smarter decisions driven by data."Real-Life Examples: Amazon Reviews: How DTM categorizes feedback into price, durability, and other topics.Marketing Adjustments: Shifting focus to features customers prioritize.Trend Analysis: Tracking the rise of sustainability in customer demands.Employee Insights: Using DTM to predict trends in employee satisfaction and churn.Resources Mentioned: Dynamic Topic Modeling Tools: Python and other software solutions for beginners and professionals.Email for Guidance: mukundansankar.substack.com

Building a Visual RAG application with Vespa in Python

2024-12-12 · AI Meetup (December): GenAI, LLMs and RAG

talk

RAG document search pdfs vespa

Thomas will go through the process of building an end-to-end Visual RAG application over PDFs in Vespa, using Python only. The accompanying code is open source, and participants will get valuable insights and tips for building their own Visual RAG applications, and how to make RAG applications scalable and fast. He will also touch on some recent promising directions in document search and retrieval.

Das A in IoT steht für AI (Eine Kurzgeschichte der künstlichen Intelligenz für IoT Profis)

2024-12-12 · [PartnerMeetup]Das A in IoT steht für AI || Potenzial in der IIoT-Visualisierung

talk

by DOMINIK DESCHNER (medialesson GmbH) , TIM STEINER (medialesson GmbH)

IoT ai c# jupyter notebooks ku00fcnstliche intelligenz

Die Anwendung von KI-Modellen wird zunehmend benutzerfreundlicher und für jedermann zugänglich. Dies erweist sich insbesondere in der aufstrebenden Welt des Internet of Things (IoT) als äußerst bedeutsam. Die Verschmelzung dieser beiden Technologien eröffnet faszinierende, neue Möglichkeiten, um bedeutende Erkenntnisse aus den eigenen Daten zu gewinnen. Doch wie es oft der Fall ist, gestaltet sich der Einstieg nicht immer ganz leicht. Aus diesem Grund erklären wir euch in diesem Talk nicht nur die notwendigen Fachbegriffe, sondern erarbeiten die praktisch relevanten Schlüsselkonzepte im Code - ganz ohne höhere Mathematik und geben euch so einen AI-Kickstart mit Python und C#. Also ran an die Jupyter Notebooks und starte mit uns eine Reise in die wundervolle Welt der künstlichen Intelligenz und finde heraus welche Actions und Insights in deinen Sensordaten schlummern!

Data Visualization in R and Python

2024-12-12 · O'Reilly Data Visualization Books O'Reilly Amazon

book

by Marco Cremonini

AI/ML Data Science DataViz Plotly Seaborn programming-languages software-development

Communicate the data that is powering our changing world with this essential text The advent of machine learning and neural networks in recent years, along with other technologies under the broader umbrella of ‘artificial intelligence,’ has produced an explosion in Data Science research and applications. Data Visualization, which combines the technical knowledge of how to work with data and the visual and communication skills required to present it, is an integral part of this subject. The expansion of Data Science is already leading to greater demand for new approaches to Data Visualization, a process that promises only to grow. Data Visualization in R and Python offers a thorough overview of the key dimensions of this subject. Beginning with the fundamentals of data visualization with Python and R, two key environments for data science, the book proceeds to lay out a range of tools for data visualization and their applications in web dashboards, data science environments, graphics, maps, and more. With an eye towards remarkable recent progress in open-source systems and tools, this book offers a cutting-edge introduction to this rapidly growing area of research and technological development. Data Visualization in R and Python readers will also find: Coverage suitable for anyone with a foundational knowledge of R and Python Detailed treatment of tools including the Ggplot2, Seaborn, and Altair libraries, Plotly/Dash, Shiny, and others Case studies accompanying each chapter, with full explanations for data operations and logic for each, based on Open Data from many different sources and of different formats Data Visualization in R and Python is ideal for any student or professional looking to understand the working principles of this key field.

DuckDB: Up and Running

2024-12-12 · O'Reilly Data Science Books O'Reilly Amazon

book

by Wei-Meng Lee

Analytics CSV Data Analytics DuckDB JSON Pandas Parquet Polars SQL data data-science data-science-tools

DuckDB, an open source in-process database created for OLAP workloads, provides key advantages over more mainstream OLAP solutions: It's embeddable and optimized for analytics. It also integrates well with Python and is compatible with SQL, giving you the performance and flexibility of SQL right within your Python environment. This handy guide shows you how to get started with this versatile and powerful tool. Author Wei-Meng Lee takes developers and data professionals through DuckDB's primary features and functions, best practices, and practical examples of how you can use DuckDB for a variety of data analytics tasks. You'll also dive into specific topics, including how to import data into DuckDB, work with tables, perform exploratory data analysis, visualize data, perform spatial analysis, and use DuckDB with JSON files, Polars, and JupySQL. Understand the purpose of DuckDB and its main functions Conduct data analytics tasks using DuckDB Integrate DuckDB with pandas, Polars, and JupySQL Use DuckDB to query your data Perform spatial analytics using DuckDB's spatial extension Work with a diverse range of data including Parquet, CSV, and JSON

atelier d’initiation

2024-12-09 · Journée portes ouvertes du Wagon Paris

workshop

HTML css

Au choix : codez votre première page web avec HTML & CSS ou mettez-vous dans la peau d’un(e) data analyst pour Airbnb avec Python

talk-data.com

Activity Trend

Top Events

Top Speakers

144: Why Should You Build Projects as a Data Analyst (Thu Vu’s Story)

Writing a custom scikit-learn estimator

Contributing to OpenSource - how to get started in 5 minutes!

CSVs Will Never Die And OneSchema Is Counting On It

#74 Hello 2025! OpenAI’s O3, Deep Seek V3, Bolt.new and Doom Goes Artsy

Breaking Down Data Silos: AI and ML in Master Data Management

Julia Quick Syntax Reference: A Pocket Guide for Data Science Programming

Episode 214: Advent of Code in BQN (vs Python)

Building a Data Vision Board: A Guide to Strategic Planning

9 Hidden Data Visualization Tricks to Transform Your Visuals using Plotly library in Python

Snowflake Recipes: A Problem-Solution Approach to Implementing Modern Data Pipelines

Essential Data Analytics, Data Science, and AI: A Practical Guide for a Data-Driven World

Modern Business Analytics

How Orchestration Impacts Data Platform Architecture

How to Use Dynamic Topic Modeling to Boost Your Marketing and Strategy

Building a Visual RAG application with Vespa in Python

Das A in IoT steht für AI (Eine Kurzgeschichte der künstlichen Intelligenz für IoT Profis)

Data Visualization in R and Python

DuckDB: Up and Running

atelier d’initiation