Python

Streamlit for Data Science - Second Edition

2023-09-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Tyler Richards

AI/ML Cloud Computing Data Science DataViz LLM NumPy Pandas data data-science

Streamlit for Data Science is your complete guide to mastering the creation of powerful, interactive data-driven applications using Python and Streamlit. With this comprehensive resource, you'll learn everything from foundational Streamlit skills to advanced techniques like integrating machine learning models and deploying apps to cloud platforms, enabling you to significantly enhance your data science toolkit. What this Book will help me do Master building interactive applications using Streamlit, including techniques for user interfaces and integrations. Develop visually appealing and functional data visualizations using Python libraries in Streamlit. Learn to integrate Streamlit applications with machine learning frameworks and tools like Hugging Face and OpenAI. Understand and apply best practices to deploy Streamlit apps to cloud platforms such as Streamlit Community Cloud and Heroku. Improve practical Python skills through implementing end-to-end data applications and prototyping data workflows. Author(s) Tyler Richards, the author of Streamlit for Data Science, is a senior data scientist with in-depth practical experience in building data-driven applications. With a passion for Python and data visualization, Tyler leverages his knowledge to help data professionals craft effective and compelling tools. His teaching approach combines clarity, hands-on exercises, and practical relevance. Who is it for? This book is written for data scientists, engineers, and enthusiasts who use Python and want to create dynamic data-driven applications. With a focus on those who have some familiarity with Python and libraries like Pandas or NumPy, it assists readers in building on their knowledge by offering tailored guidance. Perfect for those looking to prototype data projects or enhance their programming toolkit.

FiftyOne Workshop: Basics and Hands-on Introduction

2023-09-27 · Sept 2023 - Getting Started with FiftyOne Computer Vision Workshop

workshop

by Dan Gural (Voxel51)

computer vision fiftyone

A 90-minute hands-on workshop covering FiftyOne basics, workflows to explore and curate data, and how FiftyOne represents unstructured computer vision data, followed by a practical session to load datasets from the FiftyOne Dataset Zoo, navigate the FiftyOne App, programmatically inspect attributes of a dataset, add new samples and custom attributes, generate and evaluate model predictions, and save insightful views into the data.

FiftyOne Workshop: Basics and Hands-on Introduction

2023-09-27 · Sept 2023 - Getting Started with FiftyOne Computer Vision Workshop

workshop

by Dan Gural (Voxel51)

computer vision dataset zoo fiftyone fiftyone app

Two-part workshop led by Dan Gural. Part 1 covers FiftyOne basics (terms, architecture, installation, general usage), an overview of useful workflows to explore, understand, and curate data, and how FiftyOne represents and semantically slices unstructured computer vision data. Part 2 is a hands-on introduction to FiftyOne: loading datasets from the FiftyOne Dataset Zoo, navigating the FiftyOne App, programmatically inspecting attributes, adding new samples and custom attributes, generating and evaluating model predictions, and saving insightful views into the data. Prerequisites: working knowledge of Python and basic computer vision. Attendees will gain access to tutorials, videos, and code examples used in the workshop.

FiftyOne Workshop: FiftyOne Basics and Hands-on Introduction

2023-09-27 · Sept 2023 - Getting Started with FiftyOne Computer Vision Workshop

workshop

by Dan Gural (Voxel51)

fiftyone

A 90-minute hands-on workshop on leveraging the FiftyOne computer vision toolset. Part 1 covers FiftyOne Basics (terms, architecture, installation, and general usage) and an overview of useful workflows to explore, understand, and curate your data, plus how FiftyOne represents and semantically slices unstructured computer vision data. Part 2 provides a hands-on introduction to FiftyOne: loading datasets from the FiftyOne Dataset Zoo, navigating the FiftyOne App, programmatically inspecting attributes, adding new samples and custom attributes, generating and evaluating model predictions, and saving insightful views into the data. Prerequisites: working knowledge of Python and basic computer vision.

FiftyOne Workshop: Introduction and Hands-on Session

2023-09-27 · Sept 2023 - Getting Started with FiftyOne Computer Vision Workshop

workshop

by Dan Gural (Voxel51)

computer vision fiftyone

A 90-minute hands-on workshop by Dan Gural (machine learning engineer, Voxel51) covering FiftyOne basics, useful workflows to explore and curate data, and how FiftyOne represents and semantically slices unstructured computer vision data, followed by a hands-on session to load datasets from the FiftyOne Dataset Zoo, navigate the FiftyOne App, inspect attributes programmatically, add samples and custom attributes, generate and evaluate model predictions, and save insightful views. Prerequisites: working knowledge of Python. Attendees get access to the tutorials, videos, and code examples used in the workshop.

Powering Vector Search With Real Time And Incremental Vector Indexes

2023-09-25 · Data Engineering Podcast Listen

podcast_episode

by Louis Brandy , Tobias Macey

AI/ML Analytics BI CI/CD Cloud Computing CSV Data Engineering Data Management Data Quality Datafold dbt Git +7 more

Summary

The rapid growth of machine learning, especially large language models, have led to a commensurate growth in the need to store and compare vectors. In this episode Louis Brandy discusses the applications for vector search capabilities both in and outside of AI, as well as the challenges of maintaining real-time indexes of vector data.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! If you’re a data person, you probably have to jump between different tools to run queries, build visualizations, write Python, and send around a lot of spreadsheets and CSV files. Hex brings everything together. Its powerful notebook UI lets you analyze data in SQL, Python, or no-code, in any combination, and work together with live multiplayer and version control. And now, Hex’s magical AI tools can generate queries and code, create visualizations, and even kickstart a whole analysis for you – all from natural language prompts. It’s like having an analytics co-pilot built right into where you’re already doing your work. Then, when you’re ready to share, you can use Hex’s drag-and-drop app builder to configure beautiful reports or dashboards that anyone can use. Join the hundreds of data teams like Notion, AllTrails, Loom, Mixpanel and Algolia using Hex every day to make their work more impactful. Sign up today at dataengineeringpodcast.com/hex to get a 30-day free trial of the Hex Team plan! Your host is Tobias Macey and today I'm interviewing Louis Brandy about building vector indexes in real-time for analytics and AI applications

Interview

Introduction How did you get involved in the area of data management? Can you describe what vector search is and how it differs from other search technologies?

What are the technical challenges related to providing vector search? What are the applications for vector search that merit the added complexity?

Vector databases have been gaining a lot of attention recently with the proliferation of LLM applicati

Building Linked Data Products With JSON-LD

2023-09-17 · Data Engineering Podcast Listen

podcast_episode

by Brian Platz , Tobias Macey

AI/ML Analytics BI CI/CD Cloud Computing CSV Data Engineering Data Management Data Quality Datafold dbt Git +6 more

Summary

A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! If you’re a data person, you probably have to jump between different tools to run queries, build visualizations, write Python, and send around a lot of spreadsheets and CSV files. Hex brings everything together. Its powerful notebook UI lets you analyze data in SQL, Python, or no-code, in any combination, and work together with live multiplayer and version control. And now, Hex’s magical AI tools can generate queries and code, create visualizations, and even kickstart a whole analysis for you – all from natural language prompts. It’s like having an analytics co-pilot built right into where you’re already doing your work. Then, when you’re ready to share, you can use Hex’s drag-and-drop app builder to configure beautiful reports or dashboards that anyone can use. Join the hundreds of data teams like Notion, AllTrails, Loom, Mixpanel and Algolia using Hex every day to make their work more impactful. Sign up today at dataengineeringpodcast.com/hex to get a 30-day free trial of the Hex Team plan! Your host is Tobias Macey and today I'm interviewing Brian Platz about using JSON-LD for building linked-data products

Interview

Introduction How did you get involved in the area of data management? Can you describe what the term "linked data product" means and some examples of when you might build one?

What is the overlap between knowledge graphs and "linked data products"?

What is JSON-LD?

What are the domains in which it is typically used? How does it assist in developing linked data products?

what are the characterist

Learning Data Science

2023-09-15 · O'Reilly Data Science Books O'Reilly Amazon

book

by Sam Lau , Joseph Gonzalez , Deborah Nolan

Data Collection Data Science Pandas data data-science

As an aspiring data scientist, you appreciate why organizations rely on data for important decisions—whether it's for companies designing websites, cities deciding how to improve services, or scientists discovering how to stop the spread of disease. And you want the skills required to distill a messy pile of data into actionable insights. We call this the data science lifecycle: the process of collecting, wrangling, analyzing, and drawing conclusions from data. Learning Data Science is the first book to cover foundational skills in both programming and statistics that encompass this entire lifecycle. It's aimed at those who wish to become data scientists or who already work with data scientists, and at data analysts who wish to cross the "technical/nontechnical" divide. If you have a basic knowledge of Python programming, you'll learn how to work with data using industry-standard tools like pandas. Refine a question of interest to one that can be studied with data Pursue data collection that may involve text processing, web scraping, etc. Glean valuable insights about data through data cleaning, exploration, and visualization Learn how to use modeling to describe the data Generalize findings beyond the data

75: Want to be a Data Analyst? Learn These Skills w/ Luke Barousse

2023-09-13 · Data Career Podcast: Helping You Land a Data Analyst Job FAST Listen

podcast_episode

by Avery Smith , Luke Barousse

AI/ML Analytics BI Data Analytics Power BI SQL

What skills should you learn when studying to be a Data Analyst?

Join me with data legend Luke Barousse to discuss where you should focus your time.

Is it Python? Is it SQL? Is it Excel? Is it Power BI?

Listen to find out 👀

Connect with Luke Barousse:

🤝 Connect on Linkedin

▶️ Subscribe on Youtube

📊 Datanerd.tech

📩 Get my weekly email with helpful data career tips

📊 Come to my next free “How to Land Your First Data Job” training

🏫 Check out my 10-week data analytics bootcamp

Timestamps:

(03:42) - Analyzing 1.2M data jobs (DataNerd.tech)

(06:21) - The most important data skills

(12:13) - More senior skills

(22:52) - Data job titles

Connect with Avery:

📺 Subscribe on YouTube

🎙Listen to My Podcast

👔 Connect with me on LinkedIn

📸 Instagram

🎵 TikTok

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Episode 146: 🇸🇮 SRT23 - Algorithms, BQN's Superpowers & More!

2023-09-08 · ADSP: Algorithms + Data Structures = Programs Listen

podcast_episode

by Conor Hoekstra , Bryce Adelstein Lelbach (NVIDIA)

C++ GitHub

In this episode, Conor and Bryce record live from Italy while driving to Venice and chat about improvements to our parallel std::unique implementation, essential data structures, our favorite algorithms revisited and BQN’s superpowers. Link to Episode 146 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachShow Notes Date Recorded: 2023-06-21 Date Released: 2023-09-08 C++11 std::uniquethrust::uniquethrust::inclusive_scanC++17 std::transform_reduceHaskell’s outerProductC++17 std::reduceC++17 std::inclusive_scanNVIDIA cucollections (cuco)HyperLogLogC++23 std::views::chunk_byCTCI: Cracking the coding interview by Gayle Laakmann McDowellBigOCheatSheet.comPython listPython setPython dictionary (hashmap)Python collectionsPython sortedcollectionsBQN ⁼ (undo)BQN / (indices)J :. (obverse)BQN ⌾ (under)CombinatoryLogic.comPsi Combinator:BQN ○ (atop)Haskell’s onHaskell groupByIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

Eliminate The Overhead In Your Data Integration With The Open Source dlt Library

2023-09-04 · Data Engineering Podcast Listen

podcast_episode

by Tobias Macey , Adrian Brudaru (dlthub)

Airbyte Analytics BI CI/CD Cloud Computing Data Engineering Data Management Data Quality Datafold dbt ETL/ELT Fivetran +7 more

Summary

Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Your host is Tobias Macey and today I'm interviewing Adrian Brudaru about dlt, an open source python library for data loading

Interview

Introduction How did you get involved in the area of data management? Can you describe what dlt is and the story behind it?

What is the problem you want to solve with dlt? Who is the target audience?

The obvious comparison is with systems like Singer/Meltano/Airbyte in the open source space, or Fivetran/Matillion/etc. in the commercial space. What are the complexities or limitations of those tools that leave an opening for dlt? Can you describe how dlt is implemented? What are the benefits of building it in Python? How have the design and goals of the project changed since you first started working on it? How does that language choice influence the performance and scaling characteristics? What problems do users solve with dlt? What are the interfaces available for extending/customizing/integrating with dlt? Can you talk through the process of adding a new source/destination? What is the workflow for someone building a pipeline with dlt? How does the experience scale when supporting multiple connections? Given the limited scope of extract and load, and the composable design of dlt it seems like a purpose built companion to dbt (down to th

Python Data Analytics: With Pandas, NumPy, and Matplotlib

2023-09-01 · O'Reilly Data Science Books O'Reilly Amazon

book

by Fabio Nelli

AI/ML Analytics Data Analytics DataViz JavaScript Keras Matplotlib NumPy Pandas PyTorch Scikit-learn TensorFlow +3 more

Explore the latest Python tools and techniques to help you tackle the world of data acquisition and analysis. You'll review scientific computing with NumPy, visualization with matplotlib, and machine learning with scikit-learn. This third edition is fully updated for the latest version of Python and its related libraries, and includes coverage of social media data analysis, image analysis with OpenCV, and deep learning libraries. Each chapter includes multiple examples demonstrating how to work with each library. At its heart lies the coverage of pandas, for high-performance, easy-to-use data structures and tools for data manipulation Author Fabio Nelli expertly demonstrates using Python for data processing, management, and information retrieval. Later chapters apply what you've learned to handwriting recognition and extending graphical capabilities with the JavaScript D3 library. Whether you are dealing with sales data, investment data, medical data, web page usage, or other data sets, Python Data Analytics, Third Edition is an invaluable reference with its examples of storing, accessing, and analyzing data. What You'll Learn Understand the core concepts of data analysis and the Python ecosystem Go in depth with pandas for reading, writing, and processing data Use tools and techniques for data visualization and image analysis Examine popular deep learning libraries Keras, Theano,TensorFlow, and PyTorch Who This Book Is For Experienced Python developers who need to learn about Pythonic tools for data analysis

Building Statistical Models in Python

2023-08-31 · O'Reilly Data Science Books O'Reilly Amazon

book

by Huy Hoang Nguyen , Stuart J Miller , Paul N Adams

Data Science data data-science data-science-tasks statistics

Building Statistical Models in Python is your go-to guide for mastering statistical modeling techniques using Python. By reading this book, you will explore how to use Python libraries like stats models and others to tackle tasks such as regression, classification, and time series analysis. What this Book will help me do Develop a deep practical knowledge of statistical concepts and their implementation in Python. Create regression and classification models to solve real-world problems. Gain expertise analyzing time series data and generating valuable forecasts. Learn to perform hypothesis verification to interpret data correctly. Understand survival analysis and apply it in various industry scenarios. Author(s) Huy Hoang Nguyen, Paul N Adams, and Stuart J Miller bring their extensive expertise in data science and Python programming to the table. With years of professional experience in both industry and academia, they aim to make statistical modeling approachable and applicable. Combining technical depth with hands-on coding, their goal is to ensure readers not only understand the theory but also gain confidence in its application. Who is it for? This book is tailored for beginners and intermediate programmers seeking to learn statistical modeling without a prerequisite in mathematics. It's ideal for data analysts, data scientists, and Python enthusiasts who want to leverage statistical models to gain insights from data. With this book, you will journey from the basics to advanced applications, making it perfect for those who aim to master statistical analysis.

Hands-on with FiftyOne

2023-08-30 · Aug 2023 - Getting Started with FiftyOne Computer Vision Workshop

workshop

by Leila Kaneda (Voxel51)

dataset zoo fiftyone fiftyone app

A hands-on introduction to FiftyOne: load datasets from the FiftyOne Dataset Zoo, navigate the FiftyOne App, programmatically inspect attributes of a dataset, add new samples and custom attributes to a dataset, generate and evaluate model predictions, and save insightful views into the data.

FiftyOne Basics

2023-08-30 · Aug 2023 - Getting Started with FiftyOne Computer Vision Workshop

workshop

by Leila Kaneda (Voxel51)

fiftyone

An introduction to FiftyOne basics: terms, architecture, installation, and general usage, plus an overview of useful workflows to explore, understand, and curate your data, and how FiftyOne represents and semantically slices unstructured computer vision data.

FiftyOne Workshop: Basics and Hands-on Intro

2023-08-30 · Aug 2023 - Getting Started with FiftyOne Computer Vision Workshop

workshop

by Leila Kaneda (Voxel51)

computer vision fiftyone

A 90-minute hands-on workshop led by Leila Kaneda, Machine Learning Engineer at Voxel51. Part 1 covers FiftyOne Basics (terms, architecture, installation, and general usage) and an overview of useful workflows to explore, understand, and curate your data, plus how FiftyOne represents and semantically slices unstructured computer vision data. The second half is a hands-on introduction to FiftyOne: loading datasets from the FiftyOne Dataset Zoo, navigating the FiftyOne App, programmatically inspecting attributes of a dataset, adding new samples and custom attributes, generating and evaluating model predictions, and saving insightful views into the data. Prerequisites: working knowledge of Python.

Some detailed examples of Langchain Chains and Agents

2023-08-29 · Chains and Agents in Langchain

talk

agents langchain

There will be some Python, but Python is kind of a neutral language and I'm not overly concerned about coding details.

Mastering Tableau 2023 - Fourth Edition

2023-08-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Marleen Meier

AI/ML Analytics BI Data Governance Data Quality DataViz ETL/ELT Tableau data data-science data-science-tasks data-visualization

This comprehensive book on Tableau 2023 is your practical guide to mastering data visualization and business intelligence techniques. You will explore the latest features of Tableau, learn how to create insightful dashboards, and gain proficiency in integrating analytics and machine learning workflows. By the end, you'll have the skills to address a variety of analytics challenges using Tableau. What this Book will help me do Master the latest Tableau 2023 features and use cases to tackle analytics challenges. Develop and implement ETL workflows using Tableau Prep Builder for optimized data preparation. Integrate Tableau with programming languages such as Python and R to enhance analytics. Create engaging, visually impactful dashboards for effective data storytelling. Understand and apply data governance to ensure data quality and compliance. Author(s) Marleen Meier is an experienced data visualization expert and Tableau consultant with over a decade of experience helping organizations transform data into actionable insights. Her approach integrates her technical expertise and a keen eye for design to make analytics accessible rather than overwhelming. Her passion for teaching others to use visualization tools effectively shines through in her writing. Who is it for? This book is ideal for business analysts, BI professionals, or data analysts looking to enhance their Tableau expertise. It caters to both newcomers seeking to understand the foundations of Tableau and experienced users aiming to refine their skills in advanced analytics and data visualization. If your goal is to leverage Tableau as a strategic tool in your organization's BI projects, this book is for you.

Building An Internal Database As A Service Platform At Cloudflare

2023-08-28 · Data Engineering Podcast Listen

podcast_episode

by Vignesh Ravichandran (Cloudflare) , Tobias Macey

Analytics BI CI/CD Cloud Computing Cloudflare Data Engineering Data Management Data Quality Datafold dbt Modern Data Stack SaaS +3 more

Summary

Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low latency and high uptime services at global scale. This is an interesting and insightful look at pragmatic engineering for reliability and scale.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! Your host is Tobias Macey and today I'm interviewing Vignesh Ravichandran about building an internal database as a service platform at Cloudflare

Interview

Introduction How did you get involved in the area of data management? Can you start by describing the different database workloads that you have at Cloudflare?

What are the different methods that you have used for managing database instances?

What are the requirements and constraints that you had to account for in designing your current system? Why Postgres? optimizations for Postgres

simplification from not supporting multiple engines

limitations in postgres that make multi-tenancy challenging scale of operation (data volume, request rate What are the most interesting, innovative, or unexpected ways that you have seen your DBaaS used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on your internal database platform? When is an internal database as a service the wrong choice? What do you have planned for the future of Postgres hosting at Cloudflare?

Contact Info

LinkedIn Website

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Mac

Democratizing Causality - Aleksander Molak

2023-08-25 · DataTalks.Club Listen

podcast_episode

by Aleksander Molak

AI/ML GitHub HTML LLM MLOps NLP

We talked about:

Aleksander's background Aleksander as a Causal Ambassador Using causality to make decisions Counterfactuals and and Judea Pearl Meta-learners vs classical ML models Average treatment effect Reducing causal bias, the super efficient estimator, and model uplifting Metrics for evaluating a causal model vs a traditional ML model Is the added complexity of a causal model worth implementing? Utilizing LLMs in causal models (text as outcome) Text as treatment and style extraction The viability of A/B tests in causal models Graphical structures and nonparametric identification Aleksander's resource recommendations

Links:

The Book of Why: https://amzn.to/3OZpvBk Causal Inference and Discovery in Python: https://amzn.to/46Pperr Book's GitHub repo: https://github.com/PacktPublishing/Causal-Inference-and-Discovery-in-Python The Battle of Giants: Causality vs NLP (PyData Berlin 2023): https://www.youtube.com/watch?v=Bd1XtGZhnmw New Frontiers in Causal NLP (papers repo): https://bit.ly/3N0TFTL

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

talk-data.com

Activity Trend

Top Events

Top Speakers

Streamlit for Data Science - Second Edition

FiftyOne Workshop: Basics and Hands-on Introduction

FiftyOne Workshop: Basics and Hands-on Introduction

FiftyOne Workshop: FiftyOne Basics and Hands-on Introduction

FiftyOne Workshop: Introduction and Hands-on Session

Powering Vector Search With Real Time And Incremental Vector Indexes

Building Linked Data Products With JSON-LD

Learning Data Science

75: Want to be a Data Analyst? Learn These Skills w/ Luke Barousse

Episode 146: 🇸🇮 SRT23 - Algorithms, BQN's Superpowers & More!

Eliminate The Overhead In Your Data Integration With The Open Source dlt Library

Python Data Analytics: With Pandas, NumPy, and Matplotlib

Building Statistical Models in Python

Hands-on with FiftyOne

FiftyOne Basics

FiftyOne Workshop: Basics and Hands-on Intro

Some detailed examples of Langchain Chains and Agents

Mastering Tableau 2023 - Fourth Edition

Building An Internal Database As A Service Platform At Cloudflare

Democratizing Causality - Aleksander Molak