Dataflow

Keynote by Lisa Amini- What’s Next in AI for Data and Data Management?

2025-12-09 · PyData Boston 2025 Watch

talk

AI/ML Data Management LLM RAG

Advances in large language models (LLMs) have propelled a recent flurry of AI tools for data management and operations. For example, AI-powered code assistants leverage LLMs to generate code for dataflow pipelines. RAG pipelines enable LLMs to ground responses with relevant information from external data sources. Data agents leverage LLMs to turn natural language questions into data-driven answers and actions. While challenges remain, these advances are opening exciting new opportunities for data scientists and engineers. In this talk, we will examine recent advances, along with some still incubating in research labs, with the goal of understanding where this is all heading, and present our perspective on what’s next for AI in data management and data operations.

The Agentic Future: How Streaming is Evolving for AI w/ Tyler Akidau

2025-10-15 · The Joe Reis Show Listen

podcast_episode

by Tyler Akidau , Joe Reis (DeepLearning.AI)

AI/ML Analytics Data Streaming

The world of data is being reset by AI, and the infrastructure needs to evolve with it. I sit down with streaming legend Tyler Akidau to discuss how the principles of stream processing are forming the foundation for the next generation of "agentic AI" systems. Tyler, who was an AI cynic until recently, explains why he's now convinced that AI agents will fundamentally change how businesses operate and what problems we need to solve to deploy them safely. Key topics we explore: From Human Analytics to Agentic Systems: How data architectures built for human analysis must be re-imagined for a world with thousands of AI agents operating at machine speed.Auditing Everything: Why managing AI requires a new level of governance where we must record all data an agent touches, not just metadata, to diagnose its complex and opaque behaviorThe End of Windowing's Dominance: Tyler reflects on the influential Dataflow paper he co-authored and explains why he now sees a table-based abstraction as a more powerful and user-friendly model than focusing on windowing.The D&D Alignment of AI: Tyler's brilliant analogy for why enterprises are struggling to adopt AI: we're trying to integrate "chaotic" agents into systems built for "lawful good" employees.A Reset for the Industry: Why the rise of AI feels like the early 2010s of streaming, where the problems are unsolved and everyone is trying to figure out the answers.

marimo: an open-source reactive Python notebook

2025-07-11 · SciPy 2025

talk

by Akshay Agrawal (Marimo)

Git Python

Python notebooks are a workhorse of scientific computing. But traditional notebooks have problems — they suffer from a reproducibility crisis; they are difficult to use with interactive widgets; their file format does not play well with Git; and they aren't reusable like regular Python scripts or modules.

This talk presents a marimo, an open-source reactive Python notebook that addresses these concerns by modeling notebooks as dataflow graphs and storing them as Python files. We discuss design decisions and their tradeoffs, and show how these decisions make marimo notebooks reproducible in execution and packaging, Git-friendly, executable as scripts, and shareable as apps.

How Shopify and Palo Alto Networks use Dataflow bringing real-time data to AI

2025-04-10 · Google Cloud Next '25

session

by Suneetha Sarala (Palo Alto Networks) , Mehran Nazir (Google Cloud) , James Chang (Palo Alto Networks) , Franklyn D'Souza (Shopify) , Haroon Dogar (Google Cloud) , Kshetrajna Radhaven (Shopify)

AI/ML BigQuery

Leveraging real-time data in AI and machine learning (ML) can give you a competitive edge. This session explores how Shopify and Palo Alto Networks leverage real-time data and AI with BigQuery and Dataflow ML to transform customer experiences and drive innovation. Discover how these companies collect, process, and analyze real-time data to achieve significant business outcomes, and learn how to apply similar strategies in your organization.

ETL Processing on Google Cloud Using Dataflow and BigQuery (Python)

2025-04-09 · Google Cloud Next '25

session

BigQuery Cloud Computing ETL/ELT GCP Python

Build robust ETL pipelines on Google Cloud! This hands-on lab teaches you to use Dataflow (Python) and BigQuery to ingest and transform public datasets. Learn design considerations and implementation details to create effective data pipelines for your needs.

If you register for a Learning Center lab, please ensure that you sign up for a Google Cloud Skills Boost account for both your work domain and personal email address. You will need to authenticate your account as well (be sure to check your spam folder!). This will ensure you can arrive and access your labs quickly onsite. You can follow this link to sign up!

Kir Tititevsky: Modern Streaming Architecture Transforming the Service Bus

2025-03-31 · Straight Data Talk Listen

podcast_episode

by Yuliia Tkachova (Masthead Data) , Kir Titievsky (Google)

Cloud Computing Cloud Storage GCP Kafka Pub/Sub Data Streaming

Kir Titievsky, Product Manager at Google Cloud with extensive experience in streaming and storage infrastructure, joined Yuliia and Dumky to talk about streaming. Drawing from his work with Apache Kafka, Cloud PubSub, Dataflow and Cloud Storage since 2015, Kir explains the fundamental differences between streaming and micro-batch processing. He challenges common misconceptions about streaming costs, explaining how streaming can be significantly less expensive than batch processing for many use cases. Kir shares insights on the "service bus architecture" revival, discussing how modern distributed messaging systems have solved historic bottlenecks while creating new opportunities for business and performance needs.Kir's medium - https://medium.com/@kir-gcpKir's Linkedin page - https://www.linkedin.com/in/kir-titievsky-%F0%9F%87%BA%F0%9F%87%A6-7775052/

Capturing errors in Power Query and exception reporting in Power BI

2024-05-16 · Exception Reporting in Power BI with Reza Rad

talk

by reza rad (RADACAD)

m power bi power query

In this session, Reza shares tips, tricks, and techniques for capturing errors in Power Query and using them to present exception reporting in Power BI. Expect to learn techniques using Power Query and M in Power BI and Dataflow.

Data Engineering with Google Cloud Platform - Second Edition

2024-04-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Adi Wijaya

BigQuery CI/CD Cloud Computing Data Engineering Data Governance Google Dataform GCP Cloud Composer cloud-computing cloud-platforms google-cloud it-operations

Data Engineering with Google Cloud Platform is your ultimate guide to building scalable data platforms using Google Cloud technologies. In this book, you will learn how to leverage products such as BigQuery, Cloud Composer, and Dataplex for efficient data engineering. Expand your expertise and gain practical knowledge to excel in managing data pipelines within the Google Cloud ecosystem. What this Book will help me do Understand foundational data engineering concepts using Google Cloud Platform. Learn to build and manage scalable data pipelines with tools such as Dataform and Dataflow. Explore advanced topics like data governance and secure data handling in Google Cloud. Boost readiness for Google Cloud data engineering certification with real-world exam guidance. Master cost-effective strategies and CI/CD practices for data engineering on Google Cloud. Author(s) Adi Wijaya, the author of this book, is a Data Strategic Cloud Engineer at Google with extensive experience in data engineering and the Google Cloud ecosystem. With his hands-on expertise, he emphasizes practical solutions and in-depth knowledge sharing, guiding readers through the intricacies of Google Cloud for data engineering success. Who is it for? This book is ideal for data analysts, IT practitioners, software engineers, and data enthusiasts aiming to excel in data engineering. Whether you're a beginner tackling fundamental concepts or an experienced professional exploring Google Cloud's advanced capabilities, this book is designed for you. It bridges your current skills with modern data engineering practices on Google Cloud, making it a valuable resource at any stage of your career.

Bring the power of machine learning to the world of streaming data

2024-04-11 · Google Cloud Next '24

session

by Wei Hsia (Google) , Sachin Agarwal (Google) , Edgar Tanaka (Spotify)

AI/ML Cloud Computing Data Engineering Data Science GCP Data Streaming

Take the next step in your AI/ML journey with streaming data. Learn to deploy and manage complete ML pipelines to run inference and predictions, classify images, run remote inference calls, build a custom model handler, and much more with the latest innovations in Dataflow ML. Learn how Spotify leveraged Dataflow for large-scale generation of ML podcast previews and how they plan to keep pushing the boundaries of what’s possible with data engineering and data science to build better experiences for their customers and creators.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Transform your business with streaming: Learn how Mercado Libre did it

2024-04-10 · Google Cloud Next '24

session

by Diego Delgado (Mercado Libre) , Shan Kulandaivel (Google Cloud) , Pablo Milan (Mercado Libre)

Cloud Computing GCP Pub/Sub Data Streaming

Businesses everywhere have the opportunity to drive transformational impact by leveraging streaming data to make decisions and build experiences that delight users. In this session you will learn how MercadoLibre processes tens of billions of messages across thousands of applications to drive business impact. You will also learn about exciting new product announcements ranging from native ingest capabilities in Cloud Pub/Sub to new efficiency features in Dataflow to support for OSS technologies.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

It's an AI world (with MongoDB Atlas & Google Cloud)

2024-04-10 · Google Cloud Next '24

session

by Stanimira Vlaeva (MongoDB)

AI/ML BigQuery Cloud Computing GCP LLM MongoDB

Attention developers! Are you struggling with the complexities of integrating Al/ML into your apps? Join this practical session where we'll explore how MongoDB Atlas and Google Cloud's offerings like Vertex Al, Gemini, Codey, BigQuery, and Dataflow, provide a comprehensive toolkit for developers. In completing this session, you'll have the tools and confidence to embark on your own Al/ML journey! By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Your guide to real time RAG (retrieval augmented generation)

2024-04-10 · Google Cloud Next '24

demo

by Christopher Crosbie (Google Cloud) , Maruti C (Google Cloud)

AI/ML Cloud Computing GCP GenAI RAG

Natural language is an ideal interface for many real time applications such as inventory tracking, patient journey, field sales, and other on-the-go situations. However, these real time applications also require up to date and accurate information, which necessitates a real time RAG architecture. In this session, we will demonstrate how you can build an accurate and up to date real time generative AI application using a combination of Dataflow and graph databases.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Analyzing Bigtable data with BigQuery and Looker

2024-04-09 · Google Cloud Next '24

session

by Billy Jacobson (Google Cloud)

Big Data BigQuery Cloud Computing Dashboard GCP Looker

We will introduce some core Bigtable data concepts, write some data and explore it in the Cloud Console. Then we'll jump into using techniques to analyze the data in other tools, primarily BigQuery and Looker. We will set up the "Bigtable change streams to BigQuery" Dataflow pipeline, ingest data, query the change log in BigQuery and use Looker to create a visual dashboard. Throughout, we'll compare and contrast different ways to work with your big data in Bigtable to build a foundational understanding of best practices.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Channel the flood data deluge: Unlocking the American National Water Model

2024-04-09 · Google Cloud Next '24

session

by Kel Markert (Google Cloud) , Dan Ames (Brigham Young University) , Michael Ames (SADA, An Insight Company)

API BigQuery Cloud Computing GCP Cloud Run

U.S. floods cause ~$3B in damage annually. The National Oceanic and Atmospheric Administration predicts changing water levels, giving scientists and managers time to act. However, the massive archive of forecasts is too complex for typical users. Learn how BYU and U of Alabama, with SADA and Google, are using BigQuery, Cloud Run, DataFlow, and API Gateway to make these forecasts accessible for mobile apps, flood-warning systems, and more, addressing crucial concerns like rising river levels or the likelihood of flooding.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Ford uses Google Cloud for connected vehicle telemetry data solutions

2024-04-09 · Google Cloud Next '24

session

by Scott Weinstein (Google Cloud) , Anton Gething (Google Cloud) , Gavarraju Nanduri (Ford Motor Company) , Sangeetha Suryanarayanan (Ford Motor Company)

AI/ML BigQuery Cloud Computing GCP Pub/Sub

Connected vehicle telemetry has data that can be used to gain insights into vehicle performance, driver behavior, and fleet operations using AI technology. We will discuss how Ford uses Bigtable to collect, store, and analyze connected vehicle telemetry data in conjunction with BigQuery, Pub/Sub and Dataflow, a recipe applicable to many time series use cases. Get some of the insights we have gained from this data, how we have used these insights to improve our fleet operations and also some new Bigtable features we‘re most excited about.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Designing Data Platforms For Fintech Companies

2024-01-01 · Data Engineering Podcast Listen

podcast_episode

by Andrey Korchak (Monite) , Tobias Macey

AI/ML Analytics Cloud Computing Data Engineering Data Governance Data Lake Data Lakehouse Data Management Delta Hudi Iceberg Microsoft +6 more

Summary

Working with financial data requires a high degree of rigor due to the numerous regulations and the risks involved in security breaches. In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! Your host is Tobias Macey and today I'm interviewing Andrey Korchak about how to manage data in a fintech environment

Interview

Introduction How did you get involved in the area of data management? Can you start by summarizing the data challenges that are particular to the fintech ecosystem? What are the primary sources and types of data that fintech organizations are working with?

What are the business-level capabilities that are dependent on this data?

How do the regulatory and business requirements influence the technology landscape in fintech organizations?

What does a typical build vs. buy decision process look like?

Fraud prediction in e.g. banks is one of the most well-established applications of machine learning in industry. What are some of the other ways that ML plays a part in fintech?

How does that influence the architectural design/capabilities for data platforms in those organizations?

Data governance is a notoriously challenging problem. What are some of the strategies that fintech companies are able to apply to this problem given their regulatory burdens? What are the most interesting, innovative, or unexpected approaches to data management that you have seen in the fintech sector? What are the most interesting, unexpected, or challenging lessons that you have learned while working on data in fintech? What do you have planned for the future of your data capabilities at Monite?

Contact Info

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Monite ISO 270001 Tesseract GitOps SWIFT Protocol

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Starburst: Starburst Logo

This episode is brought to you by Starburst - a data lake analytics platform for data engineers who are battling to build and scale high quality data pipelines on the data lake. Powered by Trino, Starburst runs petabyte-scale SQL analytics fast at a fraction of the cost of traditional methods, helping you meet all your data needs ranging from AI/ML workloads to data applications to complete analytics.

Trusted by the teams at Comcast and Doordash, Starburst delivers the adaptability and flexibility a lakehouse ecosystem promises, while providing a single point of access for your data and all your data governance allowing you to discover, transform, govern, and secure all in one place. Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Try Starburst Galaxy today, the easiest and fastest way to get started using Trino, and get $500 of credits free. dataengineeringpodcast.com/starburstRudderstack:

Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstackMaterialize:

You shouldn't have to throw away the database to build with fast-changing data. Keep the familiar SQL, keep the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date.

That is Materialize, the only true SQL streaming database built from the ground up to meet the needs of modern data products: Fresh, Correct, Scalable — all in a familiar SQL UI. Built on Timely Dataflow and Differential Dataflow, open source frameworks created by cofounder Frank McSherry at Microsoft Research, Materialize is trusted by data and engineering teams at Ramp, Pluralsight, Onward and more to build real-time data products without the cost, complexity, and development time of stream processing.

Go to materialize.com today and get 2 weeks free!Support Data Engineering Podcast

Damian Filonowicz: Lessons Learned from the GCP Data Migration

2023-12-04 · DATA MINER Big Data Europe Conference 2020 Watch

video

by Damian Filonowicz (PAYBACK)

Big Data BigQuery Cloud Computing GCP

Join Damian Filonowicz as he shares 'Lessons Learned from the GCP Data Migration.' 🌐 Discover how PAYBACK tackled challenges in shifting data to the cloud, navigated privacy regulations, and uncovered insights about Google Cloud services like Cloud Dataflow, Cloud DLP, BigQuery, and more. Gain valuable suggestions for future endeavors in this enlightening presentation! 🚀🔍 #DataMigration #GCP #lessonslearned

✨ H I G H L I G H T S ✨

🙌 A huge shoutout to all the incredible participants who made Big Data Conference Europe 2023 in Vilnius, Lithuania, from November 21-24, an absolute triumph! 🎉 Your attendance and active participation were instrumental in making this event so special. 🌍

Don't forget to check out the session recordings from the conference to relive the valuable insights and knowledge shared! 📽️

Once again, THANK YOU for playing a pivotal role in the success of Big Data Conference Europe 2023. 🚀 See you next year for another unforgettable conference! 📅 #BigDataConference #SeeYouNextYear

Google Cloud Platform for Data Science: A Crash Course on Big Data, Machine Learning, and Data Analytics Services

2023-11-17 · O'Reilly Data Science Books O'Reilly Amazon

book

by Sandika S. Sukhdeve , Dr. Shitalkumar R. Sukhdeve

AI/ML Analytics BI Big Data BigQuery Cloud Computing Cloud Storage Data Analytics Data Science DataViz Dataproc GCP +9 more

This book is your practical and comprehensive guide to learning Google Cloud Platform (GCP) for data science, using only the free tier services offered by the platform. Data science and machine learning are increasingly becoming critical to businesses of all sizes, and the cloud provides a powerful platform for these applications. GCP offers a range of data science services that can be used to store, process, and analyze large datasets, and train and deploy machine learning models. The book is organized into seven chapters covering various topics such as GCP account setup, Google Colaboratory, Big Data and Machine Learning, Data Visualization and Business Intelligence, Data Processing and Transformation, Data Analytics and Storage, and Advanced Topics. Each chapter provides step-by-step instructions and examples illustrating how to use GCP services for data science and big data projects. Readers will learn how to set up a Google Colaboratory account and run Jupyternotebooks, access GCP services and data from Colaboratory, use BigQuery for data analytics, and deploy machine learning models using Vertex AI. The book also covers how to visualize data using Looker Data Studio, run data processing pipelines using Google Cloud Dataflow and Dataprep, and store data using Google Cloud Storage and SQL. What You Will Learn Set up a GCP account and project Explore BigQuery and its use cases, including machine learning Understand Google Cloud AI Platform and its capabilities Use Vertex AI for training and deploying machine learning models Explore Google Cloud Dataproc and its use cases for big data processing Create and share data visualizations and reports with Looker Data Studio Explore Google Cloud Dataflow and its use cases for batch and stream data processing Run data processing pipelines on Cloud Dataflow Explore Google Cloud Storageand its use cases for data storage Get an introduction to Google Cloud SQL and its use cases for relational databases Get an introduction to Google Cloud Pub/Sub and its use cases for real-time data streaming Who This Book Is For Data scientists, machine learning engineers, and analysts who want to learn how to use Google Cloud Platform (GCP) for their data science and big data projects

Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable

2023-10-15 · Data Engineering Podcast Listen

podcast_episode

by Eric Sammer (Decodable) , Tobias Macey

AI/ML Airbyte Analytics Flink API Kinesis BI CI/CD Cloud Computing Data Engineering Data Management Data Quality +21 more

Summary

Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems for engineering teams. In this episode Eric Sammer discusses why more companies are including real-time capabilities in their products and the ways that Decodable makes it faster and easier.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! As more people start using AI for projects, two things are clear: It’s a rapidly advancing field, but it’s tough to navigate. How can you get the best results for your use case? Instead of being subjected to a bunch of buzzword bingo, hear directly from pioneers in the developer and data science space on how they use graph tech to build AI-powered apps. . Attend the dev and ML talks at NODES 2023, a free online conference on October 26 featuring some of the brightest minds in tech. Check out the agenda and register today at Neo4j.com/NODES. Your host is Tobias Macey and today I'm interviewing Eric Sammer about starting your stream processing journey with Decodable

Interview

Introduction How did you get involved in the area of data management? Can you describe what Decodable is and the story behind it?

What are the notable changes to the Decodable platform since we last spoke? (October 2021) What are the industry shifts that have influenced the product direction?

What are the problems that customers are trying to solve when they come to Decodable? When you launched your focus was on SQL transformations of streaming data. What was the process for adding full Java support in addition to SQL? What are the developer experience challenges that are particular to working with streaming data?

How have you worked to address that in the Decodable platform and interfaces?

As you evolve the technical and product direction, what is your heuristic for balancing the unification of interfaces and system integration against the ability to swap different components or interfaces as new technologies are introduced? What are the most interesting, innovative, or unexpected ways that you have seen Decodable used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Decodable? When is Decodable the wrong choice? What do you have planned for the future of Decodable?

Contact Info

esammer on GitHub LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Decodable

Podcast Episode

Understanding the Apache Flink Journey Flink

Podcast Episode

Debezium

Podcast Episode

Kafka Redpanda

Podcast Episode

Kinesis PostgreSQL

Podcast Episode

Snowflake

Podcast Episode

Databricks Startree Pinot

Podcast Episode

Rockset

Podcast Episode

Druid InfluxDB Samza Storm Pulsar

Podcast Episode

ksqlDB

Podcast Episode

dbt GitHub Actions Airbyte Singer Splunk Outbox Pattern

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Neo4J: NODES Conference Logo

NODES 2023 is a free online conference focused on graph-driven innovations with content for all skill levels. Its 24 hours are packed with 90 interactive technical sessions from top developers and data scientists across the world covering a broad range of topics and use cases. The event tracks: - Intelligent Applications: APIs, Libraries, and Frameworks – Tools and best practices for creating graph-powered applications and APIs with any software stack and programming language, including Java, Python, and JavaScript - Machine Learning and AI – How graph technology provides context for your data and enhances the accuracy of your AI and ML projects (e.g.: graph neural networks, responsible AI) - Visualization: Tools, Techniques, and Best Practices – Techniques and tools for exploring hidden and unknown patterns in your data and presenting complex relationships (knowledge graphs, ethical data practices, and data representation)

Don’t miss your chance to hear about the latest graph-powered implementations and best practices for free on October 26 at NODES 2023. Go to Neo4j.com/NODES today to see the full agenda and register!Rudderstack:

Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstackMaterialize:

You shouldn't have to throw away the database to build with fast-changing data. Keep the familiar SQL, keep the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date.

That is Materialize, the only true SQL streaming database built from the ground up to meet the needs of modern data products: Fresh, Correct, Scalable — all in a familiar SQL UI. Built on Timely Dataflow and Differential Dataflow, open source frameworks created by cofounder Frank McSherry at Microsoft Research, Materialize is trusted by data and engineering teams at Ramp, Pluralsight, Onward and more to build real-time data products without the cost, complexity, and development time of stream processing.

Go to materialize.com today and get 2 weeks free!Datafold:

This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare…

Pro Power BI Architecture: Development, Deployment, Sharing, and Security for Microsoft Power BI Solutions

2023-07-24 · O'Reilly Data Science Books O'Reilly Amazon

book

by reza rad (RADACAD)

BI Cloud Computing DWH Microsoft Power BI Cyber Security business-intelligence data data-science microsoft-power-platform power-bi

This book provides detailed guidance around architecting and deploying Power BI reporting solutions, including help and best practices for sharing and security. You’ll find chapters on dataflows, shared datasets, composite model and DirectQuery connections to Power BI datasets, deployment pipelines, XMLA endpoints, and many other important features related to the overall Power BI architecture that are new since the first edition. You will gain an understanding of what functionality each of the Power BI components provide (such as Dataflow, Shared Dataset, Datamart, thin reports, and paginated reports), so that you can make an informed decision about what components to use in your solution. You will get to know the pros and cons of each component, and how they all work together within the larger Power BI architecture. Commonly encountered problems you will learn to handle include content unexpectedly changing while users are in the process of creating reports and building analyses, methods of sharing analyses that don’t cover all the requirements of your business or organization, and inconsistent security models. Detailed examples help you to understand and choose from among the different methods available for sharing and securing Power BI content so that only intended recipients can see it. The knowledge provided in this book will allow you to choose an architecture and deployment model that suits the needs of your organization. It will also help ensure that you do not spend your time maintaining your solution, but on using it for its intended purpose: gaining business value from mining and analyzing your organization’s data. What You Will Learn Architect Power BI solutions that are reliable and easy to maintain Create development templates and structures in support of reusability Set up and configure the Power BI gateway as a bridge between on-premises data sourcesand the Power BI cloud service Select a suitable connection type—Live Connection, DirectQuery, Scheduled Refresh, or Composite Model—for your use case Choose the right sharing method for how you are using Power BI in your organization Create and manage environments for development, testing, and production Secure your data using row-level and object-level security Save money by choosing the right licensing plan Who This Book Is For Data analysts and developers who are building reporting solutions around Power BI, as well as architects and managers who are responsible for the big picture of how Power BI meshes with an organization’s other systems, including database and data warehouse systems.

talk-data.com

Activity Trend

Top Events

Top Speakers

Keynote by Lisa Amini- What’s Next in AI for Data and Data Management?

The Agentic Future: How Streaming is Evolving for AI w/ Tyler Akidau

marimo: an open-source reactive Python notebook

How Shopify and Palo Alto Networks use Dataflow bringing real-time data to AI

ETL Processing on Google Cloud Using Dataflow and BigQuery (Python)

Kir Tititevsky: Modern Streaming Architecture Transforming the Service Bus

Capturing errors in Power Query and exception reporting in Power BI

Data Engineering with Google Cloud Platform - Second Edition

Bring the power of machine learning to the world of streaming data

Transform your business with streaming: Learn how Mercado Libre did it

It's an AI world (with MongoDB Atlas & Google Cloud)

Your guide to real time RAG (retrieval augmented generation)

Analyzing Bigtable data with BigQuery and Looker

Channel the flood data deluge: Unlocking the American National Water Model

Ford uses Google Cloud for connected vehicle telemetry data solutions

Designing Data Platforms For Fintech Companies

Damian Filonowicz: Lessons Learned from the GCP Data Migration

Google Cloud Platform for Data Science: A Crash Course on Big Data, Machine Learning, and Data Analytics Services

Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable

Pro Power BI Architecture: Development, Deployment, Sharing, and Security for Microsoft Power BI Solutions