talk-data.com
People (4 results)
See all 4 →Activities & events
| Title & Speakers | Event |
|---|---|
|
Panel Discussion | Building Effective Data Teams: Strategies for Success
2024-12-06 · 23:14
Paul Andrew
,
Olena Kutsenko
,
Martin Zuern
,
Gunnar Morling
– Software Engineer and open-source enthusiast
@ Decodable
🌟 Session Overview 🌟 Session Name: Building Effective Data Teams: Strategies for Success Speaker: Gunnar Morling, Martin Zuern, Olena Kutsenko, Paul Andrew Session Description: Panel Discussion will explore the key strategies for assembling and nurturing high-performing data teams. Expert panelists will discuss best practices for recruiting top talent, fostering collaboration, and creating a culture of innovation within data teams. The session will also address common challenges such as skill gaps, team dynamics, and aligning data initiatives with business goals. 🚀 About Big Data and RPA 2024 🚀 Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨ 📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP 💡 Stay Connected & Updated 💡 Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop! 🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT |
|
|
Gunnar Morling: Data Contracts In Practice With Debezium and Apache Flink
2024-12-06 · 22:25
Gunnar Morling
– Software Engineer and open-source enthusiast
@ Decodable
🌟 Session Overview 🌟 Session Name: Data Contracts In Practice With Debezium and Apache Flink Speaker: Gunnar Morling Session Description: Log-based change data capture (CDC) is an invaluable part of the data engineering toolbox: it enables a variety of use cases such as real-time analytics, full-text search, or cache invalidation by publishing data change events from your database. But when publishing change event streams across context or team boundaries, aren’t you tying external consumers to your application’s data model, thus limiting yourself in evolving the same? Enter data contracts—consciously designed abstractions between your internal data model and the outside world. Come and join us for this session to learn about: Challenges you may encounter when exposing table-level change event streams and how data contracts can mitigate them Implementation strategies for data contracts, such as the outbox pattern and stream processing Evolving your data model and the corresponding data contracts without breaking any existing consumers We’ll also touch on some advanced topics at the intersection of CDC and stream processing, such as hydrating partial change events, using the popular change stream processing duo of Debezium and Apache Flink. 🚀 About Big Data and RPA 2024 🚀 Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨ 📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP 💡 Stay Connected & Updated 💡 Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop! 🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT |
|
|
DuckDB in Action
2024-08-21
Dive into DuckDB and start processing gigabytes of data with ease—all with no data warehouse. DuckDB is a cutting-edge SQL database that makes it incredibly easy to analyze big data sets right from your laptop. In DuckDB in Action you’ll learn everything you need to know to get the most out of this awesome tool, keep your data secure on prem, and save you hundreds on your cloud bill. From data ingestion to advanced data pipelines, you’ll learn everything you need to get the most out of DuckDB—all through hands-on examples. Open up DuckDB in Action and learn how to: Read and process data from CSV, JSON and Parquet sources both locally and remote Write analytical SQL queries, including aggregations, common table expressions, window functions, special types of joins, and pivot tables Use DuckDB from Python, both with SQL and its "Relational"-API, interacting with databases but also data frames Prepare, ingest and query large datasets Build cloud data pipelines Extend DuckDB with custom functionality Pragmatic and comprehensive, DuckDB in Action introduces the DuckDB database and shows you how to use it to solve common data workflow problems. You won’t need to read through pages of documentation—you’ll learn as you work. Get to grips with DuckDB's unique SQL dialect, learning to seamlessly load, prepare, and analyze data using SQL queries. Extend DuckDB with both Python and built-in tools such as MotherDuck, and gain practical insights into building robust and automated data pipelines. About the Technology DuckDB makes data analytics fast and fun! You don’t need to set up a Spark or run a cloud data warehouse just to process a few hundred gigabytes of data. DuckDB is easily embeddable in any data analytics application, runs on a laptop, and processes data from almost any source, including JSON, CSV, Parquet, SQLite and Postgres. About the Book DuckDB in Action guides you example-by-example from setup, through your first SQL query, to advanced topics like building data pipelines and embedding DuckDB as a local data store for a Streamlit web app. You’ll explore DuckDB’s handy SQL extensions, get to grips with aggregation, analysis, and data without persistence, and use Python to customize DuckDB. A hands-on project accompanies each new topic, so you can see DuckDB in action. What's Inside Prepare, ingest and query large datasets Build cloud data pipelines Extend DuckDB with custom functionality Fast-paced SQL recap: From simple queries to advanced analytics About the Reader For data pros comfortable with Python and CLI tools. About the Authors Mark Needham is a blogger and video creator at @LearnDataWithMark. Michael Hunger leads product innovation for the Neo4j graph database. Michael Simons is a Java Champion, author, and Engineer at Neo4j. Quotes I use DuckDB every day, and I still learned a lot about how DuckDB makes things that are hard in most databases easy! - Jordan Tigani, Founder, MotherDuck An excellent resource! Unlocks possibilities for storing, processing, analyzing, and summarizing data at the edge using DuckDB. - Pramod Sadalage, Director, Thoughtworks Clear and accessible. A comprehensive resource for harnessing the power of DuckDB for both novices and experienced professionals. - Qiusheng Wu, Associate Professor, University of Tennessee Excellent! The book all we ducklings have been waiting for! - Gunnar Morling, Decodable |
O'Reilly Data Science Books
|
|
Accelerating decision intelligence with low code ML platforms
2024-06-18 · 20:30
Irina Ioana Brudaru
– Principal Engineering and Science Lead, Data
@ EQOM Group
low-code ml platforms
decision intelligence
|
|
|
Journey to a true self-service platform for Data Analytics Teams.
2024-06-18 · 19:50
Christian Hiroz
– Senior Data Engineer
@ SumUp
self-service platform
Data Analytics
|
|
|
From Postgres to OpenSearch in No Time: Using Debezium for log-based change data capture (CDC) and Apache Flink for stream processing
2024-06-18 · 19:10
Gunnar Morling
– Software Engineer
@ Decodable
postgresql
opensearch
debezium
cdc
flink
|
|
|
From Postgres to OpenSearch in No Time
2023-11-30 · 19:00
Gunnar Morling
– Software Engineer
@ Decodable
Abstract: You've been tasked with implementing a data streaming pipeline for propagating data changes from your operational Postgres database to a search index in OpenSearch. Data views in OS should be denormalized for fast querying, and of course there should be no noticeable impact on the production database. In this session we'll discuss how to build this data pipeline using two popular open-source projects: Debezium for log-based change data capture (CDC) and Apache Flink for stream processing. Join us for this talk and learn about: * Setting up change data streams with Debezium * Efficiently building nested data structures from 1:n joins * Deployment options: Kafka Connect vs. Flink CDC |
|
|
On the Journey of Redefining Stream Processing: What We Learned from Building RisingWave?
2023-11-30 · 18:35
Yingjun Wu
– Speaker
@ RisingWave Labs
Abstract: RisingWave is an open-source streaming database designed from scratch for the cloud. It implemented a Snowflake-style storage-compute separation architecture to reduce performance cost, and provides users with a PostgreSQL-like experience for stream processing. Over the last three years, RisingWave has evolved from a one-person project to a rapidly-growing product deployed by nearly 100 enterprises and startups. But the journey of building RisingWave is full of challenges. In this talk, I'd like to share with you lessons we've gained from four dimensions: 1) the decoupled compute-storage architecture, 2) the balances between stream processing and OLAP, 3) the Rust ecosystem, and 4) the product positioning. I will dive deep into technical details and then share with you my views on the future of stream processing. |
|
|
Change Data Streaming Patterns With Debezium & Apache Flink | Decodable
2023-05-11 · 18:58
Gunnar Morling
– Software Engineer and open-source enthusiast
@ Decodable
ABOUT THE TALK: Microservices are one of the big trends in software engineering of the last few years. In this session we'll discuss and showcase how open-source change data capture (CDC) with Debezium can help developers with typical challenges they often face when working on microservices. Learn how to:
ABOUT THE SPEAKER: Gunnar Morling is a software engineer and open-source enthusiast by heart, currently working at Decodable on stream processing based on Apache Flink. In his prior role as a software engineer at Red Hat, he led the Debezium project, a distributed platform for change data capture. He is a Java Champion and has founded multiple open source projects such as JfrUnit, kcctl, and MapStruct. Gunnar is an avid blogger (morling.dev) and has spoken at a wide range of conferences like QCon, Java One, and Devoxx. He lives in Hamburg, Germany. ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies. FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/ |
Data Council 2023 |
|
Change Data Capture For All Of Your Databases With Debezium
2020-01-06 · 03:00
Summary Databases are useful for inspecting the current state of your application, but inspecting the history of that data can get messy without a way to track changes as they happen. Debezium is an open source platform for reliable change data capture that you can use to build supplemental systems for everything from maintaining audit trails to real-time updates of your data warehouse. In this episode Gunnar Morling and Randall Hauch explain why it got started, how it works, and some of the myriad ways that you can use it. If you have ever struggled with implementing your own change data capture pipeline, or understanding when it would be useful then this episode is for you. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference in NYC, Strata Data in San Jose, and PyCon US in Pittsburgh. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Randall Hauch and Gunnar Morling about Debezium, an open source distributed platform for change data capture Interview Introduction How did you get involved in the area of data management? Can you start by describing what Change Data Capture is and some of the ways that it can be used? What is Debezium and what problems does it solve? What was your motivation for creating it? What are some of the use cases that it enables? What are some of the other options on the market for handling change data capture? Can you describe the systems architecture of Debezium and how it has evolved since it was first created? How has the tight coupling with Kafka impacted the direction and capabilities of Debezium? What, if any, other substrates does Debezium support (e.g. Pulsar, Bookkeeper, Pravega)? What are the data sources that are supported by Debezium? Given that you have branched into non-relational stores, how have you approached organization of the code to allow for handling the specifics of those engines while retaining a common core set of functionality? What is involved in deploying, integrating, and maintaining an installation of Debezium? What are the scaling factors? What are some of the edge cases that users and operators should be aware of? Debezium handles the ingestion and distribution of database changesets. What are the downstream challenges or complications that application designers or systems architects have to deal with to make use of that information? What are some of the design tensions that exist in the Debezium community between acting as a simple pipe vs. adding functionality for interpreting/a |
Data Engineering Podcast |