talk-data.com
People (123 results)
See all 123 →Activities & events
| Title & Speakers | Event |
|---|---|
|
The State of Airflow 2026: London Airflow Meetup!
2026-01-28 · 17:30
Join fellow Airflow enthusiasts and leaders at Salisbury House for an evening of engaging talks, great food and drinks, and exclusive swag! We'll start you off with a deep dive into the Airflow 2026 survey results, and finish off with a community member presentation on the Apache TinkerPop provider. PRESENTATIONS Talk #1: The State of Apache Airflow® 2026 Apache Airflow® continues to thrive as the world’s leading open-source data orchestration platform, with 30M downloads per month and over 3k contributors. 2025 marked a major milestone with the release of Airflow 3, which introduced DAG versioning, enhanced security and task isolation, assets, and more. These changes have reshaped how data teams build, operate, and govern their pipelines. In this session, our speaker will share insights from the State of Airflow 2026 report, including:
Join us to hear directly from a leader in the community and discover how to get the most out of Airflow in the year ahead. Talk #2: Building the Apache TinkerPop Provider for Airflow
Graph databases are powering everything from recommendation engines to fraud detection, but integrating graph operations into modern data pipelines has often required custom code and workarounds. Earlier this year, Ahmad built a new Apache TinkerPop provider for Airflow, making it easier than ever to orchestrate Gremlin queries, manage graph workloads, and connect Airflow to TinkerPop-enabled systems. In this session, you’ll learn:
Join us to explore how Airflow and TinkerPop can work together to streamline graph workflows and unlock new patterns in modern data pipelines. AGENDA
|
The State of Airflow 2026: London Airflow Meetup!
|
|
[Notes]How to Build a Portfolio That Reflects Your Real Skills
2025-12-28 · 18:00
These are the notes of the previous "How to Build a Portfolio That Reflects Your Real Skills" event: Properties of an ideal portfolio repository:
📌 Backend & Frontend Portfolio Project Ideas
☕ Junior Java Backend Developer (Spring Boot)1. Shop Manager ApplicationA monolithic Spring Boot app designed with microservice-style boundaries. Features
Engineering Focus
2. Parallel Data Processing EngineBackend service for processing large datasets efficiently. Features
Demonstrates
3. Distributed Task Queue SystemSimple async job processing system. Features
Demonstrates
4. Rate Limiting & Load Control ServiceStandalone service that protects APIs from abuse. Features
Demonstrates
5. Search & Indexing BackendDocument or record search service. Features
Demonstrates
6. Distributed Configuration & Feature Flag ServiceCentralized config service for other apps. Features
Demonstrates
🐹 Mid-Level Go Backend Developer (Non-Kubernetes)1. High-Throughput Event Processing PipelineMulti-stage concurrent pipeline. Features
2. Distributed Job Scheduler & Worker SystemAsync job execution platform. Features
3. In-Memory Caching ServiceRedis-like cache written from scratch. Features
4. Rate Limiting & Traffic Shaping GatewayReverse-proxy-style rate limiter. Features
5. Log Aggregation & Query EngineIncrementally built system: Step-by-step
🐍 Mid-Level Python Backend Developer1. Asynchronous Task Processing SystemAsync job execution platform. Features
2. Event-Driven Data PipelineStreaming data processing service. Features
3. Distributed Rate Limiting ServiceAPI protection service. Steps
4. Search & Indexing BackendSearch system for logs or documents. Features
5. Configuration & Feature Flag ServiceShared configuration backend. Steps
🟦 Mid-Level TypeScript Backend Developer1. Asynchronous Job Processing SystemQueue-based task execution. Features
2. Real-Time Chat / Notification ServiceWebSocket-based system. Features
3. Rate Limiting & API GatewayAPI gateway with protections. Features
4. Search & Filtering EngineSearch backend for products, logs, or articles. Features
5. Feature Flag & Configuration ServiceCentralized config management. Features
🟨 Mid-Level Node.js Backend Developer1. Async Task Queue SystemBackground job processor. Features
2. Real-Time Chat / Notification ServiceSocket-based system. Features
3. Rate Limiting & API GatewayTraffic control service. Features
4. Search & Indexing BackendIndexing & querying service. 5. Feature Flag / Configuration ServiceShared backend for app configs. ⚛️ Mid-Level Frontend Developer (React / Next.js)1. Dynamic Analytics DashboardInteractive data visualization app. Features
2. E-Commerce StoreFull shopping experience. Features
3. Real-Time Chat / Collaboration AppLive multi-user UI. Features
4. CMS / Blogging PlatformSEO-focused content app. Features
5. Personalized Analytics / Recommendation UIData-heavy frontend. Features
6. AI Chatbot App — “My House Plant Advisor”LLM-powered assistant with production-quality UX. Core Features
Advanced Features
✅ Final AdviceYou do NOT need to build everything. Instead, pick 1–2 strong projects per role and focus on depth:
📌 Portfolio Quality Signals (Very Important)
🎯 Why This Helps in InterviewsWorking on serious projects gives you:
🎥 Demo & Documentation Best Practices
🤝 Open Source & Personal Projects (Interview Signal)Always mention that you have contributed to Open Source or built personal projects.
|
[Notes]How to Build a Portfolio That Reflects Your Real Skills
|
|
Code security for software engineers
2025-11-26 · 16:45
Johannes Dahse
– VP of Code Security
@ Sonar
,
Gergely Orosz
– host
Brought to You By: • Statsig — The unified platform for flags, analytics, experiments, and more. Statsig are helping make the first-ever Pragmatic Summit a reality. Join me and 400 other top engineers and leaders on 11 February, in San Francisco for a special one-day event. Reserve your spot here. • Linear — The system for modern product development. Engineering teams today move much faster, thanks to AI. Because of this, coordination increasingly becomes a problem. This is where Linear helps fast-moving teams stay focused. Check out Linear. — As software engineers, what should we know about writing secure code? Johannes Dahse is the VP of Code Security at Sonar and a security expert with 20 years of industry experience. In today’s episode of The Pragmatic Engineer, he joins me to talk about what security teams actually do, what developers should own, and where real-world risk enters modern codebases. We cover dependency risk, software composition analysis, CVEs, dynamic testing, and how everyday development practices affect security outcomes. Johannes also explains where AI meaningfully helps, where it introduces new failure modes, and why understanding the code you write and ship remains the most reliable defense. If you build and ship software, this episode is a practical guide to thinking about code security under real-world engineering constraints. — Timestamps (00:00) Intro (02:31) What is penetration testing? (06:23) Who owns code security: devs or security teams? (14:42) What is code security? (17:10) Code security basics for devs (21:35) Advanced security challenges (24:36) SCA testing (25:26) The CVE Program (29:39) The State of Code Security report (32:02) Code quality vs security (35:20) Dev machines as a security vulnerability (37:29) Common security tools (42:50) Dynamic security tools (45:01) AI security reviews: what are the limits? (47:51) AI-generated code risks (49:21) More code: more vulnerabilities (51:44) AI’s impact on code security (58:32) Common misconceptions of the security industry (1:03:05) When is security “good enough?” (1:05:40) Johannes’s favorite programming language — The Pragmatic Engineer deepdives relevant for this episode: • What is Security Engineering? • Mishandled security vulnerability in Next.js • Okta Schooled on Its Security Practices — Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email [email protected]. Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe |
|
|
Measuring the impact of AI on software engineering – with Laura Tacho
2025-07-23 · 17:53
Gergely Orosz
– host
,
Laura Tacho
– CTO
@ DX
Supported by Our Partners • Statsig — The unified platform for flags, analytics, experiments, and more. • Graphite — The AI developer productivity platform. — There’s no shortage of bold claims about AI and developer productivity, but how do you separate signal from noise? In this episode of The Pragmatic Engineer, I’m joined by Laura Tacho, CTO at DX, to cut through the hype and share how well (or not) AI tools are actually working inside engineering orgs. Laura shares insights from DX’s research across 180+ companies, including surprising findings about where developers save the most time, why devs don’t use AI at all, and what kinds of rollouts lead to meaningful impact. We also discuss: • The problem with oversimplified AI headlines and how to think more critically about them • An overview of the DX AI Measurement framework • Learnings from Booking.com’s AI tool rollout • Common reasons developers aren’t using AI tools • Why using AI tools sometimes decreases developer satisfaction • Surprising results from DX’s 180+ company study • How AI-generated documentation differs from human-written docs • Why measuring developer experience before rolling out AI is essential • Why Laura thinks roadmaps are on their way out • And much more! — Timestamps (00:00) Intro (01:23) Laura’s take on AI overhyped headlines (10:46) Common questions Laura gets about AI implementation (11:49) How to measure AI’s impact (15:12) Why acceptance rate and lines of code are not sufficient measures of productivity (18:03) The Booking.com case study (20:37) Why some employees are not using AI (24:20) What developers are actually saving time on (29:14) What happens with the time savings (31:10) The surprising results from the DORA report on AI in engineering (33:44) A hypothesis around AI and flow state and the importance of talking to developers (35:59) What’s working in AI architecture (42:22) Learnings from WorkHuman’s adoption of Copilot (47:00) Consumption-based pricing, and the difficulty of allocating resources to AI (52:01) What DX Core 4 measures (55:32) The best outcomes of implementing AI (58:56) Why highly regulated industries are having the best results with AI rollout (1:00:30) Indeed’s structured AI rollout (1:04:22) Why migrations might be a good use case for AI (and a tip for doing it!) (1:07:30) Advice for engineering leads looking to get better at AI tooling and implementation (1:08:49) Rapid fire round — The Pragmatic Engineer deepdives relevant for this episode: • AI Engineering in the real world • Measuring software engineering productivity • The AI Engineering stack • A new way to measure developer productivity – from the creators of DORA and SPACE — See the transcript and other references from the episode at https://newsletter.pragmaticengineer.com/podcast — Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email [email protected]. Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe |
|
|
Erica Louie
– Sr. Analytics Manager
@ dbt Labs
,
Andrew Escay
– Lead Data Analyst
@ dbt Labs
"Look at these beautiful dbt models! Why are we still experiencing the same friction with stakeholders?" This talk from experts at dbt Labs argues that we solved the first stage of building "Transformations" (via the 2024 State of Analytics Engineering report) and now we're now in the second stage: "The Philosophy of Transformations". And all roads lead to "metrics". Speakers: Erica Louie Sr. Analytics Manager dbt Labs Andrew Escay Lead Data Analyst dbt Labs Read the blog to learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements |
|
|
Coalesce 2024: Boost your data literacy with 2 key concepts
2024-10-16 · 17:55
Rachel House
– Senior Developer Advocate
@ Great Expectations
The dbt Labs 2024 State of Analytics Engineering report highlights that stakeholder data literacy remains a problem in the modern data workplace. Data stakeholders and data professionals can both benefit from learning foundational data literacy concepts that foster their ability to reason about working with data in a business environment. In this talk, an expert from Great Expectations covers two key concepts that they've applied in their own career when framing data fundamentals: “the data supply chain” and “ML in a nutshell.” Speaker: Rachel House Senior Developer Advocate Great Expectations Read the blog to learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements |
|
|
Copenhagen dbt Meetup vol. 5
2024-10-10 · 15:30
Mark your calendars for the fifth dbt Meetup in Copenhagen! Thursday, the 10th of October 🧡 We have the speaker line-up ready for you, and we will have three exciting speakers! Date: Thursday the 10th of October 🤝 Organizer: Intellishore 🏠 Venue Host: group.one - Kalvebod Brygge 24, 1560 København Agenda & Speakers 17:30: Meet new people and catch up with old acquaintances over drinks and snacks 🥤 17:45: Event kick-off and Welcome - Hosted by Marie-Christin Henkelmann\, Senior Consultant at Intellishore 💬 18:00-20:00 - Speaker sessions 💬 Deep Dives on CI and Orchestration @ group.one by Emil Nilsson, Director of Data at group.one In this session, group.one will share their experiences with building a data team in a M&A-driven organisation, discussing key decisions and learnings along the way. Hear about specific aspects of their data stack, including CI processes and the orchestration of dbt jobs. Schedule-less Data Architecture @ Ageras by August Hörlën, Senior Analytics Engineer at Ageras Learn how Ageras transformed its data architecture by leveraging dynamic tables in Snowflake and dbt. This talk shows how they moved beyond traditional scheduling to create a responsive, schedule-less pipeline that ensures data is processed exactly when needed. Late-stage transformations: Utilizing dbt Semantic Layer metrics by Erica Louie, Sr. Analytics Manager at dbt Labs & Andrew Escay, Lead Data Analyst at dbt Labs Look at these beautiful dbt models! Why are we still experiencing the same friction with stakeholders?" This talk from experts at dbt Labs argues that we solved the first stage of building "Transformations" (via the 2024 State of Analytics Engineering report), and now we're now in the second stage: "The Philosophy of Transformations". And all roads lead to "metrics". 20:00: 🍕 Food & socializing - Continue the conversation over food and drinks. 20:45: Over and out 🙌🏼 ➡️ Join the dbt Slack community: https://www.getdbt.com/community/ 🤝For the best Meetup experience, make sure to join the #local-denmark channel in dbt Slack (https://slack.getdbt.com/) ---------------------------------- dbt is the standard in data transformation, used by over 40,000 organizations worldwide. Through the application of software engineering best practices like modularity, version control, testing, and documentation, dbt’s analytics engineering workflow helps teams work more efficiently to produce data the entire organization can trust. Learn more: https://www.getdbt.com/ To attend, please read the Health and Safety Policy and Terms of Participation: https://www.getdbt.com/legal/health-and-safety-policy |
Copenhagen dbt Meetup vol. 5
|
|
The State of Analytics Engineering Report
2024-04-10 · 21:30
The results are in for The State of Analytics Engineering Report — the pains and gains shared by data practitioners and leaders, in dbt Labs’s -annual survey of this fast-changing space. Check out Erica, Adam and Amada as they discuss industry benchmarks, macro trends shaping the industry, and strategies for creating effective data organizations 🔥 If you're a first-timer, dbt Meetups are networking events open to all folks working with data! 🤝 Organizer: dbt Labs 🏠 Venue Host: Materialize (https://materialize.com/) - Thank you for hosting us in your office! 6th floor, 436 Lafayette St, New York, NY 10003 🍕 Catering: expect light food and beverages 📝Agenda 5:30 - 6:15pm \| Check-in\, nametags and mingling 6:15 - 7:05pm \| Presentation/Discussion of the State of the Analytics Engineering Report results with Amada Echeverria (Community @ dbt Labs)\, Adam Stone (Analytics Engineering Manager @ BDC\, a Velir Company)\, and Erica Louie (Senior Manager of Analytics Engineer at dbt Labs) 7:05 - 7:20 Q & A 7:20 - 8:15pm \| Networking\, bites and drinks Our venue has capacity limits, so please only RSVP if you intend to come and change your RSVP status on the Meetup to "Not Going" or message Amada Echeverria in Meetup if you have an issue. To attend, please read the Health and Safety Policy and Terms of Participation: https://bit.ly/4azcreT ➡️ Join the dbt Slack community: https://www.getdbt.com/community/ 🤝For the best Meetup experience, make sure to join the #local-nyc channel in dbt Slack (https://slack.getdbt.com/). ---------------------------------- dbt allows teams to ship trusted data products, faster. dbt is a data transformation framework that lets analysts and engineers collaborate using their shared knowledge of SQL. Through the application of software engineering best practices like modularity, version control, testing, and documentation, dbt’s analytics engineering workflow helps teams work more efficiently to produce data the entire organization can trust. Learn more: https://www.getdbt.com/ |
The State of Analytics Engineering Report
|
|
Strategies For A Successful Data Platform Migration
2023-07-31 · 03:00
Summary All software systems are in a constant state of evolution. This makes it impossible to select a truly future-proof technology stack for your data platform, making an eventual migration inevitable. In this episode Gleb Mezhanskiy and Rob Goretsky share their experiences leading various data platform migrations, and the hard-won lessons that they learned so that you don't have to. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack Modern data teams are using Hex to 10x their data impact. Hex combines a notebook style UI with an interactive report builder. This allows data teams to both dive deep to find insights and then share their work in an easy-to-read format to the whole org. In Hex you can use SQL, Python, R, and no-code visualization together to explore, transform, and model data. Hex also has AI built directly into the workflow to help you generate, edit, explain and document your code. The best data teams in the world such as the ones at Notion, AngelList, and Anthropic use Hex for ad hoc investigations, creating machine learning models, and building operational dashboards for the rest of their company. Hex makes it easy for data analysts and data scientists to collaborate together and produce work that has an impact. Make your data team unstoppable with Hex. Sign up today at dataengineeringpodcast.com/hex to get a 30-day free trial for your team! Your host is Tobias Macey and today I'm interviewing Gleb Mezhanskiy and Rob Goretsky about when and how to think about migrating your data stack Interview Introduction How did you get involved in the area of data management? A migration can be anything from a minor task to a major undertaking. Can you start by describing what constitutes a migration for the purposes of this conversation? Is it possible to completely avoid having to invest in a migration? What are the signals that point to the need for a migration? What are some of the sources of cost that need to be accounted for when considering a migration? (both in terms of doing one, and the costs of not doing one) What are some signals that a migration is not the right solution for a perceived problem? Once the decision has been made that a migration is necessary, what are the questions that the team should be asking to determine the technologies to move to and the sequencing of execution? What are the preceding tasks that should be completed before starting the migration to ensure there is no breakage downstream of the changing component(s)? What are some of the ways that a migration effort might fail? What are the major pitfalls that teams need to be aware of as they work through a data platform migration? What are the opportunities for automation during the migration process? What are the most interesting, innovative, or unexpected ways that you have seen teams approach a platform migration? What are the most interesting, unexpected, or challenging lessons that you have learned while working on data platform migrations? What are some ways that the technologies and patterns that we use can be evolved to reduce the cost/impact/need for migraitons? Contact Info Gleb LinkedIn @glebmm on Twitter Rob LinkedIn RobGoretsky on GitHub Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers Links Datafold Podcast Episode Informatica Airflow Snowflake Podcast Episode Redshift Eventbrite Teradata BigQuery Trino EMR == Elastic Map-Reduce Shadow IT Podcast Episode Mode Analytics Looker Sunk Cost Fallacy data-diff Podcast Episode SQLGlot Dagster dbt The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Sponsored By:
Hex: Hex is a collaborative workspace for data science and analytics. A single place for teams to explore, transform, and visualize data into beautiful interactive reports. Use SQL, Python, R, no-code and AI to find and share insights across your organization. Empower everyone in an organization to make an impact with data. Sign up today at [dataengineeringpodcast.com/hex](https://www.dataengineeringpodcast.com/hex} and get 30 days free!Rudderstack: Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstackSupport Data Engineering Podcast |
|
|
Aligning Data Security With Business Productivity To Deploy Analytics Safely And At Speed
2023-03-19
Yoav Cohen
– co-founder and CTO
@ Satori
,
Tobias Macey
– host
Summary As with all aspects of technology, security is a critical element of data applications, and the different controls can be at cross purposes with productivity. In this episode Yoav Cohen from Satori shares his experiences as a practitioner in the space of data security and how to align with the needs of engineers and business users. He also explains why data security is distinct from application security and some methods for reducing the challenge of working across different data systems. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Join in with the event for the global data community, Data Council Austin. From March 28-30th 2023, they'll play host to hundreds of attendees, 100 top speakers, and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data. As a listener to the Data Engineering Podcast you can get a special discount of 20% off your ticket by using the promo code dataengpod20. Don't miss out on their only event this year! Visit: dataengineeringpodcast.com/data-council today RudderStack makes it easy for data teams to build a customer data platform on their own warehouse. Use their state of the art pipelines to collect all of your data, build a complete view of your customer and sync it to every downstream tool. Sign up for free at dataengineeringpodcast.com/rudder Hey there podcast listener, are you tired of dealing with the headache that is the 'Modern Data Stack'? We feel your pain. It's supposed to make building smarter, faster, and more flexible data infrastructures a breeze. It ends up being anything but that. Setting it up, integrating it, maintaining it—it’s all kind of a nightmare. And let's not even get started on all the extra tools you have to buy to get it to do its thing. But don't worry, there is a better way. TimeXtender takes a holistic approach to data integration that focuses on agility rather than fragmentation. By bringing all the layers of the data stack together, TimeXtender helps you build data solutions up to 10 times faster and saves you 70-80% on costs. If you're fed up with the 'Modern Data Stack', give TimeXtender a try. Head over to dataengineeringpodcast.com/timextender where you can do two things: watch us build a data estate in 15 minutes and start for free today. Your host is Tobias Macey and today I'm interviewing Yoav Cohen about the challenges that data teams face in securing their data platforms and how that impacts the productivity and adoption of data in the organization Interview Introduction How did you get involved in the area of data management? Data security is a very broad term. Can you start by enumerating some of the different concerns that are involved? How has the scope and complexity of implementing security controls on data systems changed in recent years? In your experience, what is a typical number of data locations that an organization is trying to manage access/permissions within? What are some of the main challenges that data/compliance teams face in establishing and maintaining security controls? How much of the problem is technical vs. procedural/organizational? As a vendor in the space, how do you think about the broad categories/boundary lines for the different elements of data security? (e.g. masking vs. RBAC, etc.) What are the different layers that are best suited to managing each of those categories? (e.g. masking and encryption in storage layer, RBAC in warehouse, etc.) What are some of the ways that data security and organizational productivity are at odds with each other? What are some of the shortcuts that you see teams and individuals taking to address the productivity hit from security controls? What are some of the methods that you have found to be most effective at mitigating or even improving productivity impacts through security controls? How does up-front design of the security layers improve the final outcome vs. trying to bolt on security after the platform is already in use? How can education about the motivations for different security practices improve compliance and user experience? What are the most interesting, innovative, or unexpected ways that you have seen data teams align data security and productivity? What are the most interesting, unexpected, or challenging lessons that you have learned while working on data security technology? What are the areas of data security that still need improvements? Contact Info Yoav Cohen Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers Links Satori Podcast Episode Data Masking RBAC == Role Based Access Control ABAC == Attribute Based Access Control Gartner Data Security Platform Report The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Sponsored By:
Rudderstack: You can't optimize for everything all at once. That's why we take a holistic approach to data integration that optimises for agility instead of fragmentation. By unifying each layer of the data stack, TimeXtender empowers you to build data solutions 10x faster while reducing costs by 70%-80%. We do this for one simple reason: because time matters. Go to dataengineeringpodcast.com/timextender today to get started for free!Support Data Engineering Podcast |
|
|
Power Your Real-Time Analytics Without The Headache Using Fivetran's Change Data Capture Integrations
2022-09-26 · 01:20
Mark Van de Wiel
– guest
@ Fivetran
,
Tobias Macey
– host
Summary Data integration from source systems to their downstream destinations is the foundational step for any data product. With the increasing expecation for information to be instantly accessible, it drives the need for reliable change data capture. The team at Fivetran have recently introduced that functionality to power real-time data products. In this episode Mark Van de Wiel explains how they integrated CDC functionality into their existing product, discusses the nuances of different approaches to change data capture from various sources. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! You wake up to a Slack message from your CEO, who’s upset because the company’s revenue dashboard is broken. You’re told to fix it before this morning’s board meeting, which is just minutes away. Enter Metaplane, the industry’s only self-serve data observability tool. In just a few clicks, you identify the issue’s root cause, conduct an impact analysis—and save the day. Data leaders at Imperfect Foods, Drift, and Vendr love Metaplane because it helps them catch, investigate, and fix data quality issues before their stakeholders ever notice they exist. Setup takes 30 minutes. You can literally get up and running with Metaplane by the end of this podcast. Sign up for a free-forever plan at dataengineeringpodcast.com/metaplane, or try out their most advanced features with a 14-day free trial. Mention the podcast to get a free "In Data We Trust World Tour" t-shirt. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer. Your host is Tobias Macey and today I’m interviewing Mark Van de Wiel about Fivetran’s implementation of chang |
|
|
Operational Analytics To Increase Efficiency For Multi-Location Businesses With OpsAnalitica
2022-09-19 · 01:00
Tommy Yionoulis
– Founder
@ OpsAnalitica
,
Tobias Macey
– host
Summary In order to improve efficiency in any business you must first know what is contributing to wasted effort or missed opportunities. When your business operates across multiple locations it becomes even more challenging and important to gain insights into how work is being done. In this episode Tommy Yionoulis shares his experiences working in the service and hospitality industries and how that led him to found OpsAnalitica, a platform for collecting and analyzing metrics on multi location businesses and their operational practices. He discusses the challenges of making data collection purposeful and efficient without distracting employees from their primary duties and how business owners can use the provided analytics to support their staff in their duties. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer. You wake up to a Slack message from your CEO, who’s upset because the company’s revenue dashboard is broken. You’re told to fix it before this morning’s board meeting, which is just minutes away. Enter Metaplane, the industry’s only self-serve data observability tool. In just a few clicks, you identify the issue’s root cause, conduct an impact analysis—and save the day. Data leaders at Imperfect Foods, Drift, and Vendr love Metaplane because it helps them catch, investigate, and fix data quality issues before their stakeholders ever notice they exist. Setup takes 30 minutes. You can literally get up and running with Metaplane by the end of this podcast. Sign up for a free-forever plan at dataengineeringpodcast.com/metaplane, or try out their most advanced |
|
|
An Exploration Of The Expectations, Ecosystem, and Realities Of Real-Time Data Applications
2022-08-22
Shruti Bhat
– SVP of Product
@ Rockset
,
Tobias Macey
– host
Summary Data has permeated every aspect of our lives and the products that we interact with. As a result, end users and customers have come to expect interactions and updates with services and analytics to be fast and up to date. In this episode Shruti Bhat gives her view on the state of the ecosystem for real-time data and the work that she and her team at Rockset is doing to make it easier for engineers to build those experiences. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Data stacks are becoming more and more complex. This brings infinite possibilities for data pipelines to break and a host of other issues, severely deteriorating the quality of the data and causing teams to lose trust. Sifflet solves this problem by acting as an overseeing layer to the data stack – observing data and ensuring it’s reliable from ingestion all the way to consumption. Whether the data is in transit or at rest, Sifflet can detect data quality anomalies, assess business impact, identify the root cause, and alert data teams’ on their preferred channels. All thanks to 50+ quality checks, extensive column-level lineage, and 20+ connectors across the Data Stack. In addition, data discovery is made easy through Sifflet’s information-rich data catalog with a powerful search engine and real-time health statuses. Listeners of the podcast will get $2000 to use as platform credits when signing up to use Sifflet. Sifflet also offers a 2-week free trial. Find out more at dataengineeringpodcast.com/sifflet today! The biggest challenge with modern data systems is understanding what data you have, where it is located, and who is using it. Select Star’s data discovery platform solves that out of the box, with an automated catalog that includes lineage from where the data originated, all the way to which dashboards rely on it and who is viewing them every day. Just connect it to your database/data warehouse/data lakehouse/whatever you’re using and let them do the rest. Go to dataengineeringpodcast.com/selectstar today to double the length of your free trial and get a swag package when you convert to a paid plan. Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer. Your host is Tobias Macey and today I’m interviewing |
|
|
Maintain Your Data Engineers' Sanity By Embracing Automation
2022-07-10 · 20:00
Chris Riccomini
– guest
@ WePay; LinkedIn
,
Tobias Macey
– host
Summary Building and maintaining reliable data assets is the prime directive for data engineers. While it is easy to say, it is endlessly complex to implement, requiring data professionals to be experts in a wide range of disparate topics while designing and implementing complex topologies of information workflows. In order to make this a tractable problem it is essential that engineers embrace automation at every opportunity. In this episode Chris Riccomini shares his experiences building and scaling data operations at WePay and LinkedIn, as well as the lessons he has learned working with other teams as they automated their own systems. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer. Your host is Tobias Macey and today I’m interviewing Chris Riccomini about building awareness of data usage into CI/CD pipelines for application development Interview Introduction How did you get involved in the area of data management? What are the pieces of data platforms and processing that have been most difficult to scale in an organizational sense? What are the opportunities for automation to alleviate some of the toil that data and analytics engineers get caught up in? The application delivery ecosystem has been going through ongoing transformation in the form of CI/CD, infrastructure as code, etc. What are the parallels in the data ecosystem that are still nascent? What are the principles that still need to be translated for data practitioners? Which are subject to impedance mismatch and may never make sense to translate? As someone with a software engineering background and extensive e |
|
|
A View From The Round Table Of Gartner's Cool Vendors
2021-09-09 · 02:00
Akshay Deshpande
– guest
,
Dan Weitzner
– guest
,
Saket Saurabh
– CEO
@ Nexla
,
Maarten Masschelein
– guest
@ Soda Data
,
Tobias Macey
– host
Summary Gartner analysts are tasked with identifying promising companies each year that are making an impact in their respective categories. For businesses that are working in the data management and analytics space they recognized the efforts of Timbr.ai, Soda Data, Nexla, and Tada. In this episode the founders and leaders of each of these organizations share their perspective on the current state of the market, and the challenges facing businesses and data professionals today. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription Have you ever had to develop ad-hoc solutions for security, privacy, and compliance requirements? Are you spending too much of your engineering resources on creating database views, configuring database permissions, and manually granting and revoking access to sensitive data? Satori has built the first DataSecOps Platform that streamlines data access and security. Satori’s DataSecOps automates data access controls, permissions, and masking for all major data platforms such as Snowflake, Redshift and SQL Server and even delegates data access management to business users, helping you move your organization from default data access to need-to-know access. Go to dataengineeringpodcast.com/satori today and get a $5K credit for your next Satori subscription. Your host is Tobias Macey and today I’m interviewing Saket Saurabh, Maarten Masschelein, Akshay Deshpande, and Dan Weitzner about the challenges facing data practitioners today and the solutions that are being brought to market for addressing them, as well as the work they are doing that got them recognized as "cool vendors" by Gartner. Interview Introduction How did you get involved in the area of data management? Can you each describe what you view as the biggest challenge facing data professionals? Who are you building your solutions for and what are the most common data management problems are you all solving? What are different components of Data Management and why is it so complex? What will simplify this process, if any? The report covers a lot of new data management terminology – data governance, data observability, data fabric, data mesh, DataOps, MLOps, AIOps – what does this all mean and why is it important for data engineers? How has the data management space changed in recent times? Describe the current data management landscape and any key developments. From your perspective, what are the biggest challenges in the data management space today? What modern data management features are lacking in existing databases? Gartner imagines a future where data and analytics leaders need to be prepared to rely on data manage |
|
|
Unlocking The Power of Data Lineage In Your Platform with OpenLineage
2021-05-18 · 14:00
Julien Le Dem
– creator of Parquet
,
Tobias Macey
– host
Summary Data lineage is the common thread that ties together all of your data pipelines, workflows, and systems. In order to get a holistic understanding of your data quality, where errors are occurring, or how a report was constructed you need to track the lineage of the data from beginning to end. The complicating factor is that every framework, platform, and product has its own concepts of how to store, represent, and expose that information. In order to eliminate the wasted effort of building custom integrations every time you want to combine lineage information across systems Julien Le Dem introduced the OpenLineage specification. In this episode he explains his motivations for starting the effort, the far-reaching benefits that it can provide to the industry, and how you can start integrating it into your data platform today. This is an excellent conversation about how competing companies can still find mutual benefit in co-operating on open standards. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! RudderStack’s smart customer data pipeline is warehouse-first. It builds your customer data warehouse and your identity graph on your data warehouse, with support for Snowflake, Google BigQuery, Amazon Redshift, and more. Their SDKs and plugins make event streaming easy, and their integrations with cloud applications like Salesforce and ZenDesk help you go beyond event streaming. With RudderStack you can use all of your customer data to answer more difficult questions and then send those insights to your whole customer data stack. Sign up free at dataengineeringpodcast.com/rudder today. When it comes to serving data for AI and ML projects, do you feel like you have to rebuild the plane while you’re flying it across the ocean? Molecula is an enterprise feature store that operationalizes advanced analytics and AI in a format designed for massive machine-scale projects without having to manage endless one-off information requests. With Molecula, data engineers manage one single feature store that serves the entire organization with millisecond query performance whether in the cloud or at your data center. And since it is implemented as an overlay, Molecula doesn’t disrupt legacy systems. High-growth startups use Molecula’s feature store because of its unprecedented speed, cost savings, and simplified access to all enterprise data. From feature extraction to model training to production, the Molecula feature store provides continuously updated feature access, reuse, and sharing without the need to pre-process data. If you need to deliver unprecedented speed, cost savings, and simplified access to large scale, real-time data, visit dataengineeringpodcast.com/molecula and request a demo. Mention that you’re a Data Engineering Podcast listener, and they’ll send you a free t-shirt. Your host is Tobias Macey and today I’m interviewing Julien Le Dem about Open Lineage, a new standard for structuring metadata to enable interoperability across the ecosystem of data management tools. Interview Introduction How did you get involved in the area of data management? Can you start by giving an overview of what the Open Lineage project is and the story behind it? What is the current state of t |
|


Join us at the event for the global data community, Data Council Austin. From March 28-30th 2023, we'll play host to hundreds of attendees, 100 top speakers, and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data. As a listener to the Data Engineering Podcast you can get a special discount off tickets by using the promo code dataengpod20. Don't miss out on our only event this year! Visit:
TimeXtender is a holistic, metadata-driven solution for data integration, optimized for agility. TimeXtender provides all the features you need to build a future-proof infrastructure for ingesting, transforming, modelling, and delivering clean, reliable data in the fastest, most efficient way possible.