talk-data.com talk-data.com

Filter by Source

Select conferences and events

Showing 9 results

Activities & events

Title & Speakers Event
Ariel Pohoryles – guest @ Rivery , Tobias Macey – host

Summary In this episode of the Data Engineering Podcast Ariel Pohoryles, head of product marketing for Boomi's data management offerings, talks about a recent survey of 300 data leaders on how organizations are investing in data to scale AI. He shares a paradox uncovered in the research: while 77% of leaders trust the data feeding their AI systems, only 50% trust their organization's data overall. Ariel explains why truly productionizing AI demands broader, continuously refreshed data with stronger automation and governance, and highlights the challenges posed by unstructured data and vector stores. The conversation covers the need to shift from manual reviews to automated pipelines, the resurgence of metadata and master data management, and the importance of guardrails, traceability, and agent governance. Ariel also predicts a growing convergence between data teams and application integration teams and advises leaders to focus on high-value use cases, aggressive pipeline automation, and cataloging and governing the coming sprawl of AI agents, all while using AI to accelerate data engineering itself.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Ariel Pohoryles about data management investments that organizations are making to enable them to scale AI implementationsInterview IntroductionHow did you get involved in the area of data management?Can you start by describing the motivation and scope of your recent survey on data management investments for AI across your respondents?What are the key takeaways that were most significant to you?The survey reveals a fascinating paradox: 77% of leaders trust the data used by their AI systems, yet only half trust their organization's overall data quality. For our data engineering audience, what does this suggest about how companies are currently sourcing data for AI? Does it imply they are using narrow, manually-curated "golden datasets," and what are the technical challenges and risks of that approach as they try to scale?The report highlights a heavy reliance on manual data quality processes, with one expert noting companies feel it's "not reliable to fully automate validation" for external or customer data. At the same time, maturity in "Automated tools for data integration and cleansing" is low, at only 42%. What specific technical hurdles or organizational inertia are preventing teams from adopting more automation in their data quality and integration pipelines?There was a significant point made that with generative AI, "biases can scale much faster," making automated governance essential. From a data engineering perspective, how does the data management strategy need to evolve to support generative AI versus traditional ML models? What new types of data quality checks, lineage tracking, or monitoring for feedback loops are required when the model itself is generating new content based on its own outputs?The report champions a "centralized data management platform" as the "connective tissue" for reliable AI. How do you see the scale and data maturity impacting the realities of that effort?How do architectural patterns in the shape of cloud warehouses, lakehouses, data mesh, data products, etc. factor into that need for centralized/unified platforms?A surprising finding was that a third of respondents have not fully grasped the risk of significant inaccuracies in their AI models if they fail to prioritize data management. In your experience, what are the biggest blind spots for data and analytics leaders?Looking at the maturity charts, companies rate themselves highly on "Developing a data management strategy" (65%) but lag significantly in areas like "Automated tools for data integration and cleansing" (42%) and "Conducting bias-detection audits" (24%). If you were advising a data engineering team lead based on these findings, what would you tell them to prioritize in the next 6-12 months to bridge the gap between strategy and a truly scalable, trustworthy data foundation for AI?The report states that 83% of companies expect to integrate more data sources for their AI in the next year. For a data engineer on the ground, what is the most important capability they need to build into their platform to handle this influx?What are the most interesting, innovative, or unexpected ways that you have seen teams addressing the new and accelerated data needs for AI applications?What are some of the noteworthy trends or predictions that you have for the near-term future of the impact that AI is having or will have on data teams and systems?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links BoomiData ManagementIntegration & Automation DemoAgentstudioData Connector Agent WebinarSurvey ResultsData GovernanceShadow ITPodcast EpisodeThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

AI/ML Analytics Cloud Computing Data Engineering Data Management Data Quality Datafold dbt ETL/ELT GenAI Marketing Master Data Management Prefect Python SQL Data Streaming
Data Engineering Podcast
DevFest Berlin 2024 2024-11-23 · 08:00

DevFest Berlin is back! This year back to Humboldt University of Berlin, with more than 25 talks & workshops, you can expect a whole day of learning, socialising, and engaging with a vibrant Berlin Tech community!

🎫 Get you ticket here: pretix.eu/devfestberlin/2024/ 🖍 Call for Papers still open: pretalx.com/devfest-berlin-2024/cfp

Agenda

Day 1

9:00 AM: Registration & Coffee 🥐 ☕️

9:45 AM: 🎤 Welcoming

10:00 AM: 🎤 Katya Vinnichenko - Introduction to Google Principles of Responsible AI

This year's DevFest explores how AI can improve lives globally, from business to healthcare to education. At Google we acknowledge AI's potential, while also recognising the challenges it presents. Thus, we are committed to helping you build and use AI responsibly, ensuring fairness and ethical practices.

In my talk you will learn: the main principles of responsible AI at Google; the ethical implications of AI; best practices for developing AI systems and integrating AI into Google products and services; last but not least – how AI will change the role of the developer as we know it.

10:50 AM: 🎤 Oleksii Antypov - DMARC Demystified

Discover the essential framework behind DMARC and how it secures email communication across the internet. This session covers the historical evolution of email security, dives into the common challenges of implementing DMARC, and provides actionable best practices for protecting your domain. Ideal for developers, security professionals, and anyone interested in safe email practices.

In a world where phishing and email spoofing are constant threats, DMARC stands as a vital defense mechanism. “DMARC Demystified” takes you through a journey from the origins of email security to the modern challenges and solutions that DMARC offers. We'll explore how DMARC works with SPF and DKIM, why it’s essential for organizations of all sizes, and the practical steps to ensure smooth implementation.

Expect an interactive timeline tracing the milestones of email security, detailed breakdowns of real-world cases, and insights into optimizing DMARC. Walk away with a deeper understanding of email protection, armed with knowledge to strengthen your email systems and protect against threats.

11:40 AM: 🎤 Marcin Chudy - Demystifying App Architecture: The LeanCode Guide

At LeanCode we developed over 40 Flutter apps, spanning from huge enterprise apps to nimble startup ventures. Some were developed by a single Flutter dev, some came into light through collaborative efforts across multiple teams. Each of them was different. Each of them presented unique challenges and taught us invaluable lessons.

In this talk, we invite you to explore different approaches to architecting Flutter apps. Central to our narrative will be the concept of architectural drivers - key factors or priorities that steer our decisions about how the app is structured and designed. We'll show how we leverage our experience when approaching new projects. Drawing from our successes and failures, we'll present our current Flutter stack which enables us to craft robust, scalable, and maintainable applications. While there is no silver bullet for Flutter architecture, we can still have some sensible defaults.

Why do we use BLoC for state management? Why not Riverpod? Why do we love hook

12:30 PM: 🎤 Danny Preussler - Ten things you heard about testing that might be wrong

Testing became an essential part of Android development. Many conference talks have been given and even more best practices have been written.

But what if, as time evolved, some of the things we thought were true, changed?

Let’s start questioning some of these in this talk: Are flaky tests fixable? Are mocks even harmful? Is DI about testing? Did we understand testing in isolation properly? Is the test pyramid still valid? And in times of AI, should we generate tests?

Come and join my session to learn more!

1:10 PM: Lunch 🍔🥤

2:40 PM: 🎤 Andrey Sitnik - Privacy-first architecture: alternatives to GDPR popup and local-first

Why and how modern developers could increase the privacy of modern Web.

The popularity of clouds, the rise of huge monopolies across the internet, and the growth of shady data brokers recently have made the world a much more dangerous place for ordinary people—here is how we fix it.

In this talk, Andrey Sitnik, the creator of PostCSS and the privacy-first open-source RSS reader, will explain how we can stop this dangerous trend and make the web a private place again. — Beginners will find simple steps, which can be applied to any website — Advanced developers will get practical insights into new local-first architecture — Privacy experts could find useful unique privacy tricks from a global world perspective and beyond just U.S. privacy risks

3:30 PM: 🎤 Raphaël VO - Largest Contentful Paint - The unheard story

Largest Contentful Paint (LCP) is more than a speed metric — it's the unseen factor shaping user experiences and impacting SEO. While often overlooked, LCP reveals when a page’s core content is truly ready, affecting how users perceive load time and usability. This talk uncovers LCP’s role, why it matters more than we think, and simple strategies to boost LCP for better engagement and rankings. Discover the hidden story behind one of web performance’s most crucial, yet understated metrics.

Did you know the speed of a single webpage element could decide if users stay or leave? Largest Contentful Paint (LCP) is that hidden hero, quietly working to load the most important content quickly. This talk unveils LCP’s role in creating faster, more engaging web experiences and why it’s key to winning user loyalty. Dive into the “unheard story” of LCP and discover practical tips to make your site not only faster but unforgettable.

4:20 PM: 🎤 Ash Davies - Navigation in a Multiplatform World: Choosing the Right Framework for your App

Navigation in mobile, desktop, and web applications is such a fundamental part of how we structure our architecture. In order to both obtain functional clarity, and abstraction from platform level implementation.

For a long time, there have been options available specific to each platform, and even options part of the platform framework itself. Though it can be difficult to find the right option for platform-agnostic code, ensuring consistency. Some go one step further, providing an opinionated guide on how to architecture your application.

In this talk, I'll evaluate the options available, how they differ, and to what type of applications they are best suited. Including how to get started with them, and the best practice guidelines on how to get the most out of them, for your application.

5:10 PM: 🎤 Vadim Makeev - You don’t know MathML. Almost nobody does

Do you speak math? Me neither. Still, math formulas have always been around: from Wikipedia articles to JavaScript APIs and even CSS docs. It looks so alien that I never had a clue how to express it on the web. Apparently, there’s a markup language for that. HTML for content, SVG for vector graphics, and MathML for math! And it’s pretty cross-browser, too. Let’s dive into the basics and quirks of the language of the universe. Even if math is not your love language, you might learn something interesting about the web platform.

Day 2

9:00 AM: Registration & Coffee 🥐 ☕️

10:00 AM: 🎤 Alex Mir – Accessibility matters

The regulators are here and now businesses will care about the a11y. Let's make the a11y compliance not just a formal check. I believe that it is our job as industry experts to understand why it is important and get our products ready for all groups of people.

10:50 AM: 🎤 Marco Gomiero - From Android to Multiplatform and beyond

With Kotlin Multiplatform getting increasingly established, many Android libraries became multiplatform.

But how to make an existing Android library multiplatform?

In this talk, we will cover the common challenges faced while migrating Android libraries to Kotlin Multiplatform, like handling platform-specific dependencies, re-organizing the project structure without losing the contributor's history, testing on multiple platforms, and publishing the library.

11:20 AM: 🎤 Muhammad Salman Bediya - Crucial Performance Issue in Flutter Apps: Memory Leaks

Memory leaks can be hard to spot but have a big impact on the performance of Flutter apps, especially those running for long periods. In this talk, we’ll explore the most common reasons memory leaks happen in Flutter and Dart, focusing on how asynchronous programming and Streams can make them more challenging. You’ll learn practical tips to identify and fix these issues, helping your apps run smoother and more efficiently.

11:40 AM: 🎤 Andrii Raikov - Maximizing Scalability with Go and Redis: A Telemetry Processing Journey

At Delivery Hero, we process 10,000 requests per second using Go and Redis. Join us to learn how this powerful duo handles high-load telemetry data efficiently and cost-effectively, with scalability, resource optimization, and continuous innovation through customized data flows.

12:30 PM: 🎤 Tomek Porozynski - Can You Outsmart an AI? Adventures in Prompt Hacking

In this talk combined with hands-on elements, participants will engage in a series of live prompt hacking challenges, accessible directly through their mobile devices. The workshop begins with simple prompt injection techniques and progressively moves to more sophisticated manipulation strategies. After each successful hack, I'll analyze what made it work and transform these insights into practical defense mechanisms.

Attendees will learn: Common vulnerabilities in AI prompt design, Practical techniques for prompt injection attacks, Essential strategies for securing chatbot applications, Best practices for implementing defensive layers, Real-world examples of prompt security failures and successes

Perfect for developers working with AI models, security enthusiasts, or anyone interested in building safer AI applications. No specialized tools needed - just bring your phone and creativity! You'll leave with concrete techniques for both testing and securing your AI systems against prompt manipulation attacks.

1:10 PM: Lunch 🍔🥤

2:40 PM: 🎤 Cesar Martinez - Domain Driven Design Fundamentals for Frontend Developers

What can we learn from Domain Driven Design and how to start applying its teachings in your frontend codebase.

3:30 PM: 🎤 Vadym Pinchuk - Effortless optimization of Flutter apps: performance tips for developers

In this session, we’ll dive into effortless yet impactful ways to optimize your Flutter applications. Performance improvements don’t always require a full rewrite—sometimes, small adjustments can lead to big gains. We'll explore practical tips and tricks for enhancing app speed, responsiveness, and efficiency with minimal effort. From reducing widget rebuilds to handling large data efficiently and managing state effectively, this talk will provide developers with actionable insights to deliver a smoother user experience. Whether you’re a beginner or an experienced Flutter dev, you’ll walk away with easy-to-apply techniques to optimize your apps without breaking a sweat.

4:20 PM: 🎤 Ian Ballantyne - Generative AI on Mobile and Web with Google AI Edge

Generative AI is no longer limited to execution in the cloud. Small language models, such as Gemma 2B, are quickly becoming small and powerful enough for on-device AI, offering benefits like low latency, offline functionality, privacy, and cost-effectiveness. Google AI Edge, with MediaPipe and LiteRT (formerly Tensorflow Lite), enables the development and deployment of efficient on-device AI models. These frameworks handle the complexities of model execution and hardware acceleration, allowing developers to focus on creating innovative AI experiences.

Think generative AI is just about chatbots? Think again. This talk will go beyond basic conversations with language models and explore how on-device generative AI can be integrated into everyday apps ready to help with tasks, answer questions, and provide creative inspiration, all powered by the information located on-device. Imagine truly useful apps that are quick to respond and still work without an internet connection.

5:10 PM: 🎤 Bogdan Plieshka - Automated Testing Layers in a multidimensional Monorepo: Fast-tracking Quality for hundreds apps

In this talk, I’ll dive into the testing layers that make up our quality pipeline at Zattoo, including static analysis, unit, system, and end-to-end testing.

We’ll discuss the concept of quality gates, shift-left approach, and affected domain recognition, which helps us maintain reliability across a large, dynamic codebase, bringing total quality feedback for contributors to 3 minutes.

I’ll share practices for achieving scalable, fast testing in a high-complexity environment, offering insights for anyone working with large-scale applications or monorepos and looking to streamline QA processes.

Day 3

9:00 AM: Registration & Coffee 🥐 ☕️

10:00 AM: 🎤 Inès Mir & Doruk Deniz Kutukculer - Fellowship of Product. How your team setup affects your experience

Did you know there are 2 types of team formation in tech? These formations can change your experience in the team drastically and you better recognise them early to adjust your expectations from the job. And even more importantly, you need to show different qualities on job interviews to get this job in a particular team formation!

Deniz Doruk Kuetuekcueler, a head of engineering, and Inès Mir, a principal product designer, are trying to figure out how design and engineering can effectively work together in these setups.

10:50 AM: 🎤 Alireza Rahmaty - How we automate the App Release Monitoring at GetYourGuide

App release monitoring (ARM) represents a suite of innovative tools designed to monitor the health and stability of iOS and Android app releases. These tools provide real-time updates by sending notifications to Slack channels and logging the app's status throughout the release process. At GetYourGuide, we have developed an ARM to monitor the rollout of our Android and iOS apps from the moment they are submitted to the App Store & Google Play until they are fully released. We ship releases faster and with more confidence using ARM!

11:40 AM: 🎤 Aleksandr Gorbunov - Flutter for frontenders or There and Back Again

Every developer, regardless of specialization, may encounter the need to create a UI for a client application. The choice of technology may depend on the developer, or it may be pre-determined by the client, as happened in my case.

The peculiarity is that, coming from frontend development in JavaScript, I started building user interfaces in Flutter.

Today, there is a vast number of technologies that enable the development of cross-platform applications. These technologies are evolving rapidly, attracting large communities, and more frequently, companies are adopting them. For example, Flutter is a powerful framework that allows developers to create cross-platform applications.

With a high probability, every developer may encounter the need to use such development tools, and it’s great that frameworks like Flutter come with detailed documentation and extensive community support, making it relatively easy to start developing with them. Although, at first glance, everything might not seem smooth, and the desire to revert to familiar methods may arise.

12:05 PM: 🎤 Muhammad Salman Bediya - Crucial Performance Issue in Flutter Apps: Memory Leaks

Memory leaks can be hard to spot but have a big impact on the performance of Flutter apps, especially those running for long periods. In this talk, we’ll explore the most common reasons memory leaks happen in Flutter and Dart, focusing on how asynchronous programming and Streams can make them more challenging. You’ll learn practical tips to identify and fix these issues, helping your apps run smoother and more efficiently.

12:30 PM: 🎤 Ole Bulbuk - Native GUIs For All

Traditionally native GUIs are highly platform dependent and often specific for one programming language. In this talk we will explore a way to create GUI applications that supports virtually all platforms and any programming language. It is very effective and easy to use, too.

1:10 PM: Lunch 🍔🥤

2:40 PM: 🎤 Nicole Terc - Tap it! Shake it! Fling it! Sheep it! - The Gesture Animations Dance!

Let's have fun with animations, gestures and sensors!

Using Compose Multiplatform, we'll go over how to create animations using gestures and sensor events for Android & iOS. We'll cover some basics like how to get the device motion and position information, how to track gestures in the screen, and how you can combine them with animations to have fun!

After this talk, you'll have a better understanding on how to use the sensor frameworks, how to make your own gesture effects, and how to create interesting animations in an easy way.

Keep it fun, keep it animated!

3:30 PM: 🎤 Andrii Khrystian - From waves to widgets: Sound processing in Flutter

In this talk, we'll explore how to work with sound in Flutter apps. We'll go over the basics of adding sound effects and processing audio to make your apps more interesting. You'll learn how to handle audio files and integrate them smoothly with your Flutter projects. This session is great for anyone looking to add audio features to their apps simply and effectively.

4:20 PM: 🎤 Randy Nel Gupta - From Practice: Migration of an Order Processing System to the Cloud

A case study on how an order processing system, processing 50,000 orders daily for an international retailer spread across multiple continents and jurisdictions, is migrated to the cloud. The legacy system is implemented in PL/SQL and must be migrated during ongoing operations.

The presentation will cover all aspects from testing, monitoring, to development and the application of Site Reliability Engineering.

Furthermore, less technical topics will be introduced, such as the systematic composition of teams to ensure the necessary technical as well as domain-specific expertise.

4:50 PM: 🎤 Wietse Venema - Running open large language models in production with serverless GPUs

Many developers are interested in running open large language models, such as Google's Gemma and Llama. Open models give you full control over the deployment options, the timing of model upgrades, the private data that goes into the model, and the ability to fine-tune on specific tasks such as data extraction. Hugging Face TGI is a popular open-source LLM inference server, and Hugging Face TRL is excellent for fine-tuning. You’ll learn how to build and deploy an application that uses an open model on Google Cloud Run with cost-effective GPUs that scale down to zero instances.

Day 4

9:00 AM: Registration & Coffee 🥐 ☕️

10:00 AM: 🎤 Daniel Stamer & Diana Nanova - Workshop: From Prototype to Production

In this hands-on technical workshop participants will work on a hilarious web service prototype and deploy it to the cloud, set up build and deployment pipelines, extend the code base to leverage GenAI functionality, use SRE practices to effectively operate the application and finally strengthen the security posture of the overall software delivery process to guard against supply chain attacks.

1:10 PM: Lunch 🍔🥤

2:40 PM: 🎤 John Nguyen - Building a Chrome Extension using Gemini and Langchain

In this workshop, you will learn the basics of creating a Google Chrome Extension (which will also work on any Chromium-based Browser). We will build a simple Page summarizer using Bun, Typescript, Gemini, and LangChain. We will learn the anatomy of the manifest.json for building a Chrome Extension, Bun's bundler, how to interact with Gemini, and why LangChain is a good idea here.

3:45 PM: 🎤 Guillaume Vernade - How to make the most of Gemini multimodal capabilities?

We all know that in Tech there are always dozens of way of doing anything. But what if we could only use LLM for a first investigation? Let me show you how I'm trying to solve the mystery of who killed my pond's fishes using the power of Gemini.

Day 5

9:00 AM: Registration & Coffee 🥐 ☕️

10:00 AM: 🎤 Mario Bodemann & Joost van Dijk - Workshop: Passkeys on Android: How to get rid of passwords

Passwords. Or two factors? What about multiple factors? Which email did you register with? Why is 'password123' not working on this side, that is password is shared everywhere else?

If you recognize some of those questions, I am happy to add another couple: What are passkeys? Or how about: How to use passkeys to replace passwords in an Android app?

In this workshop I will walk through the later two questions: How to build an Android App that registers and signs users in, using passkeys. Expect a quick explanation of this fancy new technology, why it will replace passwords and how you can store them either on your mobile devices or on dedicated hardware. Following that, a fictive application and service will be built to show you how to use those passkeys and which moving pieces you will need.

Expect to use you Android Studio with Kotlin and common best practices to build an Android app, talking to the public available backend.

11:05 AM: 🎤 Anton Borries - Workshop: Adding Homescreen Widgets to Flutter Apps

HomeScreen Widgets are a great way to provide more Information to your Users right on their HomeScreens providing more ways for your App to appear in User's lives and help them achieve their goals.

In this Workshop we'll look at the necessary steps needed in order to add HomeScreen Widgets to Flutter Apps using the home_widget package

12:10 PM: 🎤 Elena Grahovac - Workshop: Mastering Multiple Engineering Leadership Roles for Maximum Impact

As an engineering manager or technical leader, navigating multiple roles that demand a diverse set of skills is a common yet challenging part of the job.

In this workshop, we will explore how to effectively balance these multiple roles and responsibilities in a complex engineering environment. Participants will be guided through the creation of their own leadership framework, tailored to adapt to the unique situations and styles of each individual. Beginning with identifying core values and responsibilities, the framework is elaborated into an actionable plan to succeed.

This workshop not only offers an opportunity for reflection on personal and professional development but also provides tools and insights to enhance management capabilities and team dynamics. Join us to cultivate a comprehensive approach to leadership that aligns with your unique role, responsibilities, and personal style.

1:10 PM: Lunch 🍔🥤

2:40 PM: 🎤 Gus Martins - Workshop: Gemma for Everyone: Your First Steps with Open Models and AI

Dive into the world of open models and AI with Gemma! This workshop will guide you through the basics of using Gemma, Google's powerful family of language models. Learn how to harness Gemma's capabilities for tasks like text generation, question answering, and more. We'll also explore how to fine-tune Gemma on your own data, allowing you to create custom AI solutions tailored to your needs. No prior experience with large language models is required!

3:45 PM: 🎤 Shahriyar Rzayev - Learn Flask the hard way: Introduce Architecture Patterns

Flask is a popular and flexible web framework for Python, but building scalable and maintainable Flask applications can be challenging without a solid understanding of architecture patterns. This workshop aims to provide participants with a detailed explanation of applying architecture patterns to Flask projects. By exploring various design principles and best practices, attendees will learn how to structure their Flask applications for improved scalability, modularity, and maintainability.

Focusing on the Repository, Unit of Work, and Use Cases patterns, attendees will gain experience in applying these patterns to enhance code organization, maintainability, and testability. All these layers are wired together using Dependency Injection, which is yet another powerful tool to use in your applications.

The application we are going to build is stored in: https://github.com/ShahriyarR/hexagonal-flask-blog-tutorial

We are going to completely rewrite the official Blog application described in Flask documentation by applying architecture patterns.

All abstraction layers are covered by unit and integration tests, which will give the attendees a detailed view of why it is important to structure the application using architecture patterns.


Speakers

Aleksandr Gorbunov - Smart Steel Technologies (Full Stack Developer)

A skilled developer specializing in JavaScript (JS) and TypeScript (TS), with strong expertise in frontend development. Proficient in the Vue ecosystem (Vue2, Vue3, Composition API, Nuxt 3), using Webpack and Vite for project bundling. Experienced in testing with Vitest, Cypress, and Jest. Adept in CSS preprocessors like SASS and Stylus. Additionally, has solid knowledge of Flutter and experie…

Andrey Sitnik - Evil Martians (Lead Engineer)

With more than 20 years in open source, Andrey Sitnik created a few popular CSS tools (PostCSS, Autoprefixer), local-first framework (Logux), and many small libraries with millions of downloads (like Nano ID).

Andrii Khrystian - Dynatrace (Senior Flutter Developer)

GDG Linz organiser. Senior Flutter Developer at Dynatrace. Public speaker and tech writer

Andrii Raikov - Delivery Hero SE (Principal Software Engineer)

Andrii is a Principal Software Engineer at Delivery Hero. He has a total of 15 years of experience with Ruby and has been very passionate about Go for the last 5 years.

Anton Borries - 1KOMMA5° (Software Engineer)

Anton is a Software Engineer working at 1KOMMA5° He loves building great UI and UX using Flutter. Coming from an Android Background the gap between Flutter and native Features has always tickled his interest. This has lead him into improving the experience of developing HomeScreen Widgets for Flutter Apps

Ash Davies

Google Developer Expert for Android, enthusiastic speaker, lead engineer at ImmobilieenScout24, Kotlin aficionado, spends more time travelling than working.

Daniel Stamer - Google (Cloud Customer Engineer)

Daniel is passionate about building modern cloud-native applications on Google's serverless technologies. He works with digital natives out of Germany’s startup capital Berlin and helps to modernize applications or build brand new ones in the cloud.

Danny Preussler - SoundCloud (Android Platform Lead)

Danny is a developer by heart, living in Berlin and leading the Android team at SoundCloud. He worked for companies like Groupon, Viacom, eBay and Alcatel and started his mobile career long before any Android with Java ME and Blackberry applications. Danny writes and talks about mobile development and testing regularly and is a Google Developer Expert for Android and Kotlin.

Elena Grahovac - FerretDB (Director of Engineering)

Elena has been in software engineering since 2007, focusing on backend systems and infrastructure. Having played the roles of both individual contributor and engineering manager, Elena is passionate about combining technical expertise with strong team collaboration. A dedicated advocate of DevOps practices, she aims to enhance workflows and bring teams together. Elena believes in helping peopl…

Gus Martins - Google (Developer Advocate)

Katya Vinnichenko - Google (Program Manager)

Katya is a Program Manager at Google Developer Relations team. Currently she is leading the Google Developer Groups across Europe, the Middle East and Africa.

Marcin Chudy - LeanCode (Senior Flutter Developer)

Marcin is a Senior Flutter Developer at LeanCode, currently playing tech lead role in a big project for the banking sector. Previously worked with backend, web frontend with React, finally settling on mobile and falling in love with Flutter at first sight. After work, he enjoys dancing salsa and bachata and attends metal concerts. Marcin is a Senior Flutter Developer at LeanCode and has …

Marco Gomiero - Airalo (Senior Android Developer | Kotlin GDE)

Marco is an Android engineer, currently working at Airalo. He is a Google Developer Expert for Kotlin, he loves Kotlin and he has experience with native Android and native iOS development, as well as cross-platform development with Flutter and Kotlin Multiplatform. In his spare time, he writes and maintains open-source code, he shares his dev experience by writing on his blog, speaking a…

Mario Bodeman - Yubico (Android Developer Advocate)

Speaker of talks, coder of code, doer of dones.

Muhammad Bediya

Muhammad Salman is a Senior Software Engineer specializing in mobile app development with a focus on building scalable, high-quality applications using Flutter, React Native, Xamarin, and Swift. With experience leading frontend teams on enterprise-level projects that have reached over 1.5 million users, he brings a strong commitment to creating impactful, user-centered solutions. A dedic…

Nicole Terc

Android GDE, Boardgame lover, videogame addict and origami enthusiast, Nicole self taught herself to code and has been fooling around with the Android ecosystem for more than 10 years. She has participated in a diverse variety of projects for several clients around the world, including video streaming, news, social media and public transport applications. Regardless of what the current adventu…

Ole Bulbuk - Ardan Labs

Ole is a backend engineer since the nineties. He has been working for many companies big and small and seen many projects fail or succeed. He loves to be part of the global Go community and working on projects that make the world a better place. In his spare time he is co-organising the Berlin chapter of GDG Golang, develops open source software and enjoys time with his family.

Oleksii Antypov - DmarcDkim.com (Founder & CEO)

Experienced CTO specializing in early-stage startups. Formerly with Rocket Internet and PocketBook, now focused on accelerating global DMARC adoption. Originally from Ukraine, I relocated to Berlin in 2015 to deepen my expertise in building successful startups from the ground up.

Raphaël VO - Ekino (Senior Software Engineer)

I’m Raphael Vo, a passionate Senior Software Engineer with over 10 years of experience, specializing in Angular and frontend development. I love turning complex ideas into delightful user experiences and tackling challenges creatively and enthusiastically. When I'm not coding, you’ll find me diving into the latest tech trends or enjoying epic board game nights with friends. As an aspiring spea…

Vadim Makeev

Frontend developer in love with the Web, browsers, bicycles, and podcasting. He/him, MDN technical writer, Google Developer Expert.

Alex Mir - mobile.de (Frontend Engineer)

Frontend Engineer at car retail platform mobile.de (part of Adevinta / ex-Ebay)

Alireza Rahmaty - GetYourGuide (Android Developer)

I am Alireza, an Android developer with 6+ years of experience building apps. I have experience building server-driven UI apps, complex UI, localisation and testing, and CI/CDI. I sometimes go hiking and play video games.

Cesar Martinez - Meyer Sound (Web Developer)

Web developer with around 10 years of experience and a passion for software architecture. Currently working at Meyer Sound.

Bogdan Plieshka - Zattoo (Principal Engineer)

Engineer with over a decade of Frontend development experience, passionate about automation, accessibility, and scaling complex systems. Working at Zattoo as a Principal Engineer, focusing on delivering frontend solutions across Web, React, and React Native for streaming media content.Organizer of the React Berlin Meetup, actively contributing to the development community.

Diana Nanova - Google (Customer Engineering Manager)

Diana is a Customer Engineering Manager at Google Cloud. Based in the German tech startup capital Berlin, Diana helps digital native customers and startups across various industries to leverage the capabilities of Google Cloud and loves championing for Google culture.

Doruk Deniz Kutukculer - Zalando (Head of Engineering)

IT professional and a leader with over 15 years of experience in the industry. Currently a Head of Engineering at Zalando.

Guillaume Vernade - Google (AI Dev Rel)

I've been a jack-of-all-trades in the Tech industry, starting as a prototyper building apps on Google Glasses and the first Android watches, then became a Product Owner and an Agile coach. I realized my childhood dream of becoming a video game producer then came back to my other passion: AI.

Ian Ballantyne - Google (AI DevRel)

Ian is a Developer Relations Engineer for AI at Google. Currently he works on generative AI, such as Gemini and Gemma. He is passionate about on-device AI, using technologies such as Google AI Edge to deploy artificial intelligence to web and mobile devices. He has been in Developer Relations at Google for 9 years specializing in helping partners and developers unlock the capability of Google …

Inès Mir - Zalando (Principal Product Designer)

A principal product designer at Zalando and a content creator.

John Nguyen - Eon (Backend Developer)

Fullstack developer with a knack for whipping up code recipes using my secret ingredients: a dash of JavaScript, a pinch of Python, and a whole lot of serverless magic John's journey in software development began as a PHP developer, but he later transitioned to front-end development and became passionate about all things related to Javascript. While working as a data DevOps engineer in a…

Joost van Dijk - Yubico (Developer Advocate)

Joost van Dijk is a developer advocate at Yubico. As the inventor of the YubiKey, Yubico makes secure login easy and available for everyone. Joost focuses on securing digital identities and accelerating the adoption of open authentication standards as part of Yubico’s developer program.

Randy Gupta

Randy is a Google Developer Expert for Cloud and also Organizer of the GDG Düsseldorf. With a professional experience of more 25 years in software development he is focused today on building microservices applications on top of Kubernetes.

Shahriyar Rzayev - Nord Security (Senior Software Engineer)

Senior Software Engineer @ Nord Security. Moving forward on Clean Code and Clean Architecture. Previous accomplishments include contributing to open source, providing technical direction, and sharing knowledge about Clean Code and Architectural patterns. An empathetic team player and mentor. Azerbaijan Python Group Leader. Former QA Engineer and Bug Hunter.

Tomek Porożyński - Atos

Vadym Pinchuk - Sky (Mobile Software Engineer)

Vadym, a seasoned software engineer, possesses a wealth of experience in Android application development. He has skillfully transitioned his expertise to cross-platform development, utilizing Flutter. Throughout his career, Vadym has collaborated with a diverse range of companies, from industry giants like Samsung, Volvo, Bosch, and Instagram to smaller start-ups. Leveraging his extensiv…

Wietse Venema - Google (Google Cloud Engineer)

Wietse Venema is an engineer at Google Cloud. He wrote the O’Reilly book on Cloud Run.

Hosts

Seemran Xec - Sawayo (Software Engineer)

A focused developer possessing professional experience of 6+ years in software development for product-based and service-based industries, with businesses acquiring valuable insight and implementing best practices. Collaborated with startups and other businesses as a freelancer/consultant to build, design, and manage the product. I'm passionate about what I do and a lifelong learner.

Louis Tsai - Zalando SE (GDG Organizer)

Alex Mir - mobile.de (Frontend Engineer)

Frontend Engineer at car retail platform mobile.de (part of Adevinta / ex-Ebay)

Jhoon Saravia - Greenmates (Mobile Engineer)

Software consultant and developer, experienced in Android, Flutter and Full-stack. Interested in working on DEI initiatives as a complement to my core work. Particularly interested in technology, gadgetry, the future, the combination of those three and the impact that driving Diversity, Equity and Inclusion has on all of them both in and out of the workplace.Amateur photographer a…

Matthias Geisler - Thermondo (Senior Software Engineer)

True believer in (Kotlin) Multiplatform and working with it for over 4 years now. Builds solutions for Android. Maintainer and developer of KMock. Co-Organizer of KUG Berlin, GDG Android Berlin, Rust Berlin and XTC Berlin.

Emy Jamalian - Atlas Metrics (Software QA Engineer)

Complete your event RSVP here: https://gdg.community.dev/events/details/google-gdg-berlin-presents-devfest-berlin-2024/.

DevFest Berlin 2024

°°Get your tickets now on eventbrite.com°°

What is XConf: XConf is a tech conference created by technologists, for technologists who care deeply about the craft of software and its ability to make the world a better place. This one-day, two-track event will host a diverse range of technology leaders discussing the latest tech topics and points of view.

At XConf you will:

  • Discover emerging technologies and practices, use cases, and real stories
  • Listen to talks by a diverse line-up with a 50/50 gender split
  • Learn how to implement new technologies where you work and have greater impact
  • Hear from people who broke through seemingly insurmountable resistance to change
  • Talk with other interesting and ambitious technologists
  • Let go of previously-held assumptions and gain new perspectives

This one-day, two-track event gives you indirect access to a diverse range of Thoughtworks senior technologists working on our clients’ most complex challenges. Seating is limited to 100 guests, so register today!

Exclusive keynote my Trisha Gee Talk: ARE YOUR TESTS SLOWING YOU DOWN? Testing is a Good Thing, right? Especially automated testing. But "Good things come to those who wait" is not something that's going to appeal to the busy developer. You want results, and you want them now. You're in The Zone working on a problem, and the last thing you want is to break your flow wrestling with your testing framework or waiting for the tests to finish running.

More code means more tests. More coverage means more tests. More tests mean more time. Time that you want to spend being productive, creative, innovative. How can you balance the need for quality with the need for speed?

In this talk, Trisha will identify issues that slow down developers when writing, running and debugging tests, and look at tools that can help developers with each of these problems. There will be live coding, analysis of social media poll results, an overview of solutions in this space, "best practice" recommendations, and machine learning will be mentioned at some point.

Sneak peek into our agenda:

  • Evaluating LLM systems: What does it mean in practice? - Aili Asikainen and Oege Dijk
  • Container Security - Implementing zero trust & principle of least privilege in containers inside K8s - Amit Dube
  • Building Data Mesh on Databricks and Azure for a large manufacturing company - Dominika Makuch and Rieke Heinze

Check out the full agenda on eventbrite.com

Thoughts from past events: “Your event has really become a highlight in my conference calendar, thanks for that! See you next time!” - Mark, XConf attendee.

“It’s a really interesting conference...loads of different talks about what is modern in technology and what are the things to do. Hearing about those from great speakers is really good!" - Tim, XConf attendee.

°°Get your tickets now°°

XConf Europe 2024 | Barcelona (tickets via eventbrite.com)
Riccardo Amadio – Big Data Engineer and Dagster Evangelist

Big Data Engineer and Dagster Evangelist discusses Dagster as a modern data orchestrator.

Dagster
Vasilije Markovic – Founder @ Cognee.ai

Vasilije Markovic discusses enriching LLM context with multilayer graphs.

multilayer graphs llm context
Aleksandr "Sasha" Zolotukhin – Head of Business Intelligence @ JustWatch GmbH

Lightdash evaluation, integration, and user experience at JustWatch.

Lightdash
Will Thompson – guest @ Privacy Dynamics , Tobias Macey – host

Summary There are many dimensions to the work of protecting the privacy of users in our data. When you need to share a data set with other teams, departments, or businesses then it is of utmost importance that you eliminate or obfuscate personal information. In this episode Will Thompson explores the many ways that sensitive data can be leaked, re-identified, or otherwise be at risk, as well as the different strategies that can be employed to mitigate those attack vectors. He also explains how he and his team at Privacy Dynamics are working to make those strategies more accessible to organizations so that you can focus on all of the other tasks required of you.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Today’s episode is Sponsored by Prophecy.io – the low-code data engineering platform for the cloud. Prophecy provides an easy-to-use visual interface to design & deploy data pipelines on Apache Spark & Apache Airflow. Now all the data users can use software engineering best practices – git, tests and continuous deployment with a simple to use visual designer. How does it work? – You visually design the pipelines, and Prophecy generates clean Spark code with tests on git; then you visually schedule these pipelines on Airflow. You can observe your pipelines with built in metadata search and column level lineage. Finally, if you have existing workflows in AbInitio, Informatica or other ETL formats that you want to move to the cloud, you can import them automatically into Prophecy making them run productively on Spark. Create your free account today at dataengineeringpodcast.com/prophecy. The only thing worse than having bad data is not knowing that you have it. With Bigeye’s data observability platform, if there is an issue with your data or data pipelines you’ll know right away and can get it fixed before the business is impacted. Bigeye let’s data teams measure, improve, and communicate the quality of your data to company stakeholders. With complete API access, a user-friendly interface, and automated yet flexible alerting, you’ve got everything you need to establish and maintain trust in your data. Go to dataengineeringpodcast.com/bigeye today to sign up and start trusting your analyses. Your host is Tobias Macey and today I’m interviewing Will Thompson about managing data privacy concerns for data sets used in analytics and machine learning

Interview

Introduction How did you get involved in the area of data management? Data privacy is a multi-faceted problem domain. Can you start by enumerating the different categories of privacy concern that are involved in analytical use cases? Can you describe what Privacy Dynamics is and the story behind it?

Which categor(y|ies) are you focused on addressing?

What are some of the best practices in the definition, protection, and enforcement of data privacy policies?

Is there a data security/privacy equivalent to the OWASP top 10?

What are some of the techniques that are available for anonymizing data while maintaining statistical utility/significance?

What are some of the engineering/systems capabilities that are required for data (platform) engineers to incorporate these practices in their platforms?

What are the tradeoffs of encryption vs. obfuscation when anonymizing data? What are some of the types of PII that are non-obvious? What are the risks associated with data re-identification, and what are some of the vectors that might be exploited to achieve that?

How can privacy risks mitigation be maintained as new data sources are introduced that might contribute to these re-identification vectors?

Can you describe how Privacy Dynamics is implemented?

What are the most challenging engineering problems that you are dealing with?

How do you approach validation of a data set’s privacy? What have you found to be useful heuristics for identifying private data?

What are the risks of false positives vs. false negatives?

Can you describe what is involved in integrating the Privacy Dynamics system into an existing data platform/warehouse?

What would be required to integrate with systems such as Presto, Clickhouse, Druid, etc.?

What are the most interesting, innovative, or unexpected ways that you have seen Privacy Dynamics used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Privacy Dynamics? When is Privacy Dynamics the wrong choice? What do you have planned for the future of Privacy Dynamics?

Contact Info

LinkedIn @willseth on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, Podcast.init to learn about the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers

Links

Privacy Dynamics Pandas

Podcast Episode – Pandas For Data Engineering

Homomorphic Encryption Differential Privacy Immuta

Podcast Episode

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

AI/ML Airflow Analytics API BigEye ClickHouse Cloud Computing Data Engineering Data Management Druid ETL/ELT Git Informatica Kubernetes Pandas Presto Python Cyber Security Spark
Tal Galfsky – guest @ Cherre , Tobias Macey – host

Summary Most of the time when you think about a data pipeline or ETL job what comes to mind is a purely mechanistic progression of functions that move data from point A to point B. Sometimes, however, one of those transformations is actually a full-fledged machine learning project in its own right. In this episode Tal Galfsky explains how he and the team at Cherre tackled the problem of messy data for Addresses by building a natural language processing and entity resolution system that is served as an API to the rest of their pipelines. He discusses the myriad ways that addresses are incomplete, poorly formed, and just plain wrong, why it was a big enough pain point to invest in building an industrial strength solution for it, and how it actually works under the hood. After listening to this you’ll look at your data pipelines in a new light and start to wonder how you can bring more advanced strategies into the cleaning and transformation process.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. RudderStack’s smart customer data pipeline is warehouse-first. It builds your customer data warehouse and your identity graph on your data warehouse, with support for Snowflake, Google BigQuery, Amazon Redshift, and more. Their SDKs and plugins make event streaming easy, and their integrations with cloud applications like Salesforce and ZenDesk help you go beyond event streaming. With RudderStack you can use all of your customer data to answer more difficult questions and then send those insights to your whole customer data stack. Sign up free at dataengineeringpodcast.com/rudder today. Your host is Tobias Macey and today I’m interviewing Tal Galfsky about how Cherre is bringing order to the messy problem of physical addresses and entity resolution in their data pipelines.

Interview

Introduction How did you get involved in the area of data management? Started as physicist and evolved into Data Science Can you start by giving a brief recap of what Cherre is and the types of data that you deal with? Cherre is a company that connects data We’re not a data vendor, in that we don’t sell data, primarily We help companies connect and make sense of their data The real estate market is historically closed, gut let, behind on tech What are the biggest challenges that you deal with in your role when working with real estate data? Lack of a standard domain model in real estate. Ontology. What is a property? Each data source, thinks about properties in a very different way. Therefore, yielding similar, but completely different data. QUALITY (Even if the dataset are talking about the same thing, there are different levels of accuracy, freshness). HIREARCHY. When is one source better than another What are the teams and systems that rely on address information? Any company that needs to clean or organize (make sense) their data, need to identify, people, companies, and properties. Our clients use Address resolution in multiple ways. Via the UI or via an API. Our service is both external and internal so what I build has to be good enough for the demanding needs of our data science team, robust enough for our engineers, and simple enough that non-expert clients can use it. Can you give an example for the problems involved in entity resolution Known entity example. Empire state buidling. To resolve addresses in a way that makes sense for the client you need to capture the real world entities. Lots, buildings, units.

Identify the type of the object (lot, building, unit) Tag the object with all the relevant addresses Relations to other objects (lot, building, unit)

What are some examples of the kinds of edge cases or messiness that you encounter in addresses? First class is string problems. Second class component problems. third class is geocoding. I understand that you have developed a service for normalizing addresses and performing entity resolution to provide canonical references for downstream analyses. Can you give an overview of what is involved? What is the need for the service. The main requirement here is connecting an address to lot, building, unit with latitude and longitude coordinates

How were you satisfying this requirement previously? Before we built our model and dedicated service we had a basic prototype for pipeline only to handle NYC addresses. What were the motivations for designing and implementing this as a service? Need to expand nationwide and to deal with client queries in real time. What are some of the other data sources that you rely on to be able to perform this normalization and resolution? Lot data, building data, unit data, Footprints and address points datasets. What challenges do you face in managing these other sources of information? Accuracy, hirearchy, standardization, unified solution, persistant ids and primary keys

Digging into the specifics of your solution, can you talk through the full lifecycle of a request to resolve an address and the various manipulations that are performed on it? String cleaning, Parse and tokenize, standardize, Match What are some of the other pieces of information in your system that you would like to see addressed in a similar fashion? Our named entity solution with connection to knowledge graph and owner unmasking. What are some of the most interesting, unexpected, or challenging lessons that you learned while building this address resolution system? Scaling nyc geocode example. The NYC model was exploding a subset of the options for messing up an address. Flexibility. Dependencies. Client exposure. Now that you have this system running in production, if you were to start over today what would you do differently? a lot but at this point the module boundaries and client interface are defined in such way that we are able to make changes or completely replace any given part of it without breaking anything client facing What are some of the other projects that you are excited to work on going forward? Named entity resolution and Knowledge Graph

Contact Info

LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today? BigQuery is huge asset and in particular UDFs but they don’t support API calls or python script

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, Podcast.init to learn about the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat

Links

Cherre

Podcast Episode

Photonics Knowledge Graph Entity Resolution BigQuery NLP == Natural Language Processing dbt

Podcast Episode

Airflow

Podcast.init Episode

Datadog

Podcast Episode

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

AI/ML Airflow API BI BigQuery CI/CD Cloud Computing Data Engineering Data Management Data Quality Data Science Datadog Datafold dbt DWH ETL/ELT Kubernetes NLP Python Redshift Snowflake Data Streaming
Jonathan Katz – guest , Tobias Macey – host

Summary

One of the longest running and most popular open source database projects is PostgreSQL. Because of its extensibility and a community focus on stability it has stayed relevant as the ecosystem of development environments and data requirements have changed and evolved over its lifetime. It is difficult to capture any single facet of this database in a single conversation, let alone the entire surface area, but in this episode Jonathan Katz does an admirable job of it. He explains how Postgres started and how it has grown over the years, highlights the fundamental features that make it such a popular choice for application developers, and the ongoing efforts to add the complex features needed by the demanding workloads of today’s data layer. To cap it off he reviews some of the exciting features that the community is working on building into future releases.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. Are you struggling to keep up with customer request and letting errors slip into production? Want to try some of the innovative ideas in this podcast but don’t have time? DataKitchen’s DataOps software allows your team to quickly iterate and deploy pipelines of code, models, and data sets while improving quality. Unlike a patchwork of manual operations, DataKitchen makes your team shine by providing an end to end DataOps solution with minimal programming that uses the tools you love. Join the DataOps movement and sign up for the newsletter at datakitchen.io/de today. After that learn more about why you should be doing DataOps by listening to the Head Chef in the Data Kitchen at dataengineeringpodcast.com/datakitchen Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Jonathan Katz about a high level view of PostgreSQL and the unique capabilities that it offers

Interview

Introduction How did you get involved in the area of data management? How did you get involved in the Postgres project? For anyone who hasn’t used it, can you describe what PostgreSQL is?

Where did Postgres get started and how has it evolved over the intervening years?

What are some of the primary characteristics of Postgres that would lead someone to choose it for a given project?

What are some cases where Postgres is the wrong choice?

What are some of the common points of confusion for new users of PostGreSQL? (particularly if they have prior database experience) The recent releases of Postgres have had some fairly substantial improvements and new features. How does the community manage to balance stability and reliability against the need to add new capabilities? What are the aspects of Postgres that allow it to remain relevant in the current landscape of rapid evolution at the data layer? Are there any plans to incorporate a distributed transaction layer into the core of the project along the lines of what has been done with Citus or CockroachDB? What is in store for the future of Postgres?

Contact Info

@jkatz05 on Twitter jkatz on GitHub

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

PostgreSQL Crunchy Data Venuebook Paperless Post LAMP Stack MySQL PHP SQL ORDBMS Edgar Codd A Relational Model of Data for Large Shared Data Banks Relational Algebra Oracle DB UC Berkeley Dr. Michae

API Chef Data Engineering Data Management DataOps GitHub MySQL Oracle postgresql SQL
Showing 9 results