talk-data.com talk-data.com

Topic

MySQL

relational_database open_source sql

268

tagged

Activity Trend

27 peak/qtr
2020-Q1 2026-Q1

Activities

268 activities · Newest first

100 SQL Server Mistakes and How to Avoid Them

All the mistakes you might make with SQL Server—and how to avoid them! 100 SQL Server Mistakes and How to Avoid Them prepares you for the pitfalls database professionals often encounter—from administration to development, availability, and security. You'll learn to sidestep common errors that slow down your T-SQL code and ensure your SQL Server is installed and configured to handle anything your organization throws at it. Inside 100 SQL Server Mistakes and How to Avoid Them you'll learn to avoid: Development errors when writing T-SQL Installation and administration mistakes Optimization missteps Common pitfalls relating to high availability and disaster recovery (HA/DR) Security oversights that can endanger your data 100 SQL Server Mistakes and How to Avoid Them doesn't focus on the "happy path"—instead, it covers all the errors and problems you might face as a SQL Server developer or administrator. Each chapter is filled with real-world issues drawn from author Peter A. Carter’s two-decade-long career in SQL Server. Peter's seasoned advice helps dispel myths, debunk misconceptions, and set you on the right road. About the Technology Perfecting a SQL Server system can be a complex balancing act. Why is T-SQL running so slowly? Are the right data available? Are we protected against data theft? What about that new server instance I need to administer? Even the most skilled SQL Server experts make mistakes that cost time and performance. This book can help you get it right the first time. About the Book 100 SQL Server Mistakes and How to Avoid Them focuses exclusively on the errors that you might—and probably will—make as a SQL Server admin or developer. Real-world examples, code samples, and helpful diagrams make it easy to understand each issue and its solution. You’ll learn how to write performant code, design efficient database schemas, implement error handling, work with complex data types, and much more, all in a friendly, common-sense problem/solution format. What's Inside T-SQL development Installation, administration, and optimization High availability and security About the Reader Readers need to understand basic SQL Server concepts and SQL queries. Perfect for junior database admins, full-stack developers, and “accidental” DBAs. About the Author Peter A. Carter is a SQL Server expert with experience developing, administering, and architecting data-tier applications and SQL Server platforms. Quotes A masterful job! Covers the nuances of SQL Server that a new admin, or even an experienced one, needs to understand. - Allen White, SQL Server MVP 2007-2022 Quick, actionable advice that greatly improved my SQL Server skills. - Ruben Vandeginste, PeopleWare A practical path through pitfalls in administration and development with specific solutions, examples, and code snippets. - Josephine Bush, sqlkitty.com Worth reading for the testing and debugging section alone! - Mike McQuillan, McQTech Ltd.

Database Design and Modeling with PostgreSQL and MySQL

Discover how to design and optimize modern databases efficiently using PostgreSQL and MySQL. This book guides you through database design for scalability and performance, covering data modeling, query optimization, and real-world application integration. What this Book will help me do Build efficient and scalable relational database schemas for real-world applications. Master data modeling with normalization and denormalization techniques. Understand query optimization strategies for better database performance. Learn database strategies such as sharding, replication, and backup management. Integrate relational databases with applications and explore future database trends. Author(s) Alkin Tezuysal and Ibrar Ahmed are seasoned database professionals with decades of experience. Alkin specializes in database scalability and performance, while Ibrar brings expertise in database systems and development. Together, they bring a hands-on approach, providing clear and insightful guidance for database professionals. Who is it for? This book is oriented towards software developers, database administrators, and IT professionals looking to enhance their knowledge in database design using PostgreSQL and MySQL. Beginners in database design will find its structured approach approachable. Advanced professionals will appreciate its depth on cutting-edge topics and practical optimizations.

Using various operators to perform daily routines. Integration with Technologies: Redis: Acts as a caching mechanism to optimize data retrieval and processing speed, enhancing overall pipeline performance. MySQL: Utilized for storing metadata and managing task state information within Airflow’s backend database. Tableau: Integrates with Airflow to generate interactive visualizations and dashboards, providing valuable insights into the processed data. Amazon Redshift: Panasonic leverages Redshift for scalable data warehousing, seamlessly integrating it with Airflow for data loading and analytics. Foundry: Integrated with Airflow to access and process data stored within Foundry’s data platform, ensuring data consistency and reliability. Plotly Dashboards: Employed for creating custom, interactive web-based dashboards to visualize and analyze data processed through Airflow pipelines. GitLab CI/CD Pipelines: Utilized for version control and continuous integration/continuous deployment (CI/CD) of Airflow DAGs (Directed Acyclic Graphs), ensuring efficient development and deployment of workflows.

Airflow version upgrades can be challenging. Maybe you upgrade and your dags fail to parse (that’s an easy fix). Or maybe you upgrade and everything looks fine, but when your dag runs, you can no longer connect to mysql because the TLS version changed. In this talk I will provide concrete strategies that users can put into practice to make version upgrades safer and less painful. Topics may include: What semver means and what it implies for the upgrade process Using integration test dags, unit tests, and a test cluster to smoke out problems Strategies around constraints files / pinning, and managing providers vs core versions Using db clean prior to upgrade to reduce table size Rollback strategies What to do about warnings (e.g. deprecation warnings)? I’ll also focus on keeping it simple. Sometimes things like “integration tests” and “CI” can be scary for people. Even without having set up anything automated, there are still things you can do to make management of upgrades a little less painful and risky.

Hands-On MySQL Administration

Geared to intermediate- to advanced-level DBAs and IT professionals looking to enhance their MySQL skills, this guide provides a comprehensive overview on how to manage and optimize MySQL databases. You'll learn how to create databases and implement backup and recovery, security configurations, high availability, scaling techniques, and performance tuning. Using practical techniques, tips, and real-world examples, authors Arunjith Aravindan and Jeyaram Ayyalusamy show you how to deploy and manage MySQL, Amazon RDS, Amazon Aurora, and Azure MySQL. By the end of the book, you'll have the knowledge and skills necessary to administer, manage, and optimize MySQL databases effectively. Design and implement a scalable and reliable database infrastructure using MySQL 8 on premises and cloud Install and configure software, manage user accounts, and optimize database performance Use backup and recovery strategies, security measures, and high availability solutions Apply best practices for database schema design, indexing strategies, and replication techniques Implement advanced database features and techniques such as replication, clustering, load balancing, and high availability Troubleshoot common issues and errors, using diagnostic tools and techniques to identify and resolve problems quickly and efficiently Facilitate major MySQL upgrades including MySQL 5.7 to MySQL 8

Mastering MySQL Administration: High Availability, Security, Performance, and Efficiency

This book is your one-stop resource on MySQL database installation and server management for administrators. It covers installation, upgrades, monitoring, high availability, disaster recovery, security, and performance and troubleshooting. You will become fluent in MySQL 8.2, the latest version of the highly scalable and robust relational database system. With a hands-on approach, the book offers step-by-step guidance on installing, upgrading, and establishing robust high availability and disaster recovery capabilities for MySQL databases. It also covers high availability with InnoDB and NDB clusters, MySQL routers and enterprise MySQL tools, along with robust security design and performance techniques. Throughout, the authors punctuate concepts with examples taken from their experience with large-scale implementations at companies such as Meta and American Airlines, anchoring this practical guide to MySQL 8.2 administration in the real world. What YouWill Learn Understand MySQL architecture and best practices for administration of MySQL server Configure high availability, replication, disaster recovery with InnoDB and NDB engines Back up and restore with MySQL utilities and tools, and configure the database for zero data loss Troubleshoot with steps for real-world critical errors and detailed solutions Who This Book Is For Technical professionals, database administrators, developers, and engineers seeking to optimize MySQL databases for scale, security, and performance

Learn SQL using MySQL in One Day and Learn It Well

"Learn SQL using MySQL in One Day and Learn It Well" is your hands-on guide to mastering SQL efficiently using MySQL. This book takes you from understanding basic database concepts to executing advanced queries and implementing essential features like triggers and routines. With a project-based approach, you will confidently manage databases and unlock the potential of data. What this Book will help me do Understand database concepts and relational data architecture. Design and define tables to organize and store data effectively. Perform advanced SQL queries to manipulate and analyze data efficiently. Implement database triggers, views, and routines for advanced management. Apply practical skills in SQL through a comprehensive hands-on project. Author(s) Jamie Chan is a professional instructor and technical writer with extensive experience in database management and software development. Known for a clear and engaging teaching style, Jamie has authored numerous books focusing on hands-on learning. Jamie approaches pedagogy with the goal of making technical subjects accessible and practical for all learners. Who is it for? This book is designed for beginners eager to learn SQL and MySQL from scratch. It is perfect for professionals or students who want relevant and actionable skills in database management. Whether you're looking to enhance career prospects or leverage database tools for personal projects, this book is your practical starting point. Basic computer literacy is all that's needed.

Ready to take your MySQL applications to the next level? Join us as we explore how leading enterprises across diverse industries are leveraging Cloud SQL for MySQL to handle their most demanding workloads and deep-dive some exciting innovations with vector search in MySQL. In this session, we'll dive deep into advanced features of Cloud SQL Enterprise Plus, including improved performance and availability, plus learn practical tips and tricks to tune your MySQL database for maximum performance.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

MySQL is the most widely used open-source relational database. However, it was designed to scale up using larger machines. When you need to scale out, you’re often faced with manual sharding. Sharding introduces operational challenges, hinders scalability, and often leads to increased infrastructure and maintenance costs. This session presents a real-world case study of a financial services company who successfully migrated its large, sharded MySQL databases to Spanner, future proofing its business for availability and cost.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Get superior price and performance with Azure cloud-scale databases | BRK224H

Improve performance with the latest capabilities for Azure SQL Databases, Azure Database for PostgreSQL, and SQL Server enabled by Azure Arc for hybrid and multi-cloud. You’ll learn how customers enabled ongoing innovation by migrating to Azure Database for MySQL. This session will cover tactical ways to get the most from your applications with the databases that are easy to use, deliver unmatched price/performance, support open-source and enable transformative AI technologies.

To learn more, please check out these resources: * https://aka.ms/Ignite23CollectionsBRK224H * https://info.microsoft.com/ww-landing-contact-me-for-events-m365-in-person-events.html?LCID=en-us&ls=407628-contactme-formfill * https://aka.ms/ArcSQL * https://aka.ms/azure-ignite2023-dataaiblog

𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀: * Chandra Gavaravarapu * Maximilian Conrad * Shireesh Thota * Simon Faber * Vlad Rabenok * Xiaoxuan Guo * Ed Donahue * Aditya Badramraju * Bob Ward * Denzil Ribeiro * Parikshit Savjani

𝗦𝗲𝘀𝘀𝗶𝗼𝗻 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻: This video is one of many sessions delivered for the Microsoft Ignite 2023 event. View sessions on-demand and learn more about Microsoft Ignite at https://ignite.microsoft.com

BRK224H | English (US) | Data

MSIgnite

MySQL Crash Course, 2nd Edition

MySQL is one of the most popular database management systems available, powering everything from Internet powerhouses to individual corporate databases to simple end-user applications, and everything in between. This book will teach you all you need to know to be immediately productive with the latest version of MySQL. By working through 30 highly focused hands-on lessons, your MySQL Crash Course will be both easier and more effective than youd have thought possible. Learn How To Retrieve and Sort Data Filter Data Using Comparisons, Regular Expressions, Full Text Search, and Much More Join Relational Data Create and Alter Tables Insert, Update, and Delete Data Leverage the Power of Stored Procedures and Triggers Use Views and Cursors Manage Transactional Processing Create User Accounts and Manage Security via Access Control ...

Learning Snowflake SQL and Scripting

To help you on the path to becoming a Snowflake pro, this concise yet comprehensive guide reviews fundamentals and best practices for Snowflake's SQL and Scripting languages. Developers and data professionals will learn how to generate, modify, and query data in the Snowflake relational database management system as well as how to apply analytic functions for reporting. Author Alan Beaulieu also shows you how to create scripts, stored functions, and stored procedures to return data sets using Snowflake Scripting. This book is ideal whether you're new to databases and need to run queries or reports against a Snowflake database, or transitioning from databases such as Oracle, SQL Server, or MySQL to cloud-based platforms. With this book, you will: Generate and modify Snowflake data using INSERT, UPDATE, DELETE Query data in Snowflake using SELECT, including joining multiple tables, using subqueries, and grouping Apply analytic functions for performing subtotals, grand totals, row comparisons, and other reporting functionality Build scripts combining SQL statements with looping, if-then-else, and exception handling Learn how to build stored procedures and functions Use stored procedures to return data sets

Database-Driven Web Development: Learn to Operate at a Professional Level with PERL and MySQL

This book will teach you the essential knowledge required to be a successful and productive web developer with the ability to produce cutting-edge websites utilizing a database. This updated edition starts with the fundamentals of web development before delving into Perl and MySQL concepts such as script and database modelling, script-driven database interactions, content generation from a database, and information delivery from the server to the browser and vice versa. The only skills required to get the most from this book are basic knowledge of how the Internet works and a novice skill level with Perl and MySQL. The rest is intuitively presented code that most people can quickly and easily understand and employ. An extensive selection of practical, fully functional programming constructs in six different programming languages will give you the knowledge and tools required to create eye-catching, capable, and functionally impressive database-driven websites. Author Thomas Valentine has taken the concepts presented in the first edition of this book to new heights, offering in-depth discussions of each area of functionality required to develop fully formed database-driven web applications. He has expanded on the examples presented in the first edition and has included some very interesting and useful programming techniques for your consideration. Upon completing this book, you’ll have gained the benefit of the author’s decades worth of experience and will be able to apply your new knowledge and skills to your own projects. What You Will Learn Install, configure and use a trio of software packages (Apache Web Server, MySQL Database Server, and Perl Scripting Server) Create an effective web development workstation with databases in mind Use the PERL scripting language and MySQL databases effectively Maximize the Apache Web Server Who This Book Is For Those who already know web development basics and web developers who want to master database-driven web development. The skills required to understand the concepts put forth in this book are a working knowledge of PERL and basic MySQL.

Getting Started with SQL and Databases: Managing and Manipulating Data with SQL

Learn the basics of writing SQL scripts. Using Standard SQL as the starting point, this book teaches writing SQL in various popular dialects, including PostgreSQL, MySQL/MariaDB, Microsoft SQL Server, Oracle, and SQLite. The book starts with a general introduction to writing SQL and covers the basic concepts. Author Mark Simon then covers database principles, and how database tables are designed. He teaches you how to filter data using the WHERE clause, and you will work with NULL, numbers, dates, and strings. You will also understand sorting results using the ORDER BY clause, sorting by calculated columns, and limiting the number of results. By the end of the book, you will know how to insert and update data, and summarize data with aggregate functions and groups. Three appendices cover differences between SQL dialects, working with tables, and a crash course in PDO. What You Will Learn Filter, sort, andcalculate data Summarize data with aggregate functions Modify data with insert, update, and delete statements Study design principles in developing a database Who This Book Is For Developers and analysts working with SQL, as well as web developers who want a stronger understanding of working with databases

MySQL Crash Course

MySQL Crash Course is a fast-paced, no-nonsense introduction to relational database development. It’s filled with practical examples and expert advice that will have you up and running quickly. You’ll learn the basics of SQL, how to create a database, craft SQL queries to extract data, and work with events, procedures, and functions. You’ll see how to add constraints to tables to enforce rules about permitted data and use indexes to accelerate data retrieval. You’ll even explore how to call MySQL from PHP, Python, and Java. Three final projects will show you how to build a weather database from scratch, use triggers to prevent errors in an election database, and use views to protect sensitive data in a salary database. You’ll also learn how to: •Query database tables for specific information, order the results, comment SQL code, and deal with null values •Define table columns to hold strings, integers, and dates, and determine what data types to use •Join multiple database tables as well as use temporary tables, common table expressions, derived tables, and subqueries •Add, change, and remove data from tables, create views based on specific queries, write reusable stored routines, and automate and schedule events The perfect quick-start resource for database developers, MySQL Crash Course will arm you with the tools you need to build and manage fast, powerful, and secure MySQL-based data storage systems.

Summary

Making effective use of data requires proper context around the information that is being used. As the size and complexity of your organization increases the difficulty of ensuring that everyone has the necessary knowledge about how to get their work done scales exponentially. Wikis and intranets are a common way to attempt to solve this problem, but they are frequently ineffective. Rehgan Avon co-founded AlignAI to help address this challenge through a more purposeful platform designed to collect and distribute the knowledge of how and why data is used in a business. In this episode she shares the strategic and tactical elements of how to make more effective use of the technical and organizational resources that are available to you for getting work done with data.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you're ready to build your next pipeline, or want to test out the projects you hear about on the show, you'll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan's active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. Struggling with broken pipelines? Stale dashboards? Missing data? If this resonates with you, you’re not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the leading end-to-end Data Observability Platform! Trusted by the data teams at Fox, JetBlue, and PagerDuty, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, dbt models, Airflow jobs, and business intelligence tools, reducing time to detection and resolution from weeks to just minutes. Monte Carlo also gives you a holistic picture of data health with automatic, end-to-end lineage from ingestion to the BI layer directly out of the box. Start trusting your data with Monte Carlo today! Visit dataengineeringpodcast.com/montecarlo to learn more. Your host is Tobias Macey and today I'm interviewing Rehgan Avon about her work at AlignAI to help organizations standardize their technical and procedural approaches to working with data

Interview

Introduction How did you get involved in the area of data management? Can you describe what AlignAI is and the story behind it? What are the core problems that you are focused on addressing?

What are the tactical ways that you are working to solve those problems?

What are some of the common and avoidable ways that analytics/AI projects go wrong?

What are some of the ways that organizational scale and complexity impacts their ability to execute on data and AI projects?

What are the ways that incomplete/unevenly distributed knowledge manifests in project design and execution? Can you describe the design and implementation of the AlignAI platform?

How have the goals and implementation of the product changed since you

Summary

With all of the messaging about treating data as a product it is becoming difficult to know what that even means. Vishal Singh is the head of products at Starburst which means that he has to spend all of his time thinking and talking about the details of product thinking and its application to data. In this episode he shares his thoughts on the strategic and tactical elements of moving your work as a data professional from being task-oriented to being product-oriented and the long term improvements in your productivity that it provides.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you're ready to build your next pipeline, or want to test out the projects you hear about on the show, you'll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show! Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it’s often too late and damage is done. Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. No more shipping and praying, you can now know exactly what will change in your database! Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudder Build Data Pipelines. Not DAGs. That’s the spirit behind Upsolver SQLake, a new self-service data pipeline platform that lets you build batch and streaming pipelines without falling into the black hole of DAG-based orchestration. All you do is write a query in SQL to declare your transformation, and SQLake will turn it into a continuous pipeline that scales to petabytes and delivers up to the minute fresh data. SQLake supports a broad set of transformations, including high-cardinality joins, aggregations, upserts and window operations. Output data can be streamed into a data lake for query engines like Presto, Trino or Spark SQL, a data warehouse like Snowflake or Redshift., or any other destination you choose. Pricing for SQLake is simple. You pay $99 per terabyte ingested into your data lake using SQLake, and run unlimited transformation pipelines for free. That way data engineers and data users can process to their heart’s content without worrying about their cloud bill. For data engineering podcast listeners, we’re offering a 30 day trial with unlimited data, so go to dataengineeringpodcast.com/upsolver today and see for yourself how to avoid DAG hell. Your host is Tobias Macey and today I'm interviewing Vishal Singh about his experience

Summary

Five years of hosting the Data Engineering Podcast has provided Tobias Macey with a wealth of insight into the work of building and operating data systems at a variety of scales and for myriad purposes. In order to condense that acquired knowledge into a format that is useful to everyone Scott Hirleman turns the tables in this episode and asks Tobias about the tactical and strategic aspects of his experiences applying those lessons to the work of building a data platform from scratch.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you're ready to build your next pipeline, or want to test out the projects you hear about on the show, you'll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan's active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. Struggling with broken pipelines? Stale dashboards? Missing data? If this resonates with you, you’re not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the leading end-to-end Data Observability Platform! Trusted by the data teams at Fox, JetBlue, and PagerDuty, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, dbt models, Airflow jobs, and business intelligence tools, reducing time to detection and resolution from weeks to just minutes. Monte Carlo also gives you a holistic picture of data health with automatic, end-to-end lineage from ingestion to the BI layer directly out of the box. Start trusting your data with Monte Carlo today! Visit dataengineeringpodcast.com/montecarlo to learn more. Your host is Tobias Macey and today I'm being interviewed by Scott Hirleman about my work on the podcasts and my experience building a data platform

Interview

Introduction How did you get involved in the area of data management?

Data platform building journey

Why are you building, who are the users/use cases How to focus on doing what matters over cool tools How to build a good UX Anything surprising or did you discover anything you didn't expect at the start How to build so it's modular and can be improved in the future

General build vs buy and vendor selection process

Obviously have a good BS detector - how can others build theirs So many tools, where do you start - capability need, vendor suite offering, etc. Anything surprising in doing much of this at once How do you think about TCO in build versus buy Any advice

Guest call out

Be brave, believe you are good enough to be on the show Look at past episodes and don't pitch the same as what's been on recently And vendors, be smart, work with your customers to come up with a good pitch for them as guests...

Tobias' advice and learnings from building out a data platform:

Advice: when considering a tool, start from what are you act

Summary

Encryption and security are critical elements in data analytics and machine learning applications. We have well developed protocols and practices around data that is at rest and in motion, but security around data in use is still severely lacking. Recognizing this shortcoming and the capabilities that could be unlocked by a robust solution Rishabh Poddar helped to create Opaque Systems as an outgrowth of his PhD studies. In this episode he shares the work that he and his team have done to simplify integration of secure enclaves and trusted computing environments into analytical workflows and how you can start using it without re-engineering your existing systems.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you're ready to build your next pipeline, or want to test out the projects you hear about on the show, you'll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show! Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it’s often too late and damage is done. Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. No more shipping and praying, you can now know exactly what will change in your database! Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudder Build Data Pipelines. Not DAGs. That’s the spirit behind Upsolver SQLake, a new self-service data pipeline platform that lets you build batch and streaming pipelines without falling into the black hole of DAG-based orchestration. All you do is write a query in SQL to declare your transformation, and SQLake will turn it into a continuous pipeline that scales to petabytes and delivers up to the minute fresh data. SQLake supports a broad set of transformations, including high-cardinality joins, aggregations, upserts and window operations. Output data can be streamed into a data lake for query engines like Presto, Trino or Spark SQL, a data warehouse like Snowflake or Redshift., or any other destination you choose. Pricing for SQLake is simple. You pay $99 per terabyte ingested into your data lake using SQLake, and run unlimited transformation pipelines for free. That way data engineers and data users can process to their heart’s content without worrying about their cloud bill. For data engineering podcast listeners, we’re offering a 30 day trial with unlimited data, so go to dataengineeringpodcast.com/upsolver today an

Summary

One of the reasons that data work is so challenging is because no single person or team owns the entire process. This introduces friction in the process of collecting, processing, and using data. In order to reduce the potential for broken pipelines some teams have started to adopt the idea of data contracts. In this episode Abe Gong brings his experiences with the Great Expectations project and community to discuss the technical and organizational considerations involved in implementing these constraints to your data workflows.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you're ready to build your next pipeline, or want to test out the projects you hear about on the show, you'll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan's active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. Struggling with broken pipelines? Stale dashboards? Missing data? If this resonates with you, you’re not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the leading end-to-end Data Observability Platform! Trusted by the data teams at Fox, JetBlue, and PagerDuty, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, dbt models, Airflow jobs, and business intelligence tools, reducing time to detection and resolution from weeks to just minutes. Monte Carlo also gives you a holistic picture of data health with automatic, end-to-end lineage from ingestion to the BI layer directly out of the box. Start trusting your data with Monte Carlo today! Visit dataengineeringpodcast.com/montecarlo to learn more. Your host is Tobias Macey and today I'm interviewing Abe Gong about the technical and organizational implementation of data contracts

Interview

Introduction How did you get involved in the area of data management? Can you describe what your conception of a data contract is?

What are some of the ways that you have seen them implemented?

How has your work on Great Expectations influenced your thinking on the strategic and tactical aspects of adopting/implementing data contracts in a given team/organization?

What does the negotiation process look like for identifying what needs to be included in a contract?

What are the interfaces/integration points where data contracts are most useful/necessary? What are the discussions that need to happen when deciding when/whether a contract "violation" is a blocking action vs. issuing a notification? At what level of detail/granularity are contracts most helpful? At the technical level, what does the implementation/integration/deployment of a contract look like? What are the most interesting, innovative, or unexpected ways that you have seen data contracts used? What are the most interesting, unexpected, or chall