Advanced SQL

2026-08-25 · O'Reilly SQL Books O'Reilly Amazon

book

by Hélder Russa , Pedro Esmeriz , Rui Machado

AI/ML Analytics Flink Cloud Computing Data Lake GenAI Kafka SQL Data Streaming nosql databases

SQL is no longer just a querying language for relational databases—it's a foundational tool for building scalable, modern data solutions across real-time analytics, machine learning workflows, and even generative AI applications. Advanced SQL shows data professionals how to move beyond conventional SELECT statements and tap into the full power of SQL as a programming interface for today's most advanced data platforms. Written by seasoned data experts Rui Pedro Machado, Hélder Russa, and Pedro Esmeriz, this practical guide explores the role of SQL in streaming architectures (like Apache Kafka and Flink), data lake ecosystems, cloud data warehouses, and ML pipelines. Geared toward data engineers, analysts, scientists, and analytics engineers, the book combines hands-on guidance with architectural best practices to help you extend your SQL skills into emerging workloads and real-world production systems. Use SQL to design and deploy modern, end-to-end data architectures Integrate SQL with data lakes, stream processing, and cloud platforms Apply SQL in feature engineering and ML model deployment Master pipe syntax and other advanced features for scalable, efficient queries Leverage SQL to build GenAI-ready data applications and pipelines

Designing Data-Intensive Applications, 2nd Edition

2026-02-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Martin Kleppmann , Chris Riccomini (WePay; LinkedIn)

Flink GDPR/CCPA NoSQL Spark data data-engineering

Data is at the center of many challenges in system design today. Difficult issues such as scalability, consistency, reliability, efficiency, and maintainability need to be resolved. In addition, there's an overwhelming variety of tools and analytical systems, including relational databases, NoSQL datastores, plus data warehouses and data lakes. What are the right choices for your application? How do you make sense of all these buzzwords? In this second edition, authors Martin Kleppmann and Chris Riccomini build on the foundation laid in the acclaimed first edition, integrating new technologies and emerging trends. You'll be guided through the maze of decisions and trade-offs involved in building a modern data system, from choosing the right tools like Spark and Flink to understanding the intricacies of data laws like the GDPR. Peer under the hood of the systems you already use, and learn to use them more effectively Make informed decisions by identifying the strengths and weaknesses of different tools Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity Understand the distributed systems research upon which modern databases are built Peek behind the scenes of major online services, and learn from their architectures

AWS re:Invent 2025 - Advanced data modeling with Amazon DynamoDB (DAT414)

2025-12-05 · AWS re:Invent 2024 Watch

video

by Alex DeBrie (Amazon Web Services (AWS))

Agile/Scrum AWS Cloud Computing Data Modelling DynamoDB

Amazon DynamoDB is a popular choice for modern applications because it’s a serverless database that provides single-digit millisecond performance at any scale. Optimizing your usage of DynamoDB requires a different approach to data modeling than traditional relational databases. In this session, AWS Data Hero Alex DeBrie shows you advanced techniques to help you get the most out of DynamoDB. Learn how to “think in DynamoDB” by learning the DynamoDB foundations and principles for data modeling. Further, learn practical strategies and DynamoDB features to handle difficult use cases in your application.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - An insider’s look into architecture choices for Amazon DynamoDB (DAT436)

2025-12-04 · AWS re:Invent 2024 Watch

video

Agile/Scrum AWS Cloud Computing DynamoDB

To overcome the performance and scale limitations of relational databases, AWS built Amazon DynamoDB to deliver consistent single-digit millisecond performance at any scale for the most demanding applications on the planet. In this session, learn about the architecture choices for Amazon DynamoDB. Gain a better understanding of when to use DynamoDB and why DynamoDB is used by over one million AWS customers, and powers hundreds of applications that exceed half a million requests per second. Leave with a new perspective on how to design your own applications.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

Just Use Postgres!

2025-11-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Denis Magda

AI/ML GenAI Java JSON Microsoft Oracle SQL data data-engineering postgresql relational-databases

You probably don’t need a collection of specialty databases. Just use Postgres instead! Written for application developers and database pros, Just Use Postgres! shows you how to get the most out of the powerful Postgres database. In Just Use Postgres! you’ll learn how to: Use Postgres as an RDBMS for transactional workloads Develop generative AI, geospatial, and time-series applications Take advantage of modern SQL including window functions and CTEs Perform full-text search and process JSON documents Use Postgres as a message queue Optimize performance with various index types including B-trees, GIN, GiST, HNSW, and more Over the decades, PostgreSQL, aka Postgres, has grown into the most powerful general-purpose database and has become the de facto standard for developers worldwide. Just Use Postgres! takes a modern look at Postgres, exploring the database’s most up-to-date features for AI, time-series, full-text search, geospatial, and other application workloads. About the Technology You know that PostgreSQL is a fast, reliable, SQL compliant RDBMS. You may not know that it’s also great for geospatial systems, time series, full-text search, JSON documents, AI vector embeddings, and many other specialty database functions. For almost any data task you can imagine, you can use Postgres. About the Book Just Use Postgres! covers recipes for using Postgres in dozens of applications normally reserved for single-purpose databases. Written for busy application developers, each chapter explores a different use case illuminating the breadth and depth of Postgres’s capabilities. Along the way, you’ll also meet an incredible ecosystem of Postgres extensions like pgvector, PostGIS, pgmq, and TimescaleDB. You’ll be amazed at everything you can accomplish with Postgres! What's Inside Generative AI, geospatial, and time-series applications Modern SQL including window functions and CTEs Full-text search and JSON B-trees, GIN, GiST, HNSW, and more About the Reader For application developers, software engineers, and architects who know the basics of SQL. About the Author Denis Magda is a recognized Postgres expert and software engineer who worked on Java at Sun Microsystems and Oracle before focusing on databases and large-scale distributed systems. Quotes I was pleasantly surprised to learn many new things from this book. - From the Afterword by Vlad Mihalcea An excellent guide covering everything from basics to cutting-edge features. - Dave Cramer, PostgreSQL JDBC Maintainer Pleasant, easy to read with tonnes of great code. - Mike McQuillan, McQTech Ltd Well-organized and easy to search. - Edward Pollack, Microsoft Data Platform MVP The missing guide to understanding and using Postgres. - Mehboob Alam, POSTGRESNX, Inc.

Low-Latency Workloads: A Guide to Transactional Data on Snowflake

2025-10-30 · Snowflake World Tour Amsterdam

session

ETL/ELT Snowflake postgresql

Come to this session to learn about Snowflake solutions for handling your transactional data. Hybrid Tables are tightly integrated into Snowflake providing a unified experience for transactional and analytical workloads, meaning no need for ETL. Postgres is an open source RDBMS loved by developers with very low latency reads and writes, and following the acquisition of Crunchy Data it will be coming to your Snowflake accounts. We'll describe the architectural differences between Postgres and Hybrid tables and help you navigate when to use which.

From RAG to Relational: How Agentic Patterns Are Reshaping Data Architecture

2025-09-18 · Data Engineering Podcast Listen

podcast_episode

by Mark Brooker (AWS) , Tobias Macey

AI/ML AWS Aurora AWS Lambda Data Engineering Data Management Datafold ELK ETL/ELT LLM Prefect Python +3 more

Summary In this episode of the AI Engineering Podcast Mark Brooker, VP and Distinguished Engineer at AWS, talks about how agentic workflows are transforming database usage and infrastructure design. He discusses the evolving role of data in AI systems, from traditional models to more modern approaches like vectors, RAG, and relational databases. Mark explains why agents require serverless, elastic, and operationally simple databases, and how AWS solutions like Aurora and DSQL address these needs with features such as rapid provisioning, automated patching, geodistribution, and spiky usage. The conversation covers topics including tool calling, improved model capabilities, state in agents versus stateless LLM calls, and the role of Lambda and AgentCore for long-running, session-isolated agents. Mark also touches on the shift from local MCP tools to secure, remote endpoints, the rise of object storage as a durable backplane, and the need for better identity and authorization models. The episode highlights real-world patterns like agent-driven SQL fuzzing and plan analysis, while identifying gaps in simplifying data access, hardening ops for autonomous systems, and evolving serverless database ergonomics to keep pace with agentic development.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Marc Brooker about the impact of agentic workflows on database usage patterns and how they change the architectural requirements for databasesInterview IntroductionHow did you get involved in the area of data management?Can you describe what the role of the database is in agentic workflows?There are numerous types of databases, with relational being the most prevalent. How does the type and purpose of an agent inform the type of database that should be used?Anecdotally I have heard about how agentic workloads have become the predominant "customers" of services like Neon and Fly.io. How would you characterize the different patterns of scale for agentic AI applications? (e.g. proliferation of agents, monolithic agents, multi-agent, etc.)What are some of the most significant impacts on workload and access patterns for data storage and retrieval that agents introduce?What are the categorical differences in that behavior as compared to programmatic/automated systems?You have spent a substantial amount of time on Lambda at AWS. Given that LLMs are effectively stateless, how does the added ephemerality of serverless functions impact design and performance considerations around having to "re-hydrate" context when interacting with agents?What are the most interesting, innovative, or unexpected ways that you have seen serverless and database systems used for agentic workloads?What are the most interesting, unexpected, or challenging lessons that you have learned while working on technologies that are supporting agentic applications?Contact Info BlogLinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links AWS Aurora DSQLAWS LambdaThree Tier ArchitectureVector DatabaseGraph DatabaseRelational DatabaseVector EmbeddingRAG == Retrieval Augmented GenerationAI Engineering Podcast EpisodeGraphRAGAI Engineering Podcast EpisodeLLM Tool CallingMCP == Model Context ProtocolA2A == Agent 2 Agent ProtocolAWS Bedrock AgentCoreStrandsLangChainKiroThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Exam Ref DP-300 Administering Microsoft Azure SQL Solutions

2025-07-23 · O'Reilly SQL Books O'Reilly Amazon

book

by Craig Zacker

Azure Cloud Computing Microsoft SQL dp-900: microsoft azure data fundamentals

Prepare for Microsoft Exam DP-300 and demonstrate your real-world foundational knowledge of Azure database administration using a variety of methods and tools to perform and automate day-to-day operations, including use of Transact-SQL (T-SQL) and other tools for administrative management purposes. Designed for database administrators, solution architects, data scientists, and other data professionals, this Exam Ref focuses on the critical-thinking and decision-making acumen needed for success at the Microsoft Certified: Azure Database Administrator Associate level. Focus on the expertise measured by these objectives: Plan and implement data platform resources Implement a secure environment Monitor, configure, and optimize database resources Configure and manage automation of tasks Plan and configure a high availability and disaster recovery (HA/DR) environment This Microsoft Exam Ref: Organizes its coverage by the Skills Measured list published for the exam Features strategic, what-if scenarios to challenge you Assumes you have subject matter expertise in building database solutions that are designed to support multiple workloads built with SQL Server on-premises and Azure SQL About the Exam Exam PD-300 focuses on core knowledge for implementing and managing the operational aspects of cloud-native and hybrid data platform solutions built on SQL Server and Azure SQL services, using a variety of methods and tools to perform and automate day-to-day operations, including applying knowledge of using Transact-SQL (T-SQL) and other tools for administrative management purposes. About Microsoft Certification Passing this exam fulfills your requirements for the Microsoft Certified: Azure Database Administrator Associate certification, demonstrating your ability to administer a SQL Server database infrastructure for cloud, on-premises, and hybrid relational databases using the Microsoft PaaS relational database offerings. See full details at: microsoft.com/learn .

FHIR-ing Up Healthcare Data : Restructing Healthcare data with FHIR

2025-04-11 · Google Cloud Next '25

session

by Ashwin Shetty (Google)

BigQuery CSV

This talk offers a solution to accelerate healthcare innovation by streamlining the conversion and integration of various data formats (HL7 v2, CSV, RDBMS, etc.) into the FHIR standard.

This solution reduces the need for manual mapping allowing for quick conversion of various healthcare data formats into FHIR and significantly reduces the workload of healthcare IT teams. FHIR data is then loaded into Google BigQuery providing a scalable and secure platform for data storage and analysis.

Database professionals meetup

2025-04-10 · Google Cloud Next '25

session

by Gleb Otochkin (Google Cloud)

AI/ML Cloud Computing

A meetup for DBAs, data engineers, and database architects to share insights, troubleshoot issues, and exchange proven strategies for cloud database management. Explore how relational databases can drive smarter AI by enriching prompts for generation. Learn from real-world experiences and expand your professional network!

Grokking Relational Database Design

2025-03-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Michail Tsikerdekis , Qiang Hao

AI/ML Computer Science Data Collection GenAI Cyber Security SQL data data-engineering relational-databases

A friendly illustrated guide to designing and implementing your first database. Grokking Relational Database Design makes the principles of designing relational databases approachable and engaging. Everything in this book is reinforced by hands-on exercises and examples. In Grokking Relational Database Design, you’ll learn how to: Query and create databases using Structured Query Language (SQL) Design databases from scratch Implement and optimize database designs Take advantage of generative AI when designing databases A well-constructed database is easy to understand, query, manage, and scale when your app needs to grow. In Grokking Relational Database Design you’ll learn the basics of relational database design including how to name fields and tables, which data to store where, how to eliminate repetition, good practices for data collection and hygiene, and much more. You won’t need a computer science degree or in-depth knowledge of programming—the book’s practical examples and down-to-earth definitions are beginner-friendly. About the Technology Almost every business uses a relational database system. Whether you’re a software developer, an analyst creating reports and dashboards, or a business user just trying to pull the latest numbers, it pays to understand how a relational database operates. This friendly, easy-to-follow book guides you from square one through the basics of relational database design. About the Book Grokking Relational Database Design introduces the core skills you need to assemble and query tables using SQL. The clear explanations, intuitive illustrations, and hands-on projects make database theory come to life, even if you can’t tell a primary key from an inner join. As you go, you’ll design, implement, and optimize a database for an e-commerce application and explore how generative AI simplifies the mundane tasks of database designs. What's Inside Define entities and their relationships Minimize anomalies and redundancy Use SQL to implement your designs Security, scalability, and performance About the Reader For self-taught programmers, software engineers, data scientists, and business data users. No previous experience with relational databases assumed. About the Authors Dr. Qiang Hao and Dr. Michail Tsikerdekis are both professors of Computer Science at Western Washington University. Quotes If anyone is looking to improve their database design skills, they can’t go wrong with this book. - Ben Brumm, DatabaseStar Goes beyond SQL syntax and explores the core principles. An invaluable resource! - William Jamir Silva, Adjust Relational database design is best done right the first time. This book is a great help to achieve that! - Maxim Volgin, KLM Provides necessary notions to design and build databases that can stand the data challenges we face. - Orlando Méndez, Experian

Learn SQL in a Month of Lunches

2025-02-28 · O'Reilly SQL Books O'Reilly Amazon

book

by Jeff Iannucci

BI Databricks Microsoft MySQL Oracle Power BI SQL Tableau postgresql

Use SQL to get the data you need in no time at all! Learn to read and write basic queries, troubleshoot common problems, and control your own business data in just 24 short lessons–no programming experience required! SQL has been designed to be as close to English as possible—anyone can learn it! Learn SQL in a Month of Lunches helps you add this lucrative and highly sought-after skill to your resume in just 24 fun and friendly lessons. The book emphasizes practical uses for the language in the real-world, so you’ll just learn the most useful skills for business data analysis. Inside Learn SQL in a Month of Lunches you’ll discover how to: Set up your first database with MySQL Write your own SQL queries See only the data you need from large datasets Connect different sets of data Analyze data with functions and aggregations Master basic data manipulation techniques Save queries in stored procedures and views Create tables to store data efficiently Read and improve SQL written by others If you use Excel, Tableau, or PowerBI to crunch business data, you’ve probably seen a lot of SQL already. And guess what? It’s easy to master the most useful parts of SQL! In just a few quick lessons, Learn SQL in a Month of Lunches will get you writing your own queries, modifying existing SQL statements, and working with data like a pro. 25-year SQL veteran Jeff Iannucci makes SQL a snap through hands-on lab exercises, relevant code examples, and easy-to-understand language. About the Technology SQL, Structured Query Language, is the standard way to query, create, and manage relational databases like SQL Server, PostgreSQL, and Oracle. It’s also a superpower for data analysts who need to go beyond spreadsheets and BI dashboarding tools. SQL is easy to read and understand, and with this book (and a little practice) you’ll be pulling data, tweaking tables, and cranking out amazing reports and presentations in no time at all! About the Book Learn SQL in a Month of Lunches introduces SQL to data analysts and other aspiring data pros with no prior experience using relational databases. In it, you’ll complete 24 short lessons, each of which teaches an essential SQL skill for retrieving, filtering, and analyzing data. You’ll practice each new technique with a friendly hands-on lab designed to take about 15 minutes, as you learn to write queries that deliver the exact data you need. Along the way, you’ll build a valuable intuition for how databases operate in real business scenarios. What's Inside Get the data you need from any relational database Filter, sort, and group data Combine data from multiple tables Create, update, and delete data About the Reader For students, aspiring data analysts, software developers, and anyone else who wants to work with relational databases. About the Author Jeff Iannucci is a Senior Consultant with Straight Path Solutions. For over 20 years, he has worked extensively with SQL in sectors such as healthcare, finance, retail sales, and government. Quotes An essential guide. Jeff has carefully developed each chapter to ensure clarity and comprehensiveness, making complex concepts accessible and practical. - Buck Woody, Microsoft The fastest and the most effective way to learn SQL, regardless of your background or technical knowledge level. - Kevin Kline, author of SQL in a Nutshell Explains concepts straightforwardly to help the reader grow their skills over a month of sessions. - Steve Jones, SQL Server Central Great selection of bite-sized, digestible courses to complement your lunch arrangement. It leaves you smarter every day. - Simon Tschöke, Databricks

Best of 2024: 50 Years of SQL with Don Chamberlin, Computer Scientist and Co-Inventor of SQL

2024-12-26 · DataFramed Listen

podcast_episode

by Don Chamberlin (IBM)

AI/ML Analytics BI Cloud Computing Data Analytics DataViz GenAI IBM LLM NoSQL Cyber Security SQL +1 more

As we look back at 2024, we're highlighting some of our favourite episodes of the year, and with 100 of them to choose from, it wasn't easy! The four guests we'll be recapping with are: Lea Pica - A celebrity in the data storytelling and visualisation space. Richie and Lea cover the full picture of data presentation, how to understand your audience, how to leverage hollywood storytelling and more. Out December 19.Alex Banks - Founder of Sunday Signal. Adel and Alex cover Alex’s journey into AI and what led him to create Sunday Signal, the potential of AI, prompt engineering at its most basic level, chain of thought prompting, the future of LLMs and more. Out December 23.Don Chamberlin - The renowned co-inventor of SQL. Richie and Don explore the early development of SQL, how it became standardized, the future of SQL through NoSQL and SQL++ and more. Out December 26.Tom Tunguz - general Partner at Theory Ventures, a $235m VC firm. Richie and Tom explore trends in generative AI, cloud+local hybrid workflows, data security, the future of business intelligence and data analytics, AI in the corporate sector and more. Out December 30. For our 200th episode, we bring you a special guest and taking a walk down memory lane—to the creation and development of one of the most popular programming languages in the world. Don Chamberlin is renowned as the co-inventor of SQL (Structured Query Language), the predominant database language globally, which he developed with Raymond Boyce in the mid-1970s. Chamberlin's professional career began at IBM Research in Yorktown Heights, New York, following a summer internship there during his academic years. His work on IBM's System R project led to the first SQL implementation and significantly advanced IBM’s relational database technology. His contributions were recognized when he was made an IBM Fellow in 2003 and later a Fellow of the Computer History Museum in 2009 for his pioneering work on SQL and database architectures. Chamberlin also contributed to the development of XQuery, an XML query language, as part of the W3C, which became a W3C Recommendation in January 2007. Additionally, he holds fellowships with ACM and IEEE and is a member of the National Academy of Engineering. In the episode, Richie and Don explore his early career at IBM and the development of his interest in databases alongside Ray Boyce, the database task group (DBTG), the transition to relational databases and the early development of SQL, the commercialization and adoption of SQL, how it became standardized, how it evolved and spread via open source, the future of SQL through NoSQL and SQL++ and much more. Links Mentioned in the Show: The first-ever journal paper on SQL. SEQUEL: A Structured English Query LanguageDon’s Book: SQL++ for SQL Users: A TutorialSystem R: Relational approach to database managementSQL CoursesSQL Articles, Tutorials and Code-AlongsRelated Episode: Scaling Enterprise Analytics with...

SQL Essentials For Dummies

2024-12-24 · O'Reilly SQL Books O'Reilly Amazon

book

by Richard Blum , Allen G. Taylor

SQL

A right-to-the-point guide on all the key topics of SQL programming SQL Essentials For Dummies is your quick reference to all the core concepts of SQL—a valuable common standard language used in relational databases. This useful guide is straightforward—with no excess review, wordy explanations, or fluff—so you get what you need, fast. Great for a brush-up on the basics or as an everyday desk reference, this book is one you can rely on. Strengthen your understanding of the basics of SQL Review what you've already learned or pick up key skills Use SQL to create, manipulate, and control relational databases Jog your memory on the essentials as you work and get clear answers to your questions Perfect for supplementing classroom learning, reviewing for a certification, and staying knowledgeable on the job, SQL Essentials For Dummies is the convenient, direct, and digestible reference you've been looking for.

AWS re:Invent 2024 - An insider’s look into architecture choices for Amazon DynamoDB (DAT419)

2024-12-11 · AWS re:Invent 2024 Watch

video

Agile/Scrum AWS Cloud Computing DynamoDB

To overcome the performance and scale limitations of relational databases, AWS built Amazon DynamoDB to deliver consistent single-digit millisecond performance at any scale for the most demanding applications on the planet. In this session, learn about the architecture choices for Amazon DynamoDB. Gain a better understanding of when to use DynamoDB and why DynamoDB is used by over one million AWS customers to power hundreds of applications that exceed half a million requests per second. Leave with a new perspective on how to design your own applications.

Learn more: AWS re:Invent: https://go.aws/reinvent. More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

About AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2024

An Exploration Of The Impediments To Reusable Data Pipelines

2024-12-08 · Data Engineering Podcast Listen

podcast_episode

by Maxime Beauchemin (Preset) , Tobias Macey

Activity Schema AI/ML Data Engineering Data Management Data Modelling Datafold dbt GenAI LLM Python SQL SQLMesh

Summary In this episode of the Data Engineering Podcast the inimitable Max Beauchemin talks about reusability in data pipelines. The conversation explores the "write everything twice" problem, where similar pipelines are built without code reuse, and discusses the challenges of managing different SQL dialects and relational databases. Max also touches on the evolving role of data engineers, drawing parallels with front-end engineering, and suggests that generative AI could facilitate knowledge capture and distribution in data engineering. He encourages the community to share reference implementations and templates to foster collaboration and innovation, and expresses hopes for a future where code reuse becomes more prevalent.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm joined again by Max Beauchemin to talk about the challenges of reusability in data pipelinesInterview IntroductionHow did you get involved in the area of data management?Can you start by sharing your current thesis on the opportunities and shortcomings of code and component reusability in the data context?What are some ways that you think about what constitutes a "component" in this context?The data ecosystem has arguably grown more varied and nuanced in recent years. At the same time, the number and maturity of tools has grown. What is your view on the current trend in productivity for data teams and practitioners?What do you see as the core impediments to building more reusable and general-purpose solutions in data engineering?How can we balance the actual needs of data consumers against their requests (whether well- or un-informed) to help increase our ability to better design our workflows for reuse?In data engineering there are two broad approaches; code-focused or SQL-focused pipelines. In principle one would think that code-focused environments would have better composability. What are you seeing as the realities in your personal experience and what you hear from other teams?When it comes to SQL dialects, dbt offers the option of Jinja macros, whereas SDF and SQLMesh offer automatic translation. There are also tools like PRQL and Malloy that aim to abstract away the underlying SQL. What are the tradeoffs across those options that help or hinder the portability of transformation logic?Which layers of the data stack/steps in the data journey do you see the greatest opportunity for improving the creation of more broadly usable abstractions/reusable elements?low/no code systems for code reuseimpact of LLMs on reusability/compositionimpact of background on industry practices (e.g. DBAs, sysadmins, analysts vs. SWE, etc.)polymorphic data models (e.g. activity schema)What are the most interesting, innovative, or unexpected ways that you have seen teams address composability and reusability of data components?What are the most interesting, unexpected, or challenging lessons that you have learned while working on data-oriented tools and utilities?What are your hopes and predictions for sharing of code and logic in the future of data engineering?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links Max's Blog PostAirflowSupersetTableauLookerPowerBICohort AnalysisNextJSAirbytePodcast EpisodeFivetranPodcast EpisodeSegmentdbtSQLMeshPodcast EpisodeSparkLAMP StackPHPRelational AlgebraKnowledge GraphPython MarshmallowData Warehouse Lifecycle Toolkit (affiliate link)Entity Centric Data Modeling Blog PostAmplitudeOSACon presentationol-data-platform Tobias' team's data platform codeThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

AWS re:Invent 2024 - Analyze Amazon Aurora & RDS data in Amazon Redshift with zero-ETL (DAT331)

2024-12-07 · AWS re:Invent 2024 Watch

video

Agile/Scrum AI/ML Analytics AWS Aurora CloudFormation Amazon RDS Cloud Computing ETL/ELT Redshift

Discover the power of Amazon Aurora and Amazon RDS zero-ETL integrations with Amazon Redshift. Zero-ETL integrations help unify your data across applications and data sources for holistic insights. This session explores how Amazon Aurora and Amazon RDS zero-ETL integrations with Amazon Redshift remove the need to build and manage complex data pipelines, enabling analytics and machine learning using Amazon Redshift on petabytes of transactional data from your relational databases. In this session, learn about key zero-ETL integration functionalities like data filtering, AWS CloudFormation support, and more.

Learn more: AWS re:Invent: https://go.aws/reinvent. More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

About AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2024

AWS re:Invent 2024 - Advanced data modeling with Amazon DynamoDB (DAT404)

2024-12-05 · AWS re:Invent 2024 Watch

video

by Alex DeBrie (Amazon Web Services (AWS))

Agile/Scrum AWS Cloud Computing Data Modelling DynamoDB

Amazon DynamoDB is a popular choice for modern applications because it’s a serverless database that provides single-digit millisecond performance at any scale. Optimizing your usage of DynamoDB requires a different approach to data modeling than traditional relational databases. In this session, AWS Data Hero Alex DeBrie shows you advanced techniques to help you get the most out of DynamoDB. Learn how to “think in DynamoDB” by learning the DynamoDB foundations and principles for data modeling. Further, learn practical strategies and DynamoDB features to handle difficult use cases in your application.

Learn more: AWS re:Invent: https://go.aws/reinvent. More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

About AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2024

Beginning MongoDB Atlas with .NET: Flexible and Scalable Document Data Storage for .NET Developers

2024-09-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Luce Carter

API Cloud Computing Microsoft MongoDB data data-engineering nosql-databases

This book is a tutorial on MongoDB customized for developers working in Microsoft .NET 6, .NET 7, and beyond. It explains the differences between relational database systems and the document model supported by MongoDB, and shows how to build .NET applications that run against a MongoDB database, especially one in the cloud. Author Luce Carter kicks things off by teaching you how to determine when to use a document database versus a relational engine. After that, she walks you through building a Microsoft .NET project combining the MongoDB Atlas cloud database as a service solution with a .NET. application. In the process, you will learn how to create, read, update, and delete data in MongoDB from any .NET project. You will come away from this book with a solid understanding of MongoDB’s Developer Data Platform and how to use it from your .NET applications. You’ll be able to connect to MongoDB in the cloud and take advantage of the flexibility and scalability that MongoDB’s document storage model provides, and you’ll understand how to craft your applications to run using document storage and the MongoDB database engine. What You Will Learn Know when to use the MongoDB document model Build .NET applications that connect to MongoDB for data storage Create MongoDB clusters on the MongoDB Atlas cloud platform Store data in MongoDB Atlas Create, Read, Update, and Delete (CRUD) data from .NET Web API projects Test your CRUD endpoints using RESTful operations Validate schemas to help protect against breaking changes Who This Book Is For .NET developers who are looking for an alternative to relational databases, and those looking for a flexible and scalable document storage solution for use from .NET applications. Additionally, anyone wanting to learn MongoDB in the context of .NET and C# will benefit from this book.

Amazon DynamoDB - The Definitive Guide

2024-08-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Mike Mackay , Aman Dhingra

AWS Cloud Computing DynamoDB NoSQL data data-engineering nosql-databases

Master Amazon DynamoDB, the serverless NoSQL database designed for lightning-fast performance and scalability, with this definitive guide. You'll delve into its features, learn advanced concepts, and acquire practical skills to harness DynamoDB for modern application development. What this Book will help me do Understand AWS DynamoDB fundamentals for real-world applications. Model and optimize NoSQL databases with advanced techniques. Integrate DynamoDB into scalable, high-performance architectures. Utilize DynamoDB indexing, caching, and analytical features effectively. Plan and execute RDBMS to NoSQL data migrations successfully. Author(s) None Dhingra, an AWS DynamoDB solutions expert, and None Mackay, a seasoned NoSQL architect, bring their combined expertise straight from Amazon Web Services to guide you step-by-step in mastering DynamoDB. Combining comprehensive technical knowledge with approachable explanations, they empower readers to implement practical and efficient data strategies. Who is it for? This book is ideal for software developers and architects seeking to deepen their knowledge about AWS solutions like DynamoDB, engineering managers aiming to incorporate scalable NoSQL solutions into their projects, and data professionals transitioning from RDBMS towards a serverless data approach. Individuals with basic knowledge in cloud computing or database systems and those ready to advance in DynamoDB will find this book particularly beneficial.

talk-data.com

RDBMS

Activity Trend

Top Events

Top Speakers

Advanced SQL

Designing Data-Intensive Applications, 2nd Edition

AWS re:Invent 2025 - Advanced data modeling with Amazon DynamoDB (DAT414)

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - An insider’s look into architecture choices for Amazon DynamoDB (DAT436)

AWSreInvent #AWSreInvent2025 #AWS

Just Use Postgres!

Low-Latency Workloads: A Guide to Transactional Data on Snowflake

From RAG to Relational: How Agentic Patterns Are Reshaping Data Architecture

Exam Ref DP-300 Administering Microsoft Azure SQL Solutions

FHIR-ing Up Healthcare Data : Restructing Healthcare data with FHIR

Database professionals meetup

Grokking Relational Database Design

Learn SQL in a Month of Lunches

Best of 2024: 50 Years of SQL with Don Chamberlin, Computer Scientist and Co-Inventor of SQL

SQL Essentials For Dummies

AWS re:Invent 2024 - An insider’s look into architecture choices for Amazon DynamoDB (DAT419)

AWSreInvent #AWSreInvent2024

An Exploration Of The Impediments To Reusable Data Pipelines

AWS re:Invent 2024 - Analyze Amazon Aurora & RDS data in Amazon Redshift with zero-ETL (DAT331)

AWSreInvent #AWSreInvent2024

AWS re:Invent 2024 - Advanced data modeling with Amazon DynamoDB (DAT404)

AWSreInvent #AWSreInvent2024

Beginning MongoDB Atlas with .NET: Flexible and Scalable Document Data Storage for .NET Developers

Amazon DynamoDB - The Definitive Guide