talk-data.com talk-data.com

Topic

DynamoDB

database nosql aws cloud

43

tagged

Activity Trend

12 peak/qtr
2020-Q1 2026-Q1

Activities

43 activities · Newest first

AWS re:Invent 2024 - Data modeling core concepts for Amazon DynamoDB (DAT305)

Join this session to learn the core concepts of Amazon DynamoDB data modeling. Explore best practices for common access patterns used by DynamoDB customers for applications that need consistent, fast performance at any scale. Developers experienced with DynamoDB can learn best practices and trade-offs to make when deciding on single-table and multi-table designs, indexing strategies, and more.

Learn more: AWS re:Invent: https://go.aws/reinvent. More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

About AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2024

AWS re:Invent 2024 - Advanced data modeling with Amazon DynamoDB (DAT404)

Amazon DynamoDB is a popular choice for modern applications because it’s a serverless database that provides single-digit millisecond performance at any scale. Optimizing your usage of DynamoDB requires a different approach to data modeling than traditional relational databases. In this session, AWS Data Hero Alex DeBrie shows you advanced techniques to help you get the most out of DynamoDB. Learn how to “think in DynamoDB” by learning the DynamoDB foundations and principles for data modeling. Further, learn practical strategies and DynamoDB features to handle difficult use cases in your application.

Learn more: AWS re:Invent: https://go.aws/reinvent. More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

About AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2024

AWS re:Invent 2024 - Dive deep into Amazon DynamoDB (DAT406)

This session provides a deep dive into Amazon DynamoDB and shares insights into how DynamoDB is architected and how it’s able to deliver the scalability and response times that customers have come to expect. Gain an in-depth understanding of how DynamoDB capabilities including TTL, global tables, and MemDS have been implemented.

Learn more: AWS re:Invent: https://go.aws/reinvent. More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

About AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2024

AWS re:Invent 2024 - How Vanguard rebuilt its mission-critical trading application on AWS (FSI322)

Learn how Vanguard rebuilt its retail investor trading platform on AWS to achieve higher resiliency, improve performance, and lower costs by scaling to meet demand. Vanguard adopted AWS Fargate for Amazon ECS to run this mission-critical platform, which uses cloud-native services like Amazon Aurora PostgreSQL-Compatible Edition and Amazon DynamoDB to store and replicate data across Regions, achieving 99.99% availability over the past two years. New features that Vanguard needed months or longer to develop on its legacy mainframe monolith now take only weeks or even days, enabling the company to continuously improve the customer experience and compete effectively.

Learn more: AWS re:Invent: https://go.aws/reinvent. More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

About AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2024

In the world of observability, metrics and logs are the usual suspects for monitoring system health and diagnosing issues. But what happens when you don't know what to look for in advance? We tackle this challenge by incorporating business-critical events into our observability stack. Join me for this talk as I delve into how events can fill the gaps left by traditional metrics and logs. I'll share our journey in identifying which events are worth storing and how our technical setup evolved from periodic PostgreSQL pulls to real-time streaming with AWS Firehose. You'll see real-world examples through our Grafana dashboards and learn how this approach allows us to perform ad-hoc analyses spanning over two years without incurring huge costs.

Amazon DynamoDB - The Definitive Guide

Master Amazon DynamoDB, the serverless NoSQL database designed for lightning-fast performance and scalability, with this definitive guide. You'll delve into its features, learn advanced concepts, and acquire practical skills to harness DynamoDB for modern application development. What this Book will help me do Understand AWS DynamoDB fundamentals for real-world applications. Model and optimize NoSQL databases with advanced techniques. Integrate DynamoDB into scalable, high-performance architectures. Utilize DynamoDB indexing, caching, and analytical features effectively. Plan and execute RDBMS to NoSQL data migrations successfully. Author(s) None Dhingra, an AWS DynamoDB solutions expert, and None Mackay, a seasoned NoSQL architect, bring their combined expertise straight from Amazon Web Services to guide you step-by-step in mastering DynamoDB. Combining comprehensive technical knowledge with approachable explanations, they empower readers to implement practical and efficient data strategies. Who is it for? This book is ideal for software developers and architects seeking to deepen their knowledge about AWS solutions like DynamoDB, engineering managers aiming to incorporate scalable NoSQL solutions into their projects, and data professionals transitioning from RDBMS towards a serverless data approach. Individuals with basic knowledge in cloud computing or database systems and those ready to advance in DynamoDB will find this book particularly beneficial.

Data Dynamo is a comic that helps people understand the technology behind Hammerspace. Steve Gordon, renowned Illustartor, and Trip Hunter, Vice President of Corporate Marketing at Hammerspace, join us today to talk about how they came up with Data Dynamo and how this unique marketing approach helps people better understand the technology behind Hammerspace and how everyone can become a Data Dynamo. Download your free copy of Data Dynamo Volume 1 here: https://hammerspace.com/data-dynamo/

DataDynamo #Data #Technology #Storage #Datasilo #illustrator #marketing #VicePresident #podcast #datascience

Cyberpunk by jiglr | https://soundcloud.com/jiglrmusic Music promoted by https://www.free-stock-music.com Creative Commons Attribution 3.0 Unported License https://creativecommons.org/licenses/by/3.0/deed.en_US Hosted on Acast. See acast.com/privacy for more information.

AWS re:Inforce 2024 - Capital One’s approach for secure and resilient applications (DAP302)

Join this session to learn about Capital One’s strategic AWS Secrets Manager implementation that has helped ensure unified security across environments. Discover the key principles that can guide consistent use, with real-world examples to showcase the benefits and challenges faced. Gain insights into achieving reliability and resilience in financial services applications on AWS, including methods for maintaining system functionality amidst failures and scaling operations safely. Find out how you can implement chaos engineering and site reliability engineering using multi-Region services such as Amazon Route 53, AWS Auto Scaling, and Amazon DynamoDB.

Learn more about AWS re:Inforce at https://go.aws/reinforce.

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.

AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

reInforce2024 #CloudSecurity #AWS #AmazonWebServices #CloudComputing

AWS re:Inforce 2024 - Building resilient event-driven architectures, feat. United Airlines (DAP301)

United Airlines plans to take delivery of 700 new planes by 2032. With this growing fleet come more destinations, passengers, employees, and baggage—and a big increase in data, the lifeblood of airline operations. United Airlines is using event-driven architecture (EDA) to build a system that scales with their operations and evolves with their hybrid cloud throughout this journey. In this session, learn how United Airlines built a hybrid operations management system by modernizing from mainframes to AWS. Using Amazon MSK, Amazon DynamoDB, AWS KMS, and event mesh AWS ISV Partner Solace, they were able to design a well-crafted EDA to address their requirements.

Learn more about AWS re:Inforce at https://go.aws/reinforce.

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.

AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

reInforce2024 #CloudSecurity #AWS #AmazonWebServices #CloudComputing

Simplify data integration with Zero-ETL features (L300) | AWS Events

Uncover the power of Zero-ETL integrations for your data in AWS. Learn firsthand how integrating Amazon Aurora with Amazon Redshift and Amazon DynamoDB with Amazon OpenSearch enables near real-time analytics and machine learning. Tailored for developers, architects, and administrators, discover key insights, best practices, and techniques for seamless integration of fully managed services, reducing latency and unlocking the full potential of your data in AWS.

Learn more: https://go.aws/3x2mha0 Learn more about AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSEvents #GenerativeAI #AI #Cloud #AWSAIandDataConference

Processing Delta Lake Tables on AWS Using AWS Glue, Amazon Athena, and Amazon Redshift

Delta Lake is an open source project that helps implement modern data lake architectures commonly built on cloud storages. With Delta Lake, you can achieve ACID transactions, time travel queries, CDC, and other common use cases on the cloud.

There are a lot of use cases of Delta tables on AWS. AWS has invested a lot in this technology, and now Delta Lake is available with multiple AWS services, such as AWS Glue Spark jobs, Amazon EMR, Amazon Athena, and Amazon Redshift Spectrum. AWS Glue is a serverless, scalable data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources. With AWS Glue, you can easily ingest data from multiple data sources such as on-prem databases, Amazon RDS, DynamoDB, MongoDB into Delta Lake on Amazon S3 even without expertise in coding.

This session will demonstrate how to get started with processing Delta Lake tables on Amazon S3 using AWS Glue, and querying from Amazon Athena, and Amazon Redshift. The session also covers recent AWS service updates related to Delta Lake.

Talk by: Noritaka Sekiyama and Akira Ajisaka

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Data Led is Dumb

More and more companies are raising the “data-led” banner as manifest operational excellence. But is it true? Who's leading who? In her talk, long-time Coalesce dynamo, Emilie Schario (Amplify Partners) challenges data teams to start treating data like a product not a prophet—no better or worse than the people and processes it represents.

Check the slides here: https://docs.google.com/presentation/d/16Zeffh3ODiRkTj_mnugnMdozC4ex-Q4ejyJl_Y33mJc/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Streaming Data into Delta Lake with Rust and Kafka

Scribd's data architecture was originally batch-oriented, but in the last couple years, we introduced streaming data ingestion to provide near-real-time ad hoc query capability, mitigate the need for more batch processing tasks, and set the foundation for building real-time data applications.

Kafka and Delta Lake are the two key components of our streaming ingestion pipeline. Various applications and services write messages to Kafka as events are happening. We were tasked with getting these messages into Delta Lake quickly and efficiently.

Our first solution was to deploy Spark Structured Streaming jobs. This got us off the ground quickly, but had some downsides.

Since Delta Lake and the Delta transaction protocol are open source, we kicked off a project to implement our own Rust ingestion daemon. We were confident we could deliver a Rust implementation since our ingestion jobs are append only. Rust offers high performance with a focus on code safety and modern syntax.

In this talk I will describe Scribd's unique approach to ingesting messages from Kafka topics into Delta Lake tables. I will describe the architecture, deployment model, and performance of our solution, which leverages the kafka-delta-ingest Rust daemon and the delta-rs crate hosted in auto-scaling ECS services. I will discuss foundational design aspects for achieving data integrity such as distributed locking with DynamoDb to overcome S3's lack of "PutIfAbsent" semantics, and avoiding duplicates or data loss when multiple concurrent tasks are handling the same stream. I'll highlight the reliability and performance characteristics we've observed so far. I'll also describe the Terraform deployment model we use to deliver our 70-and-growing production ingestion streams into AWS.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Self-Serve, Automated and Robust CDC pipeline using AWS DMS, DynamoDB Streams and Databricks Delta

Many companies are trying to solve the challenges of ingesting transactional data in Data lake and dealing with late-arriving updates and deletes.

To address this at Swiggy, we have built CDC(Change Data Capture) system, an incremental processing framework to power all business-critical data pipelines at low latency and high efficiency.

It offers: Freshness: It operates in near real-time with configurable latency requirements. Performance: Optimized read and write performance with tuned compaction parameters and partitions and delta table optimization. Consistency: It supports reconciliation based on transaction types. Basically applying insert, update, and delete on existing data.

To implement this system, AWS DMS helped us with initial bootstrapping and CDC replication for Mysql sources. AWS Lambda and DynamoDB streams helped us to solve the bootstrapping and CDC replication for DynamoDB source.

After setting up the bootstrap and cdc replication process we have used Databricks delta merge to reconcile the data based on the transaction types.

To support the merge we have implemented supporting features - * Deduplicating multiple mutations of the same record using log offset and time stamp. * Adding optimal partition of the data set. * Infer schema and apply proper schema evolutions(Backward compatible schema) * We have extended the delta table snapshot generation technique to create a consistent partition for partitioned delta tables.

FInally to read the data we are using Spark sql with Hive metastore and Snowflake. Delta tables read with Spark sql have implicit support for hive metastore. We have built our own implementation of the snowflake sync process to create external, internal tables and materialized views on Snowflake.

Stats: 500m CDC logs/day 600+ tables

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Managing and Visualizing Your BIM Data

Managing and Visualizing Your BIM Data is an essential guide for AEC professionals who want to harness the power of data to enhance their projects. Designed with a hands-on approach, this book delves into using Autodesk Dynamo for data collection and Microsoft Power BI for creating insightful dashboards. By the end, readers will be adept at connecting BIM models to interactive visualizations. What this Book will help me do Gain a deep understanding of data collection workflows in Autodesk Dynamo. Learn to connect Building Information Modeling (BIM) data to Power BI dashboards. Master the basics and advanced features of Dynamo for BIM data management. Create dynamic and visually appealing Power BI dashboards for AEC projects. Explore real-world use cases with expert-guided hands-on examples. Author(s) The authors, None Pellegrino, None Bottiglieri, None Crump, None Pieper, and None Touil, are experienced professionals in the AEC and software development industries. With extensive backgrounds in Building Information Modeling (BIM) and data visualization, they bring practical insights combined with a passion for teaching. Their approach ensures readers not only learn the tools but also understand the reasoning behind best practices. Who is it for? This book is ideal for BIM managers and coordinators, design technology managers, and other Architecture, Engineering, and Construction (AEC) professionals. Readers with a foundational knowledge of BIM will find it particularly beneficial for enhancing their data analysis and reporting capabilities. If you're aiming to elevate your skill set in managing BIM data and creating impactful visualizations, this guide is for you.

Summary One of the biggest challenges for any business trying to grow and reach customers globally is how to scale their data storage. FaunaDB is a cloud native database built by the engineers behind Twitter’s infrastructure and designed to serve the needs of modern systems. Evan Weaver is the co-founder and CEO of Fauna and in this episode he explains the unique capabilities of Fauna, compares the consensus and transaction algorithm to that used in other NewSQL systems, and describes the ways that it allows for new application design patterns. One of the unique aspects of Fauna that is worth drawing attention to is the first class support for temporality that simplifies querying of historical states of the data. It is definitely worth a good look for anyone building a platform that needs a simple to manage data layer that will scale with your business.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! Alluxio is an open source, distributed data orchestration layer that makes it easier to scale your compute and your storage independently. By transparently pulling data from underlying silos, Alluxio unlocks the value of your data and allows for modern computation-intensive workloads to become truly elastic and flexible for the cloud. With Alluxio, companies like Barclays, JD.com, Tencent, and Two Sigma can manage data efficiently, accelerate business analytics, and ease the adoption of any cloud. Go to dataengineeringpodcast.com/alluxio today to learn more and thank them for their support. Understanding how your customers are using your product is critical for businesses of any size. To make it easier for startups to focus on delivering useful features Segment offers a flexible and reliable data infrastructure for your customer analytics and custom events. You only need to maintain one integration to instrument your code and get a future-proof way to send data to over 250 services with the flip of a switch. Not only does it free up your engineers’ time, it lets your business users decide what data they want where. Go to dataengineeringpodcast.com/segmentio today to sign up for their startup plan and get $25,000 in Segment credits and $1 million in free software from marketing and analytics companies like AWS, Google, and Intercom. On top of that you’ll get access to Analytics Academy for the educational resources you need to become an expert in data analytics for measuring product-market fit. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Evan Weaver about FaunaDB, a modern operational data platform built for your cloud

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what FaunaDB is and how it got started? What are some of the main use cases that FaunaDB is targeting?

How does it compare to some of the other global scale databases that have been built in recent years such as CockroachDB?

Can you describe the architecture of FaunaDB and how it has evolved? The consensus and replication protocol in Fauna is intriguing. Can you talk through how it works?

What are some of the edge cases that users should be aware of? How are conflicts managed in Fauna?

What is the underlying storage layer?

How is the query layer designed to allow for different query patterns and model representations?

How does data modeling in Fauna compare to that of relational or document databases?

Can you describe the query format? What are some of the common difficulties or points of confusion around interacting with data in Fauna?

What are some application design patterns that are enabled by using Fauna as the storage layer? Given the ability to replicate globally, how do you mitigate latency when interacting with the database? What are some of the most interesting or unexpected ways that you have seen Fauna used? When is it the wrong choice? What have been some of the most interesting/unexpected/challenging aspects of building the Fauna database and company? What do you have in store for the future of Fauna?

Contact Info

@evan on Twitter LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Fauna Ruby on Rails CNET GitHub Twitter NoSQL Cassandra InnoDB Redis Memcached Timeseries Spanner Paper DynamoDB Paper Percolator ACID Calvin Protocol Daniel Abadi LINQ LSM Tree (Log-structured Merge-tree) Scala Change Data Capture GraphQL

Podcast.init Interview About Graphene

Fauna Query Language (FQL) CQL == Cassandra Query Language Object-Relational Databases LDAP == Lightweight Directory Access Protocol Auth0 OLAP == Online Analytical Processing Jepsen distributed systems safety research

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Seven NoSQL Databases in a Week

Learn the fundamentals of seven essential NoSQL databases in just one week with this book. Covering MongoDB, DynamoDB, Redis, Cassandra, Neo4j, InfluxDB, and HBase, you'll explore their functionalities and practical applications. Designed to give you a working understanding of NoSQL database types, this guide helps aspiring DBAs and developers comprehend and utilize modern data solutions. What this Book will help me do Master the fundamentals of MongoDB, including high-performance, high-availability, and scaling features. Gain hands-on experience with Neo4j to perform database queries and integrate with Python and Java applications. Learn efficient querying with Redis for storage and retrieval tasks. Understand Cassandra's powerful solution for scalable and fault-tolerant systems. Get well-versed with HBase for creating tables, and reading and writing data efficiently. Author(s) Sudarshan Kadambi and Xun (Brian) Wu bring a wealth of experience in database technologies. They have worked extensively in the software development and database management fields. With their practical and concise teaching approach, the authors make complex topics accessible for readers. Who is it for? This book is ideal for budding DBAs and developers looking to understand NoSQL databases. It is particularly useful for those transitioning from relational databases who want to learn about modern database technologies. Suitable for both beginners and those with some database knowledge, it aims to bridge skill gaps and expand the reader's technical expertise.

Summary

As we scale our systems to handle larger volumes of data, geographically distributed users, and varied data sources the requirement to distribute the computational resources for managing that information becomes more pronounced. In order to ensure that all of the distributed nodes in our systems agree with each other we need to build mechanisms to properly handle replication of data and conflict resolution. In this episode Christopher Meiklejohn discusses the research he is doing with Conflict-Free Replicated Data Types (CRDTs) and how they fit in with existing methods for sharing and sharding data. He also shares resources for systems that leverage CRDTs, how you can incorporate them into your systems, and when they might not be the right solution. It is a fascinating and informative treatment of a topic that is becoming increasingly relevant in a data driven world.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at dataengineeringpodcast.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your data pipelines or trying out the tools you hear about on the show. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers Your host is Tobias Macey and today I’m interviewing Christopher Meiklejohn about establishing consensus in distributed systems

Interview

Introduction How did you get involved in the area of data management? You have dealt with CRDTs with your work in industry, as well as in your research. Can you start by explaining what a CRDT is, how you first began working with them, and some of their current manifestations? Other than CRDTs, what are some of the methods for establishing consensus across nodes in a system and how does increased scale affect their relative effectiveness? One of the projects that you have been involved in which relies on CRDTs is LASP. Can you describe what LASP is and what your role in the project has been? Can you provide examples of some production systems or available tools that are leveraging CRDTs? If someone wants to take advantage of CRDTs in their applications or data processing, what are the available off-the-shelf options, and what would be involved in implementing custom data types? What areas of research are you most excited about right now? Given that you are currently working on your PhD, do you have any thoughts on the projects or industries that you would like to be involved in once your degree is completed?

Contact Info

Website cmeiklejohn on GitHub Google Scholar Citations

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Basho Riak Syncfree LASP CRDT Mesosphere CAP Theorem Cassandra DynamoDB Bayou System (Xerox PARC) Multivalue Register Paxos RAFT Byzantine Fault Tolerance Two Phase Commit Spanner ReactiveX Tensorflow Erlang Docker Kubernetes Erleans Orleans Atom Editor Automerge Martin Klepman Akka Delta CRDTs Antidote DB Kops Eventual Consistency Causal Consistency ACID Transactions Joe Hellerstein

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

DynamoDB Cookbook

This comprehensive guide introduces you to Amazon's DynamoDB, a NoSQL database designed for high scalability and performance. Using this book, you will learn how to build robust web and mobile applications on DynamoDB and integrate it seamlessly with other AWS services for a complete cloud solution. What this Book will help me do Understand the key design concepts of DynamoDB and leverage its performance and scalability in your projects. Learn best practices for operating and managing DynamoDB tables, including optimizing throughput and designing efficient indexes. Master techniques for securing data in DynamoDB, including encryption and access management approaches. Explore integration strategies with other AWS services such as S3, EMR, and Lambda, to develop complex, real-world applications. Learn cost-effective solutions and tips for managing DynamoDB usage to avoid unnecessary expenses while maximizing resources. Author(s) None Deshpande, an expert in AWS and NoSQL databases, brings years of practical experience and engineering best practices to this book. With a strong focus on clear and actionable insights, Deshpande is dedicated to enabling developers to unlock the full potential of DynamoDB and related services for scalable application development. Who is it for? This book is most suited for developers and architects familiar with AWS who aim to deepen their understanding of DynamoDB. It is ideal for individuals looking to harness NoSQL databases for robust and scalable application solutions. The topics covered range from foundational knowledge to advanced integrations, making the book approachable yet comprehensive for both learners and seasoned practitioners.

DynamoDB Applied Design Patterns

In "DynamoDB Applied Design Patterns", you'll dive deep into the effective design patterns that optimize the performance of applications using DynamoDB. Through practical examples and best practices, this guide empowers developers to create scalable, efficient, and robust DynamoDB implementations. What this Book will help me do Master how to design effective data models using DynamoDB's native features such as tables, attributes, and indexes. Learn to utilize DynamoDB features like global and local secondary indexes to optimize performance. Gain in-depth knowledge on managing and querying DynamoDB using AWS services and tools. Integrate DynamoDB seamlessly with AWS services such as Redshift, S3, and MapReduce. Leverage advanced DynamoDB API features to retrieve data efficiently for diverse application use cases. Author(s) Uchit Hamendra Vyas is a highly skilled professional specializing in AWS and cloud computing. With years of experience as a developer and architect, he brings practical insights into designing efficient database solutions. His approachable teaching style makes complex topics clear and accessible. Who is it for? This book is designed for developers working with or interested in using DynamoDB in their projects. It assumes a moderate familiarity with database design and AWS concepts. Readers aiming to enhance their DynamoDB skills and optimize performance will greatly benefit. If you're looking to take your NoSQL database knowledge to the next level, this book is for you.