Capella Grundlagen

2024-06-19 · Couchbase Capella Test Drives - Virtueller Workshop

workshop

by Erfahrener Trainer (Couchbase)

Cloud Computing capella sql++

Capella Grundlagen: Einrichten eines Cloud-Accounts; Importieren relationaler Daten und Modellieren in NoSQL; Buckets, Dokumente, Abfragen und Indizes erstellen; Abfragen mit SQL++ (SQL für JSON) schreiben

#200 50 Years of SQL with Don Chamberlin, Computer Scientist and Co-Inventor of SQL

2024-04-22 · DataFramed Listen

podcast_episode

by Don Chamberlin (IBM) , Richie (DataCamp)

AI/ML Analytics IBM RDBMS SQL XML

Over the past 199 episodes of DataFramed, we’ve heard from people at the forefront of data and AI, and over the past year we’ve constantly looked ahead to the future AI might bring. But all of the technologies and ways of working we’ve witnessed have been built on foundations that were laid decades ago. For our 200th episode, we’re bringing you a special guest and taking a walk down memory lane—to the creation and development of one of the most popular programming languages in the world. Don Chamberlin is renowned as the co-inventor of SQL (Structured Query Language), the predominant database language globally, which he developed with Raymond Boyce in the mid-1970s. Chamberlin's professional career began at IBM Research in Yorktown Heights, New York, following a summer internship there during his academic years. His work on IBM's System R project led to the first SQL implementation and significantly advanced IBM’s relational database technology. His contributions were recognized when he was made an IBM Fellow in 2003 and later a Fellow of the Computer History Museum in 2009 for his pioneering work on SQL and database architectures. Chamberlin also contributed to the development of XQuery, an XML query language, as part of the W3C, which became a W3C Recommendation in January 2007. Additionally, he holds fellowships with ACM and IEEE and is a member of the National Academy of Engineering. In the episode, Richie and Don explore his early career at IBM and the development of his interest in databases alongside Ray Boyce, the database task group (DBTG), the transition to relational databases and the early development of SQL, the commercialization and adoption of SQL, how it became standardized, how it evolved and spread via open source, the future of SQL through NoSQL and SQL++ and much more. Links Mentioned in the Show: The first-ever journal paper on SQL. SEQUEL: A Structured English Query LanguageDon’s Book: SQL++ for SQL Users: A TutorialSystem R: Relational approach to database managementSQL CoursesSQL Articles, Tutorials and Code-AlongsRelated Episode: Scaling Enterprise Analytics with Libby Duane Adams, Chief Advocacy Officer and Co-Founder of AlteryxRewatch sessions from RADAR: The Analytics Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

Designing A Non-Relational Database Engine

2024-04-14 · Data Engineering Podcast Listen

podcast_episode

by Oren Eini (RavenDB) , Tobias Macey

AI/ML Analytics Cloud Computing Dagster Data Engineering Data Lake Data Lakehouse Data Management Data Quality Datafold dbt Delta +5 more

Summary

Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication. Leverage Datafold's fast cross-database data diffing and Monitoring to test your replication pipelines automatically and continuously. Validate consistency between source and target at any scale, and receive alerts about any discrepancies. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold. Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macey and today I'm interviewing Oren Eini about the work of designing and building a NoSQL database engine

Interview

Introduction How did you get involved in the area of data management? Can you describe what constitutes a NoSQL database?

How have the requirements and applications of NoSQL engines changed since they first became popular ~15 years ago?

What are the factors that convince teams to use a NoSQL vs. SQL database?

NoSQL is a generalized term that encompasses a number of different data models. How does the underlying representation (e.g. document, K/V, graph) change that calculus?

How have the evolution in data formats (e.g. N-dimensional vectors, point clouds, etc.) changed the landscape for NoSQL engines? When designing and building a database, what are the initial set of questions that need to be answered?

How many "core capabilities" can you reasonably design around before they conflict with each other?

How have you approached the evolution of RavenDB as you add new capabilities and mature the project?

What are some of the early decisions that had to be unwound to enable new capabilities?

If you were to start from scratch today, what database would you build? What are the most interesting, innovative, or unexpected ways that you have seen RavenDB/NoSQL databases used? What are the most interesting, unexpected, or challenging lessons t

The Complete Developer

2024-03-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Martin Krause

API Docker GitHub JavaScript MongoDB React TypeScript data data-engineering nosql-databases

Whether you’ve been in the developer kitchen for decades or are just taking the plunge to do it yourself, The Complete Developer will show you how to build and implement every component of a modern stack—from scratch. You’ll go from a React-driven frontend to a fully fleshed-out backend with Mongoose, MongoDB, and a complete set of REST and GraphQL APIs, and back again through the whole Next.js stack. The book’s easy-to-follow, step-by-step recipes will teach you how to build a web server with Express.js, create custom API routes, deploy applications via self-contained microservices, and add a reactive, component-based UI. You’ll leverage command line tools and full-stack frameworks to build an application whose no-effort user management rides on GitHub logins. You’ll also learn how to: Work with modern JavaScript syntax, TypeScript, and the Next.js framework Simplify UI development with the React library Extend your application with REST and GraphQL APIs Manage your data with the MongoDB NoSQL database Use OAuth to simplify user management, authentication, and authorization Automate testing with Jest, test-driven development, stubs, mocks, and fakes Whether you’re an experienced software engineer or new to DIY web development, The Complete Developer will teach you to succeed with the modern full stack. After all, control matters. Covers: Docker, Express.js, JavaScript, Jest, MongoDB, Mongoose, Next.js, Node.js, OAuth, React, REST and GraphQL APIs, and TypeScript

Big Data Computing

2024-02-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Bishwajeet Kumar Pandey , Tanvir Habib Sardar

Big Data Hadoop Hive Data Streaming apache-hive data data-engineering

This book primarily aims to provide an in-depth understanding of recent advances in big data computing technologies, methodologies, and applications along with introductory details of big data computing models such as Apache Hadoop, MapReduce, Hive, Pig, Mahout in-memory storage systems, NoSQL databases, and big data streaming services.

Mastering MongoDB 7.0 - Fourth Edition

2024-01-05 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Malak Abu Hammad , Elie Hannouch , Leandro Domingues , Marko Aleksendrić , Arek Borucki , Rachelle Palmer , Rajesh Nair

Data Management MongoDB Cyber Security data data-engineering nosql-databases

Mastering MongoDB 7.0 is your in-depth resource for learning MongoDB 7.0, the powerful NoSQL database designed for developers. Gain expertise in database architecture, data management, and modern features like MongoDB Atlas. By reading this book, you'll acquire the essential skills needed for building efficient, scalable, and secure applications. What this Book will help me do Develop expert-level skills in crafting advanced queries and managing complex data tasks in MongoDB. Learn to design efficient schemas and optimize indexing to maximize database performance. Integrate applications seamlessly with MongoDB Atlas, mastering its monitoring and backup tools. Implement robust security with RBAC, auditing strategies, and comprehensive encryption. Explore the latest MongoDB 7.0 features, including Atlas Vector Search, for modern applications. Author(s) Marko Aleksendrić, Arek Borucki, and co-authors are recognized MongoDB experts with years of hands-on experience. They bring together their expertise to deliver a practical guide filled with real-world insights that help developers advance their MongoDB skills. Their collaborative writing ensures comprehensive coverage of MongoDB 7.0 tools and techniques. Who is it for? This book is written for software developers, database administrators, and engineers who have intermediate knowledge of MongoDB and want to extend their expertise. Whether you are developing scalable applications, managing data systems, or ensuring database security, this book offers advanced guidance for achieving your professional goals with MongoDB.

Build supergraphs, not APIs

2023-12-15 · GraphQL Berlin Meetup #27

talk

by Tom Harding (Hasura)

SQL github api graphql hasura

Data is power, but building APIs is tedious. Engineers create vital value by modelling domains and data, but waste time on repetitive plumbing tasks like CRUD, data pipelines, and cross data source joins. What if you could skip all that? With supergraph, you can. Query the supergraph with GraphQL, and get consistent features like joins, filtering, and aggregations across all data sources. Set permissions where they belong: at the model level, where they can be applied to absolutely any query. Live demo of how to build a supergraph that connects the GitHub API with a users database will be presented.

Ed Anuff, CPO, DataStax as we disscuss the future of databases with AI and GenAI

2023-11-22 · Making Data Simple Listen

podcast_episode

by Ed Anuff (DataStax) , Al Martin (IBM)

AI/ML GenAI IBM RAG

Send us a text Back to talking Data with Ed Anuff, CPO, DataStax. With experience at Google, Apigee, Six Apart, Vignette, Epicentric, and Wired, Ed talks the future of databases with AI and GenAI.

05:04 The Crazy life of Ed Anuff08:12 DataStax defined10:06 Vector Database11:58 GenAI and RAG Pattern18:03 DataStax Differentiation21:39 NoSQL vs SQL24:27 Common AI Use Cases25:47 The Secret to ChatGPT31:10 DataStax 2min Pitch31:42 The Future35:47 Bring AI to the DataLinkedIn: linkedin.com/in/edanuff Website: https://www.datastax.com/ Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Learn Live: Build an AI-enabled chat w/ Azure OpenAI & Azure Cosmos DB | BRK404LL

2023-11-16 · Microsoft Ignite 2023 Watch

video

by Bethany Jepchumba (Microsoft) , Julia Muiruri (Microsoft) , Akah Mandela Munab

AI/ML API Azure C#/.NET Cosmos LLM Microsoft

Connect an existing ASP.NET Core Blazor web application to Azure Cosmos DB for NoSQL and Azure OpenAI using their .NET SDKs. Your code manages and queries items in an API for NoSQL container. Your code also sends prompts to Azure OpenAI and parses the responses. This LIVE session is presented by two experts, and our moderators will answer your questions directly in the chat.

𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀: * B Jb * Julia Muiruri * Tim Fish * Akah Mandela Munab * DMITRII SOLOVEV * Jay Gordon * Julian Sharp * Konstantin Berezovsky * DE Producer 9 * Olivia Guzzardo

𝗦𝗲𝘀𝘀𝗶𝗼𝗻 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻: This video is one of many sessions delivered for the Microsoft Ignite 2023 event. View sessions on-demand and learn more about Microsoft Ignite at https://ignite.microsoft.com

BRK404LL | English (US) | Data

MSIgnite

A Fresh Look at Data Modeling Part 1: The What and Why of Data Modeling - Audio Blog

2023-11-14 · Secrets of Data Analytics Leaders Listen

podcast_episode

Big Data Data Modelling

Many organizations abandoned data modeling as they embraced big data and NoSQL. Now they find that data modeling continues to be important, perhaps more important today than ever before. With a fresh look you’ll see that today’s data modeling is different from past practices – much more than physical design for relational data. Published at: https://www.eckerson.com/articles/a-fresh-look-at-data-modeling-part-1-the-what-and-why-of-data-modeling

Provision Redis Cloud with Pulumi across AWS, Azure, and Google Cloud

2023-07-13 · Live Workshop: Introduction to Redis and Pulumi

workshop

by Josh Kodroff (Pulumi) , Noam Stern (Redis)

AWS Azure Pulumi google cloud redis cloud

Learn how to provision Redis Cloud alongside other cloud resources on AWS, Azure, and Google Cloud using your favorite programming languages and the Redis Cloud provider for Pulumi. Topics include getting started with NoSQL using Redis Cloud, defining and deploying Redis Cloud resources with code, and integrating Pulumi with Redis.

Beginning Database Design Solutions, 2nd Edition

2023-04-04 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Rod Stephens

Cloud Computing Data Management data data-engineering relational-databases

A concise introduction to database design concepts, methods, and techniques in and out of the cloud In the newly revised second edition of Beginning Database Design Solutions: Understanding and Implementing Database Design Concepts for the Cloud and Beyond, Second Edition, award-winning programming instructor and mathematician Rod Stephens delivers an easy-to-understand guide to designing and implementing databases both in and out of the cloud. Without assuming any prior database design knowledge, the author walks you through the steps you’ll need to take to understand, analyze, design, and build databases. In the book, you’ll find clear coverage of foundational database concepts along with hands-on examples that help you practice important techniques so you can apply them to your own database designs, as well as: Downloadable source code that illustrates the concepts discussed in the book Best practices for reliable, platform-agnostic database design Strategies for digital transformation driven by universally accessible database design An essential resource for database administrators, data management specialists, and database developers seeking expertise in relational, NoSQL, and hybrid database design both in and out of the cloud, Beginning Database Design Solutions is a hands-on guide ideal for students and practicing professionals alike.

Introducing RavenDB: The Database for Modern Data Persistence

2022-11-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dejan Miličić

RDBMS SQL data data-engineering

Simplify your first steps with the RavenDB NoSQL Document Database. This book takes a task-oriented approach by showing common problems, potential solutions, brief explanations of how those solutions work, and the mechanisms used. Based on real-world examples, the recipes in this book will show you how to solve common problems with Raven Query Language and will highlight reasons why RavenDB is a great choice for fast prototyping solutions that can sustain increasing amounts of data as your application grows. Introducing RavenDB includes code and query examples that address real-life challenges you’ll encounter when using RavenDB, helping you learn the basics of the Raven Query Language more quickly and efficiently. In many cases, you’ll be able to copy and paste the examples into your own code, making only minor modifications to suit your application. RavenDB supports many advanced features, such full-text search, graph queries, and timeseries; recipes in the latter portion of the book will help you understand those advanced features and how they might be applied to your own code and applications. After reading this book, you will be able to employ RavenDB’s powerful features in your own projects. What You Will Learn Set up and start working with RavenDB Model your objects for persistence in a NoSQL document database Write basic and advanced queries in the Raven Query Language Index your data using map/reduce techniques Implement techniques leading to highly performant systems Efficiently aggregate data and query on those aggregations Who This Book Is For Developers accustomed to relational databases who are about to enter a world of NoSQL databases. The book is also for experienced programmers who have used other non-relational databases and want to learn RavenDB. It will also prove useful for developers who want to move away from using Object-Relational Modeling frameworks and start working with a persistence solution that can store object graphs directly.

What Is Distributed SQL?

2022-02-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Charles Custer , Paul Modderman , Jim Walker (Cockroach Labs)

Cloud Computing ELK RDBMS Cyber Security SQL data data-engineering nosql-databases

Globally available resources have become the status quo. They're accessible, distributed, and resilient. Our traditional SQL database options haven't kept up. Centralized SQL databases, even those with read replicas in the cloud, put all the transactional load on a central system. The further away that a transaction happens from the user, the more the user experience suffers. If the transactional data powering the application is greatly slowed down, fast-loading web pages mean nothing. In this report, Paul Modderman, Jim Walker, and Charles Custer explain how distributed SQL fits all applications and eliminates complex challenges like sharding from traditional RDBMS systems. You'll learn how distributed SQL databases can reach global scale without introducing the consistency trade-offs found in NoSQL solutions. These databases come to life through cloud computing, while legacy databases simply can't rise to meet the elastic and ubiquitous new paradigm. You'll learn: Key concepts driving this new technology, including the CAP theorem, the Raft consensus algorithm, multiversion concurrency control, and Google Spanner How distributed SQL databases meet enterprise requirements, including management, security, integration, and Everything as a Service (XaaS) The impact that distributed SQL has already made in the telecom, retail, and gaming industries Why serverless computing is an ideal fit for distributed SQL How distributed SQL can help you expand your company's strategic plan

Data Modeling for Azure Data Services

2021-07-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Peter ter Braake

Azure ADF BI Cloud Computing Cosmos Data Lake Data Management Data Modelling Data Vault ETL/ELT dimensional modeling Microsoft +6 more

Data Modeling for Azure Data Services is an essential guide that delves into the intricacies of designing, provisioning, and implementing robust data solutions within the Azure ecosystem. Through practical examples and hands-on exercises, this book equips you with the knowledge to create scalable, performant, and adaptable database designs tailored to your business needs. What this Book will help me do Understand and apply normalization, dimensional modeling, and data vault modeling for relational databases. Learn to provision and implement scalable solutions like Azure SQL DB and Azure Synapse SQL Pool. Master how to design and model a Data Lake using Azure Storage efficiently. Gain expertise in NoSQL database modeling and implementing solutions using Azure Cosmos DB. Develop ETL/ELT processes effectively using Azure Data Factory to support data integration workflows. Author(s) None Braake brings a wealth of expertise as a data architect and cloud solutions builder specializing in Azure's data services. With hands-on experience in projects requiring sophisticated data modeling and optimization, None crafts detailed learning material to help professionals level up their database design and Azure deployment skills. Dedicated to explaining complex topics with clarity and approachable language, None ensures that the learners gain not just knowledge but applied competence. Who is it for? This book is a valuable resource for business intelligence developers, data architects, and consultants aiming to refine their skills in data modeling within modern cloud ecosystems, particularly Microsoft Azure. Whether you're a beginner with some foundational cloud data management knowledge or an experienced professional seeking to deepen your Azure data services proficiency, this book caters to your learning needs.

MongoDB Fundamentals

2020-12-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Liviu Nedov , Sam Anderson , Amit Phaltankar , Juned Ahsan , Michael Harrison

Cloud Computing MongoDB Cyber Security data data-engineering nosql-databases

This book, "MongoDB Fundamentals", is the ideal hands-on guide to learning MongoDB. By starting from the basics of NoSQL databases and progressing to cloud integration using MongoDB Atlas, you will gain practical experience managing, querying, and visualizing data effectively for real-world applications. What this Book will help me do Set up and manage a MongoDB database with both local and cloud environments. Master querying and modifying data using the aggregation framework for complex operations. Implement effective database architecture with replication and sharding techniques. Ensure data security and resilience through user management and efficient backup/restore methods. Visualize data insights through dynamic reports and charts using MongoDB Charts. Author(s) Amit Phaltankar, Juned Ahsan, Michael Harrison, and Liviu Nedov are seasoned professionals in the field of database management systems, each bringing extensive experience working with MongoDB and cloud technologies. They excel at translating technical concepts into accessible, actionable insights, and have a passion for enabling IT professionals to create high-performance database solutions. Who is it for? "MongoDB Fundamentals" is tailored for developers, database administrators, system administrators, and cloud architects who are new to MongoDB but are looking to integrate it into their data processing workflows. It's perfect for those who aim to enhance their skills in handling data within cloud computing environments and have some basic programming or database experience.

Learn MongoDB 4.x

2020-09-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Doug Bierer

MongoDB Python Cyber Security data data-engineering nosql-databases

Explore the capabilities of MongoDB 4.x with this comprehensive guide designed for developers and administrators working with NoSQL databases. Dive into topics such as database design, advanced query handling, and security configuration, and gain hands-on experience through practical examples and insights. What this Book will help me do Learn to configure and install MongoDB 4.x for development and administration. Understand the principles of NoSQL schema design for optimal performance. Perform complex queries and operations to manage your MongoDB databases. Secure your MongoDB setup with role-based access control and encryption techniques. Monitor and optimize database performance for production environments. Author(s) None Bierer, the author of 'Learn MongoDB 4.x,' is a seasoned database expert with extensive experience in NoSQL technologies. With a focus on practicality and clear explanations, None brings deep insights into MongoDB's development and administration. Who is it for? This book is ideal for early-career developers, system administrators, and database enthusiasts eager to break into NoSQL technologies. If you are familiar with Python and basic database concepts, this book will guide you through mastering MongoDB. It's perfect for those building dynamic backend systems.

Introducing Microsoft SQL Server 2019

2020-04-27 · O'Reilly SQL Books O'Reilly Amazon

book

by James Rowland-Jones , Mitchell Pearson , Arun Sirpal , Dave Noderer , Dustin Ryan , Kellyn Gorman , Buck Woody , Allan Hirt

Analytics Azure BI Big Data Cloud Computing Data Management Docker Hadoop HDFS Kubernetes Microsoft Power BI +4 more

Introducing Microsoft SQL Server 2019 is the must-have guide for database professionals eager to leverage the latest advancements in SQL Server 2019. This book covers the features and capabilities that make SQL Server 2019 a powerful tool for managing and analyzing data both on-premises and in the cloud. What this Book will help me do Understand the new features introduced in SQL Server 2019 and their practical applications. Confidently manage and analyze relational, NoSQL, and big data within SQL Server 2019. Implement containerization for SQL Server using Docker and Kubernetes. Migrate and integrate your databases effectively to use Power BI Report Server. Query data from Hadoop Distributed File System with Azure Data Studio. Author(s) The authors of 'Introducing Microsoft SQL Server 2019' are subject matter experts including Kellyn Gorman, Allan Hirt, and others. With years of professional experience in database management and SQL Server, they bring a wealth of practical insight and knowledge to the book. Their experience spans roles as administrators, architects, and educators in the field. Who is it for? This book is aimed at database professionals such as DBAs, architects, and big data engineers who are currently using earlier versions of SQL Server or other database platforms. It is particularly well-suited for professionals aiming to understand and implement SQL Server 2019's new features. Readers should have basic familiarity with SQL Server and RDBMS concepts. If you're looking to explore SQL Server 2019 to improve data management and analytics in your organization, this book is for you.

Building A New Foundation For CouchDB

2020-03-17 · Data Engineering Podcast Listen

podcast_episode

by Adam Kocoloski , Tobias Macey

AI/ML Analytics Big Data ClickHouse Cloud Computing Data Engineering Data Lake Data Management DevOps DWH GitHub IBM +3 more

Summary CouchDB is a distributed document database built for scale and ease of operation. With a built-in synchronization protocol and a HTTP interface it has become popular as a backend for web and mobile applications. Created 15 years ago, it has accrued some technical debt which is being addressed with a refactored architecture based on FoundationDB. In this episode Adam Kocoloski shares the history of the project, how it works under the hood, and how the new design will improve the project for our new era of computation. This was an interesting conversation about the challenges of maintaining a large and mission critical project and the work being done to evolve it.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, a 40Gbit public network, fast object storage, and a brand new managed Kubernetes platform, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. And for your machine learning workloads, they’ve got dedicated CPU and GPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! Are you spending too much time maintaining your data pipeline? Snowplow empowers your business with a real-time event data pipeline running in your own cloud account without the hassle of maintenance. Snowplow takes care of everything from installing your pipeline in a couple of hours to upgrading and autoscaling so you can focus on your exciting data projects. Your team will get the most complete, accurate and ready-to-use behavioral web and mobile data, delivered into your data warehouse, data lake and real-time streams. Go to dataengineeringpodcast.com/snowplow today to find out why more than 600,000 websites run Snowplow. Set up a demo and mention you’re a listener for a special offer! Setting up and managing a data warehouse for your business analytics is a huge task. Integrating real-time data makes it even more challenging, but the insights you obtain can make or break your business growth. You deserve a data warehouse engine that outperforms the demands of your customers and simplifies your operations at a fraction of the time and cost that you might expect. You deserve ClickHouse, the open-source analytical database that deploys and scales wherever and whenever you want it to and turns data into actionable insights. And Altinity, the leading software and service provider for ClickHouse, is on a mission to help data engineers and DevOps managers tame their operational analytics. Go to dataengineeringpodcast.com/altinity for a free consultation to find out how they can help you today. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference in NYC, Strata Data in San Jose, and PyCon US in Pittsburgh. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Adam Kocoloski about CouchDB and the work being done to migrate the storage layer to FoundationDB

Interview

Introduction How did you get involved in the area of data management? Can you starty by describing what CouchDB is?

How did you get involved in the CouchDB project and what is your current role in the community?

What are the use cases that it is well suited for? Can you share some of the history of CouchDB and its role in the NoSQL movement? How is CouchDB currently architected and how has it evolved since it was first introduced? What have been the benefits and challenges of Erlang as the runtime for CouchDB? How is the current storage engine implemented and what are its shortcomings? What problems are you trying to solve by replatforming on a new storage layer?

What were the selection criteria for the new storage engine and how did you structure the decision making process? What was the motivation for choosing FoundationDB as opposed to other options such as rocksDB, levelDB, etc.?

How is the adoption of FoundationDB going to impact the overall architecture and implementation of CouchDB? How will the use of FoundationDB impact the way that the current capabilities are implemented, such as data replication? What will the migration path be for people running an existing installation? What are some of the biggest challenges that you are facing in rearchitecting the codebase? What new capabilities will the FoundationDB storage layer enable? What are some of the most interesting/unexpected/innovative ways that you have seen CouchDB used?

What new capabilities or use cases do you anticipate once this migration is complete?

What are some of the most interesting/unexpected/challenging lessons that you have learned while working with the CouchDB project and community? What is in store for the future of CouchDB?

Contact Info

LinkedIn @kocolosk on Twitter kocolosk on GitHub

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Apache CouchDB FoundationDB

Podcast Episode

IBM Cloudant Experimental Particle Physics FPGA == Field Programmable Gate Array Apache Software Foundation CRDT == Conflict-free Replicated Data Type

Podcast Episode

Erlang Riak RabbitMQ Heisenbug Kubernetes Property Based Testing

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

MongoDB: The Definitive Guide, 3rd Edition

2019-12-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Eoin Brazil , Shannon Bradshaw , Kristina Chodorow

MongoDB Cyber Security data data-engineering nosql-databases

Manage your data with a system designed to support modern application development. Updated for MongoDB 4.2, the third edition of this authoritative and accessible guide shows you the advantages of using document-oriented databases. You’ll learn how this secure, high-performance system enables flexible data models, high availability, and horizontal scalability. Authors Shannon Bradshaw, Eoin Brazil, and Kristina Chodorow provide guidance for database developers, advanced configuration for system administrators, and use cases for a variety of projects. NoSQL newcomers and experienced MongoDB users will find updates on querying, indexing, aggregation, transactions, replica sets, ops management, sharding and data administration, durability, monitoring, and security. In six parts, this book shows you how to: Work with MongoDB, perform write operations, find documents, and create complex queries Index collections, aggregate data, and use transactions for your application Configure a local replica set and learn how replication interacts with your application Set up cluster components and choose a shard key for a variety of applications Explore aspects of application administration and configure authentication and authorization Use stats when monitoring, back up and restore deployments, and use system settings when deploying MongoDB

talk-data.com

NoSQL

Activity Trend

Top Events

Top Speakers

Capella Grundlagen

#200 50 Years of SQL with Don Chamberlin, Computer Scientist and Co-Inventor of SQL

Designing A Non-Relational Database Engine

The Complete Developer

Big Data Computing

Mastering MongoDB 7.0 - Fourth Edition

Build supergraphs, not APIs

Ed Anuff, CPO, DataStax as we disscuss the future of databases with AI and GenAI

Learn Live: Build an AI-enabled chat w/ Azure OpenAI & Azure Cosmos DB | BRK404LL

MSIgnite

A Fresh Look at Data Modeling Part 1: The What and Why of Data Modeling - Audio Blog

Provision Redis Cloud with Pulumi across AWS, Azure, and Google Cloud

Beginning Database Design Solutions, 2nd Edition

Introducing RavenDB: The Database for Modern Data Persistence

What Is Distributed SQL?

Data Modeling for Azure Data Services

MongoDB Fundamentals

Learn MongoDB 4.x

Introducing Microsoft SQL Server 2019

Building A New Foundation For CouchDB

MongoDB: The Definitive Guide, 3rd Edition