MongoDB

Getting Started with MongoDB Atlas on Google Cloud

2024-04-10 · Google Cloud Next '24

session

by Stanimira Vlaeva (MongoDB) , Shane McAllister (MongoDB)

AI/ML API BigQuery Cloud Computing GCP NLP React

Explore Mongodb Atlas — MongoDB’s developer data platform, and learn how to integrate it with various Google Cloud services. During this lab lounge, you will create a fully managed database deployment, set up serverless Triggers that react to database events, and build Atlas Functions to communicate with Google Cloud APIs.

Additionally, you will explore Google Cloud’s NLP APIs, perform sentiment analysis on incoming data, learn how to replicate operational datasets from MongoDB Atlas to BigQuery and build an ML model for classification.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Make generative AI work: Best practices from data leaders

2024-04-09 · Google Cloud Next '24

session

by Erin McGowan (Dataiku) , Andrew Davidson (MongoDB) , Bruno Aziza (Google Cloud) , Will LaForest (Confluent) , Chris D’Agostino (FIS Global)

AI/ML Cloud Computing Databricks Dataiku GCP GenAI

The data landscape is evolving rapidly, with generative AI poised to revolutionize insight generation and data culture. Join experts from Databricks, MongoDB, Confluent, and Dataiku for an exclusive executive discussion on harnessing gen AI's transformative potential. We'll explore how to break down multicloud data silos, empowering informed decision-making and unlocking your data's full value with gen AI. Discover strategies for integrating gen AI, addressing challenges, and building a future-proof, innovation-driven data culture.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

How Rent the Runway transforms garment management automation with MongoDB

2024-04-09 · Google Cloud Next '24

session

by Mike Liberant (Rent The Runway) , Prashant Juttukonda (MongoDB)

AI/ML Cloud Computing GCP Kubernetes

In this session, you will learn how Rent the Runway (RTR) relies on MongoDB Atlas on Google Cloud to mix their automation hardware with their software, needing a robust, flexible, and intuitive data platform. We’ll dive into some reference architecture, highlighting some key integrations, such as Google Kubernetes Engine. We will then discuss RTR’s AI strategy, discussing how they’re approaching AI tools for their products. Lastly, we’ll discuss RTR and MongoDB’s mission of sustainability. Q&A to follow.

By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

#194 [Radar Recap] Scaling Data ROI: Driving Analytics Adoption Within Your Organization with Laura Gent Felker, Omar Khawaja and Tiffany Perkins-Munn

2024-04-04 · DataFramed Listen

podcast_episode

by Laura Gent Felker (Salesforce) , Omar Khawaja (Givaudan) , Dr. Tiffany Perkins-Munn, MD (J.P. Morgan Chase (JPMC))

Analytics GTM

You've just invested in licenses for your favorite analytics tool, but now what? In this session, Laura Gent Felker, GTM Analytics Lead at MongoDB, Tiffany Perkins-Munn, Managing Director & Head of Data & Analytics at JPMC and Omar Khawaja, CDAO & Global Head Data & Analytics at Givaudan will explore best practices when it comes to scaling analytics adoption within the wider organization. They will discuss how to approach change management when it comes to driving analytics adoption, the role of data leaders in driving a culture change around analytics tooling, and a lot more.

Predictive Auto-Scaling at MongoDB

2024-03-26 · Data Council Austin 2024 - Day 1 Watch

talk

by Matthieu Humeau , Jesse Jiryu Davis

The Complete Developer

2024-03-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Martin Krause

API Docker GitHub JavaScript NoSQL React TypeScript data data-engineering nosql-databases

Whether you’ve been in the developer kitchen for decades or are just taking the plunge to do it yourself, The Complete Developer will show you how to build and implement every component of a modern stack—from scratch. You’ll go from a React-driven frontend to a fully fleshed-out backend with Mongoose, MongoDB, and a complete set of REST and GraphQL APIs, and back again through the whole Next.js stack. The book’s easy-to-follow, step-by-step recipes will teach you how to build a web server with Express.js, create custom API routes, deploy applications via self-contained microservices, and add a reactive, component-based UI. You’ll leverage command line tools and full-stack frameworks to build an application whose no-effort user management rides on GitHub logins. You’ll also learn how to: Work with modern JavaScript syntax, TypeScript, and the Next.js framework Simplify UI development with the React library Extend your application with REST and GraphQL APIs Manage your data with the MongoDB NoSQL database Use OAuth to simplify user management, authentication, and authorization Automate testing with Jest, test-driven development, stubs, mocks, and fakes Whether you’re an experienced software engineer or new to DIY web development, The Complete Developer will teach you to succeed with the modern full stack. After all, control matters. Covers: Docker, Express.js, JavaScript, Jest, MongoDB, Mongoose, Next.js, Node.js, OAuth, React, REST and GraphQL APIs, and TypeScript

RAG using Semantic Kernel with Azure OpenAI and Azure Cosmos DB for MongoDB vCore

2024-03-14 · Python Data Science Day

talk

Azure Cosmos LLM RAG

Practical MongoDB Aggregations

2024-03-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Paul Done

Big Data IoT Cyber Security data data-engineering nosql-databases

Dive into the capabilities of the MongoDB aggregation framework with this official guide, "Practical MongoDB Aggregations". You'll learn how to design and optimize efficient aggregation pipelines for MongoDB 7.0, empowering you to handle complex data analysis and processing tasks directly within the database. What this Book will help me do Gain expertise in crafting advanced MongoDB aggregation pipelines for custom data workflows. Learn to perform time series analysis for financial datasets and IoT applications. Discover optimization techniques for working with sharded clusters and large datasets. Master array manipulation and other specific operations essential for MongoDB data models. Build pipelines that ensure data security and distribution while maintaining performance. Author(s) Paul Done, a recognized expert in MongoDB, brings his extensive experience in database technologies to this book. With years of practice in helping companies leverage MongoDB for big data solutions, Paul shares his deep knowledge in an accessible and logical manner. His approach to writing is hands-on, focusing on practical insights and clear explanations. Who is it for? This book is tailored for intermediate-level developers, database architects, data analysts, engineers, and scientists who use MongoDB. If you are familiar with MongoDB and looking to expand your understanding specifically around its aggregation capabilities, this guide is for you. Whether you're analyzing time series data or need to optimize pipelines for performance, you'll find actionable tips and examples here to suit your needs.

Mastering MongoDB 7.0 - Fourth Edition

2024-02-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Malak Abu Hammad , Elie Hannouch , Leandro Domingues , Marko Aleksendrić , Arek Borucki , Rachelle Palmer , Rajesh Nair

Data Management Cyber Security data data-engineering nosql-databases

Discover the many capabilities of MongoDB 7.0 with this comprehensive guide designed to take your database skills to new heights. By exploring advanced features like aggregation pipelines, role-based security, and MongoDB Atlas, you will gain in-depth expertise in modern data management. This book empowers you to create secure, high-performance database applications. What this Book will help me do Understand and implement advanced MongoDB queries for detailed data analysis. Apply optimized indexing techniques to maximize query performance. Leverage MongoDB Atlas for robust monitoring, efficient backups, and advanced integrations. Develop secure applications with role-based access control, auditing, and encryption. Create scalable and innovative solutions using the latest features in MongoDB 7.0. Author(s) Marko Aleksendrić, Arek Borucki, and their co-authors are accomplished experts in database engineering and MongoDB development. They bring collective experience in teaching and practical application of MongoDB solutions across various industries. Their goal is to simplify complex topics, making them approachable and actionable for developers worldwide. Who is it for? This book is written for developers, software engineers, and database administrators with experience in MongoDB who want to deepen their expertise. An understanding of basic database operations and queries is recommended. If you are looking to master advanced concepts and create secure, optimized, and scalable applications, this is the book for you.

Mastering MongoDB 7.0 - Fourth Edition

2024-01-05 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Malak Abu Hammad , Elie Hannouch , Leandro Domingues , Marko Aleksendrić , Arek Borucki , Rachelle Palmer , Rajesh Nair

Data Management NoSQL Cyber Security data data-engineering nosql-databases

Mastering MongoDB 7.0 is your in-depth resource for learning MongoDB 7.0, the powerful NoSQL database designed for developers. Gain expertise in database architecture, data management, and modern features like MongoDB Atlas. By reading this book, you'll acquire the essential skills needed for building efficient, scalable, and secure applications. What this Book will help me do Develop expert-level skills in crafting advanced queries and managing complex data tasks in MongoDB. Learn to design efficient schemas and optimize indexing to maximize database performance. Integrate applications seamlessly with MongoDB Atlas, mastering its monitoring and backup tools. Implement robust security with RBAC, auditing strategies, and comprehensive encryption. Explore the latest MongoDB 7.0 features, including Atlas Vector Search, for modern applications. Author(s) Marko Aleksendrić, Arek Borucki, and co-authors are recognized MongoDB experts with years of hands-on experience. They bring together their expertise to deliver a practical guide filled with real-world insights that help developers advance their MongoDB skills. Their collaborative writing ensures comprehensive coverage of MongoDB 7.0 tools and techniques. Who is it for? This book is written for software developers, database administrators, and engineers who have intermediate knowledge of MongoDB and want to extend their expertise. Whether you are developing scalable applications, managing data systems, or ensuring database security, this book offers advanced guidance for achieving your professional goals with MongoDB.

Peter Farkas: Moving MongoDB Workloads to Postgres with FerretDB

2023-12-06 · DATA MINER Big Data Europe Conference 2020 Watch

video

by Peter Farkas

Big Data postgresql

"Peter Farkas unveils a game-changing solution in 'Moving MongoDB Workloads to Postgres with FerretDB.' 🔄 Learn how to seamlessly transition MongoDB workloads to Postgres without application-level changes, and ensure a smooth user experience with familiar tools and frameworks. 📦🐘 #MongoDB #Postgres #FerretDB"

✨ H I G H L I G H T S ✨

🙌 A huge shoutout to all the incredible participants who made Big Data Conference Europe 2023 in Vilnius, Lithuania, from November 21-24, an absolute triumph! 🎉 Your attendance and active participation were instrumental in making this event so special. 🌍

Don't forget to check out the session recordings from the conference to relive the valuable insights and knowledge shared! 📽️

Once again, THANK YOU for playing a pivotal role in the success of Big Data Conference Europe 2023. 🚀 See you next year for another unforgettable conference! 📅 #BigDataConference #SeeYouNextYear

Practical MongoDB Aggregations

2023-09-07 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Paul Done

data data-engineering nosql-databases

Practical MongoDB Aggregations serves as the definitive guide to mastering aggregation pipelines within MongoDB 7.0. Officially endorsed by MongoDB, Inc., this book provides streamlined strategies and practical examples to help you achieve complex data manipulation and analytical tasks, ultimately enhancing your database operation proficiency. What this Book will help me do Understand the architecture of the MongoDB aggregation framework to build scalable pipelines. Design and implement optimized aggregation pipelines for high performance. Learn practical techniques for processing large datasets efficiently using sharding. Apply data processing directly within MongoDB to minimize external workflows. Master handling arrays and securing data through well-designed pipelines. Author(s) Paul Done is an experienced software engineer with in-depth expertise in MongoDB and database systems. With years of professional experience managing and optimizing databases, Paul draws from real-world scenarios to devise effective strategies for learning MongoDB's advanced features. His approachable and instructional writing style empowers developers, engineers, and analysts to reach their full potential. Who is it for? This book is perfect for developers, database architects, and data engineers who have a foundational understanding of MongoDB and are looking to deepen their practical skills in using aggregation pipelines. Professionals who want to perform efficient data processing and gain insights into MongoDB's advanced features will find this guide invaluable. If you wish to streamline analytical tasks, optimize performance, and work efficiently with MongoDB's latest functionalities, this book is tailored for you.

Making the Shift to Application-Driven Intelligence

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Ashwin Gangadhar , Mat Keep

AI/ML Analytics Data Lakehouse Databricks DWH

In the digital economy, application-driven intelligence delivered against live, real-time data will become a core capability of successful enterprises. It has the potential to improve the experience that you provide to your customers and deepen their engagement. But to make application-driven intelligence a reality, you can no longer rely only on copying live application data out of operational systems into analytics stores. Rather, it takes the unique real-time application-serving layer of a MongoDB database combined with the scale and real-time capabilities of a Databricks Lakehouse to automate and operationalize complex and AI-enhanced applications at scale.

In this session, we will show how it can be seamless for developers and data scientists to automate decisioning and actions on fresh application data and we'll deliver a practical demonstration on how operational data can be integrated in real time to run complex machine learning pipelines.

Talk by: Mat Keep and Ashwin Gangadhar

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Processing Delta Lake Tables on AWS Using AWS Glue, Amazon Athena, and Amazon Redshift

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Noritaka Sekiyama (Amazon Web Services (AWS)) , Akira Ajisaka

Athena AWS Amazon EMR AWS Glue Amazon RDS Cloud Computing Data Lake Data Lakehouse Databricks Delta DWH DynamoDB +3 more

Delta Lake is an open source project that helps implement modern data lake architectures commonly built on cloud storages. With Delta Lake, you can achieve ACID transactions, time travel queries, CDC, and other common use cases on the cloud.

There are a lot of use cases of Delta tables on AWS. AWS has invested a lot in this technology, and now Delta Lake is available with multiple AWS services, such as AWS Glue Spark jobs, Amazon EMR, Amazon Athena, and Amazon Redshift Spectrum. AWS Glue is a serverless, scalable data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources. With AWS Glue, you can easily ingest data from multiple data sources such as on-prem databases, Amazon RDS, DynamoDB, MongoDB into Delta Lake on Amazon S3 even without expertise in coding.

This session will demonstrate how to get started with processing Delta Lake tables on Amazon S3 using AWS Glue, and querying from Amazon Athena, and Amazon Redshift. The session also covers recent AWS service updates related to Delta Lake.

Talk by: Noritaka Sekiyama and Akira Ajisaka

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Striim | Powering a Delightful Travel Experience with a Real-Time Operational Data Hub

Creating the Right Developer Community for Your Company | AWS

2023-05-11 · Data Council 2023 Watch

video

by Wesley Faulkner

AI/ML Analytics AWS Data Engineering IBM

ABOUT THE TALK: Wesley Faulkner explores the various types of communities and discusses how to determine the most suitable one for your company at various stages of growth. Whether you are looking to double down on your current community or expand to new platforms, Wesley provides the guidance you'll need to make informed decisions about building a strong and effective community.

ABOUT THE SPEAKER: Wesley Faulkner is a first-generation American, public speaker, and podcaster. He is a founding member of the government transparency group Open Austin and a staunch supporter of racial justice, workplace equity, and neurodiversity. His professional experience spans technology from AMD, Atlassian, Dell, IBM, and MongoDB.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Increase Your Odds Of Success For Analytics And AI Through More Effective Knowledge Management With AlignAI

2022-12-29 · Data Engineering Podcast Listen

podcast_episode

by Rehgan Avon (AlignAI) , Tobias Macey

AI/ML Airflow Analytics BI Data Engineering Data Management dbt Kubernetes Monte Carlo MySQL PagerDuty postgresql

Summary

Making effective use of data requires proper context around the information that is being used. As the size and complexity of your organization increases the difficulty of ensuring that everyone has the necessary knowledge about how to get their work done scales exponentially. Wikis and intranets are a common way to attempt to solve this problem, but they are frequently ineffective. Rehgan Avon co-founded AlignAI to help address this challenge through a more purposeful platform designed to collect and distribute the knowledge of how and why data is used in a business. In this episode she shares the strategic and tactical elements of how to make more effective use of the technical and organizational resources that are available to you for getting work done with data.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you're ready to build your next pipeline, or want to test out the projects you hear about on the show, you'll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan's active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. Struggling with broken pipelines? Stale dashboards? Missing data? If this resonates with you, you’re not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the leading end-to-end Data Observability Platform! Trusted by the data teams at Fox, JetBlue, and PagerDuty, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, dbt models, Airflow jobs, and business intelligence tools, reducing time to detection and resolution from weeks to just minutes. Monte Carlo also gives you a holistic picture of data health with automatic, end-to-end lineage from ingestion to the BI layer directly out of the box. Start trusting your data with Monte Carlo today! Visit dataengineeringpodcast.com/montecarlo to learn more. Your host is Tobias Macey and today I'm interviewing Rehgan Avon about her work at AlignAI to help organizations standardize their technical and procedural approaches to working with data

Interview

Introduction How did you get involved in the area of data management? Can you describe what AlignAI is and the story behind it? What are the core problems that you are focused on addressing?

What are the tactical ways that you are working to solve those problems?

What are some of the common and avoidable ways that analytics/AI projects go wrong?

What are some of the ways that organizational scale and complexity impacts their ability to execute on data and AI projects?

What are the ways that incomplete/unevenly distributed knowledge manifests in project design and execution? Can you describe the design and implementation of the AlignAI platform?

How have the goals and implementation of the product changed since you

Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams

2022-12-29 · Data Engineering Podcast Listen

podcast_episode

by Vishal Singh (Starburst) , Tobias Macey

Airflow Analytics CDP CI/CD Cloud Computing Data Engineering Data Lake Data Management Data Quality Datafold dbt DWH +10 more

Summary

With all of the messaging about treating data as a product it is becoming difficult to know what that even means. Vishal Singh is the head of products at Starburst which means that he has to spend all of his time thinking and talking about the details of product thinking and its application to data. In this episode he shares his thoughts on the strategic and tactical elements of moving your work as a data professional from being task-oriented to being product-oriented and the long term improvements in your productivity that it provides.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you're ready to build your next pipeline, or want to test out the projects you hear about on the show, you'll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show! Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it’s often too late and damage is done. Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. No more shipping and praying, you can now know exactly what will change in your database! Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudder Build Data Pipelines. Not DAGs. That’s the spirit behind Upsolver SQLake, a new self-service data pipeline platform that lets you build batch and streaming pipelines without falling into the black hole of DAG-based orchestration. All you do is write a query in SQL to declare your transformation, and SQLake will turn it into a continuous pipeline that scales to petabytes and delivers up to the minute fresh data. SQLake supports a broad set of transformations, including high-cardinality joins, aggregations, upserts and window operations. Output data can be streamed into a data lake for query engines like Presto, Trino or Spark SQL, a data warehouse like Snowflake or Redshift., or any other destination you choose. Pricing for SQLake is simple. You pay $99 per terabyte ingested into your data lake using SQLake, and run unlimited transformation pipelines for free. That way data engineers and data users can process to their heart’s content without worrying about their cloud bill. For data engineering podcast listeners, we’re offering a 30 day trial with unlimited data, so go to dataengineeringpodcast.com/upsolver today and see for yourself how to avoid DAG hell. Your host is Tobias Macey and today I'm interviewing Vishal Singh about his experience

An Exploration Of Tobias' Experience In Building A Data Lakehouse From Scratch

2022-12-26 · Data Engineering Podcast Listen

podcast_episode

by Scott Hirleman , Tobias Macey

Airflow BI Data Engineering Data Lakehouse Data Management dbt Kubernetes Monte Carlo MySQL PagerDuty postgresql

Summary

Five years of hosting the Data Engineering Podcast has provided Tobias Macey with a wealth of insight into the work of building and operating data systems at a variety of scales and for myriad purposes. In order to condense that acquired knowledge into a format that is useful to everyone Scott Hirleman turns the tables in this episode and asks Tobias about the tactical and strategic aspects of his experiences applying those lessons to the work of building a data platform from scratch.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you're ready to build your next pipeline, or want to test out the projects you hear about on the show, you'll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan's active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. Struggling with broken pipelines? Stale dashboards? Missing data? If this resonates with you, you’re not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the leading end-to-end Data Observability Platform! Trusted by the data teams at Fox, JetBlue, and PagerDuty, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, dbt models, Airflow jobs, and business intelligence tools, reducing time to detection and resolution from weeks to just minutes. Monte Carlo also gives you a holistic picture of data health with automatic, end-to-end lineage from ingestion to the BI layer directly out of the box. Start trusting your data with Monte Carlo today! Visit dataengineeringpodcast.com/montecarlo to learn more. Your host is Tobias Macey and today I'm being interviewed by Scott Hirleman about my work on the podcasts and my experience building a data platform

Interview

Introduction How did you get involved in the area of data management?

Data platform building journey

Why are you building, who are the users/use cases How to focus on doing what matters over cool tools How to build a good UX Anything surprising or did you discover anything you didn't expect at the start How to build so it's modular and can be improved in the future

General build vs buy and vendor selection process

Obviously have a good BS detector - how can others build theirs So many tools, where do you start - capability need, vendor suite offering, etc. Anything surprising in doing much of this at once How do you think about TCO in build versus buy Any advice

Guest call out

Be brave, believe you are good enough to be on the show Look at past episodes and don't pitch the same as what's been on recently And vendors, be smart, work with your customers to come up with a good pitch for them as guests...

Tobias' advice and learnings from building out a data platform:

Advice: when considering a tool, start from what are you act

Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems

2022-12-26 · Data Engineering Podcast Listen

podcast_episode

by Rishabh Poddar (Opaque Systems) , Tobias Macey

AI/ML Airflow Analytics CDP CI/CD Cloud Computing Data Analytics Data Engineering Data Lake Data Management Data Quality Datafold +13 more

Summary

Encryption and security are critical elements in data analytics and machine learning applications. We have well developed protocols and practices around data that is at rest and in motion, but security around data in use is still severely lacking. Recognizing this shortcoming and the capabilities that could be unlocked by a robust solution Rishabh Poddar helped to create Opaque Systems as an outgrowth of his PhD studies. In this episode he shares the work that he and his team have done to simplify integration of secure enclaves and trusted computing environments into analytical workflows and how you can start using it without re-engineering your existing systems.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you're ready to build your next pipeline, or want to test out the projects you hear about on the show, you'll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show! Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it’s often too late and damage is done. Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. No more shipping and praying, you can now know exactly what will change in your database! Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudder Build Data Pipelines. Not DAGs. That’s the spirit behind Upsolver SQLake, a new self-service data pipeline platform that lets you build batch and streaming pipelines without falling into the black hole of DAG-based orchestration. All you do is write a query in SQL to declare your transformation, and SQLake will turn it into a continuous pipeline that scales to petabytes and delivers up to the minute fresh data. SQLake supports a broad set of transformations, including high-cardinality joins, aggregations, upserts and window operations. Output data can be streamed into a data lake for query engines like Presto, Trino or Spark SQL, a data warehouse like Snowflake or Redshift., or any other destination you choose. Pricing for SQLake is simple. You pay $99 per terabyte ingested into your data lake using SQLake, and run unlimited transformation pipelines for free. That way data engineers and data users can process to their heart’s content without worrying about their cloud bill. For data engineering podcast listeners, we’re offering a 30 day trial with unlimited data, so go to dataengineeringpodcast.com/upsolver today an

talk-data.com

Activity Trend

Top Events

Top Speakers

Getting Started with MongoDB Atlas on Google Cloud

Make generative AI work: Best practices from data leaders

How Rent the Runway transforms garment management automation with MongoDB

#194 [Radar Recap] Scaling Data ROI: Driving Analytics Adoption Within Your Organization with Laura Gent Felker, Omar Khawaja and Tiffany Perkins-Munn

Predictive Auto-Scaling at MongoDB

The Complete Developer

RAG using Semantic Kernel with Azure OpenAI and Azure Cosmos DB for MongoDB vCore

Practical MongoDB Aggregations

Mastering MongoDB 7.0 - Fourth Edition

Mastering MongoDB 7.0 - Fourth Edition

Peter Farkas: Moving MongoDB Workloads to Postgres with FerretDB

Practical MongoDB Aggregations

Making the Shift to Application-Driven Intelligence

Processing Delta Lake Tables on AWS Using AWS Glue, Amazon Athena, and Amazon Redshift

Sponsored by: Striim | Powering a Delightful Travel Experience with a Real-Time Operational Data Hub

Creating the Right Developer Community for Your Company | AWS

Increase Your Odds Of Success For Analytics And AI Through More Effective Knowledge Management With AlignAI

Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams

An Exploration Of Tobias' Experience In Building A Data Lakehouse From Scratch

Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems