SQL

Python and SQL Bible

2024-06-14 · O'Reilly SQL Books O'Reilly Amazon

book

by Cuantum Technologies LLC

Data Management Python

The 'Python and SQL Bible' is a comprehensive guide to mastering both Python programming and SQL querying. Starting from the very basics, the book takes readers through advanced techniques, including data manipulation, database management, and integration of Python with SQL, all while offering hands-on examples and real-world exercises. What this Book will help me do Gain a strong foundation in Python programming, including control flow, functions, and object-oriented programming. Learn how to write advanced SQL queries for data extraction, manipulation, and reporting. Understand how to integrate Python with SQL to form a seamless data manipulation workflow. Develop data analysis skills using Python and tools such as SQLAlchemy for advanced insights. Master database administration techniques to efficiently manage and query datasets. Author(s) Cuantum Technologies LLC is a renowned tech education provider with a focus on equipping learners with in-demand programming and data management skills. Their training methods blend theory with practice, ensuring students gain hands-on experience applicable in professional environments. Their team of experts crafts content to cater to both beginners and professionals seeking to advance their skill set. Who is it for? This book is ideal for beginners who are new to programming and experienced professionals who wish to master Python and SQL for data manipulation and analysis. It is perfect for aspiring data scientists, software developers, and IT professionals looking to unlock new career opportunities. By detailing concepts and providing practical exercises, it accommodates various skill levels and prepares readers for industry demands.

Data Engineering with Databricks Cookbook

2024-05-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Pulkit Chadha

Big Data Cloud Computing Data Engineering Data Governance Databricks DataOps Delta DevOps Python Spark Data Streaming data +1 more

In "Data Engineering with Databricks Cookbook," you'll learn how to efficiently build and manage data pipelines using Apache Spark, Delta Lake, and Databricks. This recipe-based guide offers techniques to transform, optimize, and orchestrate your data workflows. What this Book will help me do Master Apache Spark for data ingestion, transformation, and analysis. Learn to optimize data processing and improve query performance with Delta Lake. Manage streaming data processing with Spark Structured Streaming capabilities. Implement DataOps and DevOps workflows tailored for Databricks. Enforce data governance policies using Unity Catalog for scalable solutions. Author(s) Pulkit Chadha, the author of this book, is a Senior Solutions Architect at Databricks. With extensive experience in data engineering and big data applications, he brings practical insights into implementing modern data solutions. His educational writings focus on empowering data professionals with actionable knowledge. Who is it for? This book is ideal for data engineers, data scientists, and analysts who want to deepen their knowledge in managing and transforming large datasets. Readers should have an intermediate understanding of SQL, Python programming, and basic data architecture concepts. It is especially well-suited for professionals working with Databricks or similar cloud-based data platforms.

The Ultimate Guide to Snowpark

2024-05-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Vivekanandan SS , Shankar Narayanan SGS

AI/ML Cloud Computing Data Engineering Data Science Python Snowflake data data-engineering

The Ultimate Guide to Snowpark serves as a comprehensive resource to help you master the Snowflake Snowpark framework using Python. You'll learn how to manage data engineering, data science, and data applications in Snowpark, coupled with practical implementations and examples. By following this guide, you'll gain the skills needed to efficiently process and analyze data in the Snowflake Data Cloud. What this Book will help me do Master Snowpark with Python for data engineering, data science, and data application workloads. Develop and deploy robust data pipelines using Snowpark in Python. Design, implement, and produce machine learning models using Snowpark. Learn to monetize and operationalize Snowflake-native applications. Effectively adopt Snowpark in production for scalable, efficient data solutions. Author(s) Shankar Narayanan SGS and Vivekanandan SS are experienced professionals in data engineering and Snowflake technologies. Shankar has extensive experience in utilizing Snowflake Snowpark to manage and enhance data solutions. Vivekanandan brings expertise in the intersection of Python programming and cloud-based data processing. Together, their combined knowledge and approachable writing style make this book an invaluable resource to readers. Who is it for? This book is designed for data engineers, data scientists, developers, and seasoned data practitioners. Ideal candidates are those looking to expand their skills in implementing Snowpark solutions using Python. A prior understanding of SQL, Python programming, and familiarity with Snowflake is beneficial for readers to fully leverage the techniques presented.

Data Migration Strategies For Large Scale Systems

2024-05-27 · Data Engineering Podcast Listen

podcast_episode

by Sriram Panyam , Tobias Macey

Cloud Computing Data Engineering Data Lake Data Lakehouse Data Management Delta Hive Iceberg Cyber Security Trino

Summary

Any software system that survives long enough will require some form of migration or evolution. When that system is responsible for the data layer the process becomes more challenging. Sriram Panyam has been involved in several projects that required migration of large volumes of data in high traffic environments. In this episode he shares some of the valuable lessons that he learned about how to make those projects successful.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake. Trusted by teams of all sizes, including Comcast and Doordash. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. This episode is supported by Code Comments, an original podcast from Red Hat. As someone who listens to the Data Engineering Podcast, you know that the road from tool selection to production readiness is anything but smooth or straight. In Code Comments, host Jamie Parker, Red Hatter and experienced engineer, shares the journey of technologists from across the industry and their hard-won lessons in implementing new technologies. I listened to the recent episode "Transforming Your Database" and appreciated the valuable advice on how to approach the selection and integration of new databases in applications and the impact on team dynamics. There are 3 seasons of great episodes and new ones landing everywhere you listen to podcasts. Search for "Code Commentst" in your podcast player or go to dataengineeringpodcast.com/codecomments today to subscribe. My thanks to the team at Code Comments for their support. Your host is Tobias Macey and today I'm interviewing Sriram Panyam about his experiences conducting large scale data migrations and the useful strategies that he learned in the process

Interview

Introduction How did you get involved in the area of data management? Can you start by sharing some of your experiences with data migration projects?

As you have gone through successive migration projects, how has that influenced the ways that you think about architecting data systems?

How would you categorize the different types and motivations of migrations?

How does the motivation for a migration influence the ways that you plan for and execute that work?

Can you talk us through one or two specific projects that you have taken part in? Part 1: The Triggers

Section 1: Technical Limitations triggering Data Migration

Scaling bottlenecks: Performance issues with databases, storage, or network infrastructure Legacy compatibility: Difficulties integrating with modern tools and cloud platforms System upgrades: The need to migrate data during major software changes (e.g., SQL Server version upgrade)

Section 2: Types of Migrations for Infrastructure Focus

Storage migration: Moving data between systems (HDD to SSD, SAN to NAS, etc.) Data center migration: Physical relocation or consolidation of data centers Virtualization migration: Moving from physical servers to virtual machines (or vice versa)

Section 3: Technical Decisions Driving Data Migrations

End-of-life support: Forced migration when older software or hardware is sunsetted Security and compliance: Adopting new platforms with better security postures Cost Optimization: Potential savings of cloud vs. on-premise data centers

Part 2: Challenges (and Anxieties)

Section 1: Technical Challenges

Data transformation challenges: Schema changes, complex data mappings Network bandwidth and latency: Transferring large datasets efficiently Performance tes

#51 Is Data Science a Lonely Profession?

2024-05-22 · DataTopics: All Things Data, AI & Tech Listen

podcast_episode

AI/ML Data Engineering Data Science GDPR/CCPA Git LLM

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society.

Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style!

In this episode: Slack's Data Practices: Discussing Slack's use of customer data to build models, the risks of global data leakage, and the impact of GDPR and AI regulations.ChatGPT's Data Analysis Improvements: Discussing new features in ChatGPT that let you interrogate your data like a pro. The Loneliness of Data Scientists: Why being a lone data wolf is tough, and how collaboration is the key to success. Rustworkx for Graph Computation: Evaluating Rustworkx as a robust tool for graphs compared to Networkx.Dolt - Git for Data: Comparing Dolt and DVC as tools for data version control. Check it out.Veo by Google DeepMind: An overview of Google's Veo technology and its potential applications.Ilya Sutskever’s Departure from OpenAI: What does Ilya Sutskever’s exit mean for OpenAI with Jakub Pachocki stepping in?Hot Takes - No Data Engineering Roadmap? Debating the necessity of a data engineering roadmap and the prominence of SQL skills.

Concept Of Database Management System by Pearson

2024-05-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Shefali Naik

Computer Science Oracle data data-engineering relational-databases

Concepts of Database Management System is designed to meet the syllabi requirements of undergraduate students of computer applications and computer science. It describes the concepts in an easy-to-understand language with sufficient number of examples. The overview of emerging trends in databases is thoroughly explained. A brief introduction to PL/SQL, MS-Access and Oracle is discussed to help students get a flavor of different types of database management systems.

Database Management Systems by Pearson

2024-05-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Rohit Khurana

DWH Cyber Security data data-engineering relational-databases

Express Learning is a series of books designed as quick reference guides to important undergraduate computer courses. The organized and accessible format of these books allows students to learn important concepts in an easy-to-understand, question-and-answer format. These portable learning tools have been designed as one-stop references for students to understand and master the subjects by themselves.

Features –

• Designed as a student-friendly self-learning guide. The book is written in a clear, concise, and lucid manner. • Easy-to-understand question-and-answer format. • Includes previously asked as well as new questions organized in chapters. • All types of questions including MCQs, short and long questions are covered. • Solutions to numerical questions asked at examinations are provided. • All ideas and concepts are presented with clear examples. • Text is well structured and well supported with suitable diagrams. • Inter-chapter dependencies are kept to a minimum

Book Contents –

1: Database System 2: Conceptual Modelling 3: Relational Model 4: Relational Algebra and Calculus 5: Structured Query Language 6: Relational Database Design 7: Data Storage and Indexing 8: Query Processing and Optimization 9: Introduction to Transaction Processing 10: Concurrency Control Techniques 11: Database Recovery System 12: Database Security 13: Database System Architecture 14: Data Warehousing, OLAP, and Data Mining 15: Information Retrieval 16: Miscellaneous Questions

#49 How Will the EU AI Act Affect the Future of AI?

2024-05-08 · DataTopics: All Things Data, AI & Tech Listen

podcast_episode

by Maryam Ilyas

AI/ML Databricks PySpark Python

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society. Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style! In this episode, we're joined by special guest Maryam Ilyas as we delve into a variety of topics that shape our digital world: Women’s Healthcare Insights: Exploring the Oura ring's commitment during Women's Health Awareness Month and its role in addressing the underrepresentation of female health conditions in research. A Deep Dive into the EU AI Act: Examining the AI Act’s implications, including its classification of AI systems (prohibited, high-risk, limited-risk, and minimal-risk), ethical concerns, regulatory challenges & the act's impact on AI usage, particularly regarding mass surveillance at the Paris Olympics.The Evolution of Music and AI: Reviewing the AI-generated music video for "The Hardest Part" by Washed Out, directed by Paul Trillo, showcasing AI’s growing role in the arts.Hot Takes on Data Tools: Is combining SQL, PySpark (and Python) in Databricks the most powerful tool in the data space? Let's dissect the possibilities and limitations.Don't forget to check us out on Youtube too, where you can find a lot more content beyond the podcast!

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

2024-05-05 · Data Engineering Podcast Listen

podcast_episode

by Peter Voss (Aigo) , Tobias Macey

AI/ML Analytics Cloud Computing Dagster Data Engineering Data Lake Data Lakehouse Delta Hudi Iceberg LLM Python +2 more

Summary Artificial intelligence has dominated the headlines for several months due to the successes of large language models. This has prompted numerous debates about the possibility of, and timeline for, artificial general intelligence (AGI). Peter Voss has dedicated decades of his life to the pursuit of truly intelligent software through the approach of cognitive AI. In this episode he explains his approach to building AI in a more human-like fashion and the emphasis on learning rather than statistical prediction. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementDagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free!Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.Your host is Tobias Macey and today I'm interviewing Peter Voss about what is involved in making your AI applications more "human"Interview IntroductionHow did you get involved in machine learning?Can you start by unpacking the idea of "human-like" AI? How does that contrast with the conception of "AGI"?The applications and limitations of GPT/LLM models have been dominating the popular conversation around AI. How do you see that impacting the overrall ecosystem of ML/AI applications and investment?The fundamental/foundational challenge of every AI use case is sourcing appropriate data. What are the strategies that you have found useful to acquire, evaluate, and prepare data at an appropriate scale to build high quality models? What are the opportunities and limitations of causal modeling techniques for generalized AI models?As AI systems gain more sophistication there is a challenge with establishing and maintaining trust. What are the risks involved in deploying more human-level AI systems and monitoring their reliability?What are the practical/architectural methods necessary to build more cognitive AI systems? How would you characterize the ecosystem of tools/frameworks available for creating, evolving, and maintaining these applications?What are the most interesting, innovative, or unexpected ways that you have seen cognitive AI applied?What are the most interesting, unexpected, or challenging lessons that you have learned while working on desiging/developing cognitive AI systems?When is cognitive AI the wrong choice?What do you have planned for the future of cognitive AI applications at Aigo?Contact Info LinkedInWebsiteParting Question From your perspective, what is the biggest barrier to adoption of machine learning today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.Links Aigo.aiArtificial General IntelligenceCognitive AIKnowledge GraphCausal ModelingBayesian StatisticsThinking Fast & Slow by Daniel Kahneman (affiliate link)Agent-Based ModelingReinforcement LearningDARPA 3 Waves of AI presentationWhy Don't We Have AGI Yet? whitepaperConcepts Is All You Need WhitepaperHellen KellerStephen HawkingThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

108: Data Analyst Mock Interview

2024-05-01 · Data Career Podcast: Helping You Land a Data Analyst Job FAST Listen

podcast_episode

by Joey , Avery Smith , Richard

AI/ML Analytics Data Analytics

In this episode, Avery conducts mock data analyst interview sessions with two participants, Richard and Joey, employing a newly developed tool called Interview Simulator.

The interview scenarios are designed to replicate real-life interviews. They aim to prepare aspiring data professionals for upcoming job interviews by showcasing examples of good practices and areas for improvement.

🧙‍♂️ Ace the Interview with Confidence

⁠

⁠📩 Get my weekly email with helpful data career tips⁠

⁠📊 Come to my next free “How to Land Your First Data Job” training⁠

⁠🏫 Check out my 10-week data analytics bootcamp

Timestamps:

(01:40) - Tell Me About Yourself (05:31) - Explain SQL Window Function (09:55) - How Many Meeting Rooms

Connect with Avery:

📺 Subscribe on YouTube

🎙Listen to My Podcast

👔 Connect with me on LinkedIn

📸 Instagram

🎵 TikTok

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

GitHub Copilot for SQL Development: Integrating the Power of AI into Database Development

2024-04-30 · Azure Developers - .NET Day 2024

talk

ai github copilot

Subhojit Basak demonstrates how GitHub Copilot can assist SQL development and AI-driven database workflows.

SQL All-in-One For Dummies, 4th Edition

2024-04-30 · O'Reilly SQL Books O'Reilly Amazon

book

by Richard Blum , Allen G. Taylor

The most thorough SQL reference, now updated for SQL:2023 SQL All-in-One For Dummies has everything you need to get started with the SQL programming language, and then to level up your skill with advanced applications. This relational database coding language is one of the most used languages in professional software development. And, as it becomes ever more important to take control of data, there’s no end in sight to the need for SQL know-how. You can take your career to the next level with this guide to creating databases, accessing and editing data, protecting data from corruption, and integrating SQL with other languages in a programming environment. Become a SQL guru and turn the page on the next chapter of your coding career. Get 7 mini-books in one, covering basic SQL, database development, and advanced SQL concepts Read clear explanations of SQL code and learn to write complex queries Discover how to apply SQL in real-world situations to gain control over large datasets Enjoy a thorough reference to common tasks and issues in SQL development This Dummies All-in-One guide is for all SQL users—from beginners to more experienced programmers. Find the info and the examples you need to reach the next stage in your SQL journey.

Build Your Second Brain One Piece At A Time

2024-04-28 · Data Engineering Podcast Listen

podcast_episode

by Tsavo Knott (Pieces) , Tobias Macey

AI/ML Analytics Cloud Computing Dagster Data Collection Data Engineering Data Lake Data Lakehouse Delta GenAI Hudi Iceberg +3 more

Summary Generative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collection of tools that complements the tools that developers already use. In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementDagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free!Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.Your host is Tobias Macey and today I'm interviewing Tsavo Knott about Pieces, a personal AI toolkit to improve the efficiency of developersInterview IntroductionHow did you get involved in machine learning?Can you describe what Pieces is and the story behind it?The past few months have seen an endless series of personalized AI tools launched. What are the features and focus of Pieces that might encourage someone to use it over the alternatives?model selectionsarchitecture of Pieces applicationlocal vs. hybrid vs. online modelsmodel update/delivery processdata preparation/serving for models in context of Pieces appapplication of AI to developer workflowstypes of workflows that people are building with piecesWhat are the most interesting, innovative, or unexpected ways that you have seen Pieces used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Pieces?When is Pieces the wrong choice?What do you have planned for the future of Pieces?Contact Info LinkedInParting Question From your perspective, what is the biggest barrier to adoption of machine learning today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.Links PiecesNPU == Neural Processing UnitTensor ChipLoRA == Low Rank AdaptationGenerative Adversarial NetworksMistralEmacsVimNeoVimDartFlutte

Learn SQL using MySQL in One Day and Learn It Well

2024-04-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jamie Chan

MySQL data data-engineering relational-databases

"Learn SQL using MySQL in One Day and Learn It Well" is your hands-on guide to mastering SQL efficiently using MySQL. This book takes you from understanding basic database concepts to executing advanced queries and implementing essential features like triggers and routines. With a project-based approach, you will confidently manage databases and unlock the potential of data. What this Book will help me do Understand database concepts and relational data architecture. Design and define tables to organize and store data effectively. Perform advanced SQL queries to manipulate and analyze data efficiently. Implement database triggers, views, and routines for advanced management. Apply practical skills in SQL through a comprehensive hands-on project. Author(s) Jamie Chan is a professional instructor and technical writer with extensive experience in database management and software development. Known for a clear and engaging teaching style, Jamie has authored numerous books focusing on hands-on learning. Jamie approaches pedagogy with the goal of making technical subjects accessible and practical for all learners. Who is it for? This book is designed for beginners eager to learn SQL and MySQL from scratch. It is perfect for professionals or students who want relevant and actionable skills in database management. Whether you're looking to enhance career prospects or leverage database tools for personal projects, this book is your practical starting point. Basic computer literacy is all that's needed.

#201 The Database is the Operating System with Mike Stonebraker, CTO & Co-Founder At DBOS

2024-04-25 · DataFramed Listen

podcast_episode

by Mike Stonebraker (DBOS)

AI/ML Analytics Cloud Computing Computer Science Data Management RDBMS Cyber Security Vertica postgresql

Databases are ubiquitous, and you don’t need to be a data practitioner to know that all data everywhere is stored in a database—or is it? While the majority of data around the world lives in a database, the data that helps run the heart of our operating systems—the core functions of our computers— is not stored in the same place as everywhere else. This is due to database storage sitting ‘above’ the operating system, requiring the OS to run before the databases can be used. But what if the OS was built ‘on top’ of a database? What difference could this fundamental change make to how we use computers? Mike Stonebraker is a distinguished computer scientist known for his foundational work in database systems, he is also currently CTO & Co-Founder At DBOS. His extensive career includes significant contributions through academic prototypes and commercial startups, leading to the creation of several pivotal relational database companies such as Ingres Corporation, Illustra, Paradigm4, StreamBase Systems, Tamr, Vertica, and VoltDB. Stonebraker's role as chief technical officer at Informix and his influential research earned him the prestigious 2014 Turing Award. Stonebraker's professional journey spans two major phases: initially at the University of California, Berkeley, focusing on relational database management systems like Ingres and Postgres, and later, from 2001 at the Massachusetts Institute of Technology (MIT), where he pioneered advanced data management techniques including C-Store, H-Store, SciDB, and DBOS. He remains a professor emeritus at UC Berkeley and continues to influence as an adjunct professor at MIT’s Computer Science and Artificial Intelligence Laboratory. Stonebraker is also recognized for his editorial work on the book "Readings in Database Systems." In the episode, Richie and Mike explore the the success of PostgreSQL, the evolution of SQL databases, the shift towards cloud computing and what that means in practice when migrating to the cloud, the impact of disaggregated storage, software and serverless trends, the role of databases in facilitating new data and AI trends, DBOS and it’s advantages for security, and much more. Links Mentioned in the Show: DBOSPaper: What Goes Around Comes Around[Course] Understanding Cloud ComputingRelated Episode: Scaling Enterprise Analytics with Libby Duane Adams, Chief Advocacy Officer and Co-Founder of AlteryxRewatch sessions from RADAR: The Analytics Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

#200 50 Years of SQL with Don Chamberlin, Computer Scientist and Co-Inventor of SQL

2024-04-22 · DataFramed Listen

podcast_episode

by Don Chamberlin (IBM) , Richie (DataCamp)

AI/ML Analytics IBM NoSQL RDBMS XML

Over the past 199 episodes of DataFramed, we’ve heard from people at the forefront of data and AI, and over the past year we’ve constantly looked ahead to the future AI might bring. But all of the technologies and ways of working we’ve witnessed have been built on foundations that were laid decades ago. For our 200th episode, we’re bringing you a special guest and taking a walk down memory lane—to the creation and development of one of the most popular programming languages in the world. Don Chamberlin is renowned as the co-inventor of SQL (Structured Query Language), the predominant database language globally, which he developed with Raymond Boyce in the mid-1970s. Chamberlin's professional career began at IBM Research in Yorktown Heights, New York, following a summer internship there during his academic years. His work on IBM's System R project led to the first SQL implementation and significantly advanced IBM’s relational database technology. His contributions were recognized when he was made an IBM Fellow in 2003 and later a Fellow of the Computer History Museum in 2009 for his pioneering work on SQL and database architectures. Chamberlin also contributed to the development of XQuery, an XML query language, as part of the W3C, which became a W3C Recommendation in January 2007. Additionally, he holds fellowships with ACM and IEEE and is a member of the National Academy of Engineering. In the episode, Richie and Don explore his early career at IBM and the development of his interest in databases alongside Ray Boyce, the database task group (DBTG), the transition to relational databases and the early development of SQL, the commercialization and adoption of SQL, how it became standardized, how it evolved and spread via open source, the future of SQL through NoSQL and SQL++ and much more. Links Mentioned in the Show: The first-ever journal paper on SQL. SEQUEL: A Structured English Query LanguageDon’s Book: SQL++ for SQL Users: A TutorialSystem R: Relational approach to database managementSQL CoursesSQL Articles, Tutorials and Code-AlongsRelated Episode: Scaling Enterprise Analytics with Libby Duane Adams, Chief Advocacy Officer and Co-Founder of AlteryxRewatch sessions from RADAR: The Analytics Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

Making Email Better With AI At Shortwave

2024-04-21 · Data Engineering Podcast Listen

podcast_episode

by Andrew Lee (Shortwave) , Tobias Macey

AI/ML Analytics Cloud Computing Dagster Data Engineering Data Lake Data Lakehouse Data Management Delta GenAI Hudi Iceberg +3 more

Summary

Generative AI has rapidly transformed everything in the technology sector. When Andrew Lee started work on Shortwave he was focused on making email more productive. When AI started gaining adoption he realized that he had even more potential for a transformative experience. In this episode he shares the technical challenges that he and his team have overcome in integrating AI into their product, as well as the benefits and features that it provides to their customers.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macey and today I'm interviewing Andrew Lee about his work on Shortwave, an AI powered email client

Interview

Introduction How did you get involved in the area of data management? Can you describe what Shortwave is and the story behind it?

What is the core problem that you are addressing with Shortwave?

Email has been a central part of communication and business productivity for decades now. What are the overall themes that continue to be problematic? What are the strengths that email maintains as a protocol and ecosystem? From a product perspective, what are the data challenges that are posed by email? Can you describe how you have architected the Shortwave platform?

How have the design and goals of the product changed since you started it? What are the ways that the advent and evolution of language models have influenced your product roadmap?

How do you manage the personalization of the AI functionality in your system for each user/team? For users and teams who are using Shortwave, how does it change their workflow and communication patterns? Can you describe how I would use Shortwave for managing the workflow of evaluating, planning, and promoting my podcast episodes? What are the most interesting, innovative, or unexpected ways that you have seen Shortwave used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Shortwave? When is Shortwave the wrong choice? What do you have planned for the future of Shortwave?

Contact Info

LinkedIn Blog

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with mach

Designing A Non-Relational Database Engine

2024-04-14 · Data Engineering Podcast Listen

podcast_episode

by Oren Eini (RavenDB) , Tobias Macey

AI/ML Analytics Cloud Computing Dagster Data Engineering Data Lake Data Lakehouse Data Management Data Quality Datafold dbt Delta +5 more

Summary

Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication. Leverage Datafold's fast cross-database data diffing and Monitoring to test your replication pipelines automatically and continuously. Validate consistency between source and target at any scale, and receive alerts about any discrepancies. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold. Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macey and today I'm interviewing Oren Eini about the work of designing and building a NoSQL database engine

Interview

Introduction How did you get involved in the area of data management? Can you describe what constitutes a NoSQL database?

How have the requirements and applications of NoSQL engines changed since they first became popular ~15 years ago?

What are the factors that convince teams to use a NoSQL vs. SQL database?

NoSQL is a generalized term that encompasses a number of different data models. How does the underlying representation (e.g. document, K/V, graph) change that calculus?

How have the evolution in data formats (e.g. N-dimensional vectors, point clouds, etc.) changed the landscape for NoSQL engines? When designing and building a database, what are the initial set of questions that need to be answered?

How many "core capabilities" can you reasonably design around before they conflict with each other?

How have you approached the evolution of RavenDB as you add new capabilities and mature the project?

What are some of the early decisions that had to be unwound to enable new capabilities?

If you were to start from scratch today, what database would you build? What are the most interesting, innovative, or unexpected ways that you have seen RavenDB/NoSQL databases used? What are the most interesting, unexpected, or challenging lessons t

Build continuous data and AI pipelines with BigQuery continuous queries

2024-04-11 · Google Cloud Next '24

session

by Nick Orlove (Google Cloud) , Pavan Edara (Google Cloud) , Pinaki Mitra (UPS)

AI/ML Big Data BigQuery Cloud Computing Data Engineering GCP Pub/Sub

Learn about real-time AI-powered insights with BigQuery continuous queries, and how this new feature is poised to revolutionize data engineering by empowering event-driven and AI-driven data pipelines with Vertex AI, Pub/Sub, and Bigtable – all through the familiar language of Cloud SQL. Learn about how UPS was able to use big data on millions of shipped packages to reduce package theft, their work on more efficient claims processing, and why they are looking to BigQuery to accelerate time to insights and smarter business outcomes.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Harnessing large language models in BigQuery to activate your data

2024-04-11 · Google Cloud Next '24

session

by Michael Kilberry (Google Cloud) , Deb Lee (Google Cloud) , Michael Chin (Unilever)

AI/ML Analytics BigQuery Cloud Computing GCP GenAI LLM

Join us to learn how to activate the full potential of your data with AI in BigQuery. Take an in-depth look at how BigQuery's core integration with generative AI models like Gemini, coupled with its petabyte-scale analytics capabilities, enables new possibilities for gaining insights from your data. Learn how to derive insights from your untapped and unstructured data such as images, documents, and audio files, and explore BigQuery vector search and multi-modal embeddings, all powered by Google's industry-leading AI capabilities in BigQuery using simple Cloud SQL queries. You will also learn how Unilever is creating a data strategy that allows data teams to scale efficiently and rapidly experiment with AI models and gen AI use cases.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

talk-data.com

Activity Trend

Top Events

Top Speakers

Python and SQL Bible

Data Engineering with Databricks Cookbook

The Ultimate Guide to Snowpark

Data Migration Strategies For Large Scale Systems

#51 Is Data Science a Lonely Profession?

Concept Of Database Management System by Pearson

Database Management Systems by Pearson

#49 How Will the EU AI Act Affect the Future of AI?

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

108: Data Analyst Mock Interview

GitHub Copilot for SQL Development: Integrating the Power of AI into Database Development

SQL All-in-One For Dummies, 4th Edition

Build Your Second Brain One Piece At A Time

Learn SQL using MySQL in One Day and Learn It Well

#201 The Database is the Operating System with Mike Stonebraker, CTO & Co-Founder At DBOS

#200 50 Years of SQL with Don Chamberlin, Computer Scientist and Co-Inventor of SQL

Making Email Better With AI At Shortwave

Designing A Non-Relational Database Engine

Build continuous data and AI pipelines with BigQuery continuous queries

Harnessing large language models in BigQuery to activate your data