SQL

Extreme Self-Service: Turning Data Consumers into Data Constructors | Whatnot

2023-05-11 · Data Council 2023 Watch

video

by Alice Leach (Whatnot)

AI/ML Analytics Data Engineering Modern Data Stack

ABOUT THE TALK: Small data teams face supply and demand problems. Triaging and prioritizing data work can be overwhelming. But what if data consumers could create their own products with minimal training?

Learn how to empower data consumers without disrupting others. Discover lessons from an 'extreme' self-service analytics approach: best practices, fostering a data community, promoting SQL literacy, and establishing solid guard rails.

ABOUT THE SPEAKER: Alice Leach is a Data Engineer at Whatnot Inc., a live stream platform and marketplace that enables collectors and enthusiasts to connect, buy, and sell verified products. She transitioned from academia to data in 2021, working first as a data scientist then data engineer. Her current work at Whatnot focuses on designing and building robust, self-service data workflows using a modern data stack.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil

Resilient Oracle PL/SQL

2023-05-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Stephen B. Morris

Cloud Computing Oracle data data-engineering pl-sql pl/sql

As legacy and other critical systems continue to migrate online, the need for continuous operation is imperative. Code has to handle data issues as well as hard external problems today, including outages of networks, storage systems, power, and ancillary systems. This practical guide provides system administrators, DevSecOps engineers, and cloud architects with a concise yet comprehensive overview on how to use PL/SQL to develop resilient database solutions. Integration specialist Stephen B Morris helps you understand the language, build a PL/SQL toolkit, and collect a suite of reusable components and patterns. You'll dive into the benefits of synthesizing the toolkit with a requirements-driven, feature-oriented approach and learn how to produce resilient solutions by synthesizing the PL/SQL toolkit in conjunction with a scale of resilience. Build solid PL/SQL solutions while avoiding common PL/SQL antipatterns Learn why embedding complex business logic in SQL is often a brittle proposition Learn how to recognize and improve weak PL/SQL code Verify PL/SQL code by running data-driven, in-database tests Understand the safe operation, maintenance, and modification of complex PL/SQL systems Learn the benefits of thinking about features rather than just use cases Define good requirements for PL/SQL and hybrid solutions involving PL/SQL and high level languages

Job Ready SQL

2023-05-09 · O'Reilly SQL Books O'Reilly Amazon

book

by Haythem Balti , Kimberly A. Weiss

AI/ML BI Cloud Computing Data Engineering Data Science

Learn the most important SQL skills and apply them in your job—quickly and efficiently! SQL (Structured Query Language) is the modern language that almost every relational database system supports for adding data, retrieving data, and modifying data in a database. Although basic visual tools are available to help end-users input common commands, data scientists, business intelligence analysts, Cloud engineers, Machine Learning programmers, and other professionals routinely need to query a database using SQL. Job Ready SQL provides you with the foundational skills necessary to work with data of any kind. Offering a straightforward ‘learn-by-doing’ approach, this concise and highly practical guide teaches you all the basics of SQL so you can apply your knowledge in real-world environments immediately. Throughout the book, each lesson includes clear explanations of key concepts and hands-on exercises that mirror real-world SQL tasks. Teaches the basics of SQL database creation and management using easy-to-understand language Helps readers develop an understanding of fundamental concepts and more advanced applications such as data engineering and data science Discusses the key types of SQL commands, including Data Definition Language (DDL) commands and Data Manipulation Language (DML) commands Includes useful reference information on querying SQL-based databases Job Ready SQL is a must-have resource for students and working professionals looking to quickly get up to speed with SQL and take their relational database skills to the next level.

SQL Server 2022 Administration Inside Out

2023-05-08 · O'Reilly SQL Books O'Reilly Amazon

book

by William Assaf , Elizabeth Noble , Deepthi Goguri , Meagan Longoria , Melody Zacharias , Randolph West , Joseph D'Antoni , Louis Davidson

Azure Cloud Computing Kubernetes Linux Microsoft PowerShell SQL Server microsoft sql server

Conquer SQL Server 2022 and Azure SQL administration from the inside out! Dive into SQL Server 2022 administration and grow your Microsoft SQL Server data platform skillset. This well-organized reference packs in timesaving solutions, tips, and workarounds, all you need to plan, implement, deploy, provision, manage, and secure SQL Server 2022 in any environment: on-premises, cloud, or hybrid, including detailed, dedicated chapters on Azure SQL Database and Azure SQL Managed Instance. Nine experts thoroughly tour DBA capabilities available in the SQL Server 2022 Database Engine, SQL Server Data Tools, SQL Server Management Studio, PowerShell, and much more. Youll find extensive new coverage of Azure SQL Database and Azure SQL Managed Instance, both as a cloud platform of SQL Server and in their new integrations with SQL Server 2022, information available in no other book. Discover how experts tackle todays essential tasks and challenge yourself to new levels of mastery. Identify low-hanging fruit and practical, easy wins for improving SQL Server administration Get started with modern SQL Server tools, including SQL Server Management Studio, and Azure Data Studio Upgrade your SQL Server administration skillset to new features of SQL Server 2022, Azure SQL Database, Azure SQL Managed Instance, and SQL Server on Linux Design and implement modern on-premises database infrastructure, including Kubernetes Leverage data virtualization of third-party or non-relational data sources Monitor SQL instances for corruption, index activity, fragmentation, and extended events Automate maintenance plans, database mail, jobs, alerts, proxies, and event forwarding Protect data through encryption, privacy, and auditing Provision, manage, scale and secure, and bidirectionally synchronize Microsofts powerful Azure SQL Managed Instance Understand and enable new Intelligent Query Processing features to increase query concurrency Prepare a best-practice runbook for disaster recovery Use SQL Server 2022 features to span infrastructure across hybrid environments ...

Transitioning to Microsoft Power Platform: An Excel User Guide to Building Integrated Cloud Applications in Power BI, Power Apps, and Power Automate

2023-05-02 · O'Reilly Data Science Books O'Reilly Amazon

book

by David Ding

Analytics BI Cloud Computing CRM Dashboard DataViz ETL/ELT Microsoft Power BI business-intelligence data data-science +2 more

Welcome to this step-by-step guide for Excel users, data analysts, and finance specialists. It is designed to take you through practical report and development scenarios, including both the approach and the technical challenges. This book will equip you with an understanding of the overall Power Platform use case for addressing common business challenges. While Power BI continues to be an excellent tool of choice in the BI space, Power Platform is the real game changer. Using an integrated architecture, a small team of citizen developers can build solutions for all kinds of business problems. For small businesses, Power Platform can be used to build bespoke CRM, Finance, and Warehouse management tools. For large businesses, it can be used to build an integration point for existing systems to simplify reporting, operation, and approval processes. The author has drawn on his15 years of hands-on analytics experience to help you pivot from the traditional Excel-based reporting environment. By using different business scenarios, this book provides you with clear reasons why a skill is important before you start to dive into the scenarios. You will use a fast prototyping approach to continue to build exciting reporting, automation, and application solutions and improve them while you acquire new skill sets. The book helps you get started quickly with Power BI. It covers data visualization, collaboration, and governance practices. You will learn about the most practical SQL challenges. And you will learn how to build applications in PowerApps and Power Automate. The book ends with an integrated solution framework that can be adapted to solve a wide range of complex business problems. What You Will Learn Develop reporting solutions and business applications Understand the Power Platform licensing and development environment Apply Data ETL and modeling in Power BI Use Data Storytelling and dashboard design to better visualize data Carry out data operations with SQL and SharePoint lists Develop useful applications using Power Apps Develop automated workflows using Power Automate Integrate solutions with Power BI, Power Apps, and Power Automate to build enterprise solutions Who This Book Is For Next-generation data specialists, including Excel-based users who want to learn Power BI and build internal apps; finance specialists who want to take a different approach to traditional accounting reports; and anyone who wants to enhance their skill set for the future job market.

T-SQL Fundamentals, 4th Edition

2023-04-25 · O'Reilly SQL Books O'Reilly Amazon

book

by Itzik Ben-Gan

Starting with the background to T-SQL querying and programming, including: logical query processing, book-querying constructs (single table queries, joins, subqueries, table expressions, set operators, data analysis), data modifications, temporal tables, transactions and concurrency, SQL Graph (completely new to this edition), as well as programmatic T-SQL constructs. The book includes extensive exercises and solutions with explanations, allowing the reader to practice what they've learned. This book is widely considered as the authoritative guide on T-SQL fundamentals. It focuses on understanding why things work the way they do, and not just how to make them work. When people understand the "why" the code they write tends to be more correct and more meaningful. This edition of the book includes coverage of the newest T-SQL additions up to and including SQL Server 2022. ...

The Spark of Big Data: An Introduction to Apache Spark

2023-04-19 · PyConDE & PyData Berlin 2023

talk

by Pasha Finkelshteyn (Bellsoft)

API Big Data PySpark Python Spark

Get ready to level up your big data processing skills! Join us for an introductory talk on Apache Spark, the distributed computing system used by tech giants like Netflix and Amazon. We'll cover PySpark DataFrames and how to use them. Whether you're a Python developer new to big data or looking to explore new technologies, this talk is for you. You'll gain foundational knowledge about Apache Spark and its capabilities, and learn how to leverage DataFrames and SQL APIs to efficiently process large amounts of data. Don't miss out on this opportunity to up your big data game!

Use Spark from anywhere: A Spark client in Python powered by Spark Connect

2023-04-18 · PyConDE & PyData Berlin 2023

talk

by Martin Grund (Databricks)

API Python Spark

Over the past decade, developers, researchers, and the community have successfully built tens of thousands of data applications using Spark. Since then, use cases and requirements of data applications have evolved: Today, every application, from web services that run in application servers, interactive environments such as notebooks and IDEs, to phones and edge devices such as smart home devices, want to leverage the power of data.

However, Spark's driver architecture is monolithic, running client applications on top of a scheduler, optimizer and analyzer. This architecture makes it hard to address these new requirements: there is no built-in capability to remotely connect to a Spark cluster from languages other than SQL.

Spark Connect introduces a decoupled client-server architecture for Apache Spark that allows remote connectivity to Spark clusters using the DataFrame API and unresolved logical plans as the protocol. The separation between client and server allows Spark and its open ecosystem to be leveraged from everywhere. It can be embedded in modern data applications, in IDEs, Notebooks and programming languages.

This talk highlights how simple it is to connect to Spark using Spark Connect from any data applications or IDEs. We will do a deep dive into the architecture of Spark Connect and give an outlook of how the community can participate in the extension of Spark Connect for new programming languages and frameworks - to bring the power of Spark everywhere.

Julia, Pedram Navid + Taylor Murphy Recap Data Council

2023-04-07 · The Analytics Engineering Podcast Listen

podcast_episode

by Pedram Navid (West Marin Data) , Julia Schottenstein (dbt labs) , Taylor Murphy (Meltano)

AI/ML Modern Data Stack Meltano Data Streaming

Julia just got back from Data Council in Austin, a conference organized by Pete Sonderling, where lots of startups share what they're building, data practitioners go to learn in hands-on workshops, and of course investors go to spot the next big trend. In this episode, Taylor Murphy (Head of Product & Data at Meltano) + Pedram Navid (Founder, West Marin Data) join Julia to recap the conference and have a bit of fun. They talked streaming, how the MDS is growing up, new SQL variants, and, of course, AI. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com.

Modern analytics on the lakehouse using Databricks SQL [Sponsored by Databricks]

2023-04-05 · Modern Data Stack Conference 2023

talk

by Roberto Salcido (Databricks) , Ameek Singh

Analytics Data Lakehouse Databricks

Practical Business Analytics Using R and Python: Solve Business Problems Using a Data-driven Approach

2023-04-03 · O'Reilly Data Science Books O'Reilly Amazon

book

by Umesh R. Hodeghatta , Umesha Nayak

AI/ML Analytics Big Data Data Analytics NLP NumPy Pandas Python data data-science data-science-tools r

This book illustrates how data can be useful in solving business problems. It explores various analytics techniques for using data to discover hidden patterns and relationships, predict future outcomes, optimize efficiency and improve the performance of organizations. You’ll learn how to analyze data by applying concepts of statistics, probability theory, and linear algebra. In this new edition, both R and Python are used to demonstrate these analyses. Practical Business Analytics Using R and Python also features new chapters covering databases, SQL, Neural networks, Text Analytics, and Natural Language Processing.Part one begins with an introduction to analytics, the foundations required to perform data analytics, and explains different analytics terms and concepts such as databases and SQL, basic statistics, probability theory, and data exploration. Part two introduces predictive models using statistical machine learning and discusses concepts like regression, classification, and neural networks. Part three covers two of the most popular unsupervised learning techniques, clustering and association mining, as well as text mining and natural language processing (NLP). The book concludes with an overview of big data analytics, R and Python essentials for analytics including libraries such as pandas and NumPy. Upon completing this book, you will understand how to improve business outcomes by leveraging R and Python for data analytics. What You Will Learn Master the mathematical foundations required for business analytics Understand various analytics models and data mining techniques such as regression, supervised machine learning algorithms for modeling, unsupervised modeling techniques, and how to choose the correct algorithm for analysis in any given task Use R and Python to develop descriptive models, predictive models, and optimize models Interpret and recommend actions based on analytical model outcomes Who This Book Is For Software professionals and developers, managers, and executives who want to understand and learn the fundamentals of analytics using R and Python.

SQL Query Design Patterns and Best Practices

2023-03-31 · O'Reilly SQL Books O'Reilly Amazon

book

by Ram Babu Singh , Dennis Neer , Chi Zhang , Steven Hughes , Leslie Andrews , Shabbir Mala

JSON

This book, "SQL Query Design Patterns and Best Practices," provides a thorough and practical exploration of SQL query techniques that focus on efficiency, clarity, and maintainability. By learning and applying the patterns and methodologies outlined in this guide, readers will enhance their ability to solve complex data problems, optimize query performance, and work effectively with modern data platforms. What this Book will help me do Learn how to write efficient SQL queries by optimizing result sets, enabling better handling of large datasets. Master advanced SQL techniques, including the use of common table expressions and window functions for solving real-world challenges. Gain insights into query optimization with tools like query execution plans and indexes to boost database performance. Understand the interaction of modern data types such as JSON and their integration in SQL workflows. Organize and document SQL queries effectively using tools such as Jupyter notebooks for collaborative and reproducible work. Author(s) The authors of this book are seasoned SQL professionals with rich experience in database management and optimization. Steven Hughes, Ram Babu Singh, Shabbir Mala, and others bring their deep knowledge in SQL query practices and real-world problem-solving to this guide. With a commitment to helping readers harness the power of SQL, the authors have crafted content that is both technically robust and approachable for learners. Who is it for? This book is designed for SQL practitioners ranging from beginners to intermediate levels, including SQL developers, data analysts, and database enthusiasts. If you aim to refine your SQL skills, optimize query designs, and tackle complex data challenges, this book will serve as your reliable companion. Novice readers will quickly build a strong foundation, while experienced professionals will find valuable techniques to enhance their workflow.

Ami Gal, CEO & Co-founder at SQream. We dive deep into Big SQL analytics powered by GPUs, plus the future of compute.

2023-03-29 · Making Data Simple Listen

podcast_episode

by Ami Gal (SQream) , Al Martin (IBM)

Analytics Hadoop IBM

Send us a text Ami Gal, CEO & Co-founder at SQream. We dive deep into Big SQL analytics powered by GPUs, plus the future of compute. 02:20 Meet Ami Gal04:52 What's in a name? sqream.com08:10 Problem being solved13:53 The secret sauce : data flow16:52 Software or HW for scale20:47 Secret sauce take 225:02 Hadoop, future of27:52 Hybrid cloud31:31 Go-to-market35:09 The next 5 years of compute39:18 Ok, next 20 years44:17 For funLinkedIn: linkedin.com/in/galami Website: sqream.com Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Azure SQL Hyperscale Revealed: High-performance Scalable Solutions for Critical Data Workloads

2023-03-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Daniel Scott-Raynsford , Zoran Barać

Azure Cloud Computing Microsoft azure-sql-database data data-engineering relational-databases

Take a deep dive into the Azure SQL Database Hyperscale Service Tier and discover a new form of cloud architecture from Microsoft that supports massive databases. The new horizontally scalable architecture, formerly code-named Socrates, allows you to decouple compute nodes from storage layers. This radically different approach dramatically increases the scalability of the service. This book shows you how to leverage Hyperscale to provide next-level scalability, high throughput, and fast performance from large databases in your environment. The book begins by showing how Hyperscale helps you eliminate many of the problems of traditional high-availability and disaster recovery architecture. You’ll learn how Hyperscale overcomes storage capacity limitations and issues with scale-up times and costs. With Hyperscale, your costs do not increase linearly with database size and you can manage more data than ever at a lower cost. The book teaches you how todeploy, configure, and monitor an Azure SQL Hyperscale database in a production environment. The book also covers migrating your current workloads from traditional architecture to Azure SQL Hyperscale. What You Will Learn Understand the advantages of Hyperscale over traditional architecture Deploy a Hyperscale database on the Azure cloud (interactively and with code) Configure the advanced features of the Hyperscale database tier Monitor and scale database performance to suit your needs Back up and restore your Azure SQL Hyperscale databases Implement disaster recovery and failover capability Compare performance of Hyperscale vs traditional architecture Migrate existing databases to the Hyperscale service tier Who This Book Is For SQL architects, data engineers, and DBAs who want the most efficient and cost-effective cloud technologies to run their critical data workloads, and those seeking rapid scalability and high performance and throughput while utilizing large databases

50: FREE 5-Step SQL Course & Project

2023-03-15 · Data Career Podcast: Helping You Land a Data Analyst Job FAST Listen

podcast_episode

by Avery Smith

AI/ML Analytics CSV Data Analytics

📤 In this episode, Avery’s going to walk you through how you can teach yourself SQL for FREE with this awesome 5-step course.

🌟 Join the data project club!

“25OFF” to get 25% off (first 50 members).

📊 Come to my next free “How to Land Your First Data Job” training 🏫 Check out my 10-week data analytics bootcamp

Timestamps:

(0:24) - What is SQL?

(1:08) - Step 1: Download Datasets

(2:03) - What is CSV files?

(2:44) - Step 2: Setup SQL environment with the dataset

(3:37) - Step 3: Learn SQL for free with W3Schools

(4:50) - Step 4: Come up w/ probing questions for your data

(6:09) - Step 5: Write up your findings

(7:00) - Project Write-up Platform

Mentioned Links:

Kaggle: https://www.kaggle.com/datasets

bit.io: https://bit.io/

W3Schools: https://www.w3schools.com/sql/

Connect with Avery:

📺 Subscribe on YouTube

🎙Listen to My Podcast

👔 Connect with me on LinkedIn

📸 Instagram

🎵 TikTok

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Expert Performance Indexing in Azure SQL and SQL Server 2022: Toward Faster Results and Lower Maintenance Both on Premises and in the Cloud

2023-02-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Edward Pollack , Jason Strate

Azure Cloud Computing XML data data-engineering microsoft-sql-server relational-databases

Take a deep dive into perhaps the single most important facet of query performance—indexes—and how to best use them. Newly updated for SQL Server 2022 and Azure SQL, this fourth edition includes new guidance and features related to columnstore indexes, improved and consolidated content on Query Store, deeper content around Intelligent Query Processing, and other updates to help you optimize query execution and make performance improvements to even the most challenging workloads. The book begins with explanations of the types of indexes and how they are stored in a database. Moving further into the book, you will learn how statistics are critical for optimal index usage and how the Index Advisor can assist in reviewing and optimizing index health. This book helps you build a clear understanding of how indexes work, how to implement and use them, and the many options available to tame even the most large and complex workloads. What You Will Learn Properly index row store, columnstore, and memory-optimized tables Make use of Intelligent Query Processing for faster query results Review statistics to understand indexing choices made by the optimizer Apply indexing strategies such as covering indexes, included columns, and index intersections Recognize and remove unnecessary indexes Design effective indexes for full-text, spatial, and XML data types Who This Book Is For Azure SQL and SQL Server administrators and developers who are ready to improve the performance of their database environment by thoughtfully building indexes to speed up queries that matter the most and make a difference to the business

Modern Oracle Database Programming: Level Up Your Skill Set to Oracle's Latest and Most Powerful Features in SQL, PL/SQL, and JSON

2023-02-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Patrick Barel , Alex Nuijten

JSON Oracle data data-engineering pl-sql pl/sql

Level up your skill set to the latest that Oracle Database can offer. This book introduces features that are not well known that can transform your development efforts. You’ll discover built-in functionality that can save you massive amounts of time that otherwise would be spent reinventing the wheel. You’ll find that what used to take a lot of programming some years ago can be done with less code in a more reliable way today. Anyone using Oracle Database without the knowledge in this book is leaving valuable functionality–that their company has paid for–on the table, and this book opens the door to that functionality so that you can deliver reliable and performant solutions faster and more easily than ever. Part I looks at features in SQL and PL/SQL that are underused and not well known. You’ll learn about new join types, pattern matching across rows, Top N pagination (useful in reporting!), qualified expressions, and enhancements to iterators that reduce code complexity and make your logic easier to understand. Part II covers how and when to invoke PL/SQL from SQL while maintaining performance. You'll learn about SQL macro functions for creating reusable SQL fragments, polymorphic table functions with return types determined by incoming argument types, and constructing and parsing JSON documents for data interchange with other systems. Part III introduces a vast array of built-in functionality that Oracle provides that is just waiting to be used. Edition-based redefinition enables zero-downtime application and schema upgrades. Data redaction enables easier compliance with privacy laws and similar regulations by protecting sensitive data from those who have no need to see it. Virtual private databases provide the appearance of giving each user their own database, again helping to secure sensitive data. These features are just a taste of what the book provides. Soon you’ll be improving your skillsand wondering why you ever worked so hard to solve problems that Oracle Database already solves for you. What You Will Learn Write more powerful code by incorporating underused features in SQL and PL/SQL Optimize your integration between SQL and PL/SQL for best performance Take advantage of enhanced set operators, lateral joins, row-based pattern matching, and other advanced features in SQL Make your code easier to understand through your use of newer PL/SQL features, such as qualified expressions and iterator enhancements Integrate with web services and external data sources directly from the database Create and parse JSON documents for easy data exchange and flexible schema design Who This Book Is For Any developer who is writing SQL or PL/SQL, PL/SQL experts who want to level up their knowledgeand skills to the latest features that Oracle Database provides, and developers who don’t want to write their own solutions only to find out later that they’ve wasted their time by building something that Oracle Database provides out of the box

Let The Whole Team Participate In Data With The Quilt Versioned Data Hub

2023-02-11 · Data Engineering Podcast Listen

podcast_episode

by Aneesh Karve (Quilt Data) , Tobias Macey

AI/ML Avro CloudFormation Cloud Computing Data Engineering Data Management Delta Docker Iceberg ORC Parquet Python +3 more

Summary

Data is a team sport, but it's often difficult for everyone on the team to participate. For a long time the mantra of data tools has been "by developers, for developers", which automatically excludes a large portion of the business members who play a crucial role in the success of any data project. Quilt Data was created as an answer to make it easier for everyone to contribute to the data being used by an organization and collaborate on its application. In this episode Aneesh Karve shares the journey that Quilt has taken to provide an approachable interface for working with versioned data in S3 that empowers everyone to collaborate.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Truly leveraging and benefiting from streaming data is hard - the data stack is costly, difficult to use and still has limitations. Materialize breaks down those barriers with a true cloud-native streaming database - not simply a database that connects to streaming systems. With a PostgreSQL-compatible interface, you can now work with real-time data using ANSI SQL including the ability to perform multi-way complex joins, which support stream-to-stream, stream-to-table, table-to-table, and more, all in standard SQL. Go to dataengineeringpodcast.com/materialize today and sign up for early access to get started. If you like what you see and want to help make it better, they're hiring across all functions! Your host is Tobias Macey and today I'm interviewing Aneesh Karve about how Quilt Data helps you bring order to your chaotic data in S3 with transactional versioning and data discovery built in

Interview

Introduction How did you get involved in the area of data management? Can you describe what Quilt is and the story behind it?

How have the goals and features of the Quilt platform changed since I spoke with Kevin in June of 2018?

What are the main problems that users are trying to solve when they find Quilt?

What are some of the alternative approaches/products that they are coming from?

How does Quilt compare with options such as LakeFS, Unstruk, Pachyderm, etc.? Can you describe how Quilt is implemented? What are the types of tools and systems that Quilt gets integrated with?

How do you manage the tension between supporting the lowest common denominator, while providing options for more advanced capabilities?

What is a typical workflow for a team that is using Quilt to manage their data? What are the most interesting, innovative, or unexpected ways that you have seen Quilt used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Quilt? When is Quilt the wrong choice? What do you have planned for the future of Quilt?

Contact Info

LinkedIn @akarve on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Quilt Data

Podcast Episode

UW Madison Docker Swarm Kaggle open.quiltdata.com FinOS Perspective LakeFS

Podcast Episode

Pachyderm

Podcast Episode

Unstruk

Podcast Episode

Parquet Avro ORC Cloudformation Troposphere CDK == Cloud Development Kit Shadow IT

Podcast Episode

Delta Lake

Podcast Episode

Apache Iceberg

Podcast Episode

Datasette Frictionless DVC

Podcast.init Episode

The in

45: 3-Step Guide To Building Your First Data Science Project

2023-02-09 · Data Career Podcast: Helping You Land a Data Analyst Job FAST Listen

podcast_episode

by Avery Smith

AI/ML Analytics Data Analytics Data Science Python Tableau

You just learned SQL or Python, or Tableau. But you don’t know how to build your data science project? In this episode, Avery shares a 3-step guide to building your first data science project.

🌟 Join the data project club!

“25OFF” to get 25% off (first 50 members).

📊 Come to my next free “How to Land Your First Data Job” training

🏫 Check out my 10-week data analytics bootcamp

Timestamps:

(1:28) - Art is theft, and so is the data science project

(4:02) - Find ideas on Towards Data Science Medium

(5:32) - Read a few articles to get inspiration

(6:05) - Avery’s strategy is doing 30 projects in 30 days

(9:08) - How academia finds inspiration to write

(11:01) - Take Avery’s project, replicate and do it

Mentioned Links:

Building 30 Data Science Projects in 30 days: https://youtu.be/kKmA9ihIg20

30 Data Science Projects Resources: https://www.datacareerjumpstart.com/30projectsresourcesignup

I Used Data Science to UNCOVER McDonald’s Healthiest Meal: https://youtu.be/3bbFc1225-4

Connect with Avery:

📺 Subscribe on YouTube: https://www.youtube.com/c/AverySmithDataCareerJumpstart/videos 🎙Listen to My Podcast: https://podcasts.apple.com/us/podcast/data-career-podcast/id1547386535 👔 Connect with me on LinkedIn: https://www.linkedin.com/in/averyjsmith/ 📸 Instagram: https://www.instagram.com/datacareerjumpstart/ 🎵 TikTok: [https://www.tiktok.com/@verydata?]

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Reflecting On The Past 6 Years Of Data Engineering

2023-02-06 · Data Engineering Podcast Listen

podcast_episode

by Tobias Macey

AI/ML Airflow Alation Analytics API AWS Lambda BI Big Data Cloud Computing Dagster Data Engineering Data Management +12 more

Summary

This podcast started almost exactly six years ago, and the technology landscape was much different than it is now. In that time there have been a number of generational shifts in how data engineering is done. In this episode I reflect on some of the major themes and take a brief look forward at some of the upcoming changes.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Your host is Tobias Macey and today I'm reflecting on the major trends in data engineering over the past 6 years

Interview

Introduction 6 years of running the Data Engineering Podcast Around the first time that data engineering was discussed as a role

Followed on from hype about "data science"

Hadoop era Streaming Lambda and Kappa architectures

Not really referenced anymore

"Big Data" era of capture everything has shifted to focusing on data that presents value

Regulatory environment increases risk, better tools introduce more capability to understand what data is useful

Data catalogs

Amundsen and Alation

Orchestration engine

Oozie, etc. -> Airflow and Luigi -> Dagster, Prefect, Lyft, etc. Orchestration is now a part of most vertical tools

Cloud data warehouses Data lakes DataOps and MLOps Data quality to data observability Metadata for everything

Data catalog -> data discovery -> active metadata

Business intelligence

Read only reports to metric/semantic layers Embedded analytics and data APIs

Rise of ELT

dbt Corresponding introduction of reverse ETL

What are the most interesting, unexpected, or challenging lessons that you have learned while working on running the podcast? What do you have planned for the future of the podcast?

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Materialize:

Looking for the simplest way to get the freshest data possible to your teams? Because let's face it: if real-time were easy, everyone would be using it. Look no further than Materialize, the streaming database you already know how to use.

Materialize’s PostgreSQL-compatible interface lets users leverage the tools they already use, with unsurpassed simplicity enabled by full ANSI SQL support. Delivered as a single platform with the separation of storage and compute, strict-serializability, active replication, horizontal scalability and workload isolation — Materialize is now the fastest way to build products with streaming data, drastically reducing the time, expertise, cost and maintenance traditionally associated with implementation of real-time features.

Sign up now for early access to Materialize and get started with the power of streaming data with the same simplicity and low implementation cost as batch cloud data warehouses.

Go to materialize.comSupport Data Engineering Podcast

talk-data.com

Activity Trend

Top Events

Top Speakers

Extreme Self-Service: Turning Data Consumers into Data Constructors | Whatnot

Resilient Oracle PL/SQL

Job Ready SQL

SQL Server 2022 Administration Inside Out

Transitioning to Microsoft Power Platform: An Excel User Guide to Building Integrated Cloud Applications in Power BI, Power Apps, and Power Automate

T-SQL Fundamentals, 4th Edition

The Spark of Big Data: An Introduction to Apache Spark

Use Spark from anywhere: A Spark client in Python powered by Spark Connect

Julia, Pedram Navid + Taylor Murphy Recap Data Council

Modern analytics on the lakehouse using Databricks SQL [Sponsored by Databricks]

Practical Business Analytics Using R and Python: Solve Business Problems Using a Data-driven Approach

SQL Query Design Patterns and Best Practices

Ami Gal, CEO & Co-founder at SQream. We dive deep into Big SQL analytics powered by GPUs, plus the future of compute.

Azure SQL Hyperscale Revealed: High-performance Scalable Solutions for Critical Data Workloads

50: FREE 5-Step SQL Course & Project

Expert Performance Indexing in Azure SQL and SQL Server 2022: Toward Faster Results and Lower Maintenance Both on Premises and in the Cloud

Modern Oracle Database Programming: Level Up Your Skill Set to Oracle's Latest and Most Powerful Features in SQL, PL/SQL, and JSON

Let The Whole Team Participate In Data With The Quilt Versioned Data Hub

45: 3-Step Guide To Building Your First Data Science Project

Reflecting On The Past 6 Years Of Data Engineering