talk-data.com talk-data.com

Topic

SQL

Structured Query Language (SQL)

database_language data_manipulation data_definition programming_language

1751

tagged

Activity Trend

107 peak/qtr
2020-Q1 2026-Q1

Activities

1751 activities · Newest first

Exam Ref DP-300 Administering Microsoft Azure SQL Solutions

Prepare for Microsoft Exam DP-300 and demonstrate your real-world foundational knowledge of Azure database administration using a variety of methods and tools to perform and automate day-to-day operations, including use of Transact-SQL (T-SQL) and other tools for administrative management purposes. Designed for database administrators, solution architects, data scientists, and other data professionals, this Exam Ref focuses on the critical-thinking and decision-making acumen needed for success at the Microsoft Certified: Azure Database Administrator Associate level. Focus on the expertise measured by these objectives: Plan and implement data platform resources Implement a secure environment Monitor, configure, and optimize database resources Configure and manage automation of tasks Plan and configure a high availability and disaster recovery (HA/DR) environment This Microsoft Exam Ref: Organizes its coverage by the Skills Measured list published for the exam Features strategic, what-if scenarios to challenge you Assumes you have subject matter expertise in building database solutions that are designed to support multiple workloads built with SQL Server on-premises and Azure SQL About the Exam Exam PD-300 focuses on core knowledge for implementing and managing the operational aspects of cloud-native and hybrid data platform solutions built on SQL Server and Azure SQL services, using a variety of methods and tools to perform and automate day-to-day operations, including applying knowledge of using Transact-SQL (T-SQL) and other tools for administrative management purposes. About Microsoft Certification Passing this exam fulfills your requirements for the Microsoft Certified: Azure Database Administrator Associate certification, demonstrating your ability to administer a SQL Server database infrastructure for cloud, on-premises, and hybrid relational databases using the Microsoft PaaS relational database offerings. See full details at: microsoft.com/learn .

Product managers for BI platforms have it easy. They "just" need to have the dev team build a tool that gives all types of users access to all of the data they should be allowed to see in a way that is quick, simple, and clear while preventing them from pulling data that can be misinterpreted. Of course, there are a lot of different types of users—from the C-level executive who wants ready access to high-level metrics all the way to the analyst or data scientist who wants to drop into a SQL flow state to everyone in between. And sometimes the tool needs to provide structured dashboards, while at other times it needs to be a mechanism for ad hoc analysis. Maybe the product manager's job is actually…impossible? Past Looker CAO and current Omni CEO Colin Zima joined this episode for a lively discussion on the subject! For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

Common sense suggests that a manager should know something about the thing they are managing, no? I was asked recently to take over a team that owns the databases for the fast growing cloud storage provider, Wasabi, and found myself in the midst of pivotal decisions that will determine if the company can evolve its technology to enable order-of-magnitude scaling and new business opportunities. The lessons about how to manage the team, how to manage up, and how to make these technical decisions should be helpful to many others facing challenges with their database technology. Spoiler: I didn’t have to write a line of SQL.

Narrative SQL: Crafting Data Analysis Queries That Tell Stories

This book addresses an important gap in data analytics education: the interplay between complex query-making and storytelling. While many resources cover the fundamentals of SQL queries and the technical skills required to manipulate data, few also explore moving beyond the numbers and figures to tell stories that drive strategic business decisions. By weaving together both SQL and narrative mechanics, author Hamed Tabrizchi has assembled a powerful tool for data analysts, aspiring database professionals, and business intelligence specialists. A strong foundation is laid in the first part of the book, which examines the technical skills necessary to access and manipulate data. You’ll explore foundational SQL commands, advanced querying techniques, data manipulation, data integrity, and optimization of queries for performance. The second half moves from the "how" of SQL to the "why," examining the meaning-making practices we can apply to data, and the stories data can tell. You'll learn how SQL queries can be interpreted, how to prepare data for visualization, and most importantly, how to convey the findings in a way that engages and informs the audience. In each chapter, practical exercises reinforce the techniques learned and help you apply them in real-world situations. In addition to strengthening technical skills, these exercises encourage readers to take a critical view of the data they are studying, considering the larger story it represents. Upon completing this book, you will not only be proficient in SQL, but also possess the key skill of converting data into narratives that can influence strategic direction and operational decisions in the modern workplace. What You Will Learn Advanced SQL Techniques: Master data manipulation and retrieval skills using advanced SQL queries Data Analysis Proficiency: Develop analytical skills to uncover key insights and understand significant data patterns Storytelling with Data: Learn to translate data analytics into compelling narratives for effective stakeholder communication Complex Querying Skills: Understand advanced SQL concepts such as common table expressions (CTEs), subqueries, and window functions Query Optimization: Optimize query execution time, resource usage, and scalability by mastering Indexes and Views Practical Application of Techniques: Gain hands-on experience with practical examples of advanced SQL techniques in real-world data analysis scenarios Effective Data Presentation: Discover strategies for visually presenting data stories to enhance engagement and understanding among diverse audiences Who This Book Is For Data analysts and business analysts, SQL developers, data-driven managers and executives and academics and students looking to enhance advanced querying and narrative building skills to better interpret and convey data.

This is a free preview of a paid episode. To hear more, visit dataengineeringcentral.substack.com

Hello! A new episode of the Data Engineering Central Podcast is dropping today, we will be covering a few hot topics! * Apache Iceberg Catalogs * new Boring Catalog * new full Iceberg support from Databricks/Unity Catalog * Databricks SQL Scripting * DuckDB coming to a Lake House near you * Lakebase from Databricks Going to be a great show, come along for the ride! Thanks …

MongoDB 8.0 in Action, Third Edition

Deliver flexible, scalable, and high-performance data storage that's perfect for AI and other modern applications with MongoDB 8.0 and MongoDB Atlas multi-cloud data platform. In MongoDB 8.0 in Action, Third Edition you'll find comprehensive coverage of the latest version of MongoDB 8.0 and the MongoDB Atlas multi-cloud data platform. Learn to utilize MongoDB’s flexible schema design for data modeling, scale applications effectively using advanced sharding features, integrate full-text and vector-based semantic search, and more. This totally revised new edition delivers engaging hands-on tutorials and examples that put MongoDB into action! In MongoDB 8.0 in Action, Third Edition you'll: Master new features in MongoDB 8.0 Create your first, free Atlas cluster using the Atlas CLI Design scalable NoSQL databases with effective data modeling techniques Master Vector Search for building GenAI-driven applications Utilize advanced search capabilities in MongoDB Atlas, including full-text search Build Event-Driven Applications with Atlas Stream Processing Deploy and manage MongoDB Atlas clusters both locally and in the cloud using the Atlas CLI Leverage the Atlas SQL interface for familiar SQL querying Use MongoDB Atlas Online Archive for efficient data management Establish robust security practices including encryption Master backup and restore strategies Optimize database performance and identify slow queries MongoDB 8.0 in Action, Third Edition offers a clear, easy-to-understand introduction to everything in MongoDB 8.0 and MongoDB Atlas—including new advanced features such as embedded config servers in sharded clusters, or moving an unsharded collection to a different shard. The book also covers Atlas stream processing, full text search, and vector search capabilities for generative AI applications. Each chapter is packed with tips, tricks, and practical examples you can quickly apply to your projects, whether you're brand new to MongoDB or looking to get up to speed with the latest version. About the Technology MongoDB is the database of choice for storing structured, semi-structured, and unstructured data like business documents and other text and image files. MongoDB 8.0 introduces a range of exciting new features—from sharding improvements that simplify the management of distributed data, to performance enhancements that stay resilient under heavy workloads. Plus, MongoDB Atlas brings vector search and full-text search features that support AI-powered applications. About the Book MongoDB 8.0 in Action, Third Edition you’ll learn how to take advantage of all the new features of MongoDB 8.0, including the powerful MongoDB Atlas multi-cloud data platform. You’ll start with the basics of setting up and managing a document database. Then, you’ll learn how to use MongoDB for AI-driven applications, implement advanced stream processing, and optimize performance with improved indexing and query handling. Hands-on projects like creating a RAG-based chatbot and building an aggregation pipeline mean you’ll really put MongoDB into action! What's Inside The new features in MongoDB 8.0 Get familiar with MongoDB’s Atlas cloud platform Utilizing sharding enhancements Using vector-based search technologies Full-text search capabilities for efficient text indexing and querying About the Reader For developers and DBAs of all levels. No prior experience with MongoDB required. About the Author Arek Borucki is a MongoDB Champion, certified MongoDB and MongoDB Atlas administrator with expertise in distributed systems, NoSQL databases, and Kubernetes. Quotes An excellent resource with real-world examples and best practices to design, optimize, and scale modern applications. - Advait Patel, Broadcom Essential MongoDB resource. Covers new features such as full-text search, vector search, AI, and RAG applications. - Juan Roy, Credit Suisse Reflects author’s practical experience and clear teaching style. It’s packed with real-world examples and up-to-date insights. - Rajesh Nair, MongoDB Champion & community leader This book will definitely make you a MongoDB star! - Vinicios Wentz, JP Morgan & Chase Co.

For the past decade, SQL has reigned king of the data transformation world, and tools like dbt have formed a cornerstone of the modern data stack. Until recently, Python-first alternatives couldn't compete with the scale and performance of modern SQL. Now Ibis can provide the same benefits of SQL execution with a flexible Python dataframe API.

In this talk, you will learn how Ibis supercharges existing open-source libraries like Kedro and Pandera and how you can combine these technologies (and a few more) to build and orchestrate scalable data engineering pipelines without sacrificing the comfort (and other advantages) of Python.

This hands-on tutorial will guide participants through building an end-to-end AI agent that translates natural language questions into SQL queries, validates and executes them on live databases, and returns accurate responses. Participants will build a system that intelligently routes between a specialized SQL agent and a ReAct chat agent, implementing RAG for query similarity matching, comprehensive safety validation, and human-in-the-loop confirmation. By the end of this 4-hour session, attendees will have created a powerful and extensible system they can adapt to their own data sources.

Pandas and scikit-learn have become staples in the machine learning toolkit for processing and modeling tabular data in Python. However, when data size scales up, these tools become slow or run out of memory. Ibis provides a unified, Pythonic, dataframe-like interface to 20+ execution backends, including dataframe libraries, databases, and analytics engines. Ibis enables users to leverage these powerful tools without rewriting their data engineering code (or learning SQL). IbisML extends the benefits of using Ibis to the ML workflow by letting users preprocess their data at scale on any Ibis-supported backend.

In this tutorial, you'll build an end-to-end machine learning project to predict the live win probability after each move during chess games.

Structured Query Language (or SQL for short) is a programming language to manage data in a database system and an essential part of any data engineer’s tool kit. In this tutorial, you will learn how to use SQL to create databases, tables, insert data into them and extract, filter, join data or make calculations using queries. We will use DuckDB, a new open source embedded in-process database system that combines cutting edge database research with dataframe-inspired ease of use. DuckDB is only a pip install away (with zero dependencies), and runs right on your laptop. You will learn how to use DuckDB with your existing Python tools like Pandas, Polars, and Ibis to simplify and speed up your pipelines. Lastly, you will learn how to use SQL to create fast, interactive data visualizations, and how to teach your data how to fly and share it via the Cloud.

Enterprises want the flexibility to operate across multiple clouds, whether to optimize costs, improve resiliency, to avoid vendor lock-in, or for data sovereignty. But for developers, that flexibility usually comes at the cost of extra complexity and redundant code. The goal here is simple: write once, run anywhere, with minimum boilerplate. In Apache Airflow, we’ve already begun tackling this problem with abstractions like Common-SQL, which lets you write database queries once and run them on 20+ databases, from Snowflake to Postgres to SQLite to SAP HANA. Similarly, Common-IO standardizes cloud blob storage interactions across all public clouds. With Airflow 3.0, we are pushing this further by introducing a Common Message Bus provider, which is an abstraction, initially supporting Amazon SQS and expanding to Google PubSub and Apache Kafka soon after. We expect additional implementations such as Amazon Kinesis and Managed Kafka over time. This talk will dive into why these abstractions matter, how they reduce friction for developers while giving enterprises true multi-cloud optionality, and what’s next for Airflow’s evolving provider ecosystem.

This session explores how to bring unit testing to SQL pipelines using Airflow. I’ll walk through the development of a SQL testing library that allows isolated testing of SQL logic by injecting mock data into base tables. To support this, we built a type system for AWS Glue tables using Pydantic, enabling schema validation and mock data generation. Over time, this type system also powered production data quality checks via a custom Airflow operator. Learn how this approach improves reliability, accelerates development, and scales testing across data workflows.

Before Airflow, our BigQuery pipelines at Create Music Group operated like musicians without a conductor—each playing on its own schedule, regardless of whether upstream data was ready. As our data platform grew, this chaos led to spiralling costs, performance bottlenecks, and became utterly unsustainable. This talk tells the story of how Create Music Group brought harmony to its data workflows by adopting Apache Airflow and the Medallion architecture, ultimately slashing our data processing costs by 50%. We’ll show how moving to event-driven scheduling with datasets helped eliminate stale data issues, dramatically improved performance, and unlocked faster iteration across teams. Discover how we replaced repetitive SQL with standardized dimension/fact tables, empowering analysts in a safer sandbox.

As data workloads grow in complexity, teams need seamless orchestration to manage pipelines across batch, streaming, and AI/ML workflows. Apache Airflow provides a flexible and open-source way to orchestrate Databricks’ entire platform, from SQL analytics with Materialized Views (MVs) and Streaming Tables (STs) to AI/ML model training and deployment. In this session, we’ll showcase how Airflow can automate and optimize Databricks workflows, reducing costs and improving performance for large-scale data processing. We’ll highlight how MVs and STs eliminate manual incremental logic, enable real-time ingestion, and enhance query performance—all while maintaining governance and flexibility. Additionally, we’ll demonstrate how Airflow simplifies ML model lifecycle management by integrating Databricks’ AI/ML capabilities into end-to-end data pipelines. Whether you’re a dbt user seeking better performance, a data engineer managing streaming pipelines, or an ML practitioner scaling AI workloads, this session will provide actionable insights on using Airflow and Databricks together to build efficient, cost-effective, and future-proof data platforms.

As data workloads grow in complexity, teams need seamless orchestration to manage pipelines across batch, streaming, and AI/ML workflows. Apache Airflow provides a flexible and open-source way to orchestrate Databricks’ entire platform, from SQL analytics with Materialized Views (MVs) and Streaming Tables (STs) to AI/ML model training and deployment. In this session, we’ll showcase how Airflow can automate and optimize Databricks workflows, reducing costs and improving performance for large-scale data processing. We’ll highlight how MVs and STs eliminate manual incremental logic, enable real-time ingestion, and enhance query performance—all while maintaining governance and flexibility. Additionally, we’ll demonstrate how Airflow simplifies ML model lifecycle management by integrating Databricks’ AI/ML capabilities into end-to-end data pipelines. Whether you’re a dbt user seeking better performance, a data engineer managing streaming pipelines, or an ML practitioner scaling AI workloads, this session will provide actionable insights on using Airflow and Databricks together to build efficient, cost-effective, and future-proof data platforms.

In the rapidly evolving field of data engineering and data science, efficiency and ease of use are crucial. Our innovative solution offers a user-friendly interface to manage and schedule custom PySpark, PySQL, Python, and SQL code, streamlining the process from development to production. Using Airflow at the backend, this tool eliminates the complexities of infrastructure management, version control, CI/CD processes, and workflow orchestration.The intuitive UI allows users to upload code, configure job parameters, and set schedules effortlessly, without the need for additional scripting or coding. Additionally, users have the flexibility to bring their own custom artifactory solution and run their code. In summary, our solution significantly enhances the orchestration and scheduling of custom code, breaking down traditional barriers and empowering organizations to maximize their data’s potential and drive innovation efficiently. Whether you are an individual data scientist or part of a large data engineering team, this tool provides the resources needed to streamline your workflow and achieve your goals faster than ever before.

Fundamentals of Microsoft Fabric

In the rapidly evolving world of data and analytics, professionals face the challenge of navigating complex platforms in order to build more efficient solutions. Microsoft Fabric, hailed as Microsoft’s “biggest data product in history after SQL Server,” offers powerful capabilities but comes with a steep learning curve. The myriad of choices within Fabric can be overwhelming, with multiple ways to tackle tasks, not all of which are equally efficient. This book serves as a definitive roadmap to understanding Microsoft Fabric—and leveraging it to suit your needs. Authors Nikola Ilic and Ben Weissman demystify the core concepts and components necessary to build, manage, and administer robust data solutions within this game-changing product. Discover the core Microsoft Fabric components and understand key concepts and techniques for building a robust data platform Learn to apply Microsoft Fabric effectively in your day-to-day job Understand the concept of a lake-centric architecture Gain the skills to implement a scalable and efficient end-to-end analytics solution Manage and administer a Fabric tenant