JSON

PostgreSQL: Up and Running, 4th Edition

2027-02-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Leo S. Hsu , Regina Obe

SQL data data-engineering postgresql relational-databases

Thinking of migrating to PostgreSQL? This concise introduction helps you understand and use this open source database system. Not only will you learn about the new enterprise class features in versions 16 to 18, but you'll also discover all that PostgreSQL has to offer—much more than a relational database system. As an open source product, it has hundreds of plug-ins, expanding the capability of PostgreSQL beyond all other database systems. With examples throughout, this book shows you how to perform tasks that are difficult or impossible in other databases. The revised fourth edition covers the latest features of Postgres, such as ISO-SQL constructs rarely found in other databases, foreign data wrapper (FDW) enhancements, JSON constructs, multirange data types, query parallelization, and replication. If you're an experienced PostgreSQL user, you'll pick up gems you may have missed before. Learn basic administration tasks such as role management, database creation, backup, and restore Use psql command-line utility and the pgAdmin graphical administration tool Explore PostgreSQL tables, constraints, and indexes Learn powerful SQL constructs not generally found in other databases Use several different languages to write database functions and stored procedures Tune your queries to run as fast as your hardware will allow Query external and variegated data sources with foreign data wrappers Learn how to use built-in replication to replicate data

Data Contracts in Practice

2026-02-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ryan Collingwood

Data Contracts Data Governance Data Quality Python SQL YAML data data-engineering

In 'Data Contracts in Practice', Ryan Collingwood provides a detailed guide to managing and formalizing data responsibilities within organizations. Through practical examples and real-world use cases, you'll learn how to systematically address data quality, governance, and integration challenges using data contracts. What this Book will help me do Learn to identify and formalize expectations in data interactions, improving clarity among teams. Master implementation techniques to ensure data consistency and quality across critical business processes. Understand how to effectively document and deploy data contracts to bolster data governance. Explore solutions for proactively addressing and managing data changes and requirements. Gain real-world skills through practical examples using technologies like Python, SQL, JSON, and YAML. Author(s) Ryan Collingwood is a seasoned expert with over 20 years of experience in product management, data analysis, and software development. His holistic techno-social approach, designed to address both technical and organizational challenges, brings a unique perspective to improving data processes. Ryan's writing is informed by his extensive hands-on experience and commitment to enabling robust data ecosystems. Who is it for? This book is ideal for data engineers, software developers, and business analysts working to enhance organizational data integration. Professionals with a familiarity of system design, JSON, and YAML will find it particularly beneficial. Enterprise architects and leadership roles looking to understand data contract implementation and their business impacts will also greatly benefit. Basic understanding of Python and SQL is recommended to maximize learning.

Oracle 23AI & ADBS in Action: Exploring New Features with Hands-On Case Studies

2026-01-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Asim Chowdhury

AI/ML Blockchain JavaScript Oracle Cyber Security SQL XML data data-engineering oracle-database-solutions

Unlock the power of Oracle Database 23AI and Autonomous Database Serverless (ADB-S) with this comprehensive guide to the latest innovations in performance, security, automation, and AI-driven optimization. As enterprises embrace intelligent and autonomous data platforms, understanding these capabilities is essential for data architects, developers, and DBAs. Explore cutting-edge features such as vector data types and AI-powered vector search, revolutionizing data retrieval in modern AI applications. Learn how schema privileges and the DB_DEVELOPER_ROLE simplify access control in multi-tenant environments. Dive into advanced auditing, SQL Firewall, and data integrity constraints to strengthen security and compliance. Discover AI-driven advancements like machine learning-based query execution, customer retention prediction, and AI-powered query tuning. Additional chapters cover innovations in JSON, XML, JSON-Relational Duality Views, new indexing techniques, SQL property graphs, materialized views, partitioning, lock-free transactions, JavaScript stored procedures, blockchain tables, and automated bigfile tablespace shrinking. What sets this book apart is its practical focus—each chapter includes real-world case studies and executable scripts, enabling professionals to implement these features effectively in enterprise environments. Whether you're optimizing performance or aligning IT with business goals, this guide is your key to building scalable, secure, and AI-powered solutions with Oracle 23AI and ADB-S. What You Will Learn Explore Oracle 23AI's latest features through real-world use cases Implement AI/ML-driven optimizations for smarter, autonomous database performance Gain hands-on experience with executable scripts and practical coding examples Strengthen security and compliance using advanced auditing, SQL Firewall, and blockchain tables Master high-performance techniques for query tuning, in-memory processing, and scalability Revolutionize data access with AI-powered vector search in modern AI workloads Simplify user access in multi-tenant environments using schema privileges and DB_DEVELOPER_ROLE Model and query complex data using JSON-Relational Duality Views and SQL property graphs Who this Book is For Database architects, data engineers, Oracle developers, and IT professionals seeking to leverage Oracle 23AI’s latest features for real-world applications

Processing large JSON files without running out of memory

2025-12-10 · PyData Boston 2025 Watch

talk

by Itamar Turner-Trauring

Python

If you need to process a large JSON file in Python, it’s very easy to run out of memory while loading the data, leading to a super-slow run time or out-of-memory crashes. In this talk you'll learn:

How to measure memory usage.
Why loading JSON takes a lot of memory.
Four different ways to reduce memory usage when loading large JSON files.

SQL for Data Analytics - Fourth Edition

2025-11-21 · O'Reilly SQL Books O'Reilly Amazon

book

by Upom Malik , Benjamin Johnston , Haibin Li , Matt Goldwasser , Jun Shan

Analytics Data Analytics Python SQL

Dive into the world of data analytics with 'SQL for Data Analytics'. This book takes you beyond simple query writing to teach you how to use SQL to analyze, interpret, and derive actionable insights from real-world data. By the end, you'll build technical skills that allow you to solve complex problems and demonstrate results using data. What this Book will help me do Understand how to create, manage, and utilize structured databases for analytics. Use advanced SQL techniques such as window functions and subqueries effectively. Analyze various types of data like geospatial, JSON, and time-series data in SQL. Apply statistical principles within the context of SQL for enhanced insights. Automate data workflows and presentations using SQL and Python integration. Author(s) The authors Jun Shan, Haibin Li, Matt Goldwasser, Upom Malik, and Benjamin Johnston bring together a wealth of knowledge in data analytics, database management, and applied statistics. Together, they aim to empower readers through clear explanations, practical examples, and a focus on real-world applicability. Who is it for? This book is aimed at data professionals and learners such as aspiring data analysts, backend developers, and anyone involved in data-driven decision-making processes. The ideal reader has a basic understanding of SQL and mathematics and is eager to extend their skills to tackle real-world data challenges effectively.

Move fast, save more with MongoDB-compatible workloads on DocumentDB

2025-11-19 · Microsoft Ignite 2025 Watch

breakout

by Patty Chow (Microsoft) , Khelan Modi (Microsoft) , Gurvinder Singh (The Kraft Heinz Company)

Azure Linux MongoDB

DocumentDB, the open-source MongoDB-compatible document database now part of the Linux Foundation, helps you innovate faster and save more. Customers like Kraft Heinz move fast with a JSON-native model, reduce ops with turnkey scaling and updates, and secure workloads with enterprise-grade protection and an E2E Azure SLA. Delivered as a fully managed service with support for hybrid and multicloud, Azure DocumentDB keeps you moving faster while crushing costs at enterprise scale.

Just Use Postgres!

2025-11-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Denis Magda

AI/ML GenAI Java Microsoft Oracle RDBMS SQL data data-engineering postgresql relational-databases

You probably don’t need a collection of specialty databases. Just use Postgres instead! Written for application developers and database pros, Just Use Postgres! shows you how to get the most out of the powerful Postgres database. In Just Use Postgres! you’ll learn how to: Use Postgres as an RDBMS for transactional workloads Develop generative AI, geospatial, and time-series applications Take advantage of modern SQL including window functions and CTEs Perform full-text search and process JSON documents Use Postgres as a message queue Optimize performance with various index types including B-trees, GIN, GiST, HNSW, and more Over the decades, PostgreSQL, aka Postgres, has grown into the most powerful general-purpose database and has become the de facto standard for developers worldwide. Just Use Postgres! takes a modern look at Postgres, exploring the database’s most up-to-date features for AI, time-series, full-text search, geospatial, and other application workloads. About the Technology You know that PostgreSQL is a fast, reliable, SQL compliant RDBMS. You may not know that it’s also great for geospatial systems, time series, full-text search, JSON documents, AI vector embeddings, and many other specialty database functions. For almost any data task you can imagine, you can use Postgres. About the Book Just Use Postgres! covers recipes for using Postgres in dozens of applications normally reserved for single-purpose databases. Written for busy application developers, each chapter explores a different use case illuminating the breadth and depth of Postgres’s capabilities. Along the way, you’ll also meet an incredible ecosystem of Postgres extensions like pgvector, PostGIS, pgmq, and TimescaleDB. You’ll be amazed at everything you can accomplish with Postgres! What's Inside Generative AI, geospatial, and time-series applications Modern SQL including window functions and CTEs Full-text search and JSON B-trees, GIN, GiST, HNSW, and more About the Reader For application developers, software engineers, and architects who know the basics of SQL. About the Author Denis Magda is a recognized Postgres expert and software engineer who worked on Java at Sun Microsystems and Oracle before focusing on databases and large-scale distributed systems. Quotes I was pleasantly surprised to learn many new things from this book. - From the Afterword by Vlad Mihalcea An excellent guide covering everything from basics to cutting-edge features. - Dave Cramer, PostgreSQL JDBC Maintainer Pleasant, easy to read with tonnes of great code. - Mike McQuillan, McQTech Ltd Well-organized and easy to search. - Edward Pollack, Microsoft Data Platform MVP The missing guide to understanding and using Postgres. - Mehboob Alam, POSTGRESNX, Inc.

SQL Server 2025: The AI-ready enterprise database

2025-11-18 · Microsoft Ignite 2025 Watch

breakout

by Bob Ward (Azure Data) , Sirjad Parakkat (Ivanti) , Abhinav Tiwari (Ivanti)

AI/ML Analytics API Fabric SQL

SQL Server 2025 redefines what's possible for the enterprise data platform. With developer-first features and seamless integration with analytics and AI models, SQL Server 2025 accelerates AI innovation using the data you already own. Build modern apps with native JSON and REST APIs and harness AI with built-in vector search. Increase application availability with optimized locking and use Fabric mirroring for near real-time analytics. Join us to see why this is the most advanced SQL Server.

Wrangling Internet-scale Image Datasets

2025-11-07 · PyData Seattle 2025 Watch

talk

by Nicholas Merchant , Carlos Garcia Jurado Suarez

Data Quality Parquet

Building and curating datasets at internet scale is both powerful and messy. At Irreverent Labs, we recently released Re-LAION-Caption19M, a 19-million–image dataset with improved captions, alongside a companion arXiv paper. Behind the scenes, the project involved wrangling terabytes of raw data and designing pipelines that could produce a research-quality dataset while remaining resilient, efficient, and reproducible. In this talk, we’ll share some of the practical lessons we learned while engineering data at this scale. Topics include: strategies for ensuring data quality through a mix of automated metrics and human inspection; why building file manifests pays off when dealing with millions of files; effective use of Parquet, WDS and JSONL for metadata and intermediate results; pipeline patterns that favor parallel processing and fault tolerance; and how logging and dashboards can turn long-running jobs from opaque into observable. Whether you’re working with images, text, or any other massive dataset, these patterns and pitfalls may help you design pipelines that are more robust, maintainable, and researcher-friendly.

From Parsing Nightmares to "Production": Any Unstructured Input → JSON → MotherDuck in Seconds

2025-11-04 · Small Data SF 2025

workshop

by Upal Saha (bem)

API CSV HTML Motherduck

Every sprint consumed by fixing parsers is a sprint spent not shipping product- brittle parsing kills velocity. This workshop is about retiring that cycle so you can move from messy, unstructured inputs to production-ready data in seconds. bem ingests and transforms any unstructured input at any volume — PDFs, emails, Excel, Word, CSV, text, JSON, images (PNG, JPEG, HEIC, HEIF, WebP), HTML, and audio (WAV, MP3, M4A) — into clean JSON instantly via API. With primitives like Transform, Join, Split, Route, and Analyze, you define the exact workflow your product needs. Built-in Evals measure + enforce accuracy automatically so quality doesn’t drop as you scale. Flow outputs straight into MotherDuck so you can go from chaos to query without manual cleanup — and your team can focus on shipping, not scraping.

Building Modern Applications

2025-10-30 · NoSQL Power Workshop: Build Modern Apps with KV, SQL++, Graph & Time Series

talk

graph key-value sql++ time series

What are the benefits of NoSQL compared to SQL databases, and multi-model applications with Couchbase database platform: JSON, Key-Value, SQL++, Graph, and Time Series.

Beyond the Perimeter: Practical Patterns for Fine‑Grained Data Access

2025-10-27 · Data Engineering Podcast Listen

podcast_episode

by Matt Topper (UberEther) , Tobias Macey

AI/ML Cloud Computing Data Engineering Data Governance Data Management Data Quality Datafold dbt ETL/ELT Prefect Python Cyber Security +2 more

Summary In this episode of the Data Engineering Podcast Matt Topper, president of UberEther, talks about the complex challenge of identity, credentials, and access control in modern data platforms. With the shift to composable ecosystems, integration burdens have exploded, fracturing governance and auditability across warehouses, lakes, files, vector stores, and streaming systems. Matt shares practical solutions, including propagating user identity via JWTs, externalizing policy with engines like OPA/Rego and Cedar, and using database proxies for native row/column security. He also explores catalog-driven governance, lineage-based label propagation, and OpenTDF for binding policies to data objects. The conversation covers machine-to-machine access, short-lived credentials, workload identity, and constraining access by interface choke points, as well as lessons from Zanzibar-style policy models and the human side of enforcement. Matt emphasizes the need for trust composition - unifying provenance, policy, and identity context - to answer questions about data access, usage, and intent across the entire data path.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Matt Topper about the challenges of managing identity and access controls in the context of data systemsInterview IntroductionHow did you get involved in the area of data management?The data ecosystem is a uniquely challenging space for creating and enforcing technical controls for identity and access control. What are the key considerations for designing a strategy for addressing those challenges?For data acess the off-the-shelf options are typically on either extreme of too coarse or too granular in their capabilities. What do you see as the major factors that contribute to that situation?Data governance policies are often used as the primary means of identifying what data can be accesssed by whom, but translating that into enforceable constraints is often left as a secondary exercise. How can we as an industry make that a more manageable and sustainable practice?How can the audit trails that are generated by data systems be used to inform the technical controls for identity and access?How can the foundational technologies of our data platforms be improved to make identity and authz a more composable primitive?How does the introduction of streaming/real-time data ingest and delivery complicate the challenges of security controls?What are the most interesting, innovative, or unexpected ways that you have seen data teams address ICAM?What are the most interesting, unexpected, or challenging lessons that you have learned while working on ICAM?What are the aspects of ICAM in data systems that you are paying close attention to?What are your predictions for the industry adoption or enforcement of those controls?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links UberEtherJWT == JSON Web TokenOPA == Open Policy AgentRegoPingIdentityOktaMicrosoft EntraSAML == Security Assertion Markup LanguageOAuthOIDC == OpenID ConnectIDP == Identity ProviderKubernetesIstioAmazon CEDAR policy languageAWS IAMPII == Personally Identifiable InformationCISO == Chief Information Security OfficerOpenTDFOpenFGAGoogle ZanzibarRisk Management FrameworkModel Context ProtocolGoogle Data ProjectTPM == Trusted Platform ModulePKI == Public Key InfrastructurePassskeysDuckLakePodcast EpisodeAccumuloJDBCOpenBaoHashicorp VaultLDAPThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

SQL Server 2025 Unveiled: The AI-Ready Enterprise Database with Microsoft Fabric Integration

2025-10-15 · O'Reilly SQL Books O'Reilly Amazon

book

by Bob Ward (Azure Data)

AI/ML Analytics API Azure Cloud Computing Microsoft Fabric Cyber Security SQL Data Streaming microsoft sql server

Unveil the data platform of the future with SQL Server 2025—guided by one of its key architects . With built-in AI for application development and advanced analytics powered by Microsoft Fabric, SQL Server 2025 empowers you to innovate—securely and confidently. This book shows you how. Author Bob Ward, Principal Architect for the Microsoft Azure Data team, shares exclusive insights drawn from over three decades at Microsoft. Having worked on every version of SQL Server since OS/2 1.1, Ward brings unmatched expertise and practical guidance to help you navigate this transformative release. Ward covers everything from setup and upgrades to advanced features in performance, high availability, and security. He also highlights what makes this the most developer-friendly release in a decade: support for JSON, RegEx, REST APIs, and event streaming. Most critically, Ward explores SQL Server 2025’s advanced, scalable AI integrations, showing you how to build AI-powered applications deeply integrated with the SQL engine—and elevate your analytics to the next level. But innovation doesn’t come at the cost of safety: this release is built on a foundation of enterprise-grade security, helping you adopt AI safely and responsibly. You control which models to use, how they interact with your data, and where they run—from ground to cloud, or integrated with Microsoft Fabric. With built-in features like Row-Level Security (RLS), Transparent Data Encryption (TDE), Dynamic Data Masking, and SQL Server Auditing, your data remains protected at every layer. The AI age is here. Make sure your SQL Server databases are ready—and built for secure, scalable innovation . What You Will Learn [if !supportLists] · [endif]Grasp the fundamentals of AI to leverage AI with your data, using the industry-proven security and scale of SQL Server [if !supportLists] · [endif]Utilize AI models of your choice, services, and frameworks to build new AI applications [if !supportLists] · [endif]Explore new developer features such as JSON, Regular Expressions, REST API, and Change Event Streaming [if !supportLists] · [endif]Discover SQL Server 2025's powerful new engine capabilities to increase application concurrency [if !supportLists] · [endif]Examine new high availability features to enhance uptime and diagnose complex HADR configurations [if !supportLists] · Use new query processing capabilities to extend the performance of your application [if !supportLists] · [endif]Connect SQL Server to Azure with Arc for advanced management and security capabilities [if !supportLists] · [endif]Secure and govern your data using Microsoft Entra [if !supportLists] · [endif]Achieve near-real-time analytics with the unified data platform Microsoft Fabric [if !supportLists] · [endif]Integrate AI capabilities with SQL Server for enterprise AI [if !supportLists] · [endif]Leverage new tools such as SQL Server Management Studio and Copilot experiences to assist your SQL Server journey Who This Book Is For The SQL Server community, including DBAs, architects, and developers eager to stay ahead with the latest advancements in SQL Server 2025, and those interested in the intersection of AI and data, particularly how artificial intelligence (AI) can be seamlessly integrated with SQL Server to unlock deeper insights and smarter solutions

JSON Relationnel Duality : vos modèles document et relationnel unifié

2025-10-02 · Exhibitors' Events - Auto created

Face To Face

by Oracle , Khalid Dar

JSON Relationnel Duality : vos modèles document et relationnel unifié

2025-10-01 · PyData Paris 2025

Face To Face

Model Context Protocol: Principles and Practice

2025-09-26 · PyData Amsterdam 2025 Watch

talk

by Fabio Lipreri , Gabriele Orlandi

GenAI Git GitHub LLM Cyber Security SQL Data Streaming postgresql

Large‑language‑model agents are only as useful as the context and tools they can reach.

Anthropic’s Model Context Protocol (MCP) proposes a universal, bidirectional interface that turns every external system—SQL databases, Slack, Git, web browsers, even your local file‑system—into first‑class “context providers.”

In just 30 minutes we’ll step from high‑level buzzwords to hands‑on engineering details:

How MCP’s JSON‑RPC message format, streaming channels, and version‑negotiation work under the hood.
Why per‑tool sandboxing via isolated client processes hardens security (and what happens when an LLM tries rm ‑rf /).
Techniques for hierarchical context retrieval that stretch a model’s effective window beyond token limits.
Real‑world patterns for accessing multiple tools—Postgres, Slack, GitHub—and plugging MCP into GenAI applications.

Expect code snippets and lessons from early adoption.

You’ll leave ready to wire your own services into any MCP‑aware model and level‑up your GenAI applications—without the N×M integration nightmare.

Spark 4.0 and Delta 4.0 For Streaming Data

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Bryce Bartmann (Shell)

AI/ML Delta Python Spark Data Streaming

Real-time data is one of the most important datasets for any Data and AI Platform across any industry. Spark 4.0 and Delta 4.0 include new features that make ingestion and querying of real-time data better than ever before. Features such as: Python custom data sources for simple ingestion of streaming and batch time series data sources using Spark Variant types for managing variable data types and json payloads that are common in the real time domain Delta liquid clustering for simple data clustering without the overhead or complexity of partitioning In this presentation you will learn how data teams can leverage these latest features to build industry-leading, real-time data products using Spark and Delta and includes real world examples and metrics of the improvements they make in performance and processing of data in the real time space.

Advanced JSON Schema handing and Event Demuxing

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Dattatraya Walake (Databricks) , Murali Talluri (Databricks)

JSON Schema

This session explores advanced JSON Schema handing(inference and evolving), and event DemuxingTopics include: How from_json is currently used today and its challenges. How to use Variant for rapidly changing schema. How from_json in Lakeflow Declarative Pipelines with primed schema helps simplify schema handling. Demultiplexing patterns for scalable stream processing. Simply event Demuxing with Lakeflow Declarative Pipelines.

How Google Apps Script became the ultimate tool for AI finetuning—here's my journey

2025-04-09 · Google Cloud Next '25

session

by Dmitry Kostyuk (Wurkspaces.dev)

AI/ML Cloud Computing Cloud Storage

Unlock the Power of Fine-Tuning with Apps Script! Learn how to optimize pre-trained models for specific tasks using Google Apps Script. This session covers exporting data from Sheets to Cloud Storage as JSONL, building an Apps Script prompt explainer backend, and creating service accounts for secure access to Vertex AI and Cloud Storage. We'll also show how to collect, transform, and split data for training, launch the fine-tuning process, and test results in Vertex AI and a Google Chat bot. Master fine-tuning for practical AI applications.

DuckDB: Up and Running

2024-12-12 · O'Reilly Data Science Books O'Reilly Amazon

book

by Wei-Meng Lee

Analytics CSV Data Analytics DuckDB Pandas Parquet Polars Python SQL data data-science data-science-tools

DuckDB, an open source in-process database created for OLAP workloads, provides key advantages over more mainstream OLAP solutions: It's embeddable and optimized for analytics. It also integrates well with Python and is compatible with SQL, giving you the performance and flexibility of SQL right within your Python environment. This handy guide shows you how to get started with this versatile and powerful tool. Author Wei-Meng Lee takes developers and data professionals through DuckDB's primary features and functions, best practices, and practical examples of how you can use DuckDB for a variety of data analytics tasks. You'll also dive into specific topics, including how to import data into DuckDB, work with tables, perform exploratory data analysis, visualize data, perform spatial analysis, and use DuckDB with JSON files, Polars, and JupySQL. Understand the purpose of DuckDB and its main functions Conduct data analytics tasks using DuckDB Integrate DuckDB with pandas, Polars, and JupySQL Use DuckDB to query your data Perform spatial analytics using DuckDB's spatial extension Work with a diverse range of data including Parquet, CSV, and JSON

talk-data.com

Activity Trend

Top Events

Top Speakers

PostgreSQL: Up and Running, 4th Edition

Data Contracts in Practice

Oracle 23AI & ADBS in Action: Exploring New Features with Hands-On Case Studies

Processing large JSON files without running out of memory

SQL for Data Analytics - Fourth Edition

Move fast, save more with MongoDB-compatible workloads on DocumentDB

Just Use Postgres!

SQL Server 2025: The AI-ready enterprise database

Wrangling Internet-scale Image Datasets

From Parsing Nightmares to "Production": Any Unstructured Input → JSON → MotherDuck in Seconds

Building Modern Applications

Beyond the Perimeter: Practical Patterns for Fine‑Grained Data Access

SQL Server 2025 Unveiled: The AI-Ready Enterprise Database with Microsoft Fabric Integration

JSON Relationnel Duality : vos modèles document et relationnel unifié

JSON Relationnel Duality : vos modèles document et relationnel unifié

Model Context Protocol: Principles and Practice

Spark 4.0 and Delta 4.0 For Streaming Data

Advanced JSON Schema handing and Event Demuxing

How Google Apps Script became the ultimate tool for AI finetuning—here's my journey

DuckDB: Up and Running