RDBMS

The Language of SQL, 3rd Edition

2021-12-03 · O'Reilly SQL Books O'Reilly Amazon

book

by Larry Rockoff

Microsoft MySQL Oracle SQL SQL Server data data-engineering

Get Started Fast with SQL! The only book you need to gain a quick working knowledge of SQL and relational databases. Many SQL texts attempt to serve as an encyclopedic reference on SQL syntaxan approach that is often counterproductive because that information is readily available in online references published by the major database vendors. For SQL beginners, its more important for a book to focus on general concepts and to offer clear explanations and examples of what various SQL statements can accomplish. This is that book. Several features make The Language of SQL unique among introductory SQL books. First, you will not be required to download software or sit with a computer as you read the text. The intent of this book is to provide examples of SQL usage that can be understood simply by reading. Second, topics are organized in an intuitive and logical sequence. SQL keywords are introduced one at a time, allowing you to grow your understanding as you encounter new terms and concepts. Finally, this book covers the syntax of the latest releases of three widely used databases: Microsoft SQL Server 2019, MySQL 8.0, and Oracle 18c. Special Database Differences sidebars clearly show you any differences in syntax among these three databases, and instructions are included on how to obtain and install free versions of the databases. Use SQL to retrieve data from relational databases Apply functions and calculations to data Group and summarize data in a variety of useful ways Use complex logic to retrieve only the data you need Design relational databases so that data retrieval is easy and intuitive Update data and create new tables Use spreadsheets to transform your data into meaningful displays Retrieve data from multiple tables via joins, subqueries, views, and set logic Create, modify, and execute stored procedures Install Microsoft SQL Server, MySQL, or Oracle

PostGIS in Action, Third Edition

2021-09-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Leo S. Hsu , Regina Obe

GIS JSON SQL data data-engineering geographic-information-system-gis location-data postgis postgresql

In PostGIS in Action, Third Edition you will learn: An introduction to spatial databases Geometry, geography, raster, and topology spatial types, functions, and queries Applying PostGIS to real-world problems Extending PostGIS to web and desktop applications Querying data from external sources using PostgreSQL Foreign Data Wrappers Optimizing queries for maximum speed Simplifying geometries for greater efficiency PostGIS in Action, Third Edition teaches readers of all levels to write spatial queries for PostgreSQL. You’ll start by exploring vector-, raster-, and topology-based GIS before quickly progressing to analyzing, viewing, and mapping data. This fully updated third edition covers key changes in PostGIS 3.1 and PostgreSQL 13, including parallelization support, partitioned tables, and new JSON functions that help in creating web mapping applications. About the Technology PostGIS is a spatial database extender for PostgreSQL. It offers the features and firepower you need to take on nearly any geodata task. PostGIS lets you create location-aware queries with a few lines of SQL code, then build the backend for mapping, raster analysis, or routing application with minimal effort. About the Book PostGIS in Action, Third Edition shows you how to solve real-world geodata problems. You’ll go beyond basic mapping, and explore custom functions for your applications. Inside this fully updated edition, you’ll find coverage of new PostGIS features such as PostGIS Window functions, parallelization of queries, and outputting data for applications using JSON and Vector Tile functions. What's Inside Fully revised for PostGIS version 3.1 and PostgreSQL 13 Optimize queries for maximum speed Simplify geometries for greater efficiency Extend PostGIS to web and desktop applications About the Reader For readers familiar with relational databases and basic SQL. No prior geodata or GIS experience required. About the Authors Regina Obe and Leo Hsu are database consultants and authors. Regina is a member of the PostGIS core development team and the Project Steering Committee. Quotes The best introduction I’ve seen for engineers who want to get ramped up quickly and build advanced GIS applications. - Ikechukwu Okonkwo, Orum.io A wealth of information that showcases how powerful PostGIS is. - Luis Moux-Dominguez, EMO An extraordinary book for the world of GIS. Truly learned a lot! - DeUndre’ Rushon, DigiDiscover LLC Gives you insight into how best to provide map services for a wide audience. - Marcus Brown, Enel Green Power

Learning MySQL, 2nd Edition

2021-09-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Vinicius M. Grippa , Sergey Kuzmichev

Cloud Computing MySQL Cyber Security data data-engineering relational-databases

Get a comprehensive overview on how to set up and design an effective database with MySQL. This thoroughly updated edition covers MySQL's latest version, including its most important aspects. Whether you're deploying an environment, troubleshooting an issue, or engaging in disaster recovery, this practical guide provides the insights and tools necessary to take full advantage of this powerful RDBMS. Authors Vinicius Grippa and Sergey Kuzmichev from Percona show developers and DBAs methods for minimizing costs and maximizing availability and performance. You'll learn how to perform basic and advanced querying, monitoring and troubleshooting, database management and security, backup and recovery, and tuning for improved efficiency. This edition includes new chapters on high availability, load balancing, and using MySQL in the cloud. Get started with MySQL and learn how to use it in production Deploy MySQL databases on bare metal, on virtual machines, and in the cloud Design database infrastructures Code highly efficient queries Monitor and troubleshoot MySQL databases Execute efficient backup and restore operations Optimize database costs in the cloud Understand database concepts, especially those pertaining to MySQL

Data Modeling for Azure Data Services

2021-07-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Peter ter Braake

Azure ADF BI Cloud Computing Cosmos Data Lake Data Management Data Modelling Data Vault ETL/ELT dimensional modeling Microsoft +6 more

Data Modeling for Azure Data Services is an essential guide that delves into the intricacies of designing, provisioning, and implementing robust data solutions within the Azure ecosystem. Through practical examples and hands-on exercises, this book equips you with the knowledge to create scalable, performant, and adaptable database designs tailored to your business needs. What this Book will help me do Understand and apply normalization, dimensional modeling, and data vault modeling for relational databases. Learn to provision and implement scalable solutions like Azure SQL DB and Azure Synapse SQL Pool. Master how to design and model a Data Lake using Azure Storage efficiently. Gain expertise in NoSQL database modeling and implementing solutions using Azure Cosmos DB. Develop ETL/ELT processes effectively using Azure Data Factory to support data integration workflows. Author(s) None Braake brings a wealth of expertise as a data architect and cloud solutions builder specializing in Azure's data services. With hands-on experience in projects requiring sophisticated data modeling and optimization, None crafts detailed learning material to help professionals level up their database design and Azure deployment skills. Dedicated to explaining complex topics with clarity and approachable language, None ensures that the learners gain not just knowledge but applied competence. Who is it for? This book is a valuable resource for business intelligence developers, data architects, and consultants aiming to refine their skills in data modeling within modern cloud ecosystems, particularly Microsoft Azure. Whether you're a beginner with some foundational cloud data management knowledge or an experienced professional seeking to deepen your Azure data services proficiency, this book caters to your learning needs.

R2DBC Revealed: Reactive Relational Database Connectivity for Java and JVM Programmers

2021-04-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Robert Hedgpeth

API Java MariaDB Oracle SQL data data-engineering relational-databases

Understand the newest trend in database programming for developers working in Java, Kotlin, Clojure, and other JVM-based languages. This book introduces Reactive Relational Database Connectivity (R2DBC), a modern way of connecting to and querying relational databases from Java and other JVM languages. The book begins by helping you understand not only what reactive programming is, but why it is necessary. Then building on those fundamentals, the book takes you into the world of databases and the newly released Reactive Relational Database Connectivity (R2DBC) specification. Examples in the book are worked using the freely available MariaDB database along with MariaDB’s vendor-implementation of the R2DBC service-provider interface (SPI). Following along with the examples and the provided example code helps prepare you to work with any of the growing number of R2DBC implementations for popular enterprise databases such as Oracle Database and SQL Server. You’ll be well prepared for what is becoming the future of database access from Java and other languages built on the JVM. What You Will Learn Understand why R2DBC was created and how it utilizes the Reactive Streams API Understand the components of the R2DBC service-provider interface Create and manage reactive database connections and connection pools using an R2DBC client Programmatically execute queries on a relational database using an R2DBC client Effectively utilize transactions using an R2DBC client Build relational database-driven applications that are event-driven and non-blocking Who This Book Is For Software developers building solutions using JVM languages and the JVM ecosystem, and developers who need an introduction to the R2DBC specification and reactive programming with relational databases and want to understand what Reactive Relational Database Connectivity is and why it came about. This book includes practical examples of using the R2DBC specification with Java and MariaDB that will provide developers with the knowledge they need to create their own solutions.

Beginning T-SQL: A Step-by-Step Approach

2020-12-26 · O'Reilly SQL Books O'Reilly Amazon

book

by Lee Everest , Kathi Kellenberger

Azure BI Cloud Computing JSON Linux Microsoft SQL SQL Server XML transact-sql

Get a performance-oriented introduction to the T-SQL language underlying the Microsoft SQL Server and Azure SQL database engines. This fourth edition is updated to include SQL Notebooks as well as up-to-date syntax and features for T-SQL on-premises and in the Azure cloud. Exercises and examples now include the WideWorldImporters database, the newest sample database from Microsoft for SQL Server. Also new in this edition is coverage of JSON from T-SQL, news about performance enhancements called Intelligent Query Processing, and an appendix on running SQL Server in a container on macOS or Linux. Beginning T-SQL starts you on the path to mastering T-SQL with an emphasis on best practices. Using the sound coding techniques taught in this book will lead to excellent performance in the queries that you write in your daily work. Important techniques such as windowing functions are covered to help you write fast-executing queries that solve real business problems.The book begins with an introduction to databases, normalization, and to setting up your learning environment. You will learn about the tools you need to use such as SQL Server Management Studio, Azure Data Studio, and SQL Notebooks. Each subsequent chapter teaches an aspect of T-SQL, building on the skills learned in previous chapters. Exercises in most chapters provide an opportunity for the hands-on practice that leads to true learning and distinguishes the competent professional. A stand-out feature in this book is that most chapters end with a Thinking About Performance section. These sections cover aspects of query performance relative to the content just presented, including the new Intelligent Query Processing features that make queries faster without changing code. They will help you avoid beginner mistakes by knowing about and thinking about performance from day 1. What You Will Learn Install a sandboxed SQL Server instance for learning Understand how relational databases are designed Create objects such as tables and stored procedures Query a SQL Server table Filter and order the results of a query Query and work with specialized data types such as XML and JSON Apply modern features such as window functions Choose correct techniques so that your queries perform well Who This Book Is For Anyone who wants to learn T-SQL from the beginning or improve their T-SQL skills; those who need T-SQL as an additional skill; and those who write queries such as application developers, database administrators, business intelligence developers, and data scientists. The book is also helpful for anyone who must retrieve data from a SQL Server database.

Database Design for Mere Mortals: 25th Anniversary Edition, 4th Edition

2020-12-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Michael J. Hernandez

data data-engineering relational-databases

The #1 Easy, Commonsense Guide to Database DesignNow Updated Foreword by Michelle Poolet, Mount Vernon Data Systems LLC Database Design for Mere Mortals has earned worldwide respect as the simplest way to learn relational database design. Now, this hands-on, software independent tutorial is even clearer and easier to use. Step by step, this new 25th Anniversary Edition shows you how to design modern databases that are soundly structured, reliable, and flexible, even in the latest online applications. Michael Hernandez guides you through everything from planning to defining tables, fields, keys, table relationships, business rules, and views. You will learn practical ways to improve data integrity, how to avoid common mistakes, and when to break the rules. Updated review questions and figures help you learn these techniques more easily and effectively. Understand database types, models, and design terminology Perform interviews to efficiently capture requirementseven if everyone works remotely Set clear design objectives and transform them into effective designs Analyze a current database so you can identify ways to improve it Establish table structures and relationships, assign primary keys, set field specifications, and set up views Ensure the correct level of data integrity for each database Identify and establish business rules Preview and prepare for the future of relational databases Whatever relational database systems you use, Hernandez will help you design databases that are robust and trustworthy. Never designed a database before? Settling for inadequate generic designs? Running existing databases that need improvement? Start here.

Graph Databases in Action

2020-11-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Josh Perryman , Dave Bechberger

Agile/Scrum Cosmos Data Modelling Microsoft Neo4j data data-engineering graph-databases

Relationships in data often look far more like a web than an orderly set of rows and columns. Graph databases shine when it comes to revealing valuable insights within complex, interconnected data such as demographics, financial records, or computer networks. In Graph Databases in Action, experts Dave Bechberger and Josh Perryman illuminate the design and implementation of graph databases in real-world applications. You'll learn how to choose the right database solutions for your tasks, and how to use your new knowledge to build agile, flexible, and high-performing graph-powered applications! About the Technology Isolated data is a thing of the past! Now, data is connected, and graph databases—like Amazon Neptune, Microsoft Cosmos DB, and Neo4j—are the essential tools of this new reality. Graph databases represent relationships naturally, speeding the discovery of insights and driving business value. About the Book Graph Databases in Action introduces you to graph database concepts by comparing them with relational database constructs. You'll learn just enough theory to get started, then progress to hands-on development. Discover use cases involving social networking, recommendation engines, and personalization. What's Inside Graph databases vs. relational databases Systematic graph data modeling Querying and navigating a graph Graph patterns Pitfalls and antipatterns About the Reader For software developers. No experience with graph databases required. About the Authors Dave Bechberger and Josh Perryman have decades of experience building complex data-driven systems and have worked with graph databases since 2014. Quotes A comprehensive overview of graph databases and how to build them using Apache tools. - Richard Vaughan, Purple Monkey Collective A well-written and thorough introduction to the topic of graph databases. - Luis Moux, EMO A great guide in your journey towards graph databases and exploiting the new possibilities for data processing. - Mladen Knežić, CROZ A great introduction to graph databases and how you should approach designing systems that leverage graph databases. - Ron Sher, Intuit

Learn PostgreSQL

2020-10-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Luca Ferrari (Bending Spoons) , Enrico Pirozzi

SQL data data-engineering postgresql relational-databases

Dive into the world of PostgreSQL, one of the most powerful and versatile open-source relational databases! This book guides you through all the essentials of PostgreSQL version 12 and 13, from installation to high-performance database deployments. You'll learn how to design schemas, perform database operations efficiently, and implement advanced functionalities. What this Book will help me do Install, configure, and monitor a PostgreSQL server for optimal performance. Implement SQL and PL/pgSQL scripts to build complex database solutions. Analyze and optimize database schemas and indexes for efficiency. Secure a PostgreSQL database and manage roles and permissions effectively. Set up high-availability configurations through replication techniques. Author(s) None Ferrari and Enrico Pirozzi are seasoned database professionals with extensive experience in PostgreSQL. They bring practical expertise and a real-world perspective to the subject, ensuring you get hands-on knowledge and apply it effectively. Their approachable writing style simplifies even the most complex database concepts. Who is it for? This book is perfect for database professionals, developers, or tech enthusiasts looking to gain mastery over PostgreSQL. Whether you are new to PostgreSQL or have a fundamental understanding of databases, you'll find this book highly insightful in achieving your database management goals.

The Data Wrangling Workshop - Second Edition

2020-07-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Shubhadeep Roychowdhury , John Wesley Doyle , Harshil Jain , Samik Sen , Akshay Khare , Dr. Tirthajyoti Sarkar , Nagendra Nagaraj , Dr. Vlad Sebastian Ionescu , Robert Thas John , Brian Lipp

Analytics Data Quality Data Science Matplotlib NumPy Pandas Python SQL data data-science data-science-tools

The Data Wrangling Workshop is your beginner's guide to the essential techniques and practices of data manipulation using Python. Throughout the book, you will progressively build your skills, learning key concepts such as extracting, cleaning, and transforming data into actionable insights. By the end, you'll be confident in handling various data wrangling tasks efficiently. What this Book will help me do Understand and apply the fundamentals of data wrangling using Python. Combine and aggregate data from diverse sources like web data, SQL databases, and spreadsheets. Use descriptive statistics and plotting to examine dataset properties. Handle missing or incorrect data effectively to maintain data quality. Gain hands-on experience with Python's powerful data science libraries like Pandas, NumPy, and Matplotlib. Author(s) Brian Lipp, None Roychowdhury, and Dr. Tirthajyoti Sarkar are experienced educators and professionals in the fields of data science and engineering. Their collective expertise spans years of teaching and working with data technologies. They aim to make data wrangling accessible and comprehensible, focusing on practical examples to equip learners with real-world skills. Who is it for? The Data Wrangling Workshop is ideal for developers, data analysts, and business analysts aiming to become data scientists or analytics experts. If you're just getting started with Python, you will find this book guiding you step-by-step. A basic understanding of Python programming, as well as relational databases and SQL, is recommended for smooth learning.

Learn SQL Database Programming

2020-05-29 · O'Reilly SQL Books O'Reilly Amazon

book

by Josephine Bush

MySQL SQL

Learn SQL Database Programming is your comprehensive guide to mastering SQL and its applications in database management. With step-by-step instructions, you'll gain confidence in querying and manipulating data, covering both fundamental and advanced SQL techniques. By working through this book, you'll acquire in-demand skills for organizing, analyzing, and presenting data effectively. What this Book will help me do Install and configure MySQL tools to create and manage databases efficiently. Utilize SQL commands to query and retrieve data from simple or complex datasets. Manipulate data securely using commands like INSERT, UPDATE, and DELETE. Master advanced SQL techniques including joins, subqueries, and flow controls. Apply best practices in SQL queries to design databases with optimal performance. Author(s) Josephine Bush is an experienced database developer and technical educator with a strong background in SQL programming. She has years of practical experience working with relational databases, and her teaching is grounded in real-world applications. She excels at explaining complex concepts clearly and emphasizing hands-on learning. Who is it for? This book is ideal for business analysts, aspiring SQL developers, database administrators, and students entering the field of SQL programming. It caters to beginners with no prior SQL experience, providing a structured and practical approach to learning. If you're eager to organize data or administer databases effectively, this book is for you.

Introducing Microsoft SQL Server 2019

2020-04-27 · O'Reilly SQL Books O'Reilly Amazon

book

by James Rowland-Jones , Mitchell Pearson , Arun Sirpal , Dave Noderer , Dustin Ryan , Kellyn Gorman , Buck Woody , Allan Hirt

Analytics Azure BI Big Data Cloud Computing Data Management Docker Hadoop HDFS Kubernetes Microsoft NoSQL +4 more

Introducing Microsoft SQL Server 2019 is the must-have guide for database professionals eager to leverage the latest advancements in SQL Server 2019. This book covers the features and capabilities that make SQL Server 2019 a powerful tool for managing and analyzing data both on-premises and in the cloud. What this Book will help me do Understand the new features introduced in SQL Server 2019 and their practical applications. Confidently manage and analyze relational, NoSQL, and big data within SQL Server 2019. Implement containerization for SQL Server using Docker and Kubernetes. Migrate and integrate your databases effectively to use Power BI Report Server. Query data from Hadoop Distributed File System with Azure Data Studio. Author(s) The authors of 'Introducing Microsoft SQL Server 2019' are subject matter experts including Kellyn Gorman, Allan Hirt, and others. With years of professional experience in database management and SQL Server, they bring a wealth of practical insight and knowledge to the book. Their experience spans roles as administrators, architects, and educators in the field. Who is it for? This book is aimed at database professionals such as DBAs, architects, and big data engineers who are currently using earlier versions of SQL Server or other database platforms. It is particularly well-suited for professionals aiming to understand and implement SQL Server 2019's new features. Readers should have basic familiarity with SQL Server and RDBMS concepts. If you're looking to explore SQL Server 2019 to improve data management and analytics in your organization, this book is for you.

The SQL Workshop

2019-12-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Frank Solomon , Dixit Patel , Prashanth Jayaram , Shashikant Shakya , Fiodar Sazanavets , Awni Al Saqqa , Aaditya Pokkunuri , Scott Cosentino , Pradeep Kumar Gupta , Shubham Jain , Rakesh Kumar Pandey

Cyber Security SQL data data-engineering

The SQL Workshop is your go-to guide for delving into the essential techniques and best practices of working with SQL. You'll start with the basics of querying and database management, progressing to advanced concepts like joins, normalization, and database security. What this Book will help me do Construct and maintain relational databases that meet real-world requirements. Perform CRUD operations efficiently using SQL queries. Design effective and optimized database schemas through normalization. Secure and safeguard data with access controls and privilege management. Leverage SQL for data analysis and reporting through advanced query techniques. Author(s) Frank Solomon, Prashanth Jayaram, and Awni Al Saqqa bring together decades of practical and academic experience in SQL and database management. Their informative and hands-on approach helps readers bridge the gap between theoretical concepts and practical applications. Who is it for? Written for newcomers and intermediate learners, this book is ideal for aspiring software developers, data scientists, and database managers looking to advance their SQL skills. Beginners with no database experience will find this book's gradual learning curve approachable.

SnowflakeDB: The Data Warehouse Built For The Cloud

2019-12-09 · Data Engineering Podcast Listen

podcast_episode

by Kent Graziano (SnowflakeDB) , Tobias Macey

AI/ML Avro AWS Big Data BigQuery Cloud Computing CRM Data Engineering Data Management Data Modelling Data Vault DWH +21 more

Summary Data warehouses have gone through many transformations, from standard relational databases on powerful hardware, to column oriented storage engines, to the current generation of cloud-native analytical engines. SnowflakeDB has been leading the charge to take advantage of cloud services that simplify the separation of compute and storage. In this episode Kent Graziano, chief technical evangelist for SnowflakeDB, explains how it is differentiated from other managed platforms and traditional data warehouse engines, the features that allow you to scale your usage dynamically, and how it allows for a shift in your workflow from ETL to ELT. If you are evaluating your options for building or migrating a data platform, then this is definitely worth a listen.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media and the Python Software Foundation. Upcoming events include the Software Architecture Conference in NYC and PyCOn US in Pittsburgh. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Kent Graziano about SnowflakeDB, the cloud-native data warehouse

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what SnowflakeDB is for anyone who isn’t familiar with it?

How does it compare to the other available platforms for data warehousing? How does it differ from traditional data warehouses?

How does the performance and flexibility affect the data modeling requirements?

Snowflake is one of the data stores that is enabling the shift from an ETL to an ELT workflow. What are the features that allow for that approach and what are some of the challenges that it introduces? Can you describe how the platform is architected and some of the ways that it has evolved as it has grown in popularity?

What are some of the current limitations that you are struggling with?

For someone getting started with Snowflake what is involved with loading data into the platform?

What is their workflow for allocating and scaling compute capacity and running anlyses?

One of the interesting features enabled by your architecture is data sharing. What are some of the most interesting or unexpected uses of that capability that you have seen? What are some other features or use cases for Snowflake that are not as well known or publicized which you think users should know about? When is SnowflakeDB the wrong choice? What are some of the plans for the future of SnowflakeDB?

Contact Info

LinkedIn Website @KentGraziano on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

SnowflakeDB

Free Trial Stack Overflow

Data Warehouse Oracle DB MPP == Massively Parallel Processing Shared Nothing Architecture Multi-Cluster Shared Data Architecture Google BigQuery AWS Redshift AWS Redshift Spectrum Presto

Podcast Episode

SnowflakeDB Semi-Structured Data Types Hive ACID == Atomicity, Consistency, Isolation, Durability 3rd Normal Form Data Vault Modeling Dimensional Modeling JSON AVRO Parquet SnowflakeDB Virtual Warehouses CRM == Customer Relationship Management Master Data Management

Podcast Episode

FoundationDB

Podcast Episode

Apache Spark

Podcast Episode

SSIS == SQL Server Integration Services Talend Informatica Fivetran

Podcast Episode

Matillion Apache Kafka Snowpipe Snowflake Data Exchange OLTP == Online Transaction Processing GeoJSON Snowflake Documentation SnowAlert Splunk Data Catalog

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

SQL Server Big Data Clusters: Early First Edition Based on Release Candidate 1

2019-11-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Benjamin Weissman , Enrico van de Laar

AI/ML Analytics BI Big Data Cloud Computing Data Analytics HDFS Kubernetes Linux Spark SQL data +1 more

Get a head-start on learning one of SQL Server 2019’s latest and most impactful features—Big Data Clusters—that combines large volumes of non-relational data for analysis along with data stored relationally inside a SQL Server database. This book provides a first look at Big Data Clusters based upon SQL Server 2019 Release Candidate 1. Start now and get a jump on your competition in learning this important new feature. Big Data Clusters is a feature set covering data virtualization, distributed computing, and relational databases and provides a complete AI platform across the entire cluster environment. This book shows you how to deploy, manage, and use Big Data Clusters. For example, you will learn how to combine data stored on the HDFS file system together with data stored inside the SQL Server instances that make up the Big Data Cluster. Filled with clear examples and use cases, this book provides everything necessary to get started working with Big Data Clusters in SQL Server 2019 using Release Candidate 1. You will learn about the architectural foundations that are made up from Kubernetes, Spark, HDFS, and SQL Server on Linux. You then are shown how to configure and deploy Big Data Clusters in on-premises environments or in the cloud. Next, you are taught about querying. You will learn to write queries in Transact-SQL—taking advantage of skills you have honed for years—and with those queries you will be able to examine and analyze data from a wide variety of sources such as Apache Spark. Through the theoretical foundation provided in this book and easy-to-follow example scripts and notebooks, you will be ready to use and unveil the full potential of SQL Server 2019: combining different types of data spread across widely disparate sources into a single view that is useful for business intelligence and machine learning analysis. What You Will Learn Install, manage, and troubleshoot Big Data Clusters in cloud or on-premise environments Analyze large volumes of data directly from SQL Server and/or Apache Spark Manage data stored in HDFS from SQL Server as if it were relational data Implement advanced analytics solutions through machine learning and AI Expose different data sources as a single logical source using data virtualization Who This Book Is For For data engineers, data scientists, data architects, and database administrators who want to employ data virtualization and big data analytics in their environment

Oracle Database Application Security: With Oracle Internet Directory, Oracle Access Manager, and Oracle Identity Manager

2019-10-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Robert P. Lockard , Osama Mustafa

Oracle Cyber Security SQL data data-engineering oracle-database-solutions

Focus on the security aspects of designing, building, and maintaining a secure Oracle Database application. Starting with data encryption, you will learn to work with transparent data, back-up, and networks. You will then go through the key principles of audits, where you will get to know more about identity preservation, policies and fine-grained audits. Moving on to virtual private databases, you’ll set up and configure a VPD to work in concert with other security features in Oracle, followed by tips on managing configuration drift, profiles, and default users. Shifting focus to coding, you will take a look at secure coding standards, multi-schema database models, code-based access control, and SQL injection. Finally, you’ll cover single sign-on (SSO), and will be introduced to Oracle Internet Directory (OID), Oracle Access Manager (OAM), and Oracle Identity Management (OIM) by installing and configuring them to meet your needs. Oracle databases hold the majority of the world’s relational data, and are attractive targets for attackers seeking high-value targets for data theft. Compromise of a single Oracle Database can result in tens of millions of breached records costing millions in breach-mitigation activity. This book gets you ready to avoid that nightmare scenario. What You Will Learn Work with Oracle Internet Directory using the command-line and the console Integrate Oracle Access Manager with different applications Work with the Oracle Identity Manager console and connectors, while creating your own custom one Troubleshooting issues with OID, OAM, and OID Dive deep into file system and network security concepts Who This Book Is For Oracle DBAs and developers. Readers will need a basic understanding of Oracle RDBMS and Oracle Application Server to take complete advantage of this book.

Data Warehousing with Greenplum, 2nd Edition

2019-07-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Marshall Presser

Analytics Data Analytics DWH Cyber Security SQL data data-engineering data-warehouse storage-repositories

Data professionals are confronting the most disruptive change since relational databases appeared in the 1980s. SQL is still a major tool for data analytics, but conventional relational database management systems can’t handle the increasing size and complexity of today’s datasets. This updated edition teaches you best practices for Greenplum Database, the open source massively parallel processing (MPP) database that accommodates large sets of nonrelational and relational data. Marshall Presser, field CTO at Pivotal, introduces Greenplum’s approach to data analytics and data-driven decisions, beginning with its shared-nothing architecture. IT managers, developers, data analysts, system architects, and data scientists will all gain from exploring data organization and storage, data loading, running queries, and learning to perform analytics in the database. Discover how MPP and Greenplum will help you go beyond the traditional data warehouse. This ebook covers: Greenplum features, use case examples, and techniques for optimizing use Four Greenplum deployment options to help you balance security, cost, and time to usability Why each networked node in Greenplum’s architecture includes an independent operating system, memory, and storage Additional tools for monitoring, managing, securing, and optimizing query responses in the Pivotal Greenplum commercial database

Fifty Years of Data Management and Beyond

2019-04-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Paco Nathan

Big Data Cloud Computing Data Management Data Science IBM Data Streaming data data-engineering

Every decade since the 1960s, researchers at companies like IBM, Amazon, and many others have introduced major new frameworks and techniques to handle rising data management problems. This concise ebook explains how these new systems helped data science evolve quickly—from hierarchical and relational databases to big data and cloud computing to streaming and graph data. Computer scientist Paco Nathan shows members of your data science team how major companies created each of these data management systems not just to deal with new data types but also to take full advantage of the opportunities the data presented. Their efforts over the years have propelled an entire industry. This report covers the historical progression of data management topics including: Hierarchical databases—1960s mainframe batch systems are still used in finance, healthcare, manufacturing, energy, and other industries. Relational databases—these enabled faster transactions, mathematical optimization, and budgeting guarantees for many businesses. Big data—this includes relatively cheap horizontal scale-out systems for collecting huge amounts of customer data. Cloud computing—large companies began managing reliable, scalable, cost-effective data centers; Amazon turned the concept into a business. Cluster schedulers—managing horizontal clusters was difficult before schedulers such as Apache Mesos appeared. Streaming data—data continuously generated by different sources requires responses in "real time"—generally milliseconds.

SQL All-In-One For Dummies, 3rd Edition

2019-04-23 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Allen G. Taylor

Microsoft MySQL Oracle SQL SQL Server data data-engineering postgresql

The latest on SQL databases SQL All -In-One For Dummies, 3rd Edition, is a one-stop shop for everything you need to know about SQL and SQL-based relational databases. Everyone from database administrators to application programmers and the people who manage them will find clear, concise explanations of the SQL language and its many powerful applications. With the ballooning amount of data out there, more and more businesses, large and small, are moving from spreadsheets to SQL databases like Access, Microsoft SQL Server, Oracle databases, MySQL, and PostgreSQL. This compendium of information covers designing, developing, and maintaining these databases. Cope with any issue that arises in SQL database creation and management Get current on the newest SQL updates and capabilities Reference information on querying SQL-based databases in the SQL language Understand relational databases and their importance to today’s organizations SQL All-In-One For Dummies is a timely update to the popular reference for readers who want detailed information about SQL databases and queries.

Unpacking Fauna: A Global Scale Cloud Native Database

2019-04-22 · Data Engineering Podcast Listen

podcast_episode

by Evan Weaver (Fauna) , Tobias Macey

AI/ML Analytics AWS Big Data Cassandra Cloud Computing Data Analytics Data Engineering Data Management Data Modelling Data Science DynamoDB +7 more

Summary One of the biggest challenges for any business trying to grow and reach customers globally is how to scale their data storage. FaunaDB is a cloud native database built by the engineers behind Twitter’s infrastructure and designed to serve the needs of modern systems. Evan Weaver is the co-founder and CEO of Fauna and in this episode he explains the unique capabilities of Fauna, compares the consensus and transaction algorithm to that used in other NewSQL systems, and describes the ways that it allows for new application design patterns. One of the unique aspects of Fauna that is worth drawing attention to is the first class support for temporality that simplifies querying of historical states of the data. It is definitely worth a good look for anyone building a platform that needs a simple to manage data layer that will scale with your business.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! Alluxio is an open source, distributed data orchestration layer that makes it easier to scale your compute and your storage independently. By transparently pulling data from underlying silos, Alluxio unlocks the value of your data and allows for modern computation-intensive workloads to become truly elastic and flexible for the cloud. With Alluxio, companies like Barclays, JD.com, Tencent, and Two Sigma can manage data efficiently, accelerate business analytics, and ease the adoption of any cloud. Go to dataengineeringpodcast.com/alluxio today to learn more and thank them for their support. Understanding how your customers are using your product is critical for businesses of any size. To make it easier for startups to focus on delivering useful features Segment offers a flexible and reliable data infrastructure for your customer analytics and custom events. You only need to maintain one integration to instrument your code and get a future-proof way to send data to over 250 services with the flip of a switch. Not only does it free up your engineers’ time, it lets your business users decide what data they want where. Go to dataengineeringpodcast.com/segmentio today to sign up for their startup plan and get $25,000 in Segment credits and $1 million in free software from marketing and analytics companies like AWS, Google, and Intercom. On top of that you’ll get access to Analytics Academy for the educational resources you need to become an expert in data analytics for measuring product-market fit. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Evan Weaver about FaunaDB, a modern operational data platform built for your cloud

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what FaunaDB is and how it got started? What are some of the main use cases that FaunaDB is targeting?

How does it compare to some of the other global scale databases that have been built in recent years such as CockroachDB?

Can you describe the architecture of FaunaDB and how it has evolved? The consensus and replication protocol in Fauna is intriguing. Can you talk through how it works?

What are some of the edge cases that users should be aware of? How are conflicts managed in Fauna?

What is the underlying storage layer?

How is the query layer designed to allow for different query patterns and model representations?

How does data modeling in Fauna compare to that of relational or document databases?

Can you describe the query format? What are some of the common difficulties or points of confusion around interacting with data in Fauna?

What are some application design patterns that are enabled by using Fauna as the storage layer? Given the ability to replicate globally, how do you mitigate latency when interacting with the database? What are some of the most interesting or unexpected ways that you have seen Fauna used? When is it the wrong choice? What have been some of the most interesting/unexpected/challenging aspects of building the Fauna database and company? What do you have in store for the future of Fauna?

Contact Info

@evan on Twitter LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Fauna Ruby on Rails CNET GitHub Twitter NoSQL Cassandra InnoDB Redis Memcached Timeseries Spanner Paper DynamoDB Paper Percolator ACID Calvin Protocol Daniel Abadi LINQ LSM Tree (Log-structured Merge-tree) Scala Change Data Capture GraphQL

Podcast.init Interview About Graphene

Fauna Query Language (FQL) CQL == Cassandra Query Language Object-Relational Databases LDAP == Lightweight Directory Access Protocol Auth0 OLAP == Online Analytical Processing Jepsen distributed systems safety research

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

talk-data.com

Activity Trend

Top Events

Top Speakers

The Language of SQL, 3rd Edition

PostGIS in Action, Third Edition

Learning MySQL, 2nd Edition

Data Modeling for Azure Data Services

R2DBC Revealed: Reactive Relational Database Connectivity for Java and JVM Programmers

Beginning T-SQL: A Step-by-Step Approach

Database Design for Mere Mortals: 25th Anniversary Edition, 4th Edition

Graph Databases in Action

Learn PostgreSQL

The Data Wrangling Workshop - Second Edition

Learn SQL Database Programming

Introducing Microsoft SQL Server 2019

The SQL Workshop

SnowflakeDB: The Data Warehouse Built For The Cloud

SQL Server Big Data Clusters: Early First Edition Based on Release Candidate 1

Oracle Database Application Security: With Oracle Internet Directory, Oracle Access Manager, and Oracle Identity Manager

Data Warehousing with Greenplum, 2nd Edition

Fifty Years of Data Management and Beyond

SQL All-In-One For Dummies, 3rd Edition

Unpacking Fauna: A Global Scale Cloud Native Database