CockroachDB In Depth with Peter Mattis - Episode 35

2018-06-11 · Data Engineering Podcast Listen

podcast_episode

by Peter Mattis (Cockroach Labs) , Tobias Macey

API Cloud Computing Data Engineering Data Management Datadog Docker GDPR/CCPA GitHub Go Kubernetes NoSQL RDBMS +4 more

Summary

With the increased ease of gaining access to servers in data centers across the world has come the need for supporting globally distributed data storage. With the first wave of cloud era databases the ability to replicate information geographically came at the expense of transactions and familiar query languages. To address these shortcomings the engineers at Cockroach Labs have built a globally distributed SQL database with full ACID semantics in Cockroach DB. In this episode Peter Mattis, the co-founder and VP of Engineering at Cockroach Labs, describes the architecture that underlies the database, the challenges they have faced along the way, and the ways that you can use it in your own environments today.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. For complete visibility into the health of your pipeline, including deployment tracking, and powerful alerting driven by machine-learning, DataDog has got you covered. With their monitoring, metrics, and log collection agent, including extensive integrations and distributed tracing, you’ll have everything you need to find and fix performance bottlenecks in no time. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial and get a sweet new T-Shirt. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Peter Mattis about CockroachDB, the SQL database for global cloud services

Interview

Introduction How did you get involved in the area of data management? What was the motivation for creating CockroachDB and building a business around it? Can you describe the architecture of CockroachDB and how it supports distributed ACID transactions?

What are some of the tradeoffs that are necessary to allow for georeplicated data with distributed transactions? What are some of the problems that you have had to work around in the RAFT protocol to provide reliable operation of the clustering mechanism?

Go is an unconventional language for building a database. What are the pros and cons of that choice? What are some of the common points of confusion that users of CockroachDB have when operating or interacting with it?

What are the edge cases and failure modes that users should be aware of?

I know that your SQL syntax is PostGreSQL compatible, so is it possible to use existing ORMs unmodified with CockroachDB?

What are some examples of extensions that are specific to CockroachDB?

What are some of the most interesting uses of CockroachDB that you have seen? When is CockroachDB the wrong choice? What do you have planned for the future of CockroachDB?

Contact Info

Peter

LinkedIn petermattis on GitHub @petermattis on Twitter

Cockroach Labs

@CockroackDB on Twitter Website cockroachdb on GitHub

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

CockroachDB Cockroach Labs SQL Google Bigtable Spanner NoSQL RDBMS (Relational Database Management System) “Big Iron” (colloquial term for mainframe computers) RAFT Consensus Algorithm Consensus MVCC (Multiversion Concurrency Control) Isolation Etcd GDPR Golang C++ Garbage Collection Metaprogramming Rust Static Linking Docker Kubernetes CAP Theorem PostGreSQL ORM (Object Relational Mapping) Information Schema PG Catalog Interleaved Tables Vertica Spark Change Data Capture

The intro and outro music is from The Hug by The Freak Fandan

BizTalk

2018-06-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Suren Machiraju , Suraj Gaurav

Azure Microsoft data data-engineering enterprise-service-bus microsoft-biztalk-server streaming-messaging

Why do businesses continue to use Microsoft’s BizTalk Server as the backbone to integrate line-of-business applications with their trading partners and how do recent changes make it even more effective? With the advent of Azure, we have a unique opportunity to enhance BizTalk functionality including reducing the cost of operations and maintenance. This book offers three solutions for the reader on ways to leverage BizTalk to get more from existing deployments or find ways to modernize the deployment via Azure. Microsoft partners are playing a significant role in enhancing the capabilities of BizTalk and this book includes sections that provide an in-depth review of BizTalk 360 © and the WPC HIPAA DB Toolkit ©. Over the recent past, Web 3.0 has also introduced many new concepts and open source technologies and this book covers ways to leverage these to enhance your BizTalk deployment. The authors start with a survey of the existing BizTalk Server – its history, patterns, and state of affairs –and go on to provide an in-depth elaboration of three messaging patterns that customers use for BizTalk; the advantages of updating to SQL Server 2016; a review of partner solutions that enhance BizTalk; and BizTalk with Web 3.0 for custom solutions. The book concludes with a comparison of the three viable BizTalk Azure application solutions that will enable you to make the best choice for your business.

Mastering The Faster Web with PHP, MySQL, and JavaScript

2018-06-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Andrew Caya

JavaScript MySQL data data-engineering relational-databases

Explore cutting-edge web optimization techniques in 'Mastering The Faster Web with PHP, MySQL, and JavaScript'. This comprehensive guide equips developers with the tools and knowledge to create lightning-fast web applications using modern technologies, including PHP 7, asynchronous programming, advanced SQL, and efficient JavaScript. What this Book will help me do Efficiently use profiling and benchmarking tools to identify performance bottlenecks. Optimize PHP 7 applications through efficient data structures and logical improvements. Enhance database performance by identifying and solving inefficient SQL queries. Incorporate modern asynchronous programming and functional programming techniques into your workflow. Integrate seamless UI designs that prioritize application responsiveness and user experience. Author(s) None Caya is a seasoned web developer with extensive experience in PHP, MySQL, and JavaScript. Through their career, they have delved deep into profiling, optimization techniques, and modern web technologies to deliver high-performance web solutions. This book reflects their commitment to providing actionable insights and practical advice to fellow developers. Who is it for? Ideal readers of this book are PHP developers with foundational knowledge in programming and web technologies who aspire to build and optimize modern web applications. Experience in JavaScript is not required, as the book covers essential aspects needed for performance enhancements. If you're aiming to hone your skills in creating faster web solutions, this book suits your goals perfectly.

Microsoft SQL Server 2017 on Linux

2018-06-08 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Benjamin Nevarez

Docker Linux Microsoft Cyber Security SQL Server data data-engineering microsoft-sql-server relational-databases

Essential Microsoft® SQL Server® 2017 installation, configuration, and management techniques for Linux Foreword by Kalen Delaney, Microsoft SQL Server MVP This comprehensive guide shows, step-by-step, how to set up, configure, and administer SQL Server 2017 on Linux for high performance and high availability. Written by a SQL Server expert and respected author, Microsoft SQL Server 2017 on Linux teaches valuable Linux skills to Windows-based SQL Server professionals. You will get clear coverage of both Linux and SQL Server and complete explanations of the latest features, tools, and techniques. The book offers clear instruction on adaptive query processing, automatic tuning, disaster recovery, security, and much more. •Understand how SQL Server 2017 on Linux works •Install and configure SQL Server on Linux •Run SQL Server on Docker containers •Learn Linux Administration •Troubleshoot and tune query performance in SQL Server •Learn what is new in SQL Server 2017 •Work with adaptive query processing and automatic tuning techniques •Implement high availability and disaster recovery for SQL Server on Linux •Learn the security features available in SQL Server

Hands-On Data Warehousing with Azure Data Factory

2018-05-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Christian Cote , Giuseppe Ciaburro , Michelle Gutzait

AI/ML Analytics Azure ADF BI Cloud Computing Data Engineering Data Lake Databricks DWH ETL/ELT Power BI +6 more

Dive into the world of ETL (Extract, Transform, Load) with 'Hands-On Data Warehousing with Azure Data Factory'. This book guides readers through the essential techniques for working with Azure Data Factory and SQL Server Integration Services to design, implement, and optimize ETL solutions for both on-premises and cloud data environments. What this Book will help me do Understand and utilize Azure Data Factory and SQL Server Integration Services to build ETL solutions. Design scalable and high-performance ETL architectures tailored to modern data problems. Integrate various Azure services, such as Azure Data Lake Analytics, Machine Learning, and Databricks Spark, into your workflows. Troubleshoot and optimize ETL pipelines and address common challenges in data processing. Create insightful Power BI dashboards to visualize and interact with data from your ETL workflows. Author(s) Authors None Cote, Michelle Gutzait, and Giuseppe Ciaburro bring a wealth of experience in data engineering and cloud technologies to this practical guide. Combining expertise in Azure ecosystem and hands-on Data Warehousing, they deliver actionable insights for working professionals. Who is it for? This book is crafted for software professionals working in data engineering, especially those specializing in ETL processes. Readers with a foundational knowledge of SQL Server and cloud infrastructures will benefit most. If you aspire to implement state-of-the-art ETL pipelines or enhance existing workflows with ADF and SSIS, this book is an ideal resource.

PrestoDB and Starburst Data with Kamil Bajda-Pawlikowski - Episode 32

2018-05-21 · Data Engineering Podcast Listen

podcast_episode

by Kamil Bajda-Pawlikowski (Starburst Data) , Tobias Macey

Analytics API Cassandra Data Engineering Data Management DWH Hadoop Hive Kafka Presto Redis Teradata +1 more

Summary

Most businesses end up with data in a myriad of places with varying levels of structure. This makes it difficult to gain insights from across departments, projects, or people. Presto is a distributed SQL engine that allows you to tie all of your information together without having to first aggregate it all into a data warehouse. Kamil Bajda-Pawlikowski co-founded Starburst Data to provide support and tooling for Presto, as well as contributing advanced features back to the project. In this episode he describes how Presto is architected, how you can use it for your analytics, and the work that he is doing at Starburst Data.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Kamil Bajda-Pawlikowski about Presto and his experiences with supporting it at Starburst Data

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what Presto is?

What are some of the common use cases and deployment patterns for Presto?

How does Presto compare to Drill or Impala? What is it about Presto that led you to building a business around it? What are some of the most challenging aspects of running and scaling Presto? For someone who is using the Presto SQL interface, what are some of the considerations that they should keep in mind to avoid writing poorly performing queries?

How does Presto represent data for translating between its SQL dialect and the API of the data stores that it interfaces with?

What are some cases in which Presto is not the right solution? What types of support have you found to be the most commonly requested? What are some of the types of tooling or improvements that you have made to Presto in your distribution?

What are some of the notable changes that your team has contributed upstream to Presto?

Contact Info

Website E-mail Twitter – @starburstdata Twitter – @prestodb

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Starburst Data Presto Hadapt Hadoop Hive Teradata PrestoCare Cost Based Optimizer ANSI SQL Spill To Disk Tempto Benchto Geospatial Functions Cassandra Accumulo Kafka Redis PostGreSQL

The intro and outro music is from The Hug by The Freak Fandango Orchestra / {CC BY-SA](http://creativecommons.org/licenses/by-sa/3.0/)?utm_source=rss&utm_medium=rss Support Data Engineering Podcast

Running a data science startup, one decision at a time #Futureofdata podcast

2018-05-16 · The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics Listen

podcast_episode

by Justin Borgman (Starburst Data)

Analytics BI Big Data Cloud Computing Computer Science Data Science DWH Hadoop Presto Teradata

In this podcast, Justin Borgman talks about his journey of starting a data science start, doing an exit, and jumping on another one. The session is filled with insights for leadership, looking for entrepreneurial wisdom to get on a data-driven journey.

Timeline: 0:28 Justin's journey. 3:22 Taking the plunge to start a new company. 5:49 Perception vs. reality of starting a data warehouse company. 8:15 Bringing in something new to the IT legacy. 13:20 Getting your first few customers. 16:16 Right moment for a data warehouse company to look for a new venture. 18:20 Right person to have as a co-founder. 20:29 Advantages of going seed vs. series A. 22:13 When is a company ready for seeding or series A? 24:40 Who's a good adviser? 26:35 Exiting Teradata. 28:54 Teradata to starting a new company. 31:24 Excitement of starting something from scratch. 32:24 What is Starburst? 37:15 Presto, a great engine for cloud platforms. 40:30 How can a company get started with Presto. 41:50 Health of enterprise data. 44:15 Where does Presto not fit in? 45:19 Future of enterprise data. 46:36 Drawing parallels between proprietary space and open source space. 49:02 Does align with open-source gives a company a better chance in seeding. 51:44 John's ingredients for success. 54:05 John's favorite reads. 55:01 Key takeaways.

Paul's Recommended Read: The Outsiders Paperback – S. E. Hinton amzn.to/2Ai84Gl

Podcast Link: https://futureofdata.org/running-a-data-science-startup-one-decision-at-a-time-futureofdata-podcast/

Justin's BIO: Justin has spent the better part of a decade in senior executive roles building new businesses in the data warehousing and analytics space. Before co-founding Starburst, Justin was Vice President and General Manager at Teradata (NYSE: TDC), where he was responsible for the company’s portfolio of Hadoop products. Prior to joining Teradata, Justin was co-founder and CEO of Hadapt, the pioneering "SQL-on-Hadoop" company that transformed Hadoop from file system to analytic database accessible to anyone with a BI tool. Teradata acquired Hadapt in 2014.

Justin earned a BS in Computer Science from the University of Massachusetts at Amherst and an MBA from the Yale School of Management.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Practical SQL

2018-05-01 · O'Reilly SQL Books O'Reilly Amazon

book

by Anthony DeBarros

GIS Microsoft MySQL RDBMS SQL Server postgresql

"Practical SQL is an approachable and fast-paced guide to SQL (Structured Query Language), the standard programming language for defining, organizing, and exploring data in relational databases. The book focuses on using SQL to find the story your data tells, with the popular open-source database PostgreSQL and the pgAdmin interface as its primary tools. You’ll first cover the fundamentals of databases and the SQL language, then build skills by analyzing data from the U.S. Census and other federal and state government agencies. With exercises and real-world examples in each chapter, this book will teach even those who have never programmed before all the tools necessary to build powerful databases and access information quickly and efficiently. You’ll learn how to: • Create databases and related tables using your own data• Define the right data types for your information• Aggregate, sort, and filter data to find patterns• Use basic math and advanced statistical functions• Identify errors in data and clean them up• Import and export data using delimited text files• Write queries for geographic information systems (GIS)• Create advanced queries and automate tasks Learning SQL doesn’t have to be dry and complicated. Practical SQL delivers clear examples with an easy-to-follow approach to teach you the tools you need to build and manage your own databases. This book uses PostgreSQL, but the SQL syntax is applicable to many database applications, including Microsoft SQL Server and MySQL."

Metabase Self Service Business Intelligence with Sameer Al-Sakran - Episode 29

2018-04-30 · Data Engineering Podcast Listen

podcast_episode

by Sameer Al-Sakran (Metabase) , Tobias Macey

API BI Data Engineering Data Management Datadog GitHub Hadoop Metabase Python React Redash Scala +1 more

Summary

Business Intelligence software is often cumbersome and requires specialized knowledge of the tools and data to be able to ask and answer questions about the state of the organization. Metabase is a tool built with the goal of making the act of discovering information and asking questions of an organizations data easy and self-service for non-technical users. In this episode the CEO of Metabase, Sameer Al-Sakran, discusses how and why the project got started, the ways that it can be used to build and share useful reports, some of the useful features planned for future releases, and how to get it set up to start using it in your environment.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. For complete visibility into the health of your pipeline, including deployment tracking, and powerful alerting driven by machine-learning, DataDog has got you covered. With their monitoring, metrics, and log collection agent, including extensive integrations and distributed tracing, you’ll have everything you need to find and fix performance bottlenecks in no time. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial and get a sweet new T-Shirt. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Sameer Al-Sakran about Metabase, a free and open source tool for self service business intelligence

Interview

Introduction How did you get involved in the area of data management? The current goal for most companies is to be “data driven”. How would you define that concept?

How does Metabase assist in that endeavor?

What is the ratio of users that take advantage of the GUI query builder as opposed to writing raw SQL?

What level of complexity is possible with the query builder?

What have you found to be the typical use cases for Metabase in the context of an organization? How do you manage scaling for large or complex queries? What was the motivation for using Clojure as the language for implementing Metabase? What is involved in adding support for a new data source? What are the differentiating features of Metabase that would lead someone to choose it for their organization? What have been the most challenging aspects of building and growing Metabase, both from a technical and business perspective? What do you have planned for the future of Metabase?

Contact Info

Sameer

salsakran on GitHub @sameer_alsakran on Twitter LinkedIn

Metabase

Website @metabase on Twitter metabase on GitHub

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Expa Metabase Blackjet Hadoop Imeem Maslow’s Hierarchy of Data Needs 2 Sided Marketplace Honeycomb Interview Excel Tableau Go-JEK Clojure React Python Scala JVM Redash How To Lie With Data Stripe Braintree Payments

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

PostgreSQL 10 High Performance - Third Edition

2018-04-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Enrico Pirozzi

data data-engineering postgresql relational-databases

PostgreSQL 10 High Performance provides you with all the tools to maximize the efficiency and reliability of your PostgreSQL 10 database. Written for database admins and architects, this book offers deep insights into optimizing queries, configuring hardware, and managing complex setups. By integrating these best practices, you'll ensure scalability and stability in your systems. What this Book will help me do Optimize PostgreSQL 10 queries for improved performance and efficiency. Implement database monitoring systems to identify and resolve issues proactively. Scale your database by implementing partitioning, replication, and caching strategies. Understand PostgreSQL hardware compatibility and configuration for maximum throughput. Learn how to design high-performance solutions tailored for large and demanding applications. Author(s) Enrico Pirozzi is a seasoned database professional with extensive experience in PostgreSQL management and optimization. Having worked on large-scale database infrastructures, Enrico shares his hands-on knowledge and practical advice for achieving high performance with PostgreSQL. His approachable style makes complex topics accessible to every reader. Who is it for? This book is intended for database administrators and system architects who are working with or planning to adopt PostgreSQL 10. Readers should have a foundational knowledge of SQL and some prior exposure to PostgreSQL. If you're aiming to design efficient, scalable database solutions while ensuring high availability, this book is for you.

Octopai: Metadata Management for Better Business Intelligence with Amnon Drori - Episode 28

2018-04-23 · Data Engineering Podcast Listen

podcast_episode

by Amnon Drori (Octopai) , Tobias Macey

Airflow API BI CRM Data Engineering Data Governance Data Management Datadog ERP ETL/ELT GDPR/CCPA Informatica +4 more

Summary

The information about how data is acquired and processed is often as important as the data itself. For this reason metadata management systems are built to track the journey of your business data to aid in analysis, presentation, and compliance. These systems are frequently cumbersome and difficult to maintain, so Octopai was founded to alleviate that burden. In this episode Amnon Drori, CEO and co-founder of Octopai, discusses the business problems he witnessed that led him to starting the company, how their systems are able to provide valuable tools and insights, and the direction that their product will be taking in the future.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 200Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. For complete visibility into the health of your pipeline, including deployment tracking, and powerful alerting driven by machine-learning, DataDog has got you covered. With their monitoring, metrics, and log collection agent, including extensive integrations and distributed tracing, you’ll have everything you need to find and fix performance bottlenecks in no time. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial and get a sweet new T-Shirt. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Amnon Drori about OctopAI and the benefits of metadata management

Interview

Introduction How did you get involved in the area of data management? What is OctopAI and what was your motivation for founding it? What are some of the types of information that you classify and collect as metadata? Can you talk through the architecture of your platform? What are some of the challenges that are typically faced by metadata management systems? What is involved in deploying your metadata collection agents? Once the metadata has been collected what are some of the ways in which it can be used? What mechanisms do you use to ensure that customer data is segregated?

How do you identify and handle sensitive information during the collection step?

What are some of the most challenging aspects of your technical and business platforms that you have faced? What are some of the plans that you have for OctopAI going forward?

Contact Info

Amnon

LinkedIn @octopai_amnon on Twitter

OctopAI

@OctopaiBI on Twitter Website

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

OctopAI Metadata Metadata Management Data Integrity CRM (Customer Relationship Management) ERP (Enterprise Resource Planning) Business Intelligence ETL (Extract, Transform, Load) Informatica SAP Data Governance SSIS (SQL Server Integration Services) Vertica Airflow Luigi Oozie GDPR (General Data Privacy Regulation) Root Cause Analysis

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

A Deep Dive into NoSQL Databases: The Use Cases and Applications

2018-04-20 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Pethuru Raj , Ganesh Chandra Deka

Analytics Big Data Data Analytics Hadoop NoSQL Cyber Security data data-engineering nosql-databases

A Deep Dive into NoSQL Databases: The Use Cases and Applications, Volume 109, the latest release in the Advances in Computers series first published in 1960, presents detailed coverage of innovations in computer hardware, software, theory, design and applications. In addition, it provides contributors with a medium in which they can explore their subjects in greater depth and breadth. This update includes sections on NoSQL and NewSQL databases for big data analytics and distributed computing, NewSQL databases and scalable in-memory analytics, NoSQL web crawler application, NoSQL Security, a Comparative Study of different In-Memory (No/New)SQL Databases, NoSQL Hands On-4 NoSQLs, the Hadoop Ecosystem, and more. Provides a very comprehensive, yet compact, book on the popular domain of NoSQL databases for IT professionals, practitioners and professors Articulates and accentuates big data analytics and how it gets simplified and streamlined by NoSQL database systems Sets a stimulating foundation with all the relevant details for NoSQL database researchers, developers and administrators

Oracle SQL Revealed: Executing Business Logic in the Database Engine

2018-04-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alex Reprintsev

BI Oracle data data-engineering

Write queries using little-known, but powerful, SQL features implemented in Oracle's database engine. You will be able to take advantage of Oracle’s power in implementing business logic, thereby maximizing return from your company’s investment in Oracle Database products. Important features and aspects of SQL covered in this book include the model clause, row pattern matching, analytic and aggregate functions, and recursive subquery factoring, just to name a few. The focus is on implementing business logic in pure SQL, with a comparison of different approaches that can be used to write SELECT statements to return results that drive good decision making and competitive action in the marketplace. This book covers features that are often not well known, and sometimes not implemented in competing products. Chapters on query transformation and logical execution order provide a grasp of the big picture in which the individual SQL features described in the other chapters are executed. Also included are a discussion on when to use the procedural capabilities from PL/SQL, and a series of examples showing different mixes of SQL features being applied in common types of queries that you are likely to encounter. What You Will Learn Gain competitive advantage from Oracle SQL Know when to step up to PL/SQL versus staying in SQL Become familiar with query transformations and join mechanics Apply the model clause and analytic functions to business intelligence queries Make use of features that are specific to Oracle Database, such as row pattern matching Understand the pros and cons of different SQL approaches to solving common query tasks Traverse hierarchies using CONNECT BY and recursive subquery factoring Who This Book Is For Database programmers withsome Oracle Database experience. The book is also for SQL developers who are moving to the Oracle Database platform or want to learn unique features of its query engine. Both audiences will learn to apply the full power of Oracle’s own SQL dialect to commonly encountered types of business questions and query challenges.

Beginning DAX with Power BI: The SQL Pro’s Guide to Better Business Intelligence

2018-03-31 · O'Reilly Business Intelligence Books O'Reilly Amazon

book

by Philip Seamark

BI Data Modelling DAX Power BI analytics-platforms data data-analysis-expressions-dax data-science data analysis expressions (dax) powerpivot

Attention all SQL Pros, DAX is not just for writing Excel-based formulas! Get hands-on learning and expert advice on how to use the vast capabilities of the DAX language to solve common data modeling challenges. Beginning DAX with Power BI teaches key concepts such as mapping techniques from SQL to DAX, filtering, grouping, joining, pivoting, and using temporary tables, all aimed at the SQL professional. Join author Philip Seamark as he guides you on a journey through typical business data transformation scenarios and challenges, and teaches you, step-by-step, how to resolve challenges using DAX. Tips, tricks, and shortcuts are included and explained, along with examples of the SQL equivalent, in order to accelerate learning. Examples in the book range from beginner to advanced, with plenty of detailed explanation when walking through each scenario. What You’ll Learn Turbocharge your Power BI model by adding advanced DAX programming techniques Know when to use calculated measures versus calculated columns Generate new tables on the fly from existing data Optimize, monitor, and tune Power BI to improve performance of your models Discover new ideas, tricks, and time-saving techniques for better models Who This Book Is For Business intelligence developers, business analysts, or any SQL user who wants to use Power BI as a reporting tool. A solid understanding of SQL is recommended, as examples throughout the book include the DAX equivalents to SQL problem/solution scenarios.

Mastering the SAS DS2 Procedure

2018-03-23 · O'Reilly Data Science Books O'Reilly Amazon

book

by Mark Jordan

API Cloud Computing SAS analytics-platforms data data-science

Enhance your SAS data-wrangling skills with high-precision and parallel data manipulation using the DS2 programming language. Now in its second edition, this book addresses the DS2 programming language from SAS, which combines the precise procedural power and control of the Base SAS DATA step language with the simplicity and flexibility of SQL. DS2 provides simple, safe syntax for performing complex data transformations in parallel and enables manipulation of native database data types at full precision. It also covers PROC FEDSQL, a modernized SQL language that blends perfectly with DS2. You will learn to harness the power of parallel processing to speed up CPU-intensive computing processes in Base SAS and how to achieve even more speed by processing DS2 programs on massively parallel database systems. Techniques for leveraging internet APIs to acquire data, avoiding large data movements when working with data from disparate sources, and leveraging DS2's new data types for full-precision numeric calculations are presented, with examples of why these techniques are essential for the modern data wrangler. Here's what's new in this edition: how to significantly improve performance by using the new SAS Viya architecture with its SAS Cloud Analytic Services (CAS) how to declare private variables and methods in a package the new PROC DSTODS2 the PCRXFIND and PCRXREPLACE packages While working though the code samples provided with this book, you will build a library of custom, reusable, and easily shareable DS2 program modules, execute parallelized DATA step programs to speed up a CPU-intensive process, and conduct advanced data transformations using hash objects and matrix math operations. This book is part of the SAS Press Series.

SQL Server 2017 Developer???s Guide

2018-03-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Milo≈° Radivojeviƒá , William Durkin , Dejan Sarka

AI/ML Analytics BI Data Analytics JSON Linux Python data data-engineering

"SQL Server 2017 Developer's Guide" provides a comprehensive approach to learning and utilizing the new features introduced in SQL Server 2017. From advanced Transact-SQL to integrating R and Python into your database projects, this book equips you with the knowledge to design and develop efficient database applications tailored to modern requirements. What this Book will help me do Master new features in SQL Server 2017 to enhance database application development. Implement In-Memory OLTP and columnstore indexes for optimal performance. Utilize JSON support in SQL Server to integrate modern data formats. Leverage R and Python integration to apply advanced data analytics and machine learning. Learn Linux and container deployment options to expand SQL Server usage scenarios. Author(s) The authors of "SQL Server 2017 Developer's Guide" are industry veterans with extensive experience in database design, business intelligence, and advanced analytics. They bring a practical, hands-on writing style that helps developers apply theoretical concepts effectively. Their commitment to teaching is evident in the clear and detailed guidance provided throughout the book. Who is it for? This book is ideal for database developers and solution architects aiming to build robust database applications with SQL Server 2017. It's a valuable resource for business intelligence developers or analysts seeking to harness SQL Server 2017's advanced features. Some familiarity with SQL Server and T-SQL is recommended to fully leverage the insights provided by this book.

Gaining Data Agility with Multi-Model Databases

2018-03-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Joel Ruisi

Big Data data data-engineering relational-databases

Most organizations realize that their future depends on the ability to quickly adapt to constant changes brought on by variable and complex environments. It's become increasingly clear that the core source behind these innovative solutions is data. Polyglot persistence refers to systems that provide many different types of data storage technologies to deal with this vast variability of data. Applications that need to access data from more than one store have to navigate an array of databases in a complex—and ultimately unsustainable—maze. One solution to this problem is readily available. In this ebook, consultant Joel Ruisi explains how a multi-model database enables you to take advantage of many different types of data models (and multiple schemas) in a single backend. With a multi-model database, companies can easily centralize, manage, and search all the data the IT system collects. The result is data agility: the ability to adapt to changing environments and serve users what they need when they need it. Through several detailed use cases, this ebook explains how multi-model databases enable you to: Store and manage multiple heterogeneous data sources Consolidate your data by bringing everything in "as is" Invisibly extend model features from one model to another Take a hybrid approach to analytical and operational data Enhance user search experience, including big data search Conduct queries across data models Offer SQL without relational constraints

SQL Server 2017 Machine Learning Services with R

2018-02-27 · O'Reilly Data Science Books O'Reilly Amazon

book

by Julie Koesmarno (Microsoft) , Toma≈æ Ka≈°trun Kaštrun

AI/ML Analytics Data Science data data-science data-science-tools r

Learn how to leverage SQL Server 2017 Machine Learning Services and the R programming language to create robust, efficient data analysis and machine learning solutions. This book provides actionable insights and practical examples to help you implement and manage database-oriented analytics and predictive modeling. What this Book will help me do Understand and use SQL Server 2017 Machine Learning Services integrated with R. Gain experience in installing, configuring, and maintaining R services in SQL Server. Create and operationalize predictive models using RevoScaleR and other R packages. Improve database solutions by incorporating advanced analytics techniques. Monitor and manage R-based services effectively for reliable production solutions. Author(s) Tomaž Kaštrun and None Koesmarno bring a wealth of expertise as practitioners and educators in data science and SQL Server technologies. They share their experience innovatively, making intricate subjects approachable. Their unified teaching method ensures readers can directly benefit from practical examples and real-world applications. Who is it for? This book is tailored for database administrators, data analysts, and data scientists eager to integrate R with SQL Server. It caters to professionals with varying levels of R experience who are looking to enhance their proficiency in database-oriented analytics. Readers will benefit most if they are motivated to design effective, data-driven solutions in SQL Server environments.

Spark: The Definitive Guide

2018-02-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Matei Zaharia (Databricks) , Bill Chambers

AI/ML API Big Data Spark Data Streaming apache-spark data data-engineering

Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of the open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals. Youâ??ll explore the basic operations and common functions of Sparkâ??s structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Sparkâ??s scalable machine-learning library. Get a gentle overview of big data and Spark Learn about DataFrames, SQL, and Datasetsâ??Sparkâ??s core APIsâ??through worked examples Dive into Sparkâ??s low-level APIs, RDDs, and execution of SQL and DataFrames Understand how Spark runs on a cluster Debug, monitor, and tune Spark clusters and applications Learn the power of Structured Streaming, Sparkâ??s stream-processing engine Learn how you can apply MLlib to a variety of problems, including classification or recommendation

SQL Server 2017 Administration Inside Out, First Edition

2018-02-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by William Assaf , Sven Aelterman , Randolph West

Azure Cloud Computing PowerShell Cyber Security data data-engineering microsoft-sql-server relational-databases

Conquer SQL Server 2017 administration—from the inside out Dive into SQL Server 2017 administration—and really put your SQL Server DBA expertise to work. This supremely organized reference packs hundreds of timesaving solutions, tips, and workarounds—all you need to plan, implement, manage, and secure SQL Server 2017 in any production environment: on-premises, cloud, or hybrid. Four SQL Server experts offer a complete tour of DBA capabilities available in SQL Server 2017 Database Engine, SQL Server Data Tools, SQL Server Management Studio, and via PowerShell. Discover how experts tackle today’s essential tasks—and challenge yourself to new levels of mastery. • Install, customize, and use SQL Server 2017’s key administration and development tools • Manage memory, storage, clustering, virtualization, and other components • Architect and implement database infrastructure, including IaaS, Azure SQL, and hybrid cloud configurations • Provision SQL Server and Azure SQL databases • Secure SQL Server via encryption, row-level security, and data masking • Safeguard Azure SQL databases using platform threat protection, firewalling, and auditing • Establish SQL Server IaaS network security groups and user-defined routes • Administer SQL Server user security and permissions • Efficiently design tables using keys, data types, columns, partitioning, and views • Utilize BLOBs and external, temporal, and memory-optimized tables • Master powerful optimization techniques involving concurrency, indexing, parallelism, and execution plans • Plan, deploy, and perform disaster recovery in traditional, cloud, and hybrid environments For Experienced SQL Server Administrators and Other Database Professionals • Your role: Intermediate-to-advanced level SQL Server database administrator, architect, developer, or performance tuning expert • Prerequisites: Basic understanding of database administration procedures

talk-data.com

SQL

Activity Trend

Top Events

Top Speakers

CockroachDB In Depth with Peter Mattis - Episode 35

BizTalk

Mastering The Faster Web with PHP, MySQL, and JavaScript

Microsoft SQL Server 2017 on Linux

Hands-On Data Warehousing with Azure Data Factory

PrestoDB and Starburst Data with Kamil Bajda-Pawlikowski - Episode 32

Running a data science startup, one decision at a time #Futureofdata podcast

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Practical SQL

Metabase Self Service Business Intelligence with Sameer Al-Sakran - Episode 29

PostgreSQL 10 High Performance - Third Edition

Octopai: Metadata Management for Better Business Intelligence with Amnon Drori - Episode 28

A Deep Dive into NoSQL Databases: The Use Cases and Applications

Oracle SQL Revealed: Executing Business Logic in the Database Engine

Beginning DAX with Power BI: The SQL Pro’s Guide to Better Business Intelligence

Mastering the SAS DS2 Procedure

SQL Server 2017 Developer???s Guide

Gaining Data Agility with Multi-Model Databases

SQL Server 2017 Machine Learning Services with R

Spark: The Definitive Guide

SQL Server 2017 Administration Inside Out, First Edition