SQL

Building Real Time Applications On Streaming Data With Eventador

2020-04-20 · Data Engineering Podcast Listen

podcast_episode

by Kenny Gorman (Eventador) , Tobias Macey

AI/ML Flink Data Engineering Data Management Data Modelling Kubernetes Python Data Streaming

Summary Modern applications frequently require access to real-time data, but building and maintaining the systems that make that possible is a complex and time consuming endeavor. Eventador is a managed platform designed to let you focus on using the data that you collect, without worrying about how to make it reliable. In this episode Eventador Founder and CEO Kenny Gorman describes how the platform is architected, the challenges inherent to managing reliable streams of data, the simplicity offered by a SQL interface, and the interesting projects that his customers have built on top of it. This was an interesting inside look at building a business on top of open source stream processing frameworks and how to reduce the burden on end users.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, a 40Gbit public network, fast object storage, and a brand new managed Kubernetes platform, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. And for your machine learning workloads, they’ve got dedicated CPU and GPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! Your host is Tobias Macey and today I’m interviewing Kenny Gorman about the Eventador streaming SQL platform

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what the Eventador platform is and the story behind it?

How has your experience at ObjectRocket influenced your approach to streaming SQL? How do the capabilities and developer experience of Eventador compare to other streaming SQL engines such as ksqlDB, Pulsar SQL, or Materialize?

What are the main use cases that you are seeing people use for streaming SQL?

How does it fit into an application architecture? What are some of the design changes in the different layers that are necessary to take advantage of the real time capabilities?

Can you describe how the Eventador platform is architected?

How has the system design evolved since you first began working on it? How has the overall landscape of streaming systems changed since you first began working on Eventador? If you were to start over today what would you do differently?

What are some of the most interesting and challenging operational aspects of running your platform? What are some of the ways that you have modified or augmented the SQL dialect that you support?

What is the tipping point for when SQL is insufficient for a given task and a user might want to leverage Flink?

What is the workflow for developing and deploying different SQL jobs?

How do you handle versioning of the queries and integration with the software development lifecycle?

What are some data modeling considerations that users should be aware of?

What are some of the sharp edges or design pitfalls that users should be aware of?

What are some of the most interesting, innovative, or unexpected ways that you have seen your customers use your platform? What are some of the most interesting, unexpected, or challenging lessons that you have learned in the process of building and scaling Eventador? What do you have planned for the future of the platform?

Contact Info

LinkedIn Blog @kennygorman on Twitter kgorman on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, Podcast.init to learn about the Python language, its community, and the innovative ways it is being used. Visit t

DAX Cookbook

2020-03-18 · O'Reilly Data Science Books O'Reilly Amazon

book

by Greg Deckler

Analytics BI DAX KPI Power BI analytics-platforms data data-analysis-expressions-dax data-science data analysis expressions (dax) powerpivot

"DAX Cookbook: Over 120 recipes to enhance your business with analytics, reporting, and business intelligence" is the ultimate guidebook for mastering DAX (Data Analysis Expressions) in business intelligence, Power BI, and SQL Server Analysis Services. With hands-on examples and extensive recipes, it enables professionals to solve real-world data challenges effectively. What this Book will help me do Understand how to create tailored calculations for dates, time, and duration to enhance data insights. Develop key performance indicators (KPIs) and advanced business metrics for strategic decision-making. Master text and numerical data transformations to construct dynamic dashboards and reports. Optimize data models and DAX queries for improved performance and analytics accuracy. Learn to handle and debug calculations, and implement complex statistical and mathematical measures. Author(s) Greg Deckler is a seasoned business intelligence professional with extensive experience in using DAX and Power BI to provide actionable insights. As a recognized expert in the field, Greg brings practical knowledge of developing scalable BI solutions. His teaching approach is rooted in clarity and real-world application, making complex topics accessible to learners of all levels. Who is it for? This book is perfect for business professionals, BI developers, and data analysts with basic knowledge of the DAX language and associated tools. If you are looking to enhance your DAX skills and solve tough analytical challenges, this book is tailored for you. It's highly relevant for those aiming to optimize business intelligence workflows and improve data-driven decisions.

MySQL 8 Query Performance Tuning: A Systematic Method for Improving Execution Speeds

2020-03-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jesper Wisborg Krogh

MySQL data data-engineering relational-databases

Identify, analyze, and improve poorly performing queries that damage user experience and lead to lost revenue for your business. This book will help you make query tuning an integral part of your daily routine through a multi-step process that includes monitoring of execution times, identifying candidate queries for optimization, analyzing their current performance, and improving them to deliver results faster and with less overhead. Author Jesper Krogh systematically discusses each of these steps along with the data sources and the tools used to perform them. MySQL 8 Query Performance Tuning aims to help you improve query performance using a wide range of strategies. You will know how to analyze queries using both the traditional EXPLAIN command as well as the new EXPLAIN ANALYZE tool. You also will see how to use the Visual Explain feature to provide a visually-oriented view of an execution plan. Coverage of indexes includes indexing strategies and index statistics, and you will learn how histograms can be used to provide input on skewed data distributions that the optimizer can use to improve query performance. You will learn about locks, and how to investigate locking issues. And you will come away with an understanding of how the MySQL optimizer works, including the new hash join algorithm, and how to change the optimizer’s behavior when needed to deliver faster execution times. You will gain the tools and skills needed to delight application users and to squeeze the most value from corporate computing resources. What You Will Learn Monitor query performance to identify poor performers Choose queries to optimize that will provide the greatest gain Analyze queries using tools such as EXPLAIN ANALYZE and Visual Explain Improve slow queries through a wide range of strategies Properly deploy indexes and histograms to aid in creating fast execution plans Understand and analyze locks to resolve contention and increase throughput Who This Book Is For Database administrators and SQL developers who are familiar with MySQL and need to participate in query tuning. While some experience with MySQL is required, no prior knowledge of query performance tuning is needed.

Learning SQL, 3rd Edition

2020-03-13 · O'Reilly SQL Books O'Reilly Amazon

book

by Alan Beaulieu

Big Data relational databases

As data floods into your company, you need to put it to work right away—and SQL is the best tool for the job. With the latest edition of this introductory guide, author Alan Beaulieu helps developers get up to speed with SQL fundamentals for writing database applications, performing administrative tasks, and generating reports. You’ll find new chapters on SQL and big data, analytic functions, and working with very large databases. Each chapter presents a self-contained lesson on a key SQL concept or technique using numerous illustrations and annotated examples. Exercises let you practice the skills you learn. Knowledge of SQL is a must for interacting with data. With Learning SQL, you’ll quickly discover how to put the power and flexibility of this language to work. Move quickly through SQL basics and several advanced features Use SQL data statements to generate, manipulate, and retrieve data Create database objects, such as tables, indexes, and constraints with SQL schema statements Learn how datasets interact with queries; understand the importance of subqueries Convert and manipulate data with SQL’s built-in functions and use conditional logic in data statements

Get Up to Speed with IBM Z feat. Priya Doty

2020-03-11 · Making Data Simple Listen

podcast_episode

by Priya Doty (IBM) , Al Martin (IBM)

IBM Marketing

Send us a text Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next. Abstract This week on the podcast, our guest is Priya Doty, VP of Product Marketing at IBM. Priya specializes with the IBM Z and LinuxONE brands and shares her expertise in this episode. Connect with Priya LinkedIn Twitter Medium IBM Blogs Show Notes 02:51 - Learn more on using SQL for Data Analysis here. 05:53 - Discover what B2B means here. 16:03 - You can find out more on pervasive encryption technology here. Connect with the Team Producer Liam Seston - LinkedIn. Producer Lana Cosic - LinkedIn. Producer Meighann Helene - LinkedIn. Producer Mark Simmonds - LinkedIn. Host Al Martin - LinkedIn and Twitter. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

SQL Server 2019 Administration Inside Out

2020-03-06 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by William Assaf , Sven Aelterman , Melody Zacharias , Randolph West , Joseph D'Antoni , Louis Davidson

API Azure Big Data Cloud Computing Linux PowerShell data data-engineering microsoft-sql-server relational-databases

Conquer SQL Server 2019 administration–from the inside out Dive into SQL Server 2019 administration–and really put your SQL Server DBA expertise to work. This supremely organized reference packs hundreds of timesaving solutions, tips, and workarounds–all you need to plan, implement, manage, and secure SQL Server 2019 in any production environment: on-premises, cloud, or hybrid. Six experts thoroughly tour DBA capabilities available in SQL Server 2019 Database Engine, SQL Server Data Tools, SQL Server Management Studio, PowerShell, and Azure Portal. You’ll find extensive new coverage of Azure SQL, big data clusters, PolyBase, data protection, automation, and more. Discover how experts tackle today’s essential tasks–and challenge yourself to new levels of mastery. Explore SQL Server 2019’s toolset, including the improved SQL Server Management Studio, Azure Data Studio, and Configuration Manager Design, implement, manage, and govern on-premises, hybrid, or Azure database infrastructures Install and configure SQL Server on Windows and Linux Master modern maintenance and monitoring with extended events, Resource Governor, and the SQL Assessment API Automate tasks with maintenance plans, PowerShell, Policy-Based Management, and more Plan and manage data recovery, including hybrid backup/restore, Azure SQL Database recovery, and geo-replication Use availability groups for high availability and disaster recovery Protect data with Transparent Data Encryption, Always Encrypted, new Certificate Management capabilities, and other advances Optimize databases with SQL Server 2019’s advanced performance and indexing features Provision and operate Azure SQL Database and its managed instances Move SQL Server workloads to Azure: planning, testing, migration, and post-migration

Easier Stream Processing On Kafka With ksqlDB

2020-03-02 · Data Engineering Podcast Listen

podcast_episode

by Michael Drogalis (Confluent) , Tobias Macey

AI/ML Big Data Cloud Computing Data Engineering Data Lake Data Management DWH Kafka Kubernetes Snowplow Data Streaming

Summary Building applications on top of unbounded event streams is a complex endeavor, requiring careful integration of multiple disparate systems that were engineered in isolation. The ksqlDB project was created to address this state of affairs by building a unified layer on top of the Kafka ecosystem for stream processing. Developers can work with the SQL constructs that they are familiar with while automatically getting the durability and reliability that Kafka offers. In this episode Michael Drogalis, product manager for ksqlDB at Confluent, explains how the system is implemented, how you can use it for building your own stream processing applications, and how it fits into the lifecycle of your data infrastructure. If you have been struggling with building services on low level streaming interfaces then give this episode a listen and try it out for yourself.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, a 40Gbit public network, fast object storage, and a brand new managed Kubernetes platform, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. And for your machine learning workloads, they’ve got dedicated CPU and GPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! Are you spending too much time maintaining your data pipeline? Snowplow empowers your business with a real-time event data pipeline running in your own cloud account without the hassle of maintenance. Snowplow takes care of everything from installing your pipeline in a couple of hours to upgrading and autoscaling so you can focus on your exciting data projects. Your team will get the most complete, accurate and ready-to-use behavioral web and mobile data, delivered into your data warehouse, data lake and real-time streams. Go to dataengineeringpodcast.com/snowplow today to find out why more than 600,000 websites run Snowplow. Set up a demo and mention you’re a listener for a special offer! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference in NYC, Strata Data in San Jose, and PyCon US in Pittsburgh. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Michael Drogalis about ksqlDB, the open source streaming database layer for Kafka

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what ksqlDB is? What are some of the use cases that it is designed for? How do the capabilities and design of ksqlDB compare to other solutions for querying streaming data with SQL such as Pulsar SQL, PipelineDB, or Materialize? What was the motivation for building a unified project for providing a database interface on the data stored in Kafka? How is ksqlDB architected?

If you were to rebuild the entire platform and its components from scratch today, what would you do differently?

What is the workflow for an analyst or engineer to design and build an application on top of ksqlDB?

What dialect of SQL is supported?

What ki

Practical Oracle SQL: Mastering the Full Power of Oracle Database

2020-02-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Kim Berg Hansen

Oracle data data-engineering oracle-database-solutions

Write powerful queries using as much of the feature-rich Oracle SQL language as possible, progressing beyond the simple queries of basic SQL as standardized in SQL-92. Both standard SQL and Oracle’s own extensions to the language have progressed far over the decades in terms of how much you can work with your data in a single, albeit sometimes complex, SQL statement. If you already know the basics of SQL, this book provides many examples of how to write even more advanced SQL to huge benefit in your applications, such as: Pivoting rows to columns and columns to rows Recursion in SQL with MODEL and WITH clauses Answering Top-N questions Forecasting with linear regressions Row pattern matching to group or distribute rows Using MATCH_RECOGNIZE as a row processing engineThe process of starting from simpler statements in SQL, and gradually working those statements stepwise into more complexstatements that deliver powerful results, is covered in each example. By trying out the recipes and examples for yourself, you will put together the building blocks into powerful SQL statements that will make your application run circles around your competitors. What You Will Learn Take full advantage of advanced and modern features in Oracle SQL Recognize when modern SQL constructs can help create better applications Improve SQL query building skills through stepwise refinement Apply set-based thinking to process more data in fewer queries Make cross-row calculations with analytic functions Search for patterns across multiple rows using row pattern matching Break complex calculations into smaller steps with subquery factoring Who This Book Is For Oracle Database developers who already knowsome SQL, but rarely use features of the language beyond the SQL-92 standard. And it is for developers who would like to apply the more modern features of Oracle SQL, but don’t know where to start. The book also is for those who want to write increasingly complex queries in a stepwise and understandable manner. Experienced developers will use the book to develop more efficient queries using the advanced features of the Oracle SQL language.

Implementing a VersaStack Solution by Cisco and IBM with IBM FlashSystem 5030, Cisco UCS Mini, Hyper-V, and SQL Server

2020-02-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Lee J Cockrell , Sreeni Edula , Kiran Ghag , Jordan Fincher , David Green , Nitin D Thorve , Gucer Vasfi , Paulo Tomiyoshi Takeda

Hyper-V IBM Microsoft SQL Server data data-engineering

VersaStack, an IBM® and Cisco integrated infrastructure solution, combines computing, networking, and storage into a single integrated system. It combines the Cisco Unified Computing System (Cisco UCS) Integrated Infrastructure with IBM Spectrum Virtualize™, which includes IBM FlashSystem® storage offerings, for quick deployment and rapid time to value for the implementation of modern infrastructures. This IBM Redbooks® publication covers the preferred practices for implementing a VersaStack Solution with IBM FlashSystem 5030, Cisco UCS Mini, Hyper-V 2016, and Microsoft SQL Server. Cisco UCS Mini is optimized for branch and remote offices, point-of-sale locations, and smaller IT environments. It is the ideal solution for customers who need fewer servers but still want the comprehensive management capabilities provided by Cisco UCS Manager. The IBM FlashSystem 5030 delivers efficient, entry-level configurations that are designed to meet the needs of small and midsize businesses. Designed to provide organizations with the ability to consolidate and share data at an affordable price, the IBM FlashSystem 5030 offers advanced software capabilities such as clustering, IBM Easy Tier®, replication and snapshots that are found in more expensive systems. This book is intended for pre-sales and post-sales technical support professionals and storage administrators who are tasked with deploying a VersaStack solution with Hyper-V 2016 and Microsoft SQL Server.

Pro T-SQL 2019: Toward Speed, Scalability, and Standardization for SQL Server Developers

2020-02-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Elizabeth Noble

Data Management Git data data-engineering

Design and write simple and efficient T-SQL code in SQL Server 2019 and beyond. Writing T-SQL that pulls back correct results can be challenging. This book provides the help you need in writing T-SQL that performs fast and is easy to maintain. You also will learn how to implement version control, testing, and deployment strategies. Hands-on examples show modern T-SQL practices and provide straightforward explanations. Attention is given to selecting the right data types and objects when designing T-SQL solutions. Author Elizabeth Noble teaches you how to improve your T-SQL performance through good design practices that benefit programmers and ultimately the users of the applications. You will know the common pitfalls of writing T-SQL and how to avoid those pitfalls going forward. What You Will Learn Choose correct data types and database objects when designing T-SQL Write T-SQL that searches data efficiently and uses hardware effectively Implement source control and testing methods to streamline the deployment process Design T-SQL that can be enhanced or modified with less effort Plan for long-term data management and storage Who This Book Is For Database developers who want to improve the efficiency of their applications, and developers who want to solve complex query and data problems more easily by writing T-SQL that performs well, brings back correct results, and is easy for other developers to understand and maintain

Planet Scale SQL For The New Generation Of Applications With YugabyteDB

2020-01-13 · Data Engineering Podcast Listen

podcast_episode

by Karthik Ranganathan (Yugabyte) , Tobias Macey

AI/ML API Big Data Cassandra Data Engineering Data Management ELK Python Data Streaming postgresql

Summary The modern era of software development is identified by ubiquitous access to elastic infrastructure for computation and easy automation of deployment. This has led to a class of applications that can quickly scale to serve users worldwide. This requires a new class of data storage which can accomodate that demand without having to rearchitect your system at each level of growth. YugabyteDB is an open source database designed to support planet scale workloads with high data density and full ACID compliance. In this episode Karthik Ranganathan explains how Yugabyte is architected, their motivations for being fully open source, and how they simplify the process of scaling your application from greenfield to global. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementWhen you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference in NYC, Strata Data in San Jose, and PyCon US in Pittsburgh. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.Your host is Tobias Macey and today I’m interviewing Karthik Ranganathan about YugabyteDB, the open source, high-performance distributed SQL database for global, internet-scale apps.Interview IntroductionHow did you get involved in the area of data management?Can you start by describing what YugabyteDB is and its origin story?A growing trend in database engines (e.g. FaunaDB, CockroachDB) has been an out of the box focus on global distribution. Why is that important and how does it work in Yugabyte? What are the caveats?What are the most notable features of YugabyteDB that would lead someone to choose it over any of the myriad other options? What are the use cases that it is uniquely suited to?What are some of the systems or architecture patterns that can be replaced with Yugabyte?How does the design of Yugabyte or the different ways it is being used influence the way that users should think about modeling their data?Yugabyte is an impressive piece of engineering. Can you talk through the major design elements and how it is implemented?Easy scaling and failover is a feature that many database engines would like to be able to claim. What are the difficult elements that prevent them from implementing that capability as a standard practice? What do you have to sacrifice in order to support the level of scale and fault tolerance that you provide?Speaking of scaling, there are many ways to define that term, from vertical scaling of storage or compute, to horizontal scaling of compute, to scaling of reads and writes. What are the primary scaling factors that you focus on in Yugabyte?How do you approach testing and validation of the code given the complexity of the system that you are building?In terms of the query API you have support for a Postgres compatible SQL dialect as well as a Cassandra based syntax. What are the benefits of targeting compatibility with those platforms? What are the challenges and benefits of maintaining compatibility with those other platforms?Can you describe how the storage layer is implemented and the division between the different query formats?What are the operational characteristics of YugabyteDB? What are the complexities or edge cases that users should be aware of when planning a deployment?One of the challenges of working with large volumes of data is creating and maintaining backups. How does Yugabyte handle that problem?Most open source infrastructure projects that are backed by a business withhold various "enterprise" features such as backups and change data capture as a means of driving revenue. Can you talk through your motivation for releasing those capabilities as open source?What is the business model that you are using for YugabyteDB and how does it differ from the tribal knowledge of how open source companies generally work?What are some of the most interesting, innovative, or unexpected ways that you have seen yugabyte used?When is Yugabyte the wrong choice?What do you have planned for the future of the technical and business aspects of Yugabyte?Contact Info @karthikr on TwitterLinkedInrkarthik007 on GitHubParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don’t forget to check out our other show, Podcast.init to learn about the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workersJoin the community in the new Zulip chat workspace at dataengineeringpodcast.com/chatLinks YugabyteDBGitHubNutanixFacebook EngineeringApache CassandraApache HBaseDelphiFuanaDBPodcast EpisodeCockroachDBPodcast EpisodeHA == High AvailabilityOracleMicrosoft SQL ServerPostgreSQLPodcast EpisodeMongoDBAmazon AuroraPGCryptoPostGISpl/pgsqlForeign Data WrappersPipelineDBPodcast EpisodeCitusPodcast EpisodeJepsen TestingYugabyte Jepsen Test ResultsOLTP == Online Transaction ProcessingOLAP == Online Analytical ProcessingDocDBGoogle SpannerGoogle BigTableSpot InstancesKubernetesCloudformationTerraformPrometheusDebeziumPodcast EpisodeThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Refactoring Legacy T-SQL for Improved Performance: Modern Practices for SQL Server Applications

2020-01-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Lisa Bohm

data data-engineering microsoft-sql-server relational-databases transact-sql

Breathe new life into older applications by refactoring T-SQL queries and code using modern techniques. This book shows you how to significantly improve the performance of older applications by finding common anti-patterns in T-SQL code, then rewriting those anti-patterns using new functionality that is supported in current versions of SQL Server, including SQL Server 2019. The focus moves through the different types of database objects and the code used to create them, discussing the limitations and anti-patterns commonly found for each object type in your database. Legacy code isn’t just found in queries and external applications. It’s also found in the definitions of underlying database objects such as views and tables. This book helps you quickly find problematic code throughout the database and points out where and how modern solutions can replace older code, thereby making your legacy applications run faster and extending their lifetimes. Author Lisa Bohm explains the logic behind each anti-pattern, helping you understand why each pattern is a problem and showing how it can be avoided. Good coding habits are discussed, including guidance on topics such as readability and maintainability. What You Will Learn Find specific areas in code to target for performance gains Identify pain points quickly and understand why they are problematic Rewrite legacy T-SQL to reduce or eliminate hidden performance issues Write modern code with an awareness of readability and maintainability Recognize and correlate T-SQL anti-patterns with techniques for better solutions Make a positive impact on application user experience in your organization Who This Book Is For Database administrators or developers who maintain older code, those frustrated with complaints about slow codewhen there is so much of it to fix, and those who want a head start in making a positive impact on application user experience in their organization

Sams Teach Yourself SQL in 10 Minutes a Day, 5th Edition

2020-01-08 · O'Reilly SQL Books O'Reilly Amazon

book

by Ben Forta

IBM MariaDB Microsoft MySQL Oracle SQL Server microsoft sql server postgresql

Sams Teach Yourself SQL in 10 Minutes offers straightforward, practical answers when you need fast results. By working through the book’s 22 lessons of 10 minutes or less, you’ll learn what you need to know to take advantage of the SQL language. Lessons cover IBM DB2, Microsoft SQL Server and SQL Server Express, MariaDB, MySQL, Oracle and Oracle express, PostgreSQL, and SQLite. Full-color code examples help you understand how SQL statements are structured Tips point out shortcuts and solutions Cautions help you avoid common pitfalls Notes explain additional concepts, and provide additional information 10 minutes is all you need to learn how to… Use the major SQL statements Construct complex SQL statements using multiple clauses and operators Retrieve, sort, and format database contents Pinpoint the data you need using a variety of filtering techniques Use aggregate functions to summarize data Join two or more related tables Insert, update, and delete data Create and alter database tables Work with views, stored procedures, and more

Microsoft SQL Server 2019: A Beginner's Guide, Seventh Edition, 7th Edition

2020-01-03 · O'Reilly SQL Books O'Reilly Amazon

book

by Dusan Petkovic

AI/ML BI JSON Microsoft Python Cyber Security SQL Server microsoft sql server

Publisher's Note: Products purchased from Third Party sellers are not guaranteed by the publisher for quality, authenticity, or access to any online entitlements included with the product. Get Up to Speed on Microsoft® SQL Server® 2019 Quickly and Easily Start working with Microsoft SQL Server 2019 in no time with help from this thoroughly revised, practical resource. Filled with real-world examples and hands-on exercises, Microsoft SQL Server 2019: A Beginner’s Guide, Seventh Edition starts by explaining fundamental relational database system concepts. From there, you’ll learn how to write Transact-SQL statements, execute simple and complex database queries, handle system administration and security, and use powerful analysis and reporting tools. New topics such as SQL and JSON support, graph databases, and support for machine learning with R and Python are also covered in this step-by-step tutorial. • Install, configure, and customize Microsoft SQL Server 2019 • Create and modify database objects with Transact-SQL statements • Write stored procedures and user-defined functions • Handle backup and recovery, and automate administrative tasks • Tune your database system for optimal availability and reliability • Secure your system using authentication, encryption, and authorization • Work with SQL Server Analysis Services, Reporting Services, and other BI tools • Gain knowledge of relational storage, presentation, and retrieval of data stored in the JSON format • Manage graphs using SQL Server Graph Databases • Learn about machine learning support for R and Python

Building The DataDog Platform For Processing Timeseries Data At Massive Scale

2019-12-30 · Data Engineering Podcast Listen

podcast_episode

by Vadim Semenov (DataDog) , Tobias Macey

AI/ML Airflow API Big Data Cassandra Chef Cloud Computing Dagster Data Engineering Data Management Databricks Datadog +13 more

Summary DataDog is one of the most successful companies in the space of metrics and monitoring for servers and cloud infrastructure. In order to support their customers, they need to capture, process, and analyze massive amounts of timeseries data with a high degree of uptime and reliability. Vadim Semenov works on their data engineering team and joins the podcast in this episode to discuss the challenges that he works through, the systems that DataDog has built to power their business, and how their teams are organized to allow for rapid growth and massive scale. Getting an inside look at the companies behind the services we use is always useful, and this conversation was no exception.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference in NYC, Strata Data in San Jose, and PyCon US in Pittsburgh. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Vadim Semenov about how data engineers work at DataDog

Interview

Introduction How did you get involved in the area of data management? For anyone who isn’t familiar with DataDog, can you start by describing the types and volumes of data that you’re dealing with? What are the main components of your platform for managing that information? How are the data teams at DataDog organized and what are your primary responsibilities in the organization? What are some of the complexities and challenges that you face in your work as a result of the volume of data that you are processing?

What are some of the strategies which have proven to be most useful in overcoming those challenges?

Who are the main consumers of your work and how do you build in feedback cycles to ensure that their needs are being met? Given that the majority of the data being ingested by DataDog is timeseries, what are your lifecycle and retention policies for that information? Most of the data that you are working with is customer generated from your deployed agents and API integrations. How do you manage cleanliness and schema enforcement for the events as they are being delivered? What are some of the upcoming projects that you have planned for the upcoming months and years? What are some of the technologies, patterns, or practices that you are hoping to adopt?

Contact Info

LinkedIn @databuryat on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, Podcast.init to learn about the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat

Links

DataDog Hadoop Hive Yarn Chef SRE == Site Reliability Engineer Application Performance Management (APM) Apache Kafka RocksDB Cassandra Apache Parquet data serialization format SLA == Service Level Agreement WatchDog Apache Spark

Podcast Episode

Apache Pig Databricks JVM == Java Virtual Machine Kubernetes SSIS (SQL Server Integration Services) Pentaho JasperSoft Apache Airflow

Podcast.init Episode

Apache NiFi

Podcast Episode

Luigi Dagster

Podcast Episode

Prefect

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

The SQL Workshop

2019-12-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Frank Solomon , Dixit Patel , Prashanth Jayaram , Shashikant Shakya , Fiodar Sazanavets , Awni Al Saqqa , Aaditya Pokkunuri , Scott Cosentino , Pradeep Kumar Gupta , Shubham Jain , Rakesh Kumar Pandey

RDBMS Cyber Security data data-engineering

The SQL Workshop is your go-to guide for delving into the essential techniques and best practices of working with SQL. You'll start with the basics of querying and database management, progressing to advanced concepts like joins, normalization, and database security. What this Book will help me do Construct and maintain relational databases that meet real-world requirements. Perform CRUD operations efficiently using SQL queries. Design effective and optimized database schemas through normalization. Secure and safeguard data with access controls and privilege management. Leverage SQL for data analysis and reporting through advanced query techniques. Author(s) Frank Solomon, Prashanth Jayaram, and Awni Al Saqqa bring together decades of practical and academic experience in SQL and database management. Their informative and hands-on approach helps readers bridge the gap between theoretical concepts and practical applications. Who is it for? Written for newcomers and intermediate learners, this book is ideal for aspiring software developers, data scientists, and database managers looking to advance their SQL skills. Beginners with no database experience will find this book's gradual learning curve approachable.

Building The Materialize Engine For Interactive Streaming Analytics In SQL

2019-12-23 · Data Engineering Podcast Listen

podcast_episode

by Frank McSherry (Materialize) , Tobias Macey

AI/ML Analytics Big Data Data Engineering Data Management Dataflow Java Rust Data Streaming

Summary Transactional databases used in applications are optimized for fast reads and writes with relatively simple queries on a small number of records. Data warehouses are optimized for batched writes and complex analytical queries. Between those use cases there are varying levels of support for fast reads on quickly changing data. To address that need more completely the team at Materialize has created an engine that allows for building queryable views of your data as it is continually updated from the stream of changes being generated by your applications. In this episode Frank McSherry, chief scientist of Materialize, explains why it was created, what use cases it enables, and how it works to provide fast queries on continually updated data.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference in NYC, Strata Data in San Jose, and PyCon US in Pittsburgh. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Frank McSherry about Materialize, an engine for maintaining materialized views on incrementally updated data from change data captures

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what Materialize is and the problems that you are aiming to solve with it?

What was your motivation for creating it?

What use cases does Materialize enable?

What are some of the existing tools or systems that you have seen employed to address those needs which can be replaced by Materialize? How does it fit into the broader ecosystem of data tools and platforms?

What are some of the use cases that Materialize is uniquely able to support? How is Materialize architected and how has the design evolved since you first began working on it? Materialize is based on your timely-dataflow project, which itself is based on the work you did on Naiad. What was your reasoning for using Rust as the implementation target and what benefits has it provided?

What are some of the components or primitives that were missing in the Rust ecosystem as compared to what is available in Java or C/C++, which have been the dominant languages for distributed data systems?

In the list of features, you highlight full support for ANSI SQL 92. What were some of the edge cases that you faced in complying with that standard given the distributed execution context for Materialize?

A majority of SQL oriented platforms define custom extensions or built-in functions that are specific to their problem domain. What are some of the existing or

PolyBase Revealed: Data Virtualization with SQL Server, Hadoop, Apache Spark, and Beyond

2019-12-20 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Kevin Feasel

Azure Cosmos ETL/ELT Hadoop Microsoft Oracle Spark apache-spark data data-engineering

Harness the power of PolyBase data virtualization software to make data from a variety of sources easily accessible through SQL queries while using the T-SQL skills you already know and have mastered. PolyBase Revealed shows you how to use the PolyBase feature of SQL Server 2019 to integrate SQL Server with Azure Blob Storage, Apache Hadoop, other SQL Server instances, Oracle, Cosmos DB, Apache Spark, and more. You will learn how PolyBase can help you reduce storage and other costs by avoiding the need for ETL processes that duplicate data in order to make it accessible from one source. PolyBase makes SQL Server into that one source, and T-SQL is your golden ticket. The book also covers PolyBase scale-out clusters, allowing you to distribute PolyBase queries among several SQL Server instances, thus improving performance. With great flexibility comes great complexity, and this book shows you where to look when queries fail, complete with coverageof internals, troubleshooting techniques, and where to find more information on obscure cross-platform errors. Data virtualization is a key target for Microsoft with SQL Server 2019. This book will help you keep your skills current, remain relevant, and build new business and career opportunities around Microsoft’s product direction. What You Will Learn Install and configure PolyBase as a stand-alone service, or unlock its capabilities with a scale-out cluster Understand how PolyBase interacts with outside data sources while presenting their data as regular SQL Server tables Write queries combining data from SQL Server, Apache Hadoop, Oracle, Cosmos DB, Apache Spark, and more Troubleshoot PolyBase queries using SQL Server Dynamic Management Views Tune PolyBase queries using statistics and execution plans Solve common business problems, including "cold storage" of infrequentlyaccessed data and simplifying ETL jobs Who This Book Is For SQL Server developers working in multi-platform environments who want one easy way of communicating with, and collecting data from, all of these sources

Solving Data Lineage Tracking And Data Discovery At WeWork

2019-12-16 · Data Engineering Podcast Listen

podcast_episode

by Willy Lulciuc (WeWork) , Julien Le Dem (Astronomer) , Tobias Macey

AI/ML Airflow Analytics Big Data Dagster Data Engineering Data Management Data Modelling Data Quality Google Dataform dbt Delta +18 more

Summary Building clean datasets with reliable and reproducible ingestion pipelines is completely useless if it’s not possible to find them and understand their provenance. The solution to discoverability and tracking of data lineage is to incorporate a metadata repository into your data platform. The metadata repository serves as a data catalog and a means of reporting on the health and status of your datasets when it is properly integrated into the rest of your tools. At WeWork they needed a system that would provide visibility into their Airflow pipelines and the outputs produced. In this episode Julien Le Dem and Willy Lulciuc explain how they built Marquez to serve that need, how it is architected, and how it compares to other options that you might be considering. Even if you already have a metadata repository this is worth a listen to learn more about the value that visibility of your data can bring to your organization.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You work hard to make sure that your data is clean, reliable, and reproducible throughout the ingestion pipeline, but what happens when it gets to the data warehouse? Dataform picks up where your ETL jobs leave off, turning raw data into reliable analytics. Their web based transformation tool with built in collaboration features lets your analysts own the full lifecycle of data in your warehouse. Featuring built in version control integration, real-time error checking for their SQL code, data quality tests, scheduling, and a data catalog with annotation capabilities it’s everything you need to keep your data warehouse in order. Sign up for a free trial today at dataengineeringpodcast.com/dataform and email [email protected] with the subject "Data Engineering Podcast" to get a hands-on demo from one of their data experts. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference, the Strata Data conference, and PyCon US. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Willy Lulciuc and Julien Le Dem about Marquez, an open source platform to collect, aggregate, and visualize a data ecosystem’s metadata

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what Marquez is?

What was missing in existing metadata management platforms that necessitated the creation of Marquez?

How do the capabilities of Marquez compare with tools and services that bill themselves as data catalogs?

How does it compare to the Amundsen platform that Lyft recently released?

What are some of the tools or platforms that are currently integrated with Marquez and what additional integrations would you like to see? What are some of the capabilities that are unique to Marquez and how are you using them at WeWork? What are the primary resource types that you support in Marquez?

What are some of the lowest common denominator attributes that are necessary and useful to track in a metadata repository?

Can you explain how Marquez is architected and how the design has evolved since you first began working on it?

Many metadata management systems are simply a service layer on top of a separate data storage engine. What are the benefits of using PostgreSQL as the system of record for Marquez?

What are some of the complexities that arise from relying on a relational engine as opposed to a document store or graph database?

How is the metadata itself stored and managed in Marquez?

How much up-front data modeling is necessary and what types of schema representations are supported?

Can you talk through the overall workflow of someone using Marquez in their environment?

What is involved in registering and updating datasets? How do you define and track the health of a given dataset? What are some of the interesting questions that can be answered from the information stored in Marquez?

What were your assumptions going into this project and how have they been challenged or updated as you began using it for production use cases? For someone who is interested in using Marquez what is involved in deploying and maintaining an installation of it? What have you found to be the most challenging or unanticipated aspects of building and maintaining a metadata repository and data discovery platform? When is Marquez the wrong choice for a metadata repository? What do you have planned for the future of Marquez?

Contact Info

Julien Le Dem

@J_ on Twitter Email julienledem on GitHub

Willy

LinkedIn @wslulciuc on Twitter wslulciuc on GitHub

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, Podcast.init to learn about the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat

Links

Marquez

DataEngConf Presentation

WeWork Canary Yahoo Dremio Hadoop Pig Parquet

Podcast Episode

Airflow Apache Atlas Amundsen

Podcast Episode

Uber DataBook LinkedIn DataHub Iceberg Table Format

Podcast Episode

Delta Lake

Podcast Episode

Great Expectations data pipeline unit testing framework

Podcast.init Episode

Redshift SnowflakeDB

Podcast Episode

Apache Kafka Schema Registry

Podcast Episode

Open Tracing Jaeger Zipkin DropWizard Java framework Marquez UI Cayley Graph Database Kubernetes Marquez Helm Chart Marquez Docker Container Dagster

Podcast Episode

Luigi DBT

Podcast Episode

Thrift Protocol Buffers

The intro and outro music is from a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug?utm_source=rss&utm_medium=rss"…

Hands On Google Cloud SQL and Cloud Spanner: Deployment, Administration and Use Cases with Python

2019-12-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Shakuntala Gupta Edward , Navin Sabharwal

Big Data Cloud Computing GCP Python data data-engineering google-cloud-sql relational-databases

Discover the methodologies and best practices for getting started with Google Cloud Platform relational services – CloudSQL and CloudSpanner. The book begins with the basics of working with the Google Cloud Platform along with an introduction to the database technologies available for developers from Google Cloud. You'll then take an in-depth hands on journey into Google CloudSQL and CloudSpanner, including choosing the right platform for your application needs, planning, provisioning, designing and developing your application. Sample applications are given that use Python to connect to CloudSQL and CloudSpanner, along with helpful features provided by the engines. You''ll also implement practical best practices in the last chapter. Hands On Google Cloud SQL and Cloud Spanner is a great starting point to apply GCP data offerings in your technology stack and the code used allows you to try out the examples and extend them in interestingways. What You'll Learn Get started with Big Data technologies on the Google Cloud Platform Review CloudSQL and Cloud Spanner from basics to administration Apply best practices and use Google’s CloudSQL and CloudSpanner offering Work with code in Python notebooks and scripts Who This Book Is For Application architects, database architects, software developers, data engineers, cloud architects.

talk-data.com

Activity Trend

Top Events

Top Speakers

Building Real Time Applications On Streaming Data With Eventador

DAX Cookbook

MySQL 8 Query Performance Tuning: A Systematic Method for Improving Execution Speeds

Learning SQL, 3rd Edition

Get Up to Speed with IBM Z feat. Priya Doty

SQL Server 2019 Administration Inside Out

Easier Stream Processing On Kafka With ksqlDB

Practical Oracle SQL: Mastering the Full Power of Oracle Database

Implementing a VersaStack Solution by Cisco and IBM with IBM FlashSystem 5030, Cisco UCS Mini, Hyper-V, and SQL Server

Pro T-SQL 2019: Toward Speed, Scalability, and Standardization for SQL Server Developers

Planet Scale SQL For The New Generation Of Applications With YugabyteDB

Refactoring Legacy T-SQL for Improved Performance: Modern Practices for SQL Server Applications

Sams Teach Yourself SQL in 10 Minutes a Day, 5th Edition

Microsoft SQL Server 2019: A Beginner's Guide, Seventh Edition, 7th Edition

Building The DataDog Platform For Processing Timeseries Data At Massive Scale

The SQL Workshop

Building The Materialize Engine For Interactive Streaming Analytics In SQL

PolyBase Revealed: Data Virtualization with SQL Server, Hadoop, Apache Spark, and Beyond

Solving Data Lineage Tracking And Data Discovery At WeWork

Hands On Google Cloud SQL and Cloud Spanner: Deployment, Administration and Use Cases with Python