talk-data.com talk-data.com

Topic

API

Application Programming Interface (API)

integration software_development data_exchange

856

tagged

Activity Trend

65 peak/qtr
2020-Q1 2026-Q1

Activities

856 activities · Newest first

R2DBC Revealed: Reactive Relational Database Connectivity for Java and JVM Programmers

Understand the newest trend in database programming for developers working in Java, Kotlin, Clojure, and other JVM-based languages. This book introduces Reactive Relational Database Connectivity (R2DBC), a modern way of connecting to and querying relational databases from Java and other JVM languages. The book begins by helping you understand not only what reactive programming is, but why it is necessary. Then building on those fundamentals, the book takes you into the world of databases and the newly released Reactive Relational Database Connectivity (R2DBC) specification. Examples in the book are worked using the freely available MariaDB database along with MariaDB’s vendor-implementation of the R2DBC service-provider interface (SPI). Following along with the examples and the provided example code helps prepare you to work with any of the growing number of R2DBC implementations for popular enterprise databases such as Oracle Database and SQL Server. You’ll be well prepared for what is becoming the future of database access from Java and other languages built on the JVM. What You Will Learn Understand why R2DBC was created and how it utilizes the Reactive Streams API Understand the components of the R2DBC service-provider interface Create and manage reactive database connections and connection pools using an R2DBC client Programmatically execute queries on a relational database using an R2DBC client Effectively utilize transactions using an R2DBC client Build relational database-driven applications that are event-driven and non-blocking Who This Book Is For Software developers building solutions using JVM languages and the JVM ecosystem, and developers who need an introduction to the R2DBC specification and reactive programming with relational databases and want to understand what Reactive Relational Database Connectivity is and why it came about. This book includes practical examples of using the R2DBC specification with Java and MariaDB that will provide developers with the knowledge they need to create their own solutions.

Effortless App Development with Oracle Visual Builder

In "Effortless App Development with Oracle Visual Builder," you will explore how to quickly design, develop, and deploy robust web and mobile applications using Oracle Visual Builder's intuitive drag-and-drop features. This book equips you with the know-how to simplify application development tasks, making it perfect for professionals looking to boost productivity. What this Book will help me do Master the core architecture and features of Oracle Visual Builder to develop real-world applications effectively. Learn to create, manage, and leverage business objects and connect to various SaaS APIs within your applications. Build scalable and secure web and mobile applications using practical examples and clear implementation guidelines. Discover best practices for application lifecycle management, debugging, and troubleshooting VB applications. Extend Oracle and non-Oracle SaaS applications through hands-on knowledge tailored to real-world scenarios. Author(s) None Jain is an experienced developer and technical writer specializing in Oracle Visual Builder and cloud-based application development. With years of hands-on experience building and deploying cloud applications, they bring expertise and a practical approach to education. Their engaging writing style focuses on enabling readers to learn and apply new skills confidently. Who is it for? This book is perfectly suited for developers, UI designers, and IT professionals who want to master Oracle Visual Builder for developing web and mobile applications. If you already have experience with technologies like JavaScript, UI frameworks, and REST APIs, and seek to create intuitive applications using a simplified interface, this book is for you. Whether you're in the early stages of learning VB or looking to refine your skills, this book serves as a valuable guide.

MATLAB Recipes: A Problem-Solution Approach

Learn from state-of-the-art examples in robotics, motors, detection filters, chemical processes, aircraft, and spacecraft. With this book you will review contemporary MATLAB coding including the latest MATLAB language features and use MATLAB as a software development environment including code organization, GUI development, and algorithm design and testing. Features now covered include the new graph and digraph classes for charts and networks; interactive documents that combine text, code, and output; a new development environment for building apps; locally defined functions in scripts; automatic expansion of dimensions; tall arrays for big data; the new string type; new functions to encode/decode JSON; handling non-English languages; the new class architecture; the Mocking framework; an engine API for Java; the cloud-based MATLAB desktop; the memoize function; and heatmap charts. MATLAB Recipes: A Problem-Solution Approach, Second Edition provides practical, hands-on code snippets and guidance for using MATLAB to build a body of code you can turn to time and again for solving technical problems in your work. Develop algorithms, test them, visualize the results, and pass the code along to others to create a functional code base for your firm. What You Will Learn Get up to date with the latest MATLAB up to and including MATLAB 2020b Code in MATLAB Write applications in MATLAB Build your own toolbox of MATLAB code to increase your efficiency and effectiveness Who This Book Is For Engineers, data scientists, and students wanting a book rich in examples using MATLAB.

Summary The data warehouse has become the central component of the modern data stack. Building on this pattern, the team at Hightouch have created a platform that synchronizes information about your customers out to third party systems for use by marketing and sales teams. In this episode Tejas Manohar explains the benefits of sourcing customer data from one location for all of your organization to use, the technical challenges of synchronizing the data to external systems with varying APIs, and the workflow for enabling self-service access to your customer data by your marketing teams. This is an interesting conversation about the importance of the data warehouse and how it can be used beyond just internal analytics.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. This episode of Data Engineering Podcast is sponsored by Datadog, a unified monitoring and analytics platform built for developers, IT operations teams, and businesses in the cloud age. Datadog provides customizable dashboards, log management, and machine-learning-based alerts in one fully-integrated platform so you can seamlessly navigate, pinpoint, and resolve performance issues in context. Monitor all your databases, cloud services, containers, and serverless functions in one place with Datadog’s 400+ vendor-backed integrations. If an outage occurs, Datadog provides seamless navigation between your logs, infrastructure metrics, and application traces in just a few clicks to minimize downtime. Try it yourself today by starting a free 14-day trial and receive a Datadog t-shirt after installing the agent. Go to dataengineeringpodcast.com/datadog today to see how you can enhance visibility into your stack with Datadog. Your host is Tobias Macey and today I’m interviewing Tejas Manohar about Hightouch, a data platform that helps you sync your customer data from your data warehouse to your CRM, marketing, and support tools

Interview

Introduction How did you get involved in the area of data management? Can you start by giving an overview of what you are building at Hightouch and your motivation for creating it? What are the main points of friction for teams who are trying to make use of customer data? Where is Hightouch positioned in the ecosystem of customer data tools such as Segment, Mixpanel

Summary As data professionals we have a number of tools available for storing, processing, and analyzing data. We also have tools for collaborating on software and analysis, but collaborating on data is still an underserved capability. Gavin Mendel-Gleason encountered this problem first hand while working on the Sesshat databank, leading him to create TerminusDB and TerminusHub. In this episode he explains how the TerminusDB system is architected to provide a versioned graph storage engine that allows for branching and merging of data sets, how that opens up new possibilities for individuals and teams to work together on building new data repositories. This is a fascinating conversation on the technical challenges involved, the opportunities that such as system provides, and the complexities inherent to building a successful business on open source.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Do you want to get better at Python? Now is an excellent time to take an online course. Whether you’re just learning Python or you’re looking for deep dives on topics like APIs, memory mangement, async and await, and more, our friends at Talk Python Training have a top-notch course for you. If you’re just getting started, be sure to check out the Python for Absolute Beginners course. It’s like the first year of computer science that you never took compressed into 10 fun hours of Python coding and problem solving. Go to dataengineeringpodcast.com/talkpython today and get 10% off the course that will help you find your next level. That’s dataengineeringpodcast.com/talkpython, and don’t forget to thank them for supporting the show. You invest so much in your data infrastructure – you simply can’t afford to settle for unreliable data. Fortunately, there’s hope: in the same way that New Relic, DataDog, and other Application Performance Management solutions ensure reliable software and keep application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo’s end-to-end Data Observability Platform monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence. The platform uses machine learning to infer and learn your data, proactively identify data issues, assess its impact through lineage, and notify those who need to know before it impacts the business. By empowering data teams with end-to-end data reliability, Monte Carlo helps organizations save time, increase revenue, and restore trust in their data. Visit dataengineeringpodcast.com/montecarlo today to request a demo and see how Monte Carlo delivers data observability across your data infrastructure. The first 25 will receive a free, limited edition Monte Carlo hat! Your host is Tobias Macey and today I’m interviewing Gavin Mendel-Gleason about TerminusDB, an open source model driven graph database for knowledge graph representation

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what TerminusDB is and what motivated you to build it? What are the use cases that TerminusDB and TerminusHub are designed for? There are a number of different reasons and methods for versioning data, such as th

Summary As more organizations are gaining experience with data management and incorporating analytics into their decision making, their next move is to adopt machine learning. In order to make those efforts sustainable, the core capability they need is for data scientists and analysts to be able to build and deploy features in a self service manner. As a result the feature store is becoming a required piece of the data platform. To fill that need Kevin Stumpf and the team at Tecton are building an enterprise feature store as a service. In this episode he explains how his experience building the Michelanagelo platform at Uber has informed the design and architecture of Tecton, how it integrates with your existing data systems, and the elements that are required for well engineered feature store.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Do you want to get better at Python? Now is an excellent time to take an online course. Whether you’re just learning Python or you’re looking for deep dives on topics like APIs, memory mangement, async and await, and more, our friends at Talk Python Training have a top-notch course for you. If you’re just getting started, be sure to check out the Python for Absolute Beginners course. It’s like the first year of computer science that you never took compressed into 10 fun hours of Python coding and problem solving. Go to dataengineeringpodcast.com/talkpython today and get 10% off the course that will help you find your next level. That’s dataengineeringpodcast.com/talkpython, and don’t forget to thank them for supporting the show. You invest so much in your data infrastructure – you simply can’t afford to settle for unreliable data. Fortunately, there’s hope: in the same way that New Relic, DataDog, and other Application Performance Management solutions ensure reliable software and keep application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo’s end-to-end Data Observability Platform monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence. The platform uses machine learning to infer and learn your data, proactively identify data issues, assess its impact through lineage, and notify those who need to know before it impacts the business. By empowering data teams with end-to-end data reliability, Monte Carlo helps organizations save time, increase revenue, and restore trust in their data. Visit dataengineeringpodcast.com/montecarlo today to request a demo and see how Monte Carlo delivers data observability across your data infrastructure. The first 25 will receive a free, limited edition Monte Carlo hat! Your host is Tobias Macey and today I’m interviewing Kevin Stumpf about Tecton and the role that the feature store plays in a modern MLOps platform

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what you are building at Tecton and your motivation for starting the business? For anyone who isn’t familiar with the concept, what is an example of a feature? How do you define what a feature store is? What role does a feature store play in the overall lifecycle of a machine learning p

Modernizing Applications with IBM CICS

IBM® CICS® is a mixed language application server that runs on IBM Z®. Over the 50 years since CICS was introduced in 1969, enterprises have used the qualities of service (QoSs) that CICS provides to allow them to create high throughput and secure transactional applications that have powered their business. As the IT landscape has evolved, so has CICS to allow these applications to integrate with new platforms and still provide value to the rest of the business. Because of this capability, many businesses still rely on CICS to power their core applications. This IBM Redpaper publication focuses on modernizing these CICS applications, allowing them to integrate with cloud-native applications. This modernization can be achieved either by constructing application programming interfaces (APIs) that allow new cloud-native applications to connect to your existing assets, rewriting parts of your application in newer languages and hosting them back on CICS, or by using CICS capabilities to extend your applications to provide new capabilities and functions. The paper takes a traditional example application and shows you how it works. Then, the paper extends the example, rewrites portions of its functions, and enables its APIs. It also explains how CICS applications can use continuous integration (CI) and continuous delivery (CD) to deliver, test, and deploy code into CICS easily and with quality.

Summary Data governance is a term that encompasses a wide range of responsibilities, both technical and process oriented. One of the more complex aspects is that of access control to the data assets that an organization is responsible for managing. The team at Immuta has built a platform that aims to tackle that problem in a flexible and maintainable fashion so that data teams can easily integrate authorization, data masking, and privacy enhancing technologies into their data infrastructure. In this episode Steve Touw and Stephen Bailey share what they have built at Immuta, how it is implemented, and how it streamlines the workflow for everyone involved in working with sensitive data. If you are starting down the path of implementing a data governance strategy then this episode will provide a great overview of what is involved.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Feature flagging is a simple concept that enables you to ship faster, test in production, and do easy rollbacks without redeploying code. Teams using feature flags release new software with less risk, and release more often. ConfigCat is a feature flag service that lets you easily add flags to your Python code, and 9 other platforms. By adopting ConfigCat you and your manager can track and toggle your feature flags from their visual dashboard without redeploying any code or configuration, including granular targeting rules. You can roll out new features to a subset or your users for beta testing or canary deployments. With their simple API, clear documentation, and pricing that is independent of your team size you can get your first feature flags added in minutes without breaking the bank. Go to dataengineeringpodcast.com/configcat today to get 35% off any paid plan with code DATAENGINEERING or try out their free forever plan. You invest so much in your data infrastructure – you simply can’t afford to settle for unreliable data. Fortunately, there’s hope: in the same way that New Relic, DataDog, and other Application Performance Management solutions ensure reliable software and keep application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo’s end-to-end Data Observability Platform monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence. The platform uses machine learning to infer and learn your data, proactively identify data issues, assess its impact through lineage, and notify those who need to know before it impacts the business. By empowering data teams with end-to-end data reliability, Monte Carlo helps organizations save time, increase revenue, and restore trust in their data. Visit dataengineeringpodcast.com/montecarlo today to request a demo and see how Monte Carlo delivers data observability across your data inf

Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle

Discover the capabilities of PySpark and its application in the realm of data science. This comprehensive guide with hand-picked examples of daily use cases will walk you through the end-to-end predictive model-building cycle with the latest techniques and tricks of the trade. Applied Data Science Using PySpark is divided unto six sections which walk you through the book. In section 1, you start with the basics of PySpark focusing on data manipulation. We make you comfortable with the language and then build upon it to introduce you to the mathematical functions available off the shelf. In section 2, you will dive into the art of variable selection where we demonstrate various selection techniques available in PySpark. In section 3, we take you on a journey through machine learning algorithms, implementations, and fine-tuning techniques. We will also talk about different validation metrics and how to use them for picking the best models. Sections 4 and 5 go through machine learning pipelines and various methods available to operationalize the model and serve it through Docker/an API. In the final section, you will cover reusable objects for easy experimentation and learn some tricks that can help you optimize your programs and machine learning pipelines. By the end of this book, you will have seen the flexibility and advantages of PySpark in data science applications. This book is recommended to those who want to unleash the power of parallel computing by simultaneously working with big datasets. What You Will Learn Build an end-to-end predictive model Implement multiple variable selection techniques Operationalize models Master multiple algorithms and implementations Who This Book is For Data scientists and machine learning and deep learning engineers who want to learn and use PySpark for real-time analysis of streamingdata.

Summary As a data engineer you’re familiar with the process of collecting data from databases, customer data platforms, APIs, etc. At YipitData they rely on a variety of alternative data sources to inform investment decisions by hedge funds and businesses. In this episode Andrew Gross, Bobby Muldoon, and Anup Segu describe the self service data platform that they have built to allow data analysts to own the end-to-end delivery of data projects and how that has allowed them to scale their output. They share the journey that they went through to build a scalable and maintainable system for web scraping, how to make it reliable and resilient to errors, and the lessons that they learned in the process. This was a great conversation about real world experiences in building a successful data-oriented business.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unlock the full potential of your cloud data platforms. Learn how we streamline and accelerate manual processes to help you derive real results from your data at dataengineeringpodcast.com/immuta. Today’s episode of the Data Engineering Podcast is sponsored by Datadog, a SaaS-based monitoring and analytics platform for cloud-scale infrastructure, applications, logs, and more. Datadog uses machine-learning based algorithms to detect errors and anomalies across your entire stack—which reduces the time it takes to detect and address outages and helps promote collaboration between Data Engineering, Operations, and the rest of the company. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial. If you start a trial and install Datadog’s agent, Datadog will send you a free T-shirt. Your host is Tobias Macey and today I’m interviewing Andrew Gross, Bobby Muldoon, and Anup Segu about they are building pipelines at Yipit Data

Interview

Introduction How did you get involved in the area of data management? Can you start by giving an overview of what YipitData does? What kinds of data sources and data assets are you working with? What is the composition of your data teams and how are they structured? Given the use of your data products in the financial sector how do you handle monitoring and alerting around data qualit

Custom Fiori Applications in SAP HANA: Design, Develop, and Deploy Fiori Applications for the Enterprise

Get started building custom Fiori applications for your enterprise. This book teaches you how to design, build, and deploy enterprise-ready, custom Fiori applications in SAP HANA. Tips and tricks collected from projects using Fiori applications (built consuming OData models and REST APIs) and integrating third-party JS libraries are presented. Also included are examples using Fiori templates from different tools such as the SAP Web IDE and the new Visual Studio Code extensions. This book explains the 5 design principles that all Fiori applications are built upon: Role-based, Responsive, Coherent, Simple, and Delightful. The book expands on consuming OData services and REST APIs internal and external to SAP HANA. The Fiori application exercise demonstrates the use of the MVC pattern, JavaScript modularization, reuse of SAP UI5 controls, debugging, and the tools required for a complete scenario. The book closes with an exercise showcasing a finished single page application with multiple views and layouts, navigation between the views, and deployment of the application to AWS. This book is simple enough for entry-level developers getting started in web frameworks but also highlights integration points from the data models being consumed from the application, and shows how the application communicates with back-end services, resulting in a complete front-end custom Fiori application. What You Will Learn Know the 5 Fiori design principles Understand how to consume OData and REST API models Apply the MVC pattern using XML views and the SAP UI5 controls along with controller behavior in JavaScript Debug and deploy the application Who This Book is For Web developers and application leads who have some experience in JavaScript frameworks and web development and understand web protocol communication

Summary The core mission of data engineers is to provide the business with a way to ask and answer questions of their data. This often takes the form of business intelligence dashboards, machine learning models, or APIs on top of a cleaned and curated data set. Despite the rapid progression of impressive tools and products built to fulfill this mission, it is still an uphill battle to tie everything together into a cohesive and reliable platform. At Isima they decided to reimagine the entire ecosystem from the ground up and built a single unified platform to allow end-to-end self service workflows from data ingestion through to analysis. In this episode CEO and co-founder of Isima Darshan Rawal explains how the biOS platform is architected to enable ease of use, the challenges that were involved in building an entirely new system from scratch, and how it can integrate with the rest of your data platform to allow for incremental adoption. This was an interesting and contrarian take on the current state of the data management industry and is worth a listen to gain some additional perspective.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Follow go.datafold.com/dataengineeringpodcast to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unlock the full potential of your cloud data platforms. Learn how we streamline and accelerate manual processes to help y

Practical Azure SQL Database for Modern Developers: Building Applications in the Microsoft Cloud

Here is the expert-level, insider guidance you need on using Azure SQL Database as your back-end data store. This book highlights best practices in everything ranging from full-stack projects to mobile applications to critical, back-end APIs. The book provides instruction on accessing your data from any language and platform. And you learn how to push processing-intensive work into the database engine to be near the data and avoid undue networking traffic. Azure SQL is explained from a developer's point of view, helping you master its feature set and create applications that perform well and delight users. Core to the book is showing you how Azure SQL Database provides relational and post-relational support so that any workload can be managed with easy accessibility from any platform and any language. You will learn about features ranging from lock-free tables to columnstore indexes, and about support for data formats ranging from JSON and key-values to the nodes and edges in the graph database paradigm. Reading this book prepares you to deal with almost all data management challenges, allowing you to create lean and specialized solutions having the elasticity and scalability that are needed in the modern world. What You Will Learn Master Azure SQL Database in your development projects from design to the CI/CD pipeline Access your data from any programming language and platform Combine key-value, JSON, and relational data in the same database Push data-intensive compute work into the database for improved efficiency Delight your customers by detecting and improving poorly performing queries Enhance performance through features such as columnstore indexes and lock-free tables Build confidence in your mastery of Azure SQL Database's feature set Who This Book Is For Developers of applications and APIs that benefit from cloud database support, developers who wish to master their tools (including Azure SQL Database, and those who want their applications to be known for speedy performance and the elegance of their code

Summary Kafka has become a de facto standard interface for building decoupled systems and working with streaming data. Despite its widespread popularity, there are numerous accounts of the difficulty that operators face in keeping it reliable and performant, or trying to scale an installation. To make the benefits of the Kafka ecosystem more accessible and reduce the operational burden, Alexander Gallego and his team at Vectorized created the Red Panda engine. In this episode he explains how they engineered a drop-in replacement for Kafka, replicating the numerous APIs, that can scale more easily and deliver consistently low latencies with a much lower hardware footprint. He also shares some of the areas of innovation that they have found to help foster the next wave of streaming applications while working within the constraints of the existing Kafka interfaces. This was a fascinating conversation with an energetic and enthusiastic engineer and founder about the challenges and opportunities in the realm of streaming data.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unlock the full potential of your cloud data platforms. Learn how we streamline and accelerate manual processes to help you derive real results from your data at dataengineeringpodcast.com/immuta. If you’re looking for a way to optimize your data engineering pipeline – with instant query performance – look no further than Qubz. Qubz is next-generation OLAP technology built for the scale of Big Data from UST Global, a renowned digital services provider. Qubz lets users and enterprises analyze data on the cloud and on-premise, with blazing speed, while eliminating the complex engineering required to operationalize analytics at scale. With an emphasis on visual data engineering, connectors for all major BI tools and data sources, Qubz allow users to query OLAP cubes with sub-second response times on hundreds of billions of rows. To learn more, and sign up for a free demo, visit dataengineeringpodcast.com/qubz. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data platforms. For more opportunities to s

Privacy Optimization Meets Pandemic Tracking

Can smartphone apps help track the spread of the novel coronavirus, privately and securely? In this report, Rob Pegoraro weighs the issue of whether mobile apps can help trace and then slow the spread of COVID-19 or will end up as just another episode of botched government procurement and application of technology. Apple and Google have recently devised a system to track COVID-19 infections anonymously using Bluetooth with iOS and Android smartphones. This development points a spotlight on a needed debate about balancing privacy and collecting useful data. Do privacy-optimizing techniques, such as federated learning and differential privacy, offer useful alternatives to building centralized databases that may later invite abuse? This report takes a close look at this subject and then provides recommendations for software developers, public health authorities, and elected officials who want to build on the Apple-Google API. Understand the scope of the problem, including how contact tracing can help slow and stop outbreaks Take a closer look at Apple and Google’s proposed remedy Learn how other countries including Singapore, India, France, and Australia have traced the spread of COVID-19 Examine the risk factors for adopting and using a decentralized system like the Apple-Google app

Microservices in SAP HANA XSA: A Guide to REST APIs Using Node.js

Build enterprise-grade microservices in the SAP HANA Advanced Model (XSA). This book explains building scalable APIs in XSA and the benefits of building microservices with SAP HANA XSA. This book covers the cloud foundry (CF) architecture and how SAP HANA XSA follows the model. It begins with the details of the different architectural layers of applications hosted in XSA (specifically, microservices). Everything you need to know is presented, including analyzing requests, modularization, database ingestion, building JSON responses, and scaling your microservices. You will learn to use developmental tools such as the SAP WEB IDE, POSTMAN, and the SAP HANA Cockpit for XSA, including debugging examples on SAP HANA XSA with code snippets showing how microservices can be developed, debugged, scaled, and deployed on SAP HANA XSA. Microservices are divided into security and authentication, request handling, modularization of Node.js, and interaction with the SAP HANA database containers and response formatting. An end-to-end scenario is presented of a Node.js REST API that uses HTTP methods, concluding with deploying an SAP HANA XSA project to a production environment. This book is simple enough to help you implement a Node.js module in order to understand the development of microservices, and complex enough for architects to design their next business-ready solution integrating UAA security, application modularization, and an end-to-end REST API on SAP HANA XSA. What You Will Learn Know the definition and architecture of cloud foundry and its application on SAP HANA XSA Understand REST principles and different HTTP methods Explore microservices (Node.js) development Database interaction from Node (executing SQL statements and stored procedures) Who This Book Is For Architects designing business-ready solutions that integrate UAA security, application modularization, and an end-to-end REST API on SAP HANA XSA

Learning Spark, 2nd Edition

Data is bigger, arrives faster, and comes in a variety of formatsâ??and it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently? Enter Apache Spark. Updated to include Spark 3.0, this second edition shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms. Through step-by-step walk-throughs, code snippets, and notebooks, youâ??ll be able to: Learn Python, SQL, Scala, or Java high-level Structured APIs Understand Spark operations and SQL Engine Inspect, tune, and debug Spark operations with Spark configurations and Spark UI Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka Perform analytics on batch and streaming data using Structured Streaming Build reliable data pipelines with open source Delta Lake and Spark Develop machine learning pipelines with MLlib and productionize models using MLflow

In PayPal we decided to move away from two of our enterprise schedulers, Control-M and UC4, to Airflow. As we started the journey, the first most important step that we wanted to take was to build all the mandatory API’s on the top of Airflow so that we could integrate with our Self-Service Tools. In this talk we share the challenges that we ran into while building APIs on top of Airflow and how we overcame them.

Spark in Action, Second Edition

The Spark distributed data processing platform provides an easy-to-implement tool for ingesting, streaming, and processing data from any source. In Spark in Action, Second Edition, you’ll learn to take advantage of Spark’s core features and incredible processing speed, with applications including real-time computation, delayed evaluation, and machine learning. Spark skills are a hot commodity in enterprises worldwide, and with Spark’s powerful and flexible Java APIs, you can reap all the benefits without first learning Scala or Hadoop. About the Technology Analyzing enterprise data starts by reading, filtering, and merging files and streams from many sources. The Spark data processing engine handles this varied volume like a champ, delivering speeds 100 times faster than Hadoop systems. Thanks to SQL support, an intuitive interface, and a straightforward multilanguage API, you can use Spark without learning a complex new ecosystem. About the Book Spark in Action, Second Edition, teaches you to create end-to-end analytics applications. In this entirely new book, you’ll learn from interesting Java-based examples, including a complete data pipeline for processing NASA satellite data. And you’ll discover Java, Python, and Scala code samples hosted on GitHub that you can explore and adapt, plus appendixes that give you a cheat sheet for installing tools and understanding Spark-specific terms. What's Inside Writing Spark applications in Java Spark application architecture Ingestion through files, databases, streaming, and Elasticsearch Querying distributed datasets with Spark SQL About the Reader This book does not assume previous experience with Spark, Scala, or Hadoop. About the Author Jean-Georges Perrin is an experienced data and software architect. He is France’s first IBM Champion and has been honored for 12 consecutive years. Quotes This book reveals the tools and secrets you need to drive innovation in your company or community. - Rob Thomas, IBM An indispensable, well-paced, and in-depth guide. A must-have for anyone into big data and real-time stream processing. - Anupam Sengupta, GuardHat Inc. This book will help spark a love affair with distributed processing. - Conor Redmond, InComm Product Control Currently the best book on the subject! - Markus Breuer, Materna IPS

Send us a text Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next.  Abstract Currently, COVID-19 is disrupting the world. In an effort to better provide updated information on the status of the pandemic, IBM and The Weather Channel have created a COVID-19 dashboard. Bill Higgins, IBM Distinguished Engineer, and Daniel Benoit, Program Director of Information Governance have come on the podcast this week to discuss this new initiative. Together, with host Al Martin, they discuss the purpose of this project, their current findings, and how they personally have been impacted.  Check out the dashboard here.  Connect with Bill LinkedIn Connect with Daniel LinkedIn Show Notes 10:16 - Get up to speed on Natural Language Processing here. 14:47 - Not sure what a data lake is? Find out here.   21:49 - Learn more on why extensibility is important to your API's here.  Connect with the Team Producer Liam Seston - LinkedIn. Producer Lana Cosic - LinkedIn. Producer Meighann Helene - LinkedIn.  Producer Kate Brown - LinkedIn. Producer Allison Proctor - LinkedIn. Producer Mar Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.