data-engineering

Hacking MySQL: Breaking, Optimizing, and Securing MySQL for Your Use Case

2024-12-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Lukas Vileikis (Severalnines)

MySQL Cyber Security data relational-databases

Your MySQL instances are probably broken. Many developers face slow-running queries, issues related to database architecture, replication, or database security—and that’s only the beginning. This book will deliver answers to your most pressing MySQL database questions related to performance, availability, or security by uncovering what causes databases to break in the first place. At its core, this book provides you with the knowledge necessary for you to break your database instances so you can better optimize it for performance and secure it from data breaches. In other words, you’ll discover the sorts of actions, minor and major, that degrade databases so you can fix and ultimately preempt them. MySQL sometimes acts according to its own rules, and this book will help you keep it working on your terms. At the same time, you will learn to optimize your backup and recovery procedures, determine when and which data to index to achieve maximum performance, and choose the best MySQL configurations, among other essential skills. Most MySQL books focus exclusively on optimization, but this book argues that it’s just as important to pay attention to the ways databases break. Indeed, after reading this book, you will be able to safely break your database instances to expose and overcome the nuanced issues that affect performance, availability, and security. What You Will Learn Know the basics of MySQL and the storage engines innoDB and MyISAM Spot the ways you are harming your database’s performance, availability and security without even realizing it Fix minor bugs and issues that have surprisingly serious impact Optimize schema, data types, queries, indexes, and partitions to head off issues Understand key MySQL security strategies Who This Book Is For Database administrators, web developers, systems administrators, and security professionals with an intermediary knowledge of database management systems and building applications in MySQL

Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle

2024-12-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Venkata Gunnu , Balaji Dhamodharan , Ramcharan Kakarla , Sundar Krishnan

AI/ML API Data Science Docker PySpark Spark apache-spark data

This comprehensive guide, featuring hand-picked examples of daily use cases, will walk you through the end-to-end predictive model-building cycle using the latest techniques and industry tricks. In Chapters 1, 2, and 3, we will begin by setting up the environment and covering the basics of PySpark, focusing on data manipulation. Chapter 4 delves into the art of variable selection, demonstrating various techniques available in PySpark. In Chapters 5, 6, and 7, we explore machine learning algorithms, their implementations, and fine-tuning techniques. Chapters 8 and 9 will guide you through machine learning pipelines and various methods to operationalize and serve models using Docker/API. Chapter 10 will demonstrate how to unlock the power of predictive models to create a meaningful impact on your business. Chapter 11 introduces some of the most widely used and powerful modeling frameworks to unlock real value from data. In this new edition, you will learn predictive modeling frameworks that can quantify customer lifetime values and estimate the return on your predictive modeling investments. This edition also includes methods to measure engagement and identify actionable populations for effective churn treatments. Additionally, a dedicated chapter on experimentation design has been added, covering steps to efficiently design, conduct, test, and measure the results of your models. All code examples have been updated to reflect the latest stable version of Spark. You will: Gain an overview of end-to-end predictive model building Understand multiple variable selection techniques and their implementations Learn how to operationalize models Perform data science experiments and learn useful tips

Data Engineering with AWS Cookbook

2024-11-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Viquar Khan , Trâm Ngọc Phạm , Gonzalo Herreros González , Huda Nofal

Analytics Athena AWS Amazon EMR AWS Glue Big Data Cloud Computing Data Engineering Data Lake ETL/ELT QuickSight Redshift +1 more

Data Engineering with AWS Cookbook serves as a comprehensive practical guide for building scalable and efficient data engineering solutions using AWS. With this book, you will master implementing data lakes, orchestrating data pipelines, and creating serving layers using AWS's robust services, such as Glue, EMR, Redshift, and Athena. With hands-on exercises and practical recipes, you will enhance your AWS-based data engineering projects. What this Book will help me do Gain the skills to design centralized data lake solutions and manage them securely at scale. Develop expertise in crafting data pipelines with AWS's ETL technologies like Glue and EMR. Learn to implement and automate governance, orchestration, and monitoring for data platforms. Build high-performance data serving layers using AWS analytics tools like Redshift and QuickSight. Effectively plan and execute data migrations to AWS from on-premises infrastructure. Author(s) Trâm Ngọc Phạm, Gonzalo Herreros González, Viquar Khan, and Huda Nofal bring together years of collective experience in data engineering and AWS cloud solutions. Each author's deep knowledge and passion for cloud technology have shaped this book into a valuable resource, geared towards practical learning and real-world application. Their approach ensures readers are not just learning but building tangible, impactful solutions. Who is it for? This book is geared towards data engineers and big data professionals engaged in or transitioning to cloud-based environments, specifically on AWS. Ideal readers are those looking to optimize workflows and master AWS tools to create scalable, efficient solutions. The content assumes a basic familiarity with AWS concepts like IAM roles and a command-line interface, ensuring all examples are accessible yet meaningful for those seeking advancement in AWS data engineering.

Managing Data as a Product

2024-11-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Andrea Gioia (Quantyca)

AI/ML Data Engineering Data Management Data Modelling data

Discover how to transform your data architecture with the insights and techniques presented in Managing Data as a Product by Andrea Gioia. In this comprehensive guide, you'll explore how to design, implement, and maintain data-product-centered systems to meet modern demands, achieving scalable and sustainable data management tailored to your organization's needs. What this Book will help me do Understand the principles of data-product-centered architectures and their advantages. Learn to design, develop, and operate data products in production settings. Explore strategies to manage the lifecycle of data products efficiently. Gain insights into team topologies and data ownership for distributed systems. Discover data modeling techniques for AI-ready architectures. Author(s) Andrea Gioia is a renowned data architect and the creator of the Open Data Mesh Initiative. With over 20 years of experience, Andrea has successfully led complex data projects and is passionate about sharing his expertise. His writing is practical and driven by real-world challenges, aiming to equip engineers with actionable knowledge. Who is it for? This book is ideal for data engineers, software architects, and engineering leaders involved in shaping innovative data architectures. If you have foundational knowledge of data engineering and are eager to advance your expertise by adopting data-product principles, this book will suit your needs. It is for professionals aiming to modernize and optimize their approach to organizational data management.

Evolve from Infrastructure to Innovation with SAP on AWS: Strategize Beyond Infrastructure for Extending your SAP applications, Data Management, IoT & AI/ML integration and IT Operations using AWS Services

2024-11-27 · O'Reilly AI & ML Books O'Reilly Amazon

book

by Bidwan Baruah , Krishnakumar Ramadoss , Abarajith Vivekanandha

AI/ML AWS Cloud Computing Data Management IoT SAP data

The world of SAP is undergoing a major transformation, with many customers either planning or actively modernizing their SAP landscapes as part of the S/4HANA digital transformation. Given the extensive SAP transformation efforts adopted by nearly all SAP customers in recent years and the profound impact these digital changes have had on their business models and IT organizations, the authors decided to write this book. As customers embark on their SAP on AWS journey, they face three main challenges: deciding on the overall strategy, selecting the right business use cases and implementing them effectively. This book aims to address these challenges by guiding readers through the process of identifying and executing the appropriate use cases. It will highlight how customers can harness AWS services beyond merely hosting their SAP systems on AWS, demonstrating the potential of these services to drive innovation. This book covers the entire journey, from defining strategy and identifying business use cases to their implementation, providing practical tips, strategies, and insights. It serves as an essential guide for customers planning to migrate or those who have already migrated their SAP workloads to AWS, helping them explore beyond just the infrastructure aspects of their journey. You Will : Discover how to go beyond just hosting SAP systems on AWS, using the full range of AWS services to innovate and extend your SAP applications. Learn how to identify the right business use cases and implement them effectively, with practical examples and real-world scenarios. Develop the mindset and skills needed to architect modern, cloud-native, event-driven architectures, balancing trade-offs between simplicity, efficiency, and cost. This book is for: Business leaders, IT professionals, and SAP specialists who are looking to modernize their SAP landscapes by leveraging AWS services

Learn FileMaker Pro 2024: The Comprehensive Guide to Building Custom Databases

2024-11-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Mark Conway Munro

AI/ML API LLM data filemaker

FileMaker Pro is a development platform from Claris International Inc., a subsidiary of Apple Inc. The software makes it easy for everyone to create powerful, multi-user, cross-platform, relational database applications. This book navigates the reader through the software in a clear and logical manner, with each chapter building on the previous one. After an initial review of the user environment and application basics, the book delves into a deep exploration of the integrated development environment, which seamlessly combines the full stack of schema, business logic, and interface layers into a unified visual programming experience. Everything beginners need to get started is covered, along with advanced material that seasoned professionals will appreciate. Written by a professional developer with decades of real-world experience, "Learn FileMaker Pro 2024" is a comprehensive learning and reference guide. Join millions of users and developers worldwide in achieving a new level of workflow efficiency with FileMaker. For This New Edition This third edition includes clearer lessons and more examples, making it easier than ever to start planning, building, and deploying a custom database solution. It covers dozens of new and modified features introduced in versions 19.1 to 19.6, as well as the more recent 2023 (v20) and 2024 (v21) releases. Whatever your level of experience, this book has something new for you! What You’ll Learn · Plan and create custom tables, fields, and relationships · Write calculations using built-in and custom functions · Build layouts with dynamic objects, themes, and custom menus · Automate tasks with scripts and link them to objects and interface events · Keep database files secure and healthy · Integrate with external systems using ODBC, cURL, and the FM API · Deploy solutions to share with desktop, iOS, and web clients · Learn about summary reports, dynamic object references, and transactions · Delve into artificial intelligence with CoreML, OpenAI, and Semantic Finds Who This Book Is For Hobbyist developers, professional consultants, IT staff

Data-driven Models in Inverse Problems

2024-11-18 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tatiana A. Bubba

data data-models

Advances in learning-based methods are revolutionizing several fields in applied mathematics, including inverse problems, resulting in a major paradigm shift towards data-driven approaches. This volume, which is inspired by this cutting-edge area of research, brings together contributors from the inverse problem community and shows how to successfully combine model- and data-driven approaches to gain insight into practical and theoretical issues.

Apache Spark for Machine Learning

2024-11-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Deepak Gowda

AI/ML Big Data Computer Science Spark apache-spark data

Dive into the power of Apache Spark as a tool for handling and processing big data required for machine learning. With this book, you will explore how to configure, execute, and deploy machine learning algorithms using Spark's scalable architecture and learn best practices for implementing real-world big data solutions. What this Book will help me do Understand the integration of Apache Spark with large-scale infrastructures for machine learning applications. Employ data processing techniques for preprocessing and feature engineering efficiently with Spark. Master the implementation of advanced supervised and unsupervised learning algorithms using Spark. Learn to deploy machine learning models within Spark ecosystems for optimized performance. Discover methods for analyzing big data trends and machine learning model tuning for improved accuracy. Author(s) The author, Deepak Gowda, is an experienced data scientist with over ten years of expertise in machine learning and big data. His career spans industries such as supply chain, cybersecurity, and more where he has utilized Apache Spark extensively. Deepak's teaching style is marked by clarity and practicality, making complex concepts approachable. Who is it for? Apache Spark for Machine Learning is tailored for data engineers, machine learning practitioners, and computer science students looking to advance their ability to process, analyze, and model using large datasets. If you're already familiar with basic machine learning and want to scale your solutions using Spark, this book is ideal for your studies and professional growth.

Apache Airflow Best Practices

2024-10-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dylan Intorf , Dylan Storey , Kendrick van Doorn

Airflow AWS Cloud Computing Data Engineering GCP Python apache-airflow data

"Apache Airflow Best Practices" is your go-to guide for mastering data workflow orchestration using Apache Airflow. This book introduces you to core concepts and features of Airflow and helps you efficiently design, deploy, and manage workflows. With detailed examples and hands-on tutorials, you'll learn how to tackle real-world challenges in data engineering. What this Book will help me do Understand and utilize the features and updates introduced in Apache Airflow 2.x. Design and implement robust, scalable, and efficient data pipelines and workflows. Learn best practices for deploying Apache Airflow in cloud environments such as AWS and GCP. Extend Airflow's functionality with custom plugins and advanced configuration. Monitor, maintain, and scale your Airflow deployment effectively for high availability. Author(s) Dylan Intorf, Dylan Storey, and Kendrick van Doorn are seasoned professionals in data engineering, data strategy, and software development. Between them, they bring decades of experience working in diverse industries like finance, tech, and life sciences. They bring their expertise into this practical guide to help practitioners understand and master Apache Airflow. Who is it for? This book is tailored for data professionals such as data engineers, scientists, and system administrators, offering valuable insights for new learners and experienced users. If you're starting with workflow orchestration, seeking to optimize your current Airflow implementation, or scaling efforts, this book aligns with your goals. Readers should have a basic knowledge of Python programming and data engineering principles.

Building Modern Data Applications Using Databricks Lakehouse

2024-10-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Will Girten

AI/ML CI/CD Cloud Computing Data Lakehouse Data Quality Databricks DataOps Delta Python Spark Terraform data +2 more

This book, "Building Modern Data Applications Using Databricks Lakehouse," provides a comprehensive guide for data professionals to master the Databricks platform. You'll learn to effectively build, deploy, and monitor robust data pipelines with Databricks' Delta Live Tables, empowering you to manage and optimize cloud-based data operations effortlessly. What this Book will help me do Understand the foundations and concepts of Delta Live Tables and its role in data pipeline development. Learn workflows to process and transform real-time and batch data efficiently using the Databricks lakehouse architecture. Master the implementation of Unity Catalog for governance and secure data access in modern data applications. Deploy and automate data pipeline changes using CI/CD, leveraging tools like Terraform and Databricks Asset Bundles. Gain advanced insights in monitoring data quality and performance, optimizing cloud costs, and managing DataOps tasks effectively. Author(s) Will Girten, the author, is a seasoned Solutions Architect at Databricks with over a decade of experience in data and AI systems. With a deep expertise in modern data architectures, Will is adept at simplifying complex topics and translating them into actionable knowledge. His books emphasize real-time application and offer clear, hands-on examples, making learning engaging and impactful. Who is it for? This book is geared towards data engineers, analysts, and DataOps professionals seeking efficient strategies to implement and maintain robust data pipelines. If you have a basic understanding of Python and Apache Spark and wish to delve deeper into the Databricks platform for streamlining workflows, this book is tailored for you.

Delta Lake: The Definitive Guide

2024-10-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Denny Lee (Databricks) , Prashanth Babu (Databricks) , Tristen Wentling (Databricks) , Scott Haines (Databricks)

Flink Data Engineering Delta Kafka Data Streaming Trino data delta-lake storage-repositories

Ready to simplify the process of building data lakehouses and data pipelines at scale? In this practical guide, learn how Delta Lake is helping data engineers, data scientists, and data analysts overcome key data reliability challenges with modern data engineering and management techniques. Authors Denny Lee, Tristen Wentling, Scott Haines, and Prashanth Babu (with contributions from Delta Lake maintainer R. Tyler Croy) share expert insights on all things Delta Lake--including how to run batch and streaming jobs concurrently and accelerate the usability of your data. You'll also uncover how ACID transactions bring reliability to data lakehouses at scale. This book helps you: Understand key data reliability challenges and how Delta Lake solves them Explain the critical role of Delta transaction logs as a single source of truth Learn the Delta Lake ecosystem with technologies like Apache Flink, Kafka, and Trino Architect data lakehouses with the medallion architecture Optimize Delta Lake performance with features like deletion vectors and liquid clustering

Handling and Mapping Geographic Information

2024-10-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Hélène Mathian , Claire Cunty

data geographic-information-system-gis geographic information system (gis) location-data

With the increasing proliferation of data and the systematization of geographic information referencing, maps are now a major concern – not only for specialists, but also for urban planning and development organizations and the general public. However, while producing a map may seem straightforward, the actual process of transforming data into a useful map with a specific purpose is characterized by a series of precise operations that require knowledge in a variety of fields: statistics, geography, cartography and so on. Handling and Mapping Geographic Information presents a wide range of operations based on a variety of examples. Each chapter adopts a different approach, explaining the methodological choices made in relation to the theme and the pursued objective. This approach, encompassing the entire map production process, will enable all readers, whether students, researchers, teachers or planners, to understand the multiple roles that maps can play in the analysis of geographical data.

Aerospike: Up and Running

2024-10-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Albert Autin , V. Srinivasan , Paige Roberts (Vertica) , Tim Faulkes

NoSQL Cyber Security SQL data nosql-databases

If you're a developer looking to build a distributed, resilient, scalable, high-performance application, you may be evaluating distributed SQL and NoSQL solutions. Perhaps you're considering the Aerospike database. This practical book shows developers, architects, and engineers how to get the highly scalable and extremely low-latency Aerospike database up and running. You will learn how to power your globally distributed applications and take advantage of Aerospike's hybrid memory architecture with the real-time performance of in-memory plus dependable persistence. After reading this book, you'll be able to build applications that can process up to tens of millions of transactions per second for millions of concurrent users on any scale of data. This practical guide provides: Step-by-step instructions on installing and connecting to Aerospike A clear explanation of the programming models available All the advice you need to develop your Aerospike application Coverage of issues such as administration, connectors, consistency, and security Code examples and tutorials to get you up and running quickly And more

Databricks Data Intelligence Platform: Unlocking the GenAI Revolution

2024-10-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jason Yip , Nikhil Gupta

AI/ML Analytics Data Governance Data Lakehouse Data Science Databricks Delta GenAI LLM RAG Cyber Security SQL +2 more

This book is your comprehensive guide to building robust Generative AI solutions using the Databricks Data Intelligence Platform. Databricks is the fastest-growing data platform offering unified analytics and AI capabilities within a single governance framework, enabling organizations to streamline their data processing workflows, from ingestion to visualization. Additionally, Databricks provides features to train a high-quality large language model (LLM), whether you are looking for Retrieval-Augmented Generation (RAG) or fine-tuning. Databricks offers a scalable and efficient solution for processing large volumes of both structured and unstructured data, facilitating advanced analytics, machine learning, and real-time processing. In today's GenAI world, Databricks plays a crucial role in empowering organizations to extract value from their data effectively, driving innovation and gaining a competitive edge in the digital age. This book will not only help you master the Data Intelligence Platform but also help power your enterprise to the next level with a bespoke LLM unique to your organization. Beginning with foundational principles, the book starts with a platform overview and explores features and best practices for ingestion, transformation, and storage with Delta Lake. Advanced topics include leveraging Databricks SQL for querying and visualizing large datasets, ensuring data governance and security with Unity Catalog, and deploying machine learning and LLMs using Databricks MLflow for GenAI. Through practical examples, insights, and best practices, this book equips solution architects and data engineers with the knowledge to design and implement scalable data solutions, making it an indispensable resource for modern enterprises. Whether you are new to Databricks and trying to learn a new platform, a seasoned practitioner building data pipelines, data science models, or GenAI applications, or even an executive who wants to communicate the value of Databricks to customers, this book is for you. With its extensive feature and best practice deep dives, it also serves as an excellent reference guide if you are preparing for Databricks certification exams. What You Will Learn Foundational principles of Lakehouse architecture Key features including Unity Catalog, Databricks SQL (DBSQL), and Delta Live Tables Databricks Intelligence Platform and key functionalities Building and deploying GenAI Applications from data ingestion to model serving Databricks pricing, platform security, DBRX, and many more topics Who This Book Is For Solution architects, data engineers, data scientists, Databricks practitioners, and anyone who wants to deploy their Gen AI solutions with the Data Intelligence Platform. This is also a handbook for senior execs who need to communicate the value of Databricks to customers. People who are new to the Databricks Platform and want comprehensive insights will find the book accessible.

Data Engineering Best Practices

2024-10-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Richard J. Schiller , David Larochelle

Agile/Scrum AI/ML Analytics Big Data Cloud Computing Data Engineering ETL/ELT data

Unlock the secrets to building scalable and efficient data architectures with 'Data Engineering Best Practices.' This book provides in-depth guidance on designing, implementing, and optimizing cloud-based data pipelines. You will gain valuable insights into best practices, agile workflows, and future-proof designs. What this Book will help me do Effectively plan and architect scalable data solutions leveraging cloud-first strategies. Master agile processes tailored to data engineering for improved project outcomes. Implement secure, efficient, and reliable data pipelines optimized for analytics and AI. Apply real-world design patterns and avoid common pitfalls in data flow and processing. Create future-ready data engineering solutions following industry-proven frameworks. Author(s) Richard J. Schiller and David Larochelle are seasoned data engineering experts with decades of experience crafting efficient and secure cloud-based infrastructures. Their collaborative writing distills years of real-world expertise into practical advice aimed at helping engineers succeed in a rapidly evolving field. Who is it for? This book is ideal for data engineers, ETL specialists, and big data professionals seeking to enhance their knowledge in cloud-based solutions. Some familiarity with data engineering, ETL pipelines, and big data technologies is helpful. It suits those keen on mastering advanced practices, improving agility, and developing efficient data pipelines. Perfect for anyone looking to future-proof their skills in data engineering.

Azure SQL Revealed: The Next-Generation Cloud Database with AI and Microsoft Fabric

2024-10-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Bob Ward (Azure Data)

AI/ML Azure Cloud Computing GenAI Microsoft Fabric Cyber Security SQL azure-sql-database data relational-databases

Access detailed content and examples on Azure SQL, a set of cloud services that allows for SQL Server to be deployed in the cloud. This book teaches the fundamentals of deployment, configuration, security, performance, and availability of Azure SQL from the perspective of these same tasks and capabilities in SQL Server. This distinct approach makes this book an ideal learning platform for readers familiar with SQL Server on-premises who want to migrate their skills toward providing cloud solutions to an enterprise market that is increasingly cloud-focused. If you know SQL Server, you will love this book. You will be able to take your existing knowledge of SQL Server and translate that knowledge into the world of cloud services from the Microsoft Azure platform, and in particular into Azure SQL. This book provides information never seen before about the history and architecture of Azure SQL. Author Bob Ward is a leading expert with access to and support from the Microsoft engineering team that built Azure SQL and related database cloud services. He presents powerful, behind-the-scenes insights into the workings of one of the most popular database cloud services in the industry. This book also brings you the latest innovations for Azure SQL including Azure Arc, Hyperscale, generative AI applications, Microsoft Copilots, and integration with the Microsoft Fabric. What You Will Learn Know the history of Azure SQL Deploy, configure, and connect to Azure SQL Choose the correct way to deploy SQL Server in Azure Migrate existing SQL Server instances to Azure SQL Monitor and tune Azure SQL’s performance to meet your needs Ensure your data and application are highly available Secure your data from attack and theft Learn the latest innovations for Azure SQL including Hyperscale Learn how to harness the power of AI for generative data-driven applications and Microsoft Copilots for assistance Learn how to integrate Azure SQL with the unified data platform, the Microsoft Fabric Who This Book Is For This book is designed to teach SQL Server in the Azure cloud to the SQL Server professional. Anyone who operates, manages, or develops applications for SQL Server will benefit from this book. Readers will be able to translate their current knowledge of SQL Server—especially of SQL Server 2019 and 2022—directly to Azure. This book is ideal for database professionals looking to remain relevant as their customer base moves into the cloud.

Financial Data Engineering

2024-10-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tamer Khraisha

API Data Engineering Data Governance data

Today, investment in financial technology and digital transformation is reshaping the financial landscape and generating many opportunities. Too often, however, engineers and professionals in financial institutions lack a practical and comprehensive understanding of the concepts, problems, techniques, and technologies necessary to build a modern, reliable, and scalable financial data infrastructure. This is where financial data engineering is needed. A data engineer developing a data infrastructure for a financial product possesses not only technical data engineering skills but also a solid understanding of financial domain-specific challenges, methodologies, data ecosystems, providers, formats, technological constraints, identifiers, entities, standards, regulatory requirements, and governance. This book offers a comprehensive, practical, domain-driven approach to financial data engineering, featuring real-world use cases, industry practices, and hands-on projects. You'll learn: The data engineering landscape in the financial sector Specific problems encountered in financial data engineering The structure, players, and particularities of the financial data domain Approaches to designing financial data identification and entity systems Financial data governance frameworks, concepts, and best practices The financial data engineering lifecycle from ingestion to production The varieties and main characteristics of financial data workflows How to build financial data pipelines using open source tools and APIs Tamer Khraisha, PhD, is a senior data engineer and scientific author with more than a decade of experience in the financial sector.

Data Security Blueprints

2024-10-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Federico Castanedo

AI/ML Data Science Cyber Security data

Once you decide to implement a data security strategy, it can be difficult to know where to start. With so many potential threats and challenges to resolve, teams often try to fix everything at once. But this boil-the-ocean approach is difficult to manage efficiently and ultimately leads to frustration, confusion, and halted progress. There's a better way to go. In this report, data science and AI leader Federico Castanedo shows you what to look for in a data security platform that will deliver the speed, scale, and agility you need to be successful in today's fast-paced, distributed data ecosystems. Unlike other resources that focus solely on data security concepts, this guide provides a road map for putting those concepts into practice. This report reveals: The most common data security use cases and their potential challenges What to look for in a data security solution that's built for speed and scale Why increasingly decentralized data architectures require centralized, dynamic data security mechanisms How to implement the steps required to put common use cases into production Methods for assessing risks—and controls necessary to mitigate those risks How to facilitate cross-functional collaboration to put data security into practice in a scalable, efficient way You'll examine the most common data security use cases that global enterprises across every industry aim to achieve, including the specific steps needed for implementation as well as the potential obstacles these use cases present. Federico Castanedo is a data science and AI leader with extensive experience in academia, industry, and startups. Having held leadership positions at DataRobot and Vodafone, he has a successful track record of leading high-performing data science teams and developing data science and AI products with business impact.

Advanced interactive interfaces with Access: Building Interactive Interfaces with VBA

2024-10-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alessandro Grimaldi

VBA data database-management-tools microsoft-access

Explore and learn advanced techniques for working with graphical, interactive interfaces that can be built in Access. This book starts with best practices and tips to write code using VBA, and covers how to implement them in a real-world scenario. You will learn how to create and use VBA classes. An introduction to the binary code and the "bit vector" technique is discussed, followed by the implementation of a drag-and-drop engine. You also will learn how to design a timeline, and make it scrollable. What You Will Learn Write readable, easy-to-maintain code Add a drag-and-drop engine to an Access application Apply variations to the drag-and-drop technique to create different graphical effects Embed a scrollable timeline in an Access application, on which objects can be dynamically placed Who This Book Is For VBA developers

In-Memory Analytics with Apache Arrow - Second Edition

2024-09-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Matthew Topol (Voltron Data)

Analytics Arrow Dremio DuckDB Parquet Snowflake SQL apache-arrow data

Dive into efficient data handling with 'In-Memory Analytics with Apache Arrow.' This book explores Apache Arrow, a powerful open-source project that revolutionizes how tabular and hierarchical data are processed. You'll learn to streamline data pipelines, accelerate analysis, and utilize high-performance tools for data exchange. What this Book will help me do Understand and utilize the Apache Arrow in-memory data format for your data analysis needs. Implement efficient and high-speed data pipelines using Arrow subprojects like Flight SQL and Acero. Enhance integration and performance in analysis workflows by using tools like Parquet and Snowflake with Arrow. Master chaining and reusing computations across languages and environments with Arrow's cross-language support. Apply in real-world scenarios by integrating Apache Arrow with analytics systems like Dremio and DuckDB. Author(s) Matthew Topol, the author of this book, brings 15 years of technical expertise in the realm of data processing and analysis. Having worked across various environments and languages, Matthew offers insights into optimizing workflows using Apache Arrow. His approachable writing style ensures that complex topics are comprehensible. Who is it for? This book is tailored for developers, data engineers, and data scientists eager to enhance their analytic toolset. Whether you're a beginner or have experience in data analysis, you'll find the concepts actionable and transformative. If you are curious about improving the performance and capabilities of your analytic pipelines or tools, this book is for you.

talk-data.com

Activity Trend

Top Events

Top Speakers

Hacking MySQL: Breaking, Optimizing, and Securing MySQL for Your Use Case

Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle

Data Engineering with AWS Cookbook

Managing Data as a Product

Evolve from Infrastructure to Innovation with SAP on AWS: Strategize Beyond Infrastructure for Extending your SAP applications, Data Management, IoT & AI/ML integration and IT Operations using AWS Services

Learn FileMaker Pro 2024: The Comprehensive Guide to Building Custom Databases

Data-driven Models in Inverse Problems

Apache Spark for Machine Learning

Apache Airflow Best Practices

Building Modern Data Applications Using Databricks Lakehouse

Delta Lake: The Definitive Guide

Handling and Mapping Geographic Information

Aerospike: Up and Running

Databricks Data Intelligence Platform: Unlocking the GenAI Revolution

Data Engineering Best Practices

Azure SQL Revealed: The Next-Generation Cloud Database with AI and Microsoft Fabric

Financial Data Engineering

Data Security Blueprints

Advanced interactive interfaces with Access: Building Interactive Interfaces with VBA

In-Memory Analytics with Apache Arrow - Second Edition