talk-data.com talk-data.com

A

Speaker

Andy Oram

4

talks

author
Filtering by: O'Reilly Data Engineering Books ×

Filter by Event / Source

Talks & appearances

Showing 4 of 6 activities

Search activities →
The Evolving Role of the Data Engineer

Companies working to become data driven often view data scientists as heroes, but that overlooks the vital role that data engineers play in the process. While data scientists focus on finding new insights from datasets, data engineers deal with preparation—obtaining, cleaning, and creating enhanced versions of the data an organization needs. In this report, Andy Oram examines how the role of data engineer has quickly evolved. DBAs, software engineers, developers, and students will explore the responsibilities of modern data engineers and the skills and tools necessary to do the job. You’ll learn how to deal with software engineering concepts such as rapid and continuous development, automation and orchestration, modularity, and traceability. Decision makers considering a move to the cloud will also benefit from the in-depth discussion this report provides. This report covers: Major tasks of data engineers today The different levels of structure in data and ways to maximize its value Capabilities of third-party cloud options Tools for ingestion, transfer, and enrichment Using containers and VMs to run the tools Software engineering development Automation and orchestration of data engineering

Streaming Data

Managers and staff responsible for planning, hiring, and allocating resources need to understand how streaming data can fundamentally change their organizations. Companies everywhere are disrupting business, government, and society by using data and analytics to shape their business. Even if you don’t have deep knowledge of programming or digital technology, this high-level introduction brings data streaming into focus. You won’t find math or programming details here, or recommendations for particular tools in this rapidly evolving space. But you will explore the decision-making technologies and practices that organizations need to process streaming data and respond to fast-changing events. By describing the principles and activities behind this new phenomenon, author Andy Oram shows you how streaming data provides hidden gems of information that can transform the way your business works. Learn where streaming data comes from and how companies put it to work Follow a simple data processing project from ingesting and analyzing data to presenting results Explore how (and why) big data processing tools have evolved from MapReduce to Kubernetes Understand why streaming data is particularly useful for machine learning projects Learn how containers, microservices, and cloud computing led to continuous integration and DevOps

Data Lake Maturity Model

Data is changing everything. Many industries today are being fundamentally transformed through the accumulation and analysis of large quantities of data, stored in diversified but flexible repositories known as data lakes. Whether your company has just begun to think about big data or has already initiated a strategy for handling it, this practical ebook shows you how to plan a successful data lake migration. You’ll learn the value of data lakes, their structure, and the problems they attempt to solve. Using Zaloni’s data lake maturity model, you’ll then explore your organization’s readiness for putting a data lake into action. Do you have the tools and data architectures to support big data analysis? Are your people and processes prepared? The data lake maturity model will help you rate your organization’s readiness. This report includes: The structure and purpose of a data lake Descriptive, predictive, and prescriptive analytics Data lake curation, self-service, and the use of data lake zones How to rate your organization using the data lake maturity model A complete checklist to help you determine your strategic path forward

Managing the Data Lake

Organizations across many industries have recently created fast-growing repositories to deal with an influx of new data from many sources and often in multiple formats. To manage these data lakes, companies have begun to leave the familiar confines of relational databases and data warehouses for Hadoop and various big data solutions. But adopting new technology alone won’t solve the problem. Based on interviews with several experts in data management, author Andy Oram provides an in-depth look at common issues you’re likely to encounter as you consider how to manage business data. You’ll explore five key topic areas, including: Acquisition and ingestion: how to solve these problems with a degree of automation. Metadata: how to keep track of when data came in and how it was formatted, and how to make it available at later stages of processing. Data preparation and cleaning: what you need to know before you prepare and clean your data, and what needs to be cleaned up and how. Organizing workflows: what you should do to combine your tasks—ingestion, cataloging, and data preparation—into an end-to-end workflow. Access control: how to address security and access controls at all stages of data handling. Andy Oram, an editor at O’Reilly Media since 1992, currently specializes in programming. His work for O'Reilly includes the first books on Linux ever published commercially in the United States.