talk-data.com talk-data.com

Topic

data-engineering

3377

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: O'Reilly Data Engineering Books ×
Professional Hadoop

The professional's one-stop guide to this open-source, Java-based big data framework Professional Hadoop is the complete reference and resource for experienced developers looking to employ Apache Hadoop in real-world settings. Written by an expert team of certified Hadoop developers, committers, and Summit speakers, this book details every key aspect of Hadoop technology to enable optimal processing of large data sets. Designed expressly for the professional developer, this book skips over the basics of database development to get you acquainted with the framework's processes and capabilities right away. The discussion covers each key Hadoop component individually, culminating in a sample application that brings all of the pieces together to illustrate the cooperation and interplay that make Hadoop a major big data solution. Coverage includes everything from storage and security to computing and user experience, with expert guidance on integrating other software and more. Hadoop is quickly reaching significant market usage, and more and more developers are being called upon to develop big data solutions using the Hadoop framework. This book covers the process from beginning to end, providing a crash course for professionals needing to learn and apply Hadoop quickly. Configure storage, UE, and in-memory computing Integrate Hadoop with other programs including Kafka and Storm Master the fundamentals of Apache Big Top and Ignite Build robust data security with expert tips and advice Hadoop's popularity is largely due to its accessibility. Open-source and written in Java, the framework offers almost no barrier to entry for experienced database developers already familiar with the skills and requirements real-world programming entails. Professional Hadoop gives you the practical information and framework-specific skills you need quickly.

IBM TS7700 Release 3.3

IBM® TS7700 is a family of mainframe virtual tape solutions that optimize data protection and business continuance for IBM z Systems™ data. Through the use of virtualization and disk cache, the TS7700 family operates at disk speeds while maintaining compatibility with existing tape operations. Its fully integrated tiered storage hierarchy takes advantage of both disk and tape technologies to deliver performance for active data and best economics for inactive and archive data. This IBM Redbooks® publication describes the TS7700 R3.3 architecture, planning, migration, implementation, and operations. The latest TS7700 family of z Systems tape virtualization is offered as two models: IBM TS7720 features encryption-capable high-capacity cache that uses 3 TB SAS disk drives with RAID 6, which can scale to large capacities with the highest level of data protection. IBM TS7740 features encryption-capable 600 GB SAS drives with RAID 6 protection. Both models write data by policy to physical tape through attachment to high-capacity, high-performance IBM TS1150 and earlier IBM 3592 model tape drives that are installed in IBM TS3500 tape libraries. Physical tape support is optional on TS7720. TS7700 R3.3 also supports external key management for disk-based encryption by using IBM Security Key Lifecycle Manager. This book intended for system architects who want to integrate their storage systems for smoother operation.

Mastering Hibernate

Mastering Hibernate is your comprehensive guide to understanding and mastering Hibernate, a powerful Object-Relational Mapping tool for Java and .Net applications. Through this book, you will dive deep into the mechanics of Hibernate, exploring its core concepts and architecture. Whether you're working with SQL or NoSQL data stores, this book ensures you can unlock Hibernate's full potential. What this Book will help me do Grasp the internal workings of Hibernate, including its session management and entity lifecycle. Optimize mapping between Java classes and relational database structures for better performance. Effectively manage relationships and collections within your data models using Hibernate features. Utilize Hibernate's caching systems to improve application performance and scalability. Handle multi-tenant database configurations with confidence using Hibernate's architectural capabilities. Author(s) None Rad is an experienced software developer and educator specializing in Java-based applications and enterprise architecture. With years of hands-on practice using Hibernate in real-world scenarios, None Rad has curated this book to serve as a clear and practical guide. Their writing reflects deep technical expertise combined with an approachable and illustrative teaching style, ensuring learning is both effective and engaging. Who is it for? This book is ideal for software developers and engineers who are familiar with Java or other similar object-oriented programming languages. Whether you're a professional looking to deepen your understanding of Hibernate's internals or a developer aiming to create more efficient ORM solutions, this book has something for you. Readers should have a basic understanding of Java and relational databases, but no prior Hibernate expertise is required. By the end, you'll be equipped to confidently apply Hibernate to sophisticated data challenges.

Making Sense of Stream Processing

How can event streams help make your application more scalable, reliable, and maintainable? In this report, O’Reilly author Martin Kleppmann shows you how stream processing can make your data storage and processing systems more flexible and less complex. Structuring data as a stream of events isn’t new, but with the advent of open source projects such as Apache Kafka and Apache Samza, stream processing is finally coming of age. Using several case studies, Kleppmann explains how these projects can help you reorient your database architecture around streams and materialized views. The benefits of this approach include better data quality, faster queries through precomputed caches, and real-time user interfaces. Learn how to open up your data for richer analysis and make your applications more scalable and robust in the face of failures. Understand stream processing fundamentals and their similarities to event sourcing, CQRS, and complex event processing Learn how logs can make search indexes and caches easier to maintain Explore the integration of databases with event streams, using the new Bottled Water open source tool Turn your database architecture inside out by orienting it around streams and materialized views

Big Data in Practice

The best-selling author of Big Data is back, this time with a unique and in-depth insight into how specific companies use big data. Big data is on the tip of everyone's tongue. Everyone understands its power and importance, but many fail to grasp the actionable steps and resources required to utilise it effectively. This book fills the knowledge gap by showing how major companies are using big data every day, from an up-close, on-the-ground perspective. From technology, media and retail, to sport teams, government agencies and financial institutions, learn the actual strategies and processes being used to learn about customers, improve manufacturing, spur innovation, improve safety and so much more. Organised for easy dip-in navigation, each chapter follows the same structure to give you the information you need quickly. For each company profiled, learn what data was used, what problem it solved and the processes put it place to make it practical, as well as the technical details, challenges and lessons learned from each unique scenario. Learn how predictive analytics helps Amazon, Target, John Deere and Apple understand their customers Discover how big data is behind the success of Walmart, LinkedIn, Microsoft and more Learn how big data is changing medicine, law enforcement, hospitality, fashion, science and banking Develop your own big data strategy by accessing additional reading materials at the end of each chapter

Apache Hive Cookbook

Apache Hive Cookbook is a comprehensive resource for mastering Apache Hive, a tool that bridges the gap between SQL and Big Data processing. Through guided recipes, you'll acquire essential skills in Hive query development, optimization, and integration with modern big data frameworks. What this Book will help me do Design efficient Hive query structures for big data analytics. Optimize data storage and query execution using partitions and buckets. Integrate Hive seamlessly with frameworks like Spark and Hadoop. Understand and utilize the HiveQL syntax to perform advanced analytical processing. Implement practical solutions to secure, maintain, and scale Hive environments. Author(s) Hanish Bansal, Saurabh Chauhan, and Shrey Mehrotra bring their extensive expertise in big data technologies and Hive to this cookbook. With years of practical experience and deep technical knowledge, they offer a collection of solutions and best practices that reflect real-world use cases. Their commitment to clarity and depth makes this book an invaluable resource for exploring Hive to its fullest potential. Who is it for? This book is perfect for data professionals, engineers, and developers looking to enhance their capabilities in big data analytics using Hive. It caters to those with a foundational understanding of big data frameworks and some familiarity with SQL. Whether you're planning to optimize data handling or integrate Hive with other data tools, this guide helps you achieve your goals. Step into the world of efficient data analytics with Apache Hive through structured learning paths.

Dynamic SQL: Applications, Performance, and Security

This book is an introduction and deep-dive into the many uses of dynamic SQL in Microsoft SQL Server. Dynamic SQL is key to large-scale searching based upon user-entered criteria. It's also useful in generating value-lists, in dynamic pivoting of data for business intelligence reporting, and for customizing database objects and querying their structure. Executing dynamic SQL is at the heart of applications such as business intelligence dashboards that need to be fluid and respond instantly to changing user needs as those users explore their data and view the results. Yet dynamic SQL is feared by many due to concerns over SQL injection attacks. Reading Dynamic SQL: Applications, Performance, and Security is your opportunity to learn and master an often misunderstood feature, including security and SQL injection. All aspects of security relevant to dynamic SQL are discussed in this book. You will learn many ways to save time and develop code more efficiently, and you will practice directly with security scenarios that threaten companies around the world every day. Dynamic SQL: Applications, Performance, and Security helps you bring the productivity and user-satisfaction of flexible and responsive applications to your organization safely and securely. Your organization's increased ability to respond to rapidly changing business scenarios will build competitive advantage in an increasingly crowded and competitive global marketplace. Discusses many applications of dynamic SQL, both simple and complex. Explains each example with demos that can be run at home and on your laptop. Helps you to identify when dynamic SQL can offer superior performance. Pays attention to security and best practices to ensure safety of your data. What You Will Learn Build flexible applications that respond fast to changing business needs. Take advantage of unconventional but productive uses of dynamic SQL. Protect your data from attack through best-practices in your implementations. Know about SQL Injection and be confident in your defenses against it Run at high performance by optimizing dynamic SQL in your applications. Troubleshoot and debug dynamic SQL to ensure correct results. Who This Book is For Dynamic SQL: Applications, Performance, and Security is for developers and database administrators looking to hone and build their T-SQL coding skills. The book is ideal for advanced users wanting to plumb the depths of application flexibility and troubleshoot performance issues involving dynamic SQL. The book is also ideal for beginners wanting to learn what dynamic SQL is about and how it can help them deliver competitive advantage to their organizations.

Big Data

Big Data: Storage, Sharing, and Security examines Big Data management from an R&D perspective. It covers the 3S designs-storage, sharing, and security-through detailed descriptions of Big Data concepts and implementations. Presenting the contributions of recognized Big Data experts from around the world, the book contains more than 450 pages of technical details on the most important implementation aspects regarding Big Data.

IBM Power Systems HMC Implementation and Usage Guide

The IBM® Hardware Management Console (HMC) provides to systems administrators a tool for planning, deploying, and managing IBM Power Systems™ servers. This IBM Redbooks® publication is an extension of IBM Power Systems HMC Implementation and Usage Guide, SG24-7491 and also merges updated information from IBM Power Systems Hardware Management Console: Version 8 Release 8.1.0 Enhancements, SG24-8232. It explains the new features of IBM Power Systems Hardware Management Console Version V8.8.1.0 through V8.8.4.0. The major functions that the HMC provides are Power Systems server hardware management and virtualization (partition) management. Further information about virtualization management is in the following publications: IBM PowerVM Virtualization Managing and Monitoring, SG24-7590 IBM PowerVM Virtualization Introduction and Configuration, SG24-7940 IBM PowerVM Enhancements What is New in 2013, SG24-8198 IBM Power Systems SR-IOV: Technical Overview and Introduction, REDP-5065 The following features of HMC V8.8.1.0 through HMC V8.8.4.0 are described in this book: HMC V8.8.1.0 enhancements HMC V8.8.4.0 enhancements System and Partition Templates HMC and IBM PowerVM® Simplification Enhancement Manage Partition Enhancement Performance and Capacity Monitoring HMC V8.8.4.0 upgrade changes

External Procedures, Triggers, and User-Defined Functions on IBM DB2 for i

Procedures, triggers, and user-defined functions (UDFs) are the key database software features for developing robust and distributed applications. IBM Universal Database™ for i (IBM DB2® for i) supported these features for many years, and they were enhanced in V5R1, V5R2, and V5R3 of IBM® OS/400® and V5R4 of IBM i5/OS™. This IBM Redbooks® publication includes several of the announced features for procedures, triggers, and UDFs in V5R1, V5R2, V5R3, and V5R4. This book includes suggestions, guidelines, and practical examples to help you effectively develop IBM DB2 for i procedures, triggers, and UDFs. The following topics are covered in this book: External stored procedures and triggers Java procedures (both Java Database Connectivity (JDBC) and Structured Query Language for Java (SQLJ)) External triggers External UDFs This publication also offers examples that were developed in several programming languages, including RPG, COBOL, C, Java, and Visual Basic, by using native and SQL data access interfaces. This book is part of the original IBM Redbooks publication, Stored Procedures, Triggers, and User-Defined Functions on DB2 Universal Database for iSeries, SG24-6503-02, that covered external procedures, triggers, and functions, and also SQL procedures, triggers, and functions. All of the information that relates to external routines was left in this publication. All of the information that relates to SQL routines was rewritten and updated. This information is in the new IBM Redbooks publication, SQL Procedures, Triggers, and Functions on IBM DB2 for i, SG24-8326. This book is intended for anyone who wants to develop IBM DB2 for i procedures, triggers, and UDFs. Before you read this book, you need to know about relational database technology and the application development environment on the IBM i server.

SQL Procedures, Triggers, and Functions on IBM DB2 for i

Structured Query Language (SQL) procedures, triggers, and functions, which are also known as user-defined functions (UDFs), are the key database features for developing robust and distributed applications. IBM® DB2® for i supported these features for many years, and they are enhanced in IBM i versions 6.1, 7.1, and 7.2. DB2 for i refers to the IBM DB2 family member and relational database management system that is integrated within the IBM Power operating system that is known as IBM i. This IBM Redbooks® publication includes several of the announced features for SQL procedures, triggers, and functions in IBM i versions 6.1, 7.1, and 7.2. This book includes suggestions, guidelines, and practical examples to develop DB2 for i SQL procedures, triggers, and functions effectively. This book covers the following topics: Introduction to the SQL/Persistent Stored Modules (PSM) language, which is used in SQL procedures, triggers, and functions SQL procedures SQL triggers SQL functions This book is for IBM i database engineers and data-centric developers who strive to provide flexible, extensible, agile, and scalable database solutions that meet business requirements in a timely manner. Before you read this book, you need to know about relational database technology and the application development environment on the IBM Power Systems™ with the IBM i operating system.

Practical Maintenance Plans in SQL Server: Automation for the DBA

This book is a complete guide to setting up and maintaining maintenance plans for SQL Server Database Administrators. Maintenance plans too often consist of a backup task and that's it, but there is so much more that can and must be done to ensure the integrity of your most important company resource -- the data you are tasked to manage and safeguard. This book walks even the newest of users through creating a powerful, automated maintenance plan. Automate your job using SQL Server Agent to leverage the power of Maintenance Plans to deliver real, proactive solutions to common issues. Schedule common tasks such as backups and index rebuilds to run automatically, and get early-warning notifications of impending problems relating to resource usage and query performance. By the time your boss knows to call you about a problem, you'll have already called him to describe your solution. The large majority of books never really cover the topic of inheriting a database server with multiple live databases; the common thread is that the databases will be created and maintained by the reader forever and ever. In the real world, that scenario rarely happens. covers that scenario and provides you with the knowledge and tools needed to get comfortable writing your own maintenance plans for any SQL Server database, whether created by you or inherited. Practical Maintenance Plans in SQL Server Shows the different tasks that can be run in a maintenance plan. Explains how and why those tasks can be implemented. Provides a roadmap to creating your own custom maintenance plan. What You Will Learn Implement a completely automated backup maintenance plan Be alerted to performance problems and outages ahead of your boss Learn the different types of database maintenance tasks Plan the workflow of tasks within a maintenance plan Automate your work by implementing custom maintenance plans Who This Book Is For is for any level of database administrator, but specifically it's for those administrators with a real need to set up a powerful maintenance plan quickly. New and seasoned administrators will appreciate the book for its robust learning pattern of visual aids in combination with explanations and scenarios. P Practical Maintenance Plans in SQL Server is the perfect "new hire" gift for new database administrators in any organization. ractical Maintenance Plans in SQL Server

Data Structure and Software Engineering

This title includes a number of Open Access chapters. Data structure and software engineering is an integral part of computer science. This volume presents new approaches and methods to knowledge sharing, brain mapping, data integration, and data storage. The author describes how to manage an organization’s business process and domain data and presents new software and hardware testing methods. The book introduces a game development framework used as a learning aid in a software engineering at the university level. It also features a review of social software engineering metrics and methods for processing business information. It explains how to use Pegasys to create and manage sequence analysis workflows.

Measurement Data Modeling and Parameter Estimation

This book discusses the theories, methods, and application techniques of the measurement data mathematical modeling and parameter estimation. It seeks to build a bridge between mathematical theory and engineering practice in the measurement data processing field so theoretical researchers and technical engineers can communicate. It is organized with abundant materials, such as illustrations, tables, examples, and exercises. The authors create examples to apply mathematical theory innovatively to measurement and control engineering. Not only does this reference provide theoretical knowledge, it provides information on first hand experiences.

Remanufacturing Modeling and Analysis

Providing a solid foundation of knowledge in modeling remanufacturing systems, this book addresses the design, planning, and processing issues faced by decision-makers in the field. With easy-to-use mathematical or simulation modeling to demonstrate solutions for each remanufacturing issue, it helps practitioners understand how a particular issue can be effectively modeled and how to choose the appropriate solution methodology. The book also discusses how increasingly stringent environmental regulations and decreasing natural resources influence manufacturers toward more environmentally conscious manufacturing and product recovery initiatives.

Oracle Database Problem Solving and Troubleshooting Handbook

An Expert Guide for Solving Complex Oracle Database Problems delivers comprehensive, practical, and up-to-date advice for running the Oracle Database reliably and efficiently in complex production environments. Seven leading Oracle experts have brought together an unmatched collection of proven solutions, hands-on examples, and step-by-step tips for Oracle Database 12 Oracle Database Problem Solving and Troubleshooting Handbook c, 11 g, and other recent versions of Oracle Database. Every solution is crafted to help experienced Oracle DBAs and DMAs understand and fix serious problems as rapidly as possible. The authors cover LOB segments, UNDO tablespaces, high GC buffer wait events, poor query response times, latch contention, indexing, XA distributed transactions, RMAN backup/recovery, and much more. They also offer in-depth coverage of a wide range of topics, including DDL optimization, VLDB tuning, database forensics, adaptive cursor sharing, data pumps, data migration, SSDs, indexes, and how to go about fixing Oracle RAC problems. Learn how to Choose the quickest path to solve high-impact problems Use modern best practices to make your day more efficient and predictable Construct your “Call 9-1-1 plan” for future database emergencies Proactively perform maintenance to improve your environment’s stability Save time with industry-standard tools and scripts Register your product at informit.com/register for convenient access to downloads, updates, and corrections as they become available.

Architecting Data Lakes

Many organizations use Hadoop-driven data lakes as an adjunct staging area for their enterprise data warehouses (EDW). But for those companies ready to take the plunge, a data lake is far more useful as a one-stop-shop for extracting insights from their vast collection of data. With this eBook, you’ll learn best practices for building, maintaining, and deriving value from a Hadoop data lake in production environments. Authors Alice LaPlante and Ben Sharma explain how a data lake will enable your organization to manage an increasing volume of datasets—from blog postings and product reviews to streaming data—and to discover important relationships between them. Whether you want to control administrative costs in healthcare or reduce risk in financial services, this ebook addresses the architectural considerations and required capabilities you need to build your own data lake. With this report, you’ll learn: The key attributes of a data lake, including its ability to store information in native formats for later processing Why implementing data management and governance in your data lake is crucial How to address various challenges for building and managing a data lake Self-service options that enable different users to access the data lake without help from IT Emerging trends that will shape the future of data lakes

Mapping Workflows and Managing Knowledge

This book is Volume II of simple but powerful tools for performance improvement. It is written for managers, analysts, and consultants who realize the value that system dynamic modeling can bring to companies and organizations, and would like to have that capability without a degree in math or computer science. It features the iThink modeling program, which requires no extensive knowledge of math; instead, iThink uses a small set of symbols and rules to allow any keen observer of a system to create models graphically—the user literally draws a graphic of the system within the program and works from that. In Chapter 1, the author describes his own experiences with modeling, the growth and development of modeling software, and makes the case for its value. Chapter 2 is an overview of iThink symbols and rules, sufficient to enable the reader to interpret and understand iThink models; while the program has many advanced features, a great many models are based on the fundamentals in this chapter. Chapter 3 provides guidelines for converting workflow-mapping models into iThink dynamic models, and discusses approaches to building models from scratch. This approach to modeling is consistent with the author’s approach to workflow mapping and analysis, which uses a small symbol set and related discipline to map workflows in any company or organization, without the need for expensive software or extended training. That process is described in this volume of the series, and these maps are often the foundation for modeling the system as a dynamic entity.

Relational Database Design and Implementation, 4th Edition

Relational Database Design and Implementation: Clearly Explained, Fourth Edition, provides the conceptual and practical information necessary to develop a database design and management scheme that ensures data accuracy and user satisfaction while optimizing performance. Database systems underlie the large majority of business information systems. Most of those in use today are based on the relational data model, a way of representing data and data relationships using only two-dimensional tables. This book covers relational database theory as well as providing a solid introduction to SQL, the international standard for the relational database data manipulation language. The book begins by reviewing basic concepts of databases and database design, then turns to creating, populating, and retrieving data using SQL. Topics such as the relational data model, normalization, data entities, and Codd's Rules (and why they are important) are covered clearly and concisely. In addition, the book looks at the impact of big data on relational databases and the option of using NoSQL databases for that purpose. Features updated and expanded coverage of SQL and new material on big data, cloud computing, and object-relational databases Presents design approaches that ensure data accuracy and consistency and help boost performance Includes three case studies, each illustrating a different database design challenge Reviews the basic concepts of databases and database design, then turns to creating, populating, and retrieving data using SQL

The Hadoop Performance Myth

The wish lists of many data-driven organizations seem reasonable enough. They’d like to capitalize on real-time data analysis, move beyond batch processing for time-critical insights, allow multiple users to share cluster resources, and provide predictable service levels. However, fundamental performance limitations of complex distributed systems such as Hadoop prevent much of this from happening. In this report, Courtney Webster examines the root cause of these performance problems and explains why best practices for mitigating them—cluster tuning, provisioning, and even cluster isolation for mission critical jobs—don’t provide viable, scalable, or long-term solutions. Organizations have been pushing Hadoop and other distributed systems to their performance breaking points as they seek to use clusters as shared resources across multiple business units and individual users. Once they hit this performance wall, companies will find it difficult to deliver on the big data promise at scale. Read this report to find out what the implications are for your organization.