talk-data.com talk-data.com

Topic

data

3406

tagged

Activity Trend

3 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: O'Reilly Data Engineering Books ×
Getting Data Right

Over the last 20 years, companies have invested roughly $3-4 trillion in enterprise software. These investments have been primarily focused on the development and deployment of single systems, applications, functions, and geographies targeted at the automation and optimization of key business processes. Companies are now investing heavily in big data analytics ($44 billion alone in 2014) in an effort to begin analyzing all of the data being generated from their process automation systems. But companies are quickly realizing that one of their key bottlenecks is Data Variety—the silo’d nature of the data that is a natural result of internal and external source proliferation. The problem of big data variety has crept up from the bottom—and the cost of variety is only appreciated when companies attempt to ask simple questions across many business silos (divisions, geographies, functions, etc.). Current top-down, deterministic data unification approaches (such as ETL, ELT, and MDM) were simply not designed to scale to the variety of hundreds or thousands or even tens of thousands of data silos. Download this free eBook to learn about the fundamental challenges that Data Variety poses to enterprises looking to maximize the value of their existing investments—and how new approaches promise to help organizations embrace and leverage the fundamental diversity of data. Readers will also find best practices for designing bottom-up and probabilistic methods for finding and managing data; principles for doing data science at scale in the big data era; preparing and unifying data in ways that complement existing systems; optimizing data warehousing; and how to use “data ops” to automate large-scale integration.

Implementing an IBM High-Performance Computing Solution on IBM POWER8

This IBM® Redbooks® publication documents and addresses topics to provide step-by-step programming concepts to tune the applications to use IBM POWER8® hardware architecture with the technical computing software stack. This publication explores, tests, and documents how to implement an IBM high-performance computing (HPC) solution on POWER8 by using IBM technical innovations to help solve challenging scientific, technical, and business problems. This book demonstrates and documents that the combination of IBM HPC hardware and software solutions delivers significant value to technical computing clients in need of cost-effective, highly scalable, and robust solutions. This book targets technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for delivering cost-effective HPC solutions that help uncover insights among clients' data so that they can act to optimize business results, product development, and scientific discoveries.

Managing the Data Lake

Organizations across many industries have recently created fast-growing repositories to deal with an influx of new data from many sources and often in multiple formats. To manage these data lakes, companies have begun to leave the familiar confines of relational databases and data warehouses for Hadoop and various big data solutions. But adopting new technology alone won’t solve the problem. Based on interviews with several experts in data management, author Andy Oram provides an in-depth look at common issues you’re likely to encounter as you consider how to manage business data. You’ll explore five key topic areas, including: Acquisition and ingestion: how to solve these problems with a degree of automation. Metadata: how to keep track of when data came in and how it was formatted, and how to make it available at later stages of processing. Data preparation and cleaning: what you need to know before you prepare and clean your data, and what needs to be cleaned up and how. Organizing workflows: what you should do to combine your tasks—ingestion, cataloging, and data preparation—into an end-to-end workflow. Access control: how to address security and access controls at all stages of data handling. Andy Oram, an editor at O’Reilly Media since 1992, currently specializes in programming. His work for O'Reilly includes the first books on Linux ever published commercially in the United States.

Mapping Big Data

To discover the shape and structure of the big data market, the San Francisco-based startup Relato took a unique approach to market research and created the first fully data-driven market report. Company CEO Russell Jurney and his team collected and analyzed raw data from a variety of sources to reveal a boatload of business insights about the big data space. This exceptional report is now available for free download. Using data analytic techniques such as social network analysis (SNA), Relato exposed the vast and complex partnership network that exists among tens of thousands of unique big data vendors. The dataset Relato collected is centered around Cloudera, Hortonworks, and MapR, the major platform vendors of Hadoop, the primary force behind this market. From this snowball sample, a 2-hop network, the Relato team was able to answer several questions, including: Who are the major players in the big data market? Which is the leading Hadoop vendor? What sectors are included in this market and how do they relate? Which among the thousands of partnerships are most important? Who’s doing business with whom? Metrics used in this report are also visible in Relato’s interactive web application, via a link in the report, which walks you through the insights step-by-step.

Sharing Big Data Safely

Many big data-driven companies today are moving to protect certain types of data against intrusion, leaks, or unauthorized eyes. But how do you lock down data while granting access to people who need to see it? In this practical book, authors Ted Dunning and Ellen Friedman offer two novel and practical solutions that you can implement right away.

Apache Spark Graph Processing

Dive into the world of large-scale graph data processing with Apache Spark's GraphX API. This book introduces you to the core concepts of graph analytics and teaches you how to leverage Spark for handling and analyzing massive graphs. From building to analyzing, you'll acquire a comprehensive skillset to work with graph data efficiently. What this Book will help me do Learn to utilize Apache Spark GraphX API to process and analyze graph data. Master transforming raw datasets into sophisticated graph structures. Explore visualization and analysis techniques for understanding graphs. Understand and build custom graph operations tailored to your needs. Implement advanced graph algorithms like clustering and iterative processing. Author(s) Rindra Ramamonjison is a seasoned data engineer with vast experience in big data technologies and graph processing. With a passion for explaining complex concepts in simple terms, Rindra builds on his professional expertise to guide readers in mastering cutting-edge Spark tools. Who is it for? This book is tailored for data scientists and software developers looking to delve into graph data processing at scale. Ideal for those with basic knowledge of Scala and Apache Spark, it equips readers with the tools and techniques to derive insights from complex network datasets. Whether you're diving deeper into big data or exploring graph-specific analytics, this book is your guide.

Oracle PL/SQL Language Pocket Reference, 5th Edition

Be more productive with the Oracle PL/SQL language. The fifth edition of this popular pocket reference puts the syntax of specific PL/SQL language elements right at your fingertips, including features added in Oracle Database 12c. Whether you’re a developer or database administrator, when you need answers quickly, the Oracle PL/SQL Language Pocket Reference will save you hours of frustration with concise summaries of: Fundamental language elements, such as block structure, datatypes, and declarations Statements for program control, cursor management, and exception handling Records, procedures, functions, triggers, and packages Execution of PL/SQL functions in SQL Compilation options, object-oriented features, collections, and Java integration This handy pocket reference is a perfect companion to Steven Feuerstein and Bill Pribyl’s bestselling Oracle PL/SQL Programming.

Redis Essentials

Redis Essentials is your go-to guide for understanding and mastering Redis, the leading in-memory data structure store. In this book, you will explore the powerful features offered by Redis, such as real-time data processing, highly scalable architectures, and practical implementations for web applications. You'll complete the journey equipped to handle and optimize Redis for your development projects. What this Book will help me do Design analytics applications with advanced data structures like Bitmaps and HyperLogLogs. Scale your application infrastructure using Redis Sentinel, Twemproxy, and Redis Cluster. Develop custom Redis commands and extend its functionality with the Lua scripting language. Implement robust security measures for Redis, including SSL encryption and firewall rules. Master the usage of Redis client libraries in PHP, Python, Node.js, and Ruby for seamless development. Author(s) Maxwell Dayvson da Silva is an experienced software engineer and author with expertise in designing high-performance systems. With a strong focus on practical knowledge and hands-on solutions, Maxwell brings over a decade of experience using Redis to this book. His approachable teaching style ensures learners grasp complex topics easily while emphasizing their practical application to real-world challenges. Who is it for? Redis Essentials is aimed at developers looking to enhance their system's performance and scalability using Redis. Whether you're moderately familiar with key-value stores or new to Redis, this book will provide the explanations and hands-on examples you need. Recommended for developers with experience in data architectures, the book bridges the gap between understanding Redis features and their real-world application. Start here to bring high-performance in-memory data solutions to your projects.

Pro Oracle Fusion Applications: Installation and Administration

Pro Oracle Fusion Applications is your one-stop source for help with installing Oracle's Fusion Applications suite in your on-premise environment. It also aids in the monitoring and ongoing administration of your Fusion environment. Author Tushar Thakker is widely known for his writings and expertise on Oracle Fusion Applications, and now he brings his accumulated wisdom to you in the form of this convenient handbook. Provisioning an Oracle Fusion Applications infrastructure is a daunting task. You'll have to plan a suitable topology and install the required database, an enterprise-wide identity management solution, and the applications themselves—all while working with a wide variety of people who may not always be accustomed to working together. Pro Oracle Fusion Applications provides a path to success that you won't want to be without. Beyond installation, Pro Oracle Fusion Applications provides excellent guidance on managing, monitoring, diagnostics, and troubleshooting your environment. The book also covers patching, a mundane but essential task that must be done regularly to keep your installation protected and running smoothly. The comprehensive and wide-ranging coverage makes Pro Oracle Fusion Applications an important book for anyone with responsibility for installation and ongoing management of an Oracle Fusion Applications installation.

Pro SQL Server Wait Statistics

Pro SQL Server Wait Statistics is a practical guide for analyzing and troubleshooting SQL Server performance using wait statistics. Whether you are new to wait statistics, or already familiar with them, this book will help you gain a deeper understanding of how wait statistics are generated and what they can mean for your SQL Server’s performance. Besides the most common wait types, Pro SQL Server Wait Statistics goes further into the more complex and performance threatening wait types. The different wait types are categorized by their area of impact, and include CPU, IO, Lock, and many more different wait type categories. Filled with clear examples, Pro SQL Server Wait Statistics helps you gain practical knowledge of why and how specific wait times increase or decrease, and how they impact your SQL Server’s performance.

Oracle SOA Suite 12c Handbook

Master Oracle SOA Suite 12 c Design, implement, manage, and maintain a highly flexible service-oriented computing infrastructure across your enterprise using the detailed information in this Oracle Press guide. Written by an Oracle ACE director, Oracle SOA Suite 12c Handbook uses a start-to-finish case study to illustrate each concept and technique. Learn expert techniques for designing and implementing components, assembling composite applications, integrating Java, handling complex business logic, and maximizing code reuse. Runtime administration, governance, and security are covered in this practical resource. Get started with the Oracle SOA Suite 12 c development and run time environment Deploy and manage SOA composite applications Expose SOAP/XML REST/JSON through Oracle Service Bus Establish interactions through adapters for Database, JMS, File/FTP, UMS, LDAP, and Coherence Embed custom logic using Java and the Spring component Perform fast data analysis in real time with Oracle Event Processor Implement Event Drive Architecture based on the Event Delivery Network (EDN) Use Oracle Business Rules to encapsulate logic and automate decisions Model complex processes using BPEL, BPMN, and human task components Establish KPIs and evaluate performance using Oracle Business Activity Monitoring Control traffic, audit system activity, and encrypt sensitive data

The Architecture of Privacy

Technology’s influence on privacy not only concerns consumers, political leaders, and advocacy groups, but also the software architects who design new products. In this practical guide, experts in data analytics, software engineering, security, and privacy policy describe how software teams can make privacy-protective features a core part of product functionality, rather than add them late in the development process. Ideal for software engineers new to privacy, this book helps you examine privacy-protective information management architectures and their foundational components—building blocks that you can combine in many ways. Policymakers, academics, students, and advocates unfamiliar with the technical terrain will learn how these tools can help drive policies to maximize privacy protection.

Learning RSLogix 5000 Programming

Dive into "Learning RSLogix 5000 Programming" and gain comprehensive insights into the RSLogix 5000 and Studio 5000 environments for Rockwell Automation controllers. By the end of this book, you'll master the essentials of programming ControlLogix, CompactLogix, SoftLogix, and designing advanced function block diagrams and sequential routines. What this Book will help me do Learn to program Rockwell Automation controllers using RSLogix 5000 and Studio 5000. Understand the features and functionalities of ControlLogix and CompactLogix platforms. Explore advanced programming techniques such as ladder logic, function block diagrams, and structured text. Familiarize yourself with the latest changes introduced in Studio 5000 and Logix Designer. Gain confidence in troubleshooting, industrial network communication, and automation system application design. Author(s) Austin Scott, a seasoned automation expert, has vast experience working with industrial control systems and Rockwell Automation technologies. His teaching methods focus on practical application and easy comprehension, making technical concepts accessible to beginners and professionals alike. Who is it for? Ideal for PLC programmers, electricians, and automation specialists aiming to enhance their skills with RSLogix 5000. Beginners with basic PLC knowledge will find the step-by-step approach convenient for mastering advanced tools. Aspiring professionals can use this resource to build foundational and advanced programming expertise.

Learning YARN

"Learning YARN" is your comprehensive guide to master YARN, the resource management layer in the Hadoop ecosystem. Through the book, you'll leverage YARN's capabilities for big data processing, learning to deploy, manage, and scale Hadoop-YARN clusters. What this Book will help me do Understand the main features and benefits of the YARN framework. Gain experience managing Hadoop clusters of varying sizes. Learn to integrate YARN with domain-specific big data tools like Spark. Become skilled at administration and configuration of YARN. Develop and run your own YARN-based applications for distributed computing. Author(s) Akhil Arora and Shrey Mehrotra bring with them years of experience working in big data frameworks and technologies. With expertise in YARN specifically, they aim to bridge the gap for developers and administrators to learn and implement scalable big data solutions. Their extensive knowledge in cluster management and distributed data processing shines through in how this book is structured and detailed. Who is it for? This book is ideal for software developers, big data engineers, and system administrators interested in advancing their knowledge in resource management in Hadoop systems. If you have basic familiarity with Hadoop and need a deeper understanding or feature knowledge of YARN for professional growth, this book is tailored for you. It is also suitable for learners seeking to integrate big data platforms like Spark into YARN clusters.

OCA/OCP Oracle Database 12c All-in-One Exam Guide (Exams 1Z0-061, 1Z0-062, & 1Z0-063), 2nd Edition

This Oracle Press certification exam guide prepares you for the new Oracle Database 12 c certification track, including the core requirements for OCA and OCP certification. OCA/OCP Oracle Database 12c All-in-One Exam Guide (Exams 1Z0-061, 1Z0-062, & 1Z0-063) covers all of the exam objectives on the Installation and Administration, SQL Fundamentals, and Advanced Administration exams in detail. Each chapter includes examples, practice questions, Inside the Exam sections highlighting key exam topics, a chapter summary, and a two-minute drill to reinforce essential knowledge. 300+ practice exam questions match the format, topics, and difficulty of the real exam. Electronic content includes interactive practice exam software with hundreds of questions that include detailed answers and explanations, and a score report performance assessment tool Ideal as both exam guide and on-the-job reference The most comprehensive single preparation tool for the Oracle Database 12 c OCA and OCP certification exams

Performance Optimization and Tuning Techniques for IBM Power Systems Processors Including IBM POWER8

This IBM® Redbooks® publication focuses on gathering the correct technical information, and laying out simple guidance for optimizing code performance on IBM POWER8® processor-based systems that run the IBM AIX®, IBM i, or Linux operating systems. There is straightforward performance optimization that can be performed with a minimum of effort and without extensive previous experience or in-depth knowledge. The POWER8 processor contains many new and important performance features, such as support for eight hardware threads in each core and support for transactional memory. The POWER8 processor is a strict superset of the IBM POWER7+™ processor, and so all of the performance features of the POWER7+ processor, such as multiple page sizes, also appear in the POWER8 processor. Much of the technical information and guidance for optimizing performance on POWER8 processors that is presented in this guide also applies to POWER7+ and earlier processors, except where the guide explicitly indicates that a feature is new in the POWER8 processor. This guide strives to focus on optimizations that tend to be positive across a broad set of IBM POWER® processor chips and systems. Specific guidance is given for the POWER8 processor; however, the general guidance is applicable to the IBM POWER7+, IBM POWER7®, IBM POWER6®, IBM POWER5, and even to earlier processors. This guide is directed at personnel who are responsible for performing migration and implementation activities on POWER8 processor-based systems. This includes system administrators, system architects, network administrators, information architects, and database administrators (DBAs).

Expert Oracle Exadata, Second Edition

Expert Oracle Exadata, 2nd Edition opens up the internals of Oracle's Exadata platform so that you can fully benefit from the most performant and scalable database hardware appliance capable of running Oracle Database. This edition is fully-updated to cover Exadata 5-2 and Oracle Database 12c. If you're new to Exadata, you'll soon learn that it embodies a change in how you think about and manage relational databases. A key part of that change lies in the concept of offloading SQL processing to the storage layer. In addition there is Oracle's engineering effort in creating a powerful platform for both consolidation and transaction processing. The resulting value proposition in the form of Exadata has truly been a game-changer. Expert Oracle Exadata, 2nd Edition provides a look at the internals and how the combination of hardware and software that comprise Exadata actually work. Authors include Martin Bach, Andy Colvin, and Frits Hoogland, with contributions from Karl Arao, and built on the foundation laid by Kerry Osborne, Randy Johnson, and Tanel Poder in the first edition. They share their real-world experience gained through a great many Exadata implementations, possibly more than any other group of experts today. Always their goal is toward helping you advance your career through success with Exadata in your own environment. This book is intended for readers who want to understand what makes the platform tick and for whom—"how" it does what it is does is as important as what it does. By being exposed to the features that are unique to Exadata, you will gain an understanding of the mechanics that will allow you to fully benefit from the advantages that the platform provides. This book changes how you think about managing SQL performance and processing. It provides a roadmap to successful Exadata implementation. And it removes the "black box" mystique. You'll learn how Exadata actually works and be better able to manage your Exadata engineered systems in support of your business. This book: Changes the way you think about managing SQL performance and processing Provides a roadmap to successful Exadata implementation Removes the "black box" mystique, showing how Exadata actually works

Structured Search for Big Data

The WWW era made billions of people dramatically dependent on the progress of data technologies, out of which Internet search and Big Data are arguably the most notable. Structured Search paradigm connects them via a fundamental concept of key-objects evolving out of keywords as the units of search. The key-object data model and KeySQL revamp the data independence principle making it applicable for Big Data and complement NoSQL with full-blown structured querying functionality. The ultimate goal is extracting Big Information from the Big Data. As a Big Data Consultant, Mikhail Gilula combines academic background with 20 years of industry experience in the database and data warehousing technologies working as a Sr. Data Architect for Teradata, Alcatel-Lucent, and PayPal, among others. He has authored three books, including The Set Model for Database and Information Systems and holds four US Patents in Structured Search and Data Integration. Conceptualizes structured search as a technology for querying multiple data sources in an independent and scalable manner. Explains how NoSQL and KeySQL complement each other and serve different needs with respect to big data Shows the place of structured search in the internet evolution and describes its implementations including the real-time structured internet search

Modernize Your IBM DB2 for IBM z/OS Maintenance with Utility Autonomics

IBM® DB2® for IBM z/OS® helps lower the cost of managing data by automating administration, increasing storage efficiency, improving performance, and simplifying the deployment of virtual appliances. By automating tasks such as memory allocation, storage management, and business policy maintenance, DB2 is able to perform many management tasks itself, freeing up Database Administrators to focus on new projects. This IBM Redbooks® publication introduces autonomics for DB2 for z/OS. IBM provides several different components that, when combined, can create an autonomic database environment. All these respective components cover certain aspects of autonomics, which can collaborate into one coherent solution. In our evolution of autonomics and the need to move to smarter systems there has been a bigger drive to the concept of "Active" versus "Passive" autonomics. With the inclusion of the IBM Management Console for IMS™ and DB2 for z/OS and the Autonomics Director, it is now easier than ever to make that transition by leveraging the strength of the DB2 Utilities Solution Pack for z/OS all in one standardized and centralized interface. This publication guides you through the business reasons for adopting autonomic solutions, and provides step-by-step guidance to implement these capabilities in your DB2 for z/OS configuration. This publication is of interest primarily to DB2 Database Administrators and DB2 Systems Programmers, and for anyone looking to understand the benefits of DB2 autonomic solutions.

You: For Sale

Everything we do online, and increasingly in the real world, is tracked, logged, analyzed, and often packaged and sold on to the highest bidder. Every time you visit a website, use a credit card, drive on the freeway, or go past a CCTV camera, you are logged and tracked. Every day billions of people choose to share their details on social media, which are then sold to advertisers. The Edward Snowden revelations that governments - including those of the US and UK – have been snooping on their citizens, have rocked the world. But nobody seems to realize that this has already been happening for years, with firms such as Google capturing everything you type into a browser and selling it to the highest bidder. Apps take information about where you go, and your contact book details, harvest them and sell them on – and people just click the EULA without caring. No one is revealing the dirty secret that is the tech firms harvesting customers’ personal data and selling it for vast profits – and people are totally unaware of the dangers. You: For Sale is for anyone who is concerned about what corporate and government invasion of privacy means now and down the road. The book sets the scene by spelling out exactly what most users of the Internet and smart phones are exposing themselves to via commonly used sites and apps such as facebook and Google, and then tells you what you can do to protect yourself. The book also covers legal and government issues as well as future trends. With interviews of leading security experts, black market data traders, law enforcement and privacy groups, You: For Sale will help you view your personal data in a new light, and understand both its value, and its danger. Provides a clear picture of how companies and governments harvest and use personal data every time someone logs on Describes exactly what these firms do with the data once they have it – and what you can do to stop it Learn about the dangers of unwittingly releasing private data to tech firms, including interviews with top security experts, black market data traders, law enforcement and privacy groups Understand the legal information and future trends that make this one of the most important issues today