talk-data.com talk-data.com

Event

O'Reilly Data Engineering Books

2001-10-19 – 2027-05-25 Oreilly Visit website ↗

Activities tracked

395

Collection of O'Reilly books on Data Engineering.

Filtering by: Analytics ×

Sessions & talks

Showing 376–395 of 395 · Newest first

Search within this event →
HBase in Action

HBase in Action has all the knowledge you need to design, build, and run applications using HBase. First, it introduces you to the fundamentals of distributed systems and large scale data handling. Then, you'll explore real-world applications and code samples with just enough theory to understand the practical techniques. You'll see how to build applications with HBase and take advantage of the MapReduce processing framework. And along the way you'll learn patterns and best practices. About the Technology HBase is a NoSQL storage system designed for fast, random access to large volumes of data. It runs on commodity hardware and scales smoothly from modest datasets to billions of rows and millions of columns. About the Book HBase in Action is an experience-driven guide that shows you how to design, build, and run applications using HBase. First, it introduces you to the fundamentals of handling big data. Then, you'll explore HBase with the help of real applications and code samples and with just enough theory to back up the practical techniques. You'll take advantage of the MapReduce processing framework and benefit from seeing HBase best practices in action. What's Inside When and how to use HBase Practical examples Design patterns for scalable data systems Deployment, integration, and design About the Reader Written for developers and architects familiar with data storage and processing. No prior knowledge of HBase, Hadoop, or MapReduce is required. About the Authors Nick Dimiduk is a Data Architect with experience in social media analytics, digital marketing, and GIS. Amandeep Khurana is a Solutions Architect focused on building HBase-driven solutions. Quotes Timely, practical ... explains in plain language how to use HBase. - From the Foreword by Michael Stack, Chair of the Apache HBase Project Management Committee A difficult topic lucidly explained. - John Griffin, coauthor of "Hibernate Search in Action" Amusing tongue-in-cheek style that doesn’t detract from the substance. - Charles Pyle, APS Healthcare Learn how to think the HBase way. - Gianluca Righetto, Menttis

Getting Started with Storm

Even as big data is turning the world upside down, the next phase of the revolution is already taking shape: real-time data analysis. This hands-on guide introduces you to Storm, a distributed, JVM-based system for processing streaming data. Through simple tutorials, sample Java code, and a complete real-world scenario, you’ll learn how to build fast, fault-tolerant solutions that process results as soon as the data arrives. Discover how easy it is to set up Storm clusters for solving various problems, including continuous data computation, distributed remote procedure calls, and data stream processing. Learn how to program Storm components: spouts for data input and bolts for data transformation Discover how data is exchanged between spouts and bolts in a Storm topology Make spouts fault-tolerant with several commonly used design strategies Explore bolts—their life cycle, strategies for design, and ways to implement them Scale your solution by defining each component’s level of parallelism Study a real-time web analytics system built with Node.js, a Redis server, and a Storm topology Write spouts and bolts with non-JVM languages such as Python, Ruby, and Javascript

Complete Analytics with IBM DB2 Query Management Facility: Accelerating Well-Informed Decisions Across the Enterprise

There is enormous pressure today for businesses across all industries to cut costs, enhance business performance, and deliver greater value with fewer resources. To take business analytics to the next level and drive tangible improvements to the bottom line, it is important to manage not only the volume of data, but the speed with which actionable findings can be drawn from a wide variety of disparate sources. The findings must be easily communicated to those responsible for making both strategic and tactical decisions. At the same time, strained IT budgets require that the solution be self-service for everyone from DBAs to business users, and easily deployed to thin, browser-based clients. Business analytics hosted in the Query Management Facility™ (QMF™) on DB2® and System z® allow you to tackle these challenges in a practical way, using new features and functions that are easily deployed across the enterprise and easily consumed by business users who do not have prior IT experience. This IBM® Redbooks® publication provides step-by-step instructions on using these new features: Access to data that resides in any JDBC-compliant data source OLAP access through XMLA 150+ new analytical functions Graphical query interfaces and graphical reports Graphical, interactive dashboards Ability to integrate QMF functions with third-party applications Support for the IBM DB2 Analytics Accelerator A new QMF Classic perspective in QMF for Workstation Ability to start QMF for TSO as a DB2 for z/OS stored procedure New metadata capabilities, including ER diagrams and capability to federate data into a single virtual source

Optimizing DB2 Queries with IBM DB2 Analytics Accelerator for z/OS

The IBM® DB2® Analytics Accelerator Version 2.1 for IBM z/OS® (also called DB2 Analytics Accelerator or Query Accelerator in this book and in DB2 for z/OS documentation) is a marriage of the IBM System z® Quality of Service and Netezza® technology to accelerate complex queries in a DB2 for z/OS highly secure and available environment. Superior performance and scalability with rapid appliance deployment provide an ideal solution for complex analysis. This IBM Redbooks® publication provides technical decision-makers with a broad understanding of the IBM DB2 Analytics Accelerator architecture and its exploitation by documenting the steps for the installation of this solution in an existing DB2 10 for z/OS environment. In this book we define a business analytics scenario, evaluate the potential benefits of the DB2 Analytics Accelerator appliance, describe the installation and integration steps with the DB2 environment, evaluate performance, and show the advantages to existing business intelligence processes. Please note that the additional material referenced in the text is not available from IBM.

IBM Real Time Compression Appliance Application Integration Guide

Continuing its commitment to developing and delivering industry-leading storage technologies, IBM® is introducing the IBM Real-time Compression Appliances for NAS, an innovative new storage offering that delivers essential storage efficiency technologies, combined with exceptional ease of use and performance. In an era when the amount of information, particularly in unstructured files, is exploding, but budgets for storing that information are stagnant, IBM Real-time Compression technology offers a powerful tool for better information management, protection, and access. IBM Real-time Compression can help slow the growth of storage acquisition, reducing storage costs while simplifying both operations and management. It also enables organizations to keep more data available for use rather than storing it offsite or on harder-to-access tape, so they can support improved analytics and decision making. IBM Real-time Compression Appliances provide online storage optimization through real-time data compression, delivering dramatic cost reduction without performance degradation. This IBM Redbooks® publication is an easy-to-follow guide that describes how to design solutions successfully using IBM Real-time Compression Appliances (IBM RTCAs). It explains best practices for RTCA solution design, application integration, and practical RTCA use cases. This is a companion book to Introduction to IBM Real-time Compression Appliances, SG24-7953.

Microsoft® SQL Server® 2012 Analysis Services: The BISM Tabular Model

Build agile and responsive Business Intelligence solutions Analyze tabular data using the BI Semantic Model (BISM) in Microsoft SQL Server 2012 Analysis Services—and discover a simpler method for creating corporate-level BI solutions. Led by three BI experts, you’ll learn how to build, deploy, and query a BISM tabular model with step-by-step guides, examples, and best practices. This hands-on book shows you how the tabular model’s in-memory database enables you to perform rapid analytics—whether you’re a professional BI developer new to Analysis Services or familiar with its multidimensional model. Discover how to: Determine when a tabular or multidimensional model is right for your project Build a tabular model using SQL Server Data Tools in Microsoft Visual Studio 2010 Integrate data from multiple sources into a single, coherent view of company information Use the Data Analysis eXpressions (DAX) language to create calculated columns, measures, and queries Choose a data modeling technique that meets your organization’s performance and usability requirements Optimize your data model for better performance with xVelocity storage engine Manage complex data relationships, such as multicolumn, banding, and many-to-many Implement security by establishing administrative and data user roles

Principles of Data Integration

Principles of Data Integration is the first comprehensive textbook of data integration, covering theoretical principles and implementation issues as well as current challenges raised by the semantic web and cloud computing. The book offers a range of data integration solutions enabling you to focus on what is most relevant to the problem at hand. Readers will also learn how to build their own algorithms and implement their own data integration application. Written by three of the most respected experts in the field, this book provides an extensive introduction to the theory and concepts underlying today's data integration techniques, with detailed, instruction for their application using concrete examples throughout to explain the concepts. This text is an ideal resource for database practitioners in industry, including data warehouse engineers, database system designers, data architects/enterprise architects, database researchers, statisticians, and data analysts; students in data analytics and knowledge discovery; and other data professionals working at the R&D and implementation levels. Offers a range of data integration solutions enabling you to focus on what is most relevant to the problem at hand Enables you to build your own algorithms and implement your own data integration applications

Introduction to IBM Real-time Compression Appliances

Continuing its commitment to developing and delivering industry-leading storage technologies, IBM® is introducing the IBM Real-time Compression Appliances for NAS, an innovative new storage offering that delivers essential storage efficiency technologies, combined with exceptional ease of use and performance. In an era when the amount of information, particularly in unstructured files, is exploding, but budgets for storing that information are stagnant, IBM Real-time Compression technology offers a powerful tool for better information management, protection, and access. IBM Real-time Compression can help slow the growth of storage acquisition, reducing storage costs while simplifying both operations and management. It also enables organizations to keep more data available for use rather than storing it offsite or on harder-to-access tape, so they can support improved analytics and decision making. IBM Real-time Compression Appliances provide on-line storage optimization through real-time data compression, delivering dramatic cost reduction without performance degradation. This IBM Redbooks® publication is an easy-to-follow guide that describes how to design solutions successfully using IBM Real-time Compression Appliances (IBM RTCAs). It provides practical installation examples, ease of use, remote management, high availability, and administration techniques. Furthermore, it explains best practices for RTCA solution design, application integration, and practical RTCA use cases.

MongoDB in Action

NEWER EDITION AVAILABLE MongoDB in Action, Second Edition is now available. An eBook of this older edition is included at no additional cost when you buy the revised edition! A limited number of pBook copies of this edition are still available. Please contact Manning Support to inquire about purchasing previous edition copies. MongoDB in Action is a comprehensive guide to MongoDB for application developers. The book begins by explaining what makes MongoDB unique and describing its ideal use cases. A series of tutorials designed for MongoDB mastery then leads into detailed examples for leveraging MongoDB in e-commerce, social networking, analytics, and other common applications. About the Technology Big data can mean big headaches. MongoDB is a document-oriented database designed to be flexible, scalable, and very fast, even with big data loads. It's built for high availability, supports rich, dynamic schemas, and lets you easily distribute data across multiple servers. About the Book MongoDB in Action introduces you to MongoDB and the document-oriented database model. This perfectly paced book provides both the big picture you'll need as a developer and enough low-level detail to satisfy a system engineer. Numerous examples will help you develop confidence in the crucial area of data modeling. You'll also love the deep explanations of each feature, including replication, auto-sharding, and deployment. What's Inside Indexes, queries, and standard DB operations Map-reduce for custom aggregations and reporting Schema design patterns Deploying for scale and high availability About the Reader Written for developers. No MongoDB or NoSQL experience required. About the Author Kyle Banker is a software engineer at 10gen where he maintains the official MongoDB drivers for Ruby and C. Quotes Awesome! MongoDB in a nutshell. - Hardy Ferentschik, Red Hat Excellent. Many practical examples. - Curtis Miller, Flatterline Not only the how, but also the why. - Philip Hallstrom, PJKH, LLC Has a developer-centric flavor--an excellent reference. - Rick Wagner, Red Hat A must-read. - Daniel Bretoi, Advanced Energy

IBM InfoSphere Streams: Assembling Continuous Insight in the Information Revolution

In this IBM® Redbooks® publication, we discuss and describe the positioning, functions, capabilities, and advanced programming techniques for IBM InfoSphere™ Streams (V2), a new paradigm and key component of IBM Big Data platform. Data has traditionally been stored in files or databases, and then analyzed by queries and applications. With stream computing, analysis is performed moment by moment as the data is in motion. In fact, the data might never be stored (perhaps only the analytic results). The ability to analyze data in motion is called real-time analytic processing (RTAP). IBM InfoSphere Streams takes a fundamentally different approach to Big Data analytics and differentiates itself with its distributed runtime platform, programming model, and tools for developing and debugging analytic applications that have a high volume and variety of data types. Using in-memory techniques and analyzing record by record enables high velocity. Volume, variety and velocity are the key attributes of Big Data. The data streams that are consumable by IBM InfoSphere Streams can originate from sensors, cameras, news feeds, stock tickers, and a variety of other sources, including traditional databases. It provides an execution platform and services for applications that ingest, filter, analyze, and correlate potentially massive volumes of continuous data streams. This book is intended for professionals that require an understanding of how to process high volumes of streaming data or need information about how to implement systems to satisfy those requirements. See: http://www.redbooks.ibm.com/abstracts/sg247865.html for the IBM InfoSphere Streams (V1) release.

Introduction to IBM Real-time Compression Appliances

Continuing its commitment to developing and delivering industry-leading storage technologies, IBM® is introducing the IBM Real-time Compression Appliances for NAS, an innovative new storage offering that delivers essential storage efficiency technologies, combined with exceptional ease of use and performance. In an era when the amount of information, particularly in unstructured files, is exploding, but budgets for storing that information are stagnant, IBM Real-time Compression technology offers a powerful tool for better information management, protection, and access. IBM Real-time Compression can help slow the growth of storage acquisition, reducing storage costs while simplifying both operations and management. It also enables organizations to keep more data available for use rather than storing it offsite or on harder-to-access tape, so they can support improved analytics and decision making. IBM Real-time Compression Appliances provide on-line storage optimization through real-time data compression, delivering dramatic cost reduction without performance degradation. This IBM Redbooks® publication is an easy-to-follow guide that describes how to design solutions successfully using IBM Real-time Compression Appliances. It provides practical installation examples, ease of use, remote management, high availability, and administration techniques.

IBM Content Analytics Version 2.2: Discovering Actionable Insight from Your Content

With IBM® Content Analytics Version 2.2, you can unlock the value of unstructured content and gain new business insight. IBM Content Analytics Version 2.2 provides a robust interface for exploratory analytics of unstructured content. It empowers a new class of analytical applications that use this content. Through content analysis, IBM Content Analytics provides enterprises with tools to better identify new revenue opportunities, improve customer satisfaction, and provide early problem detection. To help you achieve the most from your unstructured content, this IBM Redbooks® publication provides in-depth information about Content Analytics. This book examines the power and capabilities of Content Analytics, explores how it works, and explains how to design, prepare, install, configure, and use it to discover actionable business insights. This book explains how to use the automatic text classification capability, from the IBM Classification Module, with Content Analytics. It explains how to use the LanguageWare® Resource Workbench to create custom annotators. It also explains how to work with the IBM Content Assessment offering to timely decommission obsolete and unnecessary content while preserving and using content that has business value. The target audience of this book is decision makers, business users, and IT architects and specialists who want to understand and use their enterprise content to improve and enhance their business operations. It is also intended as a technical guide for use with the online information center to configure and perform content analysis with Content Analytics.

Pro SharePoint 2010 Search

Pro SharePoint 2010 Search gives you expert advice on planning, deploying and customizing searches in SharePoint 2010. Drawing on the authors' extensive experience of working with real-world SharePoint deployments, this book teaches everything you'll need to know to create well-designed SharePoint solutions that always keep the end-user's experience in mind. Increase your search efficiency with SharePoint 2010's search functionality: extend the search user interface using third-party tools, and utilize analytics to improve relevancy. This practical hands-on book is a must-have resource for anyone looking to unlock the full potential of their SharePoint server's search capabilities. Pro SharePoint 2010 Search empowers you to customize a SharePoint 2010 search deployment and maximize the platform's potential for your organization. What you'll learn Design and implement effective search crawls and indexing Create intuitive user interfaces, and improve search findability Understand how to configure core SharePoint components Customize SharePoint's existing search functionality Who this book is for This book is aimed at intermediate to advanced SharePoint administrators who want to incorporate well-designed search functionality into their sites.

Getting Started with the IBM Smart Analytics System 9600

The IBM® Smart Analytics System 9600 is a single, end-to-end business analytics solution to accelerate data warehousing and business intelligence initiatives. It provides integrated hardware, software, and services that enable enterprise customers to quickly and cost-effectively deploy business-changing analytics across their organizations. As a workload-optimized system for business analytics, it leverages the strengths of the System z® platform to drive: Significant savings in hardware, software, operating, and people costs to deliver a complete range of data warehouse and BI capabilities Faster time to value with a reduction in the time and speed associated with deploying Business Intelligence Industry-leading scalability, reliability, availability, and security Simplified and faster access to the data on System z

Data Integration Blueprint and Modeling: Techniques for a Scalable and Sustainable Architecture

Making Data Integration Work: How to Systematically Reduce Cost, Improve Quality, and Enhance Effectiveness Today’s enterprises are investing massive resources in data integration. Many possess thousands of point-to-point data integration applications that are costly, undocumented, and difficult to maintain. Data integration now accounts for a major part of the expense and risk of typical data warehousing and business intelligence projects--and, as businesses increasingly rely on analytics, the need for a blueprint for data integration is increasing now more than ever. This book presents the solution: a clear, consistent approach to defining, designing, and building data integration components to reduce cost, simplify management, enhance quality, and improve effectiveness. Leading IBM data management expert Tony Giordano brings together best practices for architecture, design, and methodology, and shows how to do the disciplined work of getting data integration right. Mr. Giordano begins with an overview of the “patterns” of data integration, showing how to build blueprints that smoothly handle both operational and analytic data integration. Next, he walks through the entire project lifecycle, explaining each phase, activity, task, and deliverable through a complete case study. Finally, he shows how to integrate data integration with other information management disciplines, from data governance to metadata. The book’s appendices bring together key principles, detailed models, and a complete data integration glossary. Coverage includes Implementing repeatable, efficient, and well-documented processes for integrating data Lowering costs and improving quality by eliminating unnecessary or duplicative data integrations Managing the high levels of complexity associated with integrating business and technical data Using intuitive graphical design techniques for more effective process and data integration modeling Building end-to-end data integration applications that bring together many complex data sources

Joe Celko's Analytics and OLAP in SQL

Joe Celko's Analytics and OLAP in SQL is the first book that teaches what SQL programmers need in order to successfully make the transition from On-Line Transaction Processing (OLTP) systems into the world of On-Line Analytical Processing (OLAP). This book is not an in-depth look at particular subjects, but an overview of many subjects that will give the working RDBMS programmers a map of the terra incognita they will face — if they want to grow. It contains expert advice from a noted SQL authority and award-winning columnist, who has given ten years of service to the ANSI SQL standards committee and many more years of dependable help to readers of online forums. It offers real-world insights and lots of practical examples. It covers the OLAP extensions in SQL-99; ETL tools, OLAP features supported in DBMSs, other query tools, simple reports, and statistical software. This book is ideal for experienced SQL programmers who have worked with OLTP systems who need to learn techniques—and even some tricks—that they can use in an OLAP situation. Expert advice from a noted SQL authority and award-winning columnist, who has given ten years of service to the ANSI SQL standards committee and many more years of dependable help to readers of online forums First book that teaches what SQL programmers need in order to successfully make the transition from transactional systems (OLTP) into the world of data warehouse data and OLAP Offers real-world insights and lots of practical examples Covers the OLAP extensions in SQL-99; ETL tools, OLAP features supported in DBMSs, other query tools, simple reports, and statistical software

Oracle Essbase 9 Implementation Guide

This book is a practical guide to implementing high-performance multidimensional OLAP solutions using Oracle Essbase 9. It takes you step by step through software installation to the creation of fully-functional Essbase database cubes, helping you develop a solid understanding of multidimensional database concepts and the tools to create analytical OLAP applications effectively. What this Book will help me do Understand the fundamentals of multidimensional database technology and how it applies to analytical OLAP solutions. Construct efficient database outlines including dimensions and members. Import data from various sources into Essbase cubes and process it for analysis. Develop effective calculation scripts to automate complex data processing tasks. Leverage tools like Excel Add-In to produce dynamic, interactive reports for comprehensive data insights. Author(s) The authors of this guide are seasoned professionals with extensive practical experience in Oracle Essbase and OLAP technology. They bring years of expertise in database design, analytics, and performance optimization, ensuring this book provides both foundational knowledge and advanced techniques. Their teaching approach focuses on clarity and step-by-step guidance to make complex topics accessible to practitioners at all levels. Who is it for? This book is ideal for IT professionals with a general understanding of information systems who want to learn OLAP and Oracle Essbase. Beginners will find the step-by-step approach accessible, while experienced OLAP users will benefit from advanced tips and practical examples. If you're looking to enhance your analytics skills or explore Essbase for career growth, this resource is for you.

Google Hacks, 3rd Edition

Everyone knows that Google lets you search billions of web pages. But few people realize that Google also gives you hundreds of cool ways to organize and play with information. Since we released the last edition of this bestselling book, Google has added many new features and services to its expanding universe: Google Earth, Google Talk, Google Maps, Google Blog Search, Video Search, Music Search, Google Base, Google Reader, and Google Desktop among them. We've found ways to get these new services to do even more. The expanded third edition of Google Hacks is a brand-new and infinitely more useful book for this powerful search engine. You'll not only find dozens of hacks for the new Google services, but plenty of updated tips, tricks and scripts for hacking the old ones. Now you can make a Google Earth movie, visualize your web site traffic with Google Analytics, post pictures to your blog with Picasa, or access Gmail in your favorite email client. Industrial strength and real-world tested, this new collection enables you to mine a ton of information within Google's reach. And have a lot of fun while doing it: Search Google over IM with a Google Talk bot Build a customized Google Map and add it to your own web site Cover your searching tracks and take back your browsing privacy Turn any Google query into an RSS feed that you can monitor in Google Reader or the newsreader of your choice Keep tabs on blogs in new, useful ways Turn Gmail into an external hard drive for Windows, Mac, or Linux Beef up your web pages with search, ads, news feeds, and more Program Google with the Google API and language of your choice For those of you concerned about Google as an emerging Big Brother, this new edition also offers advice and concrete tips for protecting your privacy. Get into the world of Google and bend it to your will!

SQL Server 2005 Distilled

Need to get your arms around Microsoft SQL Server 2005 fast, without getting buried in the details? Need to make fundamental decisions about deploying, using, or administering Microsoft’s latest enterprise database? Need to understand what’s new in SQL Server 2005, and how it fits with your existing IT and business infrastructure? SQL Server 2005 Distilled delivers the answers you need–quickly, clearly, and objectively. Former SQL Server team member Eric L. Brown offers realistic insight into every significant aspect of SQL Server 2005: its new features, architecture, administrative tools, security model, data management capabilities, development environment, and much more. Brown draws on his extensive experience consulting with enterprise users, outlining realistic usage scenarios that leverage SQL Server 2005’s strengths and minimize its limitations. Coverage includes Architectural overview: how SQL Server 2005’s features work together and what it means to you Security management, policies, and permissions: gaining tighter control over your data SQL Server Management Studio: Microsoft’s new, unified tool suite for authoring, management, and operations Availability enhancements: online restoration, improved replication, shorter maintenance/recovery windows, and more Scalability improvements, including a practical explanation of SQL Server 2005’s complex table partitioning feature Data access enhancements, from ADO.NET 2.0 to XML SQL Server 2005’s built-in .NET CLR: how to use it, when to use it, and when to stay with T-SQL Business Intelligence Development Studio: leveraging major improvements in reporting and analytics Visual Studio integration: improving efficiency throughout the coding and debugging process Simple code examples demonstrating SQL Server 2005’s most significant new features

Siebel 7.8 with IBM DB2 UDB V8.2 Handbook

This IBM Redbooks publication delivers details about DB2 UDB V8.2 on Siebel 7.8. It outlines the partnership between Siebel Systems and IBM and the benefits of using DB2 UDB to support the Siebel Enterprise. The most commonly used components of the Siebel Enterprise and the DB2 UDB architecture are described. We provide the planning considerations for running DB2 UDB in Siebel environment. The step-by-step installation and configuration details are followed. We then describe information on methods to populate and maintain data in Siebel tables including data archival techniques and information on ensuring data integrity and data quality. The database administration, monitoring, and tuning tools provided by DB2 UDB and operating systems are discussed and the tool usage provided. The book also provides in-depth discussion on high availability and disaster recovery options and setup procedure for a Siebel/DB2 UDB environment. Finally, the book provides information about the components of Siebel Analytics and where these components fit in the overall scheme with Siebel Enterprise.