talk-data.com talk-data.com

Topic

data-engineering

3377

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: O'Reilly Data Engineering Books ×
ArcPy and ArcGIS: Geospatial Analysis with Python

"ArcPy and ArcGIS: Geospatial Analysis with Python" introduces you to streamlining geospatial analysis using the ArcPy library in Python. You'll learn to automate repetitive GIS tasks, enhance your workflow in ArcGIS, and handle geospatial data programmatically to achieve efficient and accurate results in your projects. What this Book will help me do Master the use of the ArcPy library to automate and optimize GIS workflows. Learn techniques to efficiently handle geospatial data updates and analysis in Python. Understand how to use Python scripting to dynamically create and manage maps and analyses. Gain the skills to enhance repetitive GIS tasks into custom Python tools to increase productivity. Explore advanced geospatial analysis topics using Python's ArcPy module for complex problem-solving. Author(s) Silas Toms is a seasoned GIS professional with extensive experience in Python programming for geospatial applications. With years of hands-on work in automating GIS processes and teaching others, Silas excels at making technical concepts relatable and useful for real-world applications. His practical writing style ensures readers can effectively apply what they learn. Who is it for? This book is ideal for GIS students and professionals who wish to enhance their efficiency by automating tasks in ArcGIS using Python. It also suits Python developers keen on exploring geospatial data analysis and management workflows. Suitable for those with basic GIS knowledge, the book bridges the gap to advanced GIS automation techniques. It's perfect if you aim to streamline repetitive tasks and integrate programming into your geospatial projects.

PostgreSQL Server Programming - Second Edition

Delve into the concepts and practices of PostgreSQL server-side programming with this practical and insightful guide. Learn how to extend PostgreSQL functionality through user-defined functions, various procedural languages, and effective debugging techniques. Gain a deeper understanding of PostgreSQL 9.4's features to optimize your database's capabilities. What this Book will help me do Master PostgreSQL's PL/pgSQL and other procedural languages for server-side programming. Craft powerful user-defined functions to provide database functionality specific to your needs. Explore debugging techniques and tools, including PL/pgSQL debugging extensions and NOTIFY. Scale and optimize databases effectively using tools like PL/Proxy. Leverage new features in PostgreSQL 9.4, such as event triggers, to enhance database performance. Author(s) The book is authored by experienced PostgreSQL professionals None Dar, None Krosing, and Jim Mlodgenski. Together, they bring years of expertise in database design, architecture, and development. Their combined backgrounds ensure a comprehensive and practical learning experience for readers. They aim to share practical insights and structured knowledge for database enthusiasts. Who is it for? This book is ideal for database professionals with a moderate to advanced understanding of PostgreSQL. Readers should have experience with SQL, query optimization concepts, and basic programming in languages like Python, Perl, or C. If you are aiming to enhance your knowledge of PostgreSQL in-depth capabilities and get hands-on with advanced features such as server programming and database scale optimization, this book is suitable for you.

Apache Flume: Distributed Log Collection for Hadoop - Second Edition

"Apache Flume: Distributed Log Collection for Hadoop - Second Edition" is your hands-on guide to learning how to use Apache Flume to reliably collect and move logs and data streams into your Hadoop ecosystem. Through practical examples and real-world scenarios, this book will help you master the setup, configuration, and optimization of Flume for various data ingestion use cases. What this Book will help me do Understand the key concepts and architecture behind Apache Flume to build reliable and scalable data ingestion systems. Set up Flume agents to collect and transfer data into the Hadoop File System (HDFS) or other storage solutions effectively. Learn stream data processing techniques, such as filtering, transforming, and enriching data during transit to improve data usability. Integrate Flume with other tools like Elasticsearch and Solr to enhance analytics and search capabilities. Implement monitoring and troubleshooting workflows to maintain healthy and optimized Flume data pipelines. Author(s) Steven Hoffman, a seasoned software developer and data engineer, brings years of practical experience working with big data technologies to this book. He has a strong background in distributed systems and big data solutions, having implemented enterprise-scale analytics projects. Through clear and approachable writing, he aims to empower readers to successfully deploy reliable data pipelines using Apache Flume. Who is it for? This book is written for Hadoop developers, data engineers, and IT professionals who seek to build robust pipelines for streaming data into Hadoop environments. It is ideal for readers who have a basic understanding of Hadoop and HDFS but are new to Apache Flume. If you are looking to enhance your analytics capabilities by efficiently ingesting, routing, and processing streaming data, this book is for you. Beginners as well as experienced engineers looking to dive deeper into Flume will find it insightful.

Couchbase Essentials

Couchbase Essentials is your gateway to mastering Couchbase, a powerful NoSQL database designed for building flexible and scalable applications. Through this book, you will understand Couchbase's key features, explore its indexing and querying capabilities, and learn to design schemas for its schemaless document model. What this Book will help me do Understand how to install and configure a single-node Couchbase environment. Master putting data into and retrieving data from Couchbase using its API. Develop skills in creating secondary and advanced indexes using Couchbase MapReduce views. Learn to design an efficient schema for Couchbase's schemaless document database. Create and query a functional application utilizing Couchbase and its N1QL query language. Author(s) John C Zablocki is an experienced software developer and technology enthusiast with a deep understanding of NoSQL databases like Couchbase. With years of practical experience, John has been instrumental in implementing Couchbase in scalable applications, and he shares actionable insights in this well-rounded book. Who is it for? This book is tailored for application developers aiming to enhance their applications with NoSQL capabilities. It is highly suitable for developers with backgrounds in relational databases, as well as those new to NoSQL systems. If you are interested in building modern, scalable applications, this comprehensive guide to Couchbase is for you.

Hadoop MapReduce v2 Cookbook - Second Edition

Explore insights from vast datasets with "Hadoop MapReduce v2 Cookbook - Second Edition." This book serves as a practical guide for developers and system administrators who aim to master big data processing using Hadoop v2. By engaging with its step-by-step recipes, you will learn to harness the Hadoop MapReduce ecosystem for scalable and efficient data solutions. What this Book will help me do Master the configuration and management of Hadoop YARN, MapReduce v2, and HDFS clusters. Integrate big data tools such as Hive, HBase, Pig, Mahout, and Nutch with Hadoop v2. Develop analytics solutions for large-scale datasets using MapReduce-based applications. Address specific challenges like data classification, recommendations, and text analytics leveraging Hadoop MapReduce. Deploy and manage big data clusters effectively, including options for cloud environments. Author(s) The authors behind "Hadoop MapReduce v2 Cookbook - Second Edition" combine their deep expertise in big data technology and years of experience working directly with Hadoop. They have helped numerous organizations implement scalable data processing solutions and are passionate about teaching others. Their approach ensures readers gain both foundational knowledge and practical skills. Who is it for? This book is perfect for developers and system administrators who want to learn Hadoop MapReduce v2, including configuring and managing big data clusters. Beginners with basic Java knowledge can follow along to advance their skills in big data processing. Ideal for those transitioning to Hadoop v2 or requiring practical recipes for immediate application. Great for professionals aiming to deepen their expertise in scalable data technologies.

Learning Apache Cassandra

Learning Apache Cassandra is your comprehensive guide to mastering one of the most popular distributed databases for building scalable, fault-tolerant data layers. Through step-by-step examples and clear explanations, this book will help you understand Cassandra's architecture and how to use its features to design efficient applications. What this Book will help me do Successfully install and set up Apache Cassandra in your environment. Develop highly scalable data models for various application scenarios. Implement efficient query designs using Cassandra's specialized APIs. Maintain data consistency and handle concurrent updates in distributed systems. Apply best practices for securing Cassandra deployments and managing distributed data. Author(s) None Brown is an experienced software developer with a focus on database systems and distributed architectures. With years of hands-on experience working with SQL and NoSQL databases, they bring practical insights and clear instructions to their readers. Their writing aims to demystify complex topics and provide practical learning paths. Who is it for? This book is intended for software developers and database administrators looking to expand their knowledge of distributed databases. If you are familiar with SQL databases like MySQL or PostgreSQL and want to transition to Cassandra, this guide will help you. No prior experience with distributed databases is assumed. By following this book, you'll quickly become proficient in using Cassandra for your distributed application needs.

IBM Business Process Manager V8.5 Performance Tuning and Best Practices

This IBM® Redbooks® publication provides performance tuning tips and best practices for IBM Business Process Manager (IBM BPM) V8.5.5 (all editions) and IBM Business Monitor V8.5.5. These products represent an integrated development and runtime environment based on a key set of service-oriented architecture (SOA) and business process management (BPM) technologies. Such technologies include Service Component Architecture (SCA), Service Data Object (SDO), Business Process Execution Language (BPEL) for web services, and Business Processing Modeling Notation (BPMN). Both IBM Business Process Manager and Business Monitor build on the core capabilities of the IBM WebSphere® Application Server infrastructure. As a result, Business Process Manager solutions benefit from tuning, configuration, and best practices information for WebSphere Application Server and the corresponding platform Java virtual machines (JVMs). This book targets a wide variety of groups, both within IBM (development, services, technical sales, and others) and customers. For customers who are either considering or are in the early stages of implementing a solution incorporating Business Process Manager and Business Monitor, this document proves a useful reference. The book is useful both in terms of best practices during application development and deployment and as a reference for setup, tuning, and configuration information. This book talks about many issues that can influence performance of each product and can serve as a guide for making rational first choices in terms of configuration and performance settings. Similarly, customers who already implemented a solution with these products can use the information presented here to gain insight into how their overall integrated solution performance can be improved.

NoSQL For Dummies

Get up to speed on the nuances of NoSQL databases and what they mean for your organization This easy to read guide to NoSQL databases provides the type of no-nonsense overview and analysis that you need to learn, including what NoSQL is and which database is right for you. Featuring specific evaluation criteria for NoSQL databases, along with a look into the pros and cons of the most popular options, NoSQL For Dummies provides the fastest and easiest way to dive into the details of this incredible technology. You'll gain an understanding of how to use NoSQL databases for mission-critical enterprise architectures and projects, and real-world examples reinforce the primary points to create an action-oriented resource for IT pros. If you're planning a big data project or platform, you probably already know you need to select a NoSQL database to complete your architecture. But with options flooding the market and updates and add-ons coming at a rapid pace, determining what you require now, and in the future, can be a tall task. This is where NoSQL For Dummies comes in! Learn the basic tenets of NoSQL databases and why they have come to the forefront as data has outpaced the capabilities of relational databases Discover major players among NoSQL databases, including Cassandra, MongoDB, MarkLogic, Neo4J, and others Get an in-depth look at the benefits and disadvantages of the wide variety of NoSQL database options Explore the needs of your organization as they relate to the capabilities of specific NoSQL databases Big data and Hadoop get all the attention, but when it comes down to it, NoSQL databases are the engines that power many big data analytics initiatives. With NoSQL For Dummies, you'll go beyond relational databases to ramp up your enterprise's data architecture in no time.

YARN Essentials

"YARN Essentials" offers a practical introduction to Apache Hadoop YARN. With this book, you will acquire the skills to install, configure, and manage YARN clusters effectively. It provides hands-on guidance for deploying and managing applications and emerging frameworks, making this resource vital for mastering this key Hadoop technology. What this Book will help me do Learn how to install and configure Apache YARN from scratch. Understand YARN's architecture and its integration with the Hadoop ecosystem. Gain the ability to fine-tune a YARN cluster for optimal performance and scalability. Develop skills to create and run applications on a shared YARN cluster environment. Become proficient in managing, troubleshooting, and expanding YARN capabilities. Author(s) None Fasale and Nirmal Kumar are experienced professionals specializing in Hadoop and distributed systems. With years of hands-on experience in YARN and managing large-scale data processing frameworks, they bring their comprehensive expertise into this guide. Their focus on clarity and applicable knowledge ensures readers gain practical skills alongside theoretical understanding. Who is it for? This book is ideal for Hadoop administrators or developers with background knowledge of Hadoop 1.x, seeking to specialize in managing YARN clusters effectively. It assumes familiarity with basic Hadoop concepts while providing thorough explanations for YARN-specific features and topics. If you're looking to deploy scalable applications using YARN, this is the book for you.

Foundations of Linear and Generalized Linear Models

A valuable overview of the most important ideas and results in statistical modeling Written by a highly-experienced author, Foundations of Linear and Generalized Linear Models is a clear and comprehensive guide to the key concepts and results of linear statistical models. The book presents a broad, in-depth overview of the most commonly used statistical models by discussing the theory underlying the models, R software applications,and examples with crafted models to elucidate key ideas and promote practical modelbuilding. The book begins by illustrating the fundamentals of linear models, such as how the model-fitting projects the data onto a model vector subspace and how orthogonal decompositions of the data yield information about the effects of explanatory variables. Subsequently, the book covers the most popular generalized linear models, which include binomial and multinomial logistic regression for categorical data, and Poisson and negative binomial loglinear models for count data. Focusing on the theoretical underpinnings of these models, Foundations of Linear and Generalized Linear Models also features:

An introduction to quasi-likelihood methods that require weaker distributional assumptions, such as generalized estimating equation methods An overview of linear mixed models and generalized linear mixed models with random effects for clustered correlated data, Bayesian modeling, and extensions to handle problematic cases such as high dimensional problems Numerous examples that use R software for all text data analyses More than 400 exercises for readers to practice and extend the theory, methods, and data analysis A supplementary website with datasets for the examples and exercises

An invaluable textbook for upper-undergraduate and graduate-level students in statistics and biostatistics courses, Foundations of Linear and Generalized Linear Models is also an excellent reference for practicing statisticians and biostatisticians, as well as anyone who is interested in learning about the most important statistical models for analyzing data.

Oracle RMAN Database Duplication

RMAN is Oracle’s flagship backup and recovery tool, but did you know it’s also an effective database duplication tool? Oracle RMAN Database Duplication is a deep dive into RMAN’s duplication feature set, showing how RMAN can make it so much easier for you as a database administrator to satisfy the many requests from developers and testers for database copies and refreshes for use in their work. You’ll learn to make and refresh duplicate databases with a single command, and of course you can automate and schedule that command so that developers and testers are supplied with regular, known good databases without any manual intervention on your part. Fast and easy provisioning of databases for developers and testers is a driving force in the move to cloud computing and virtualization. RMAN’s robust database duplication feature set plays right into this growing need for ease of provisioning, enabling easy duplication of known-good databases on demand, across operating systems such as between Linux and Solaris, and even across storage environments such as when duplicating from a RAC/ASM environment to a single-node instance using regular file system storage. Oracle RMAN Database Duplication is your thorough guide to providing amazing business value to your organization by way of fast and easy provisioning of database duplicates in service of development and testing projects.

IBM Tape Library Guide for Open Systems

This IBM® Redbooks® publication presents a general introduction to Linear Tape-Open (LTO) technology and the implementation of corresponding IBM products. The high-performance, high-capacity, and cost-effective IBM TS1150 tape drive is included. The book highlights the IBM TS4500 tape library, which is the next-generation storage solution that is designed to help midsize and large enterprises respond to storage challenges. The IBM TS1150 tape drive gives organizations an easy way to deliver fast access to data, improve security, and provide long-term retention for less expense than disk solutions. TS1150 offers high-performance, flexible data storage with support for data encryption. This fifth-generation drive can help protect investments in tape automation by offering compatibility with existing automation. This eleventh edition includes information about the latest enhancements to the IBM Ultrium family of tape drives and tape libraries. In particular, it includes details of the latest IBM LTO Ultrium 6 tape drive technology and its implementation in IBM tape libraries. It contains technical information about each IBM tape product for open systems and includes generalized sections about Small Computer System Interface (SCSI) and Fibre Channel connections and multipath architecture configurations. This edition also includes details about Tape System Library Manager (TSLM), which consolidates and simplifies large TS3500 tape library environments, including the IBM Shuttle Complex. This book also covers tools and techniques for library management. It is intended for anyone who wants to understand more about IBM tape products and their implementation. It is suitable for IBM clients, IBM Business Partners, IBM specialist sales representatives, and technical specialists. If you do not have a background in computer tape storage products, you might need to read other sources of information. In the interest of being concise, topics that are generally understood are not covered in detail.

Learning Spark

Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and Maven coordinates.

Data: Emerging Trends and Technologies

What are the emerging trends and technologies that will transform the data landscape in coming months? In this report from Strata + Hadoop World co-chair Alistair Croll, you'll learn how the ubiquity of cheap sensors, fast networks, and distributed computing have given rise to several developments that will soon have a profound effect on individuals and society as a whole. Machine learning, for example, has quickly moved from lab tool to hosted, pay-as-you-go services in the cloud. Those services, in turn, are leading to predictive apps that will provide individuals with the right functionality and content at the right time by continuously learning about them and predicting what they'll need. Computational power can produce cognitive augmentation. Report topics include: The swing between centralized and distributed computing Machine learning as a service Personal digital assistants and cognitive augmentation Graph databases and analytics Regulating complex algorithms The pace of real-time data and automation Solving dire problems with big data Implications of having sensors everywhere This report contains many more examples of how big data is starting to reshape business and change behavior, and it's just a small sample of the in-depth information Strata + Hadoop World provides. Pick up this report and make plans to attend one of several Strata + Hadoop World conferences in the San Francisco Bay Area, London, and New York.

Extending IBM Business Process Manager to the Mobile Enterprise with IBM Worklight

In today's business in motion environments, workers expect to be connected to their critical business processes while on-the-go. It is imperative to deliver more meaningful user engagements by extending business processes to the mobile working environments. This IBM® Redbooks® publication provides an overview of the market forces that push organizations to reinvent their process with Mobile in mind. It describes IBM Mobile Smarter Process and explains how the capabilities provided by the offering help organizations to mobile-enable their processes. This book outlines an approach that organizations can use to identify where within the organization mobile technologies can offer the greatest benefits. It provides a high-level overview of the IBM Business Process Manager and IBM Worklight® features that can be leveraged to mobile-enable processes and accelerate the adoption of mobile technologies, improving time-to-value. Key IBM Worklight and IBM Business Process Manager capabilities are showcased in the examples included in this book. The examples show how to integrate with IBM Bluemix™ as the platform to implement various supporting processes. This IBM Redbooks publication discusses architectural patterns for exposing business processes to mobile environments. It includes an overview of the IBM MobileFirst reference architecture and deployment considerations. Through use cases and usage scenarios, this book explains how to build and deliver a business process using IBM Business Process Manager and how to develop a mobile app that enables remote users to interact with the business process while on-the-go, using the IBM Worklight Platform. The target audience for this book consists of solution architects, developers, and technical consultants who will learn the following information: What is IBM Mobile Smarter Process Patterns and benefits of a mobile-enabled Smarter Process IBM BPM features to mobile-enable processes IBM Worklight features to mobile-enable processes Mobile architecture and deployment topology IBM BPM interaction patterns Enterprise mobile security with IBM Security Access Manager and IBM Worklight Implementing mobile apps to mobile-enabled business processes

Learning Hadoop 2

Delve into the world of big data with 'Learning Hadoop 2', a comprehensive guide to leveraging the capabilities of Hadoop 2 for data processing and analysis. In this book, you will explore the tools and frameworks that integrate with Hadoop, discovering the best ways to design and deploy effective workflows for managing and analyzing large datasets. What this Book will help me do Understand the fundamentals of the MapReduce framework and its applications. Utilize advanced tools such as Samza and Spark for real-time and iterative data processing. Manage large datasets with data mining techniques tailored for Hadoop environments. Deploy Hadoop applications across various infrastructures, including local clusters and cloud services. Create and orchestrate sophisticated data workflows and pipelines with Apache Pig and Oozie. Author(s) Gabriele Modena is an experienced developer and trained data specialist with a keen focus on distributed data processing frameworks. Having worked extensively with big data platforms, Gabriele brings practical insights and a hands-on perspective to technical subjects. His writing is concise and engaging, aiming to render complex concepts accessible. Who is it for? This book is ideal for system and application developers eager to learn practical implementations of the Hadoop framework. Readers should be familiar with the Unix/Linux command-line interface and Java programming. Prior experience with Hadoop will be advantageous, but not necessary.

Dataflow Processing

Since its first volume in 1960, Advances in Computers has presented detailed coverage of innovations in computer hardware, software, theory, design, and applications. It has also provided contributors with a medium in which they can explore their subjects in greater depth and breadth than journal articles usually allow. As a result, many articles have become standard references that continue to be of significant, lasting value in this rapidly expanding field. In-depth surveys and tutorials on new computer technology Well-known authors and researchers in the field Extensive bibliographies with most chapters Many of the volumes are devoted to single themes or subfields of computer science

Implementing the IBM Storwize V5000

Organizations of all sizes are faced with the challenge of managing massive volumes of increasingly valuable data. But storing this data can be costly, and extracting value from the data is becoming more difficult. IT organizations have limited resources but must stay responsive to dynamic environments and act quickly to consolidate, simplify, and optimize their IT infrastructures. The IBM® Storwize® V5000 system provides a smarter solution that is affordable, easy to use, and self-optimizing, which enables organizations to overcome these storage challenges. Storwize V5000 delivers efficient, entry-level configurations that are specifically designed to meet the needs of small and midsize businesses. Designed to provide organizations with the ability to consolidate and share data at an affordable price, Storwize V5000 offers advanced software capabilities that are usually found in more expensive systems. This IBM Redbooks® publication is intended for pre-sales and post-sales technical support professionals and storage administrators. The concepts in this book also relate to the IBM Storwize V3700. This book was written at a software level of Version 7 Release 4.

Big Data Analytics

With this book, managers and decision makers are given the tools to make more informed decisions about big data purchasing initiatives. Big Data Analytics: A Practical Guide for Managers not only supplies descriptions of common tools, but also surveys the various products and vendors that supply the big data market. Comparing and contrasting the different types of analysis commonly conducted with big data, this accessible reference presents clear-cut explanations of the general workings of big data tools. Instead of spending time on HOW to install specific packages, it focuses on the reasons WHY readers would install a given package. The book provides authoritative guidance on a range of tools, including open source and proprietary systems. It details the strengths and weaknesses of incorporating big data analysis into decision-making and explains how to leverage the strengths while mitigating the weaknesses. Describes the benefits of distributed computing in simple terms Includes substantial vendor/tool material, especially for open source decisions Covers prominent software packages, including Hadoop and Oracle Endeca Examines GIS and machine learning applications Considers privacy and surveillance issues The book further explores basic statistical concepts that, when misapplied, can be the source of errors. Time and again, big data is treated as an oracle that discovers results nobody would have imagined. While big data can serve this valuable function, all too often these results are incorrect, yet are still reported unquestioningly. The probability of having erroneous results increases as a larger number of variables are compared unless preventative measures are taken. The approach taken by the authors is to explain these concepts so managers can ask better questions of their analysts and vendors as to the appropriateness of the methods used to arrive at a conclusion. Because the world of science and medicine has been grappling with similar issues in the publication of studies, the authors draw on their efforts and apply them to big data.

Extend Microsoft Access Applications to the Cloud

Learn how to create an Access web app, and move your database into the cloud. This practical book shows you how to design an Access web app for Microsoft Office 365, and convert existing Access desktop databases to a web app as well. You’ll quickly learn your way around the web app design environment, including how to capitalize on its strengths and avoid the pitfalls. You don’t need any special web skills to get started. Discover how to: Make your desktop database compatible with web app table structures Create tables, views, and queries Customize the table selector and work with popup views to provide a navigation interface Implement business rules using the Macro Programming Tools Develop using Office 365 and SharePoint 2013 Use SQL Azure to investigate how your web app is structured Design, test, and troubleshoot Data Macros Understand how security links between a web app and Office 365 Deploy a public facing web app on your Office 365 public website