talk-data.com talk-data.com

Topic

data-engineering

3395

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

3395 activities · Newest first

Mobile Security and Privacy

Mobile Security and Privacy: Advances, Challenges and Future Research Directions provides the first truly holistic view of leading edge mobile security research from Dr. Man Ho Au and Dr. Raymond Choo—leading researchers in mobile security. Mobile devices and apps have become part of everyday life in both developed and developing countries. As with most evolving technologies, mobile devices and mobile apps can be used for criminal exploitation. Along with the increased use of mobile devices and apps to access and store sensitive, personally identifiable information (PII) has come an increasing need for the community to have a better understanding of the associated security and privacy risks. Drawing upon the expertise of world-renowned researchers and experts, this volume comprehensively discusses a range of mobile security and privacy topics from research, applied, and international perspectives, while aligning technical security implementations with the most recent developments in government, legal, and international environments. The book does not focus on vendor-specific solutions, instead providing a complete presentation of forward-looking research in all areas of mobile security. The book will enable practitioners to learn about upcoming trends, scientists to share new directions in research, and government and industry decision-makers to prepare for major strategic decisions regarding implementation of mobile technology security and privacy. In addition to the state-of-the-art research advances, this book also discusses prospective future research topics and open challenges. Presents the most current and leading edge research on mobile security and privacy, featuring a panel of top experts in the field Provides a strategic and international overview of the security issues surrounding mobile technologies Covers key technical topics and provides readers with a complete understanding of the most current research findings along with future research directions and challenges Enables practitioners to learn about upcoming trends, scientists to share new directions in research, and government and industry decision-makers to prepare for major strategic decisions regarding the implementation of mobile technology security and privacy initiatives

SAP Flexible Real Estate Management

Learn SAP's real estate management integrated solution to effectively manage the real estate portfolio at your organization. You will configure SAP REFX for business scenarios covering solutions from master data to financial posting and reporting. You will address all phases of the real estate life cycle, including real estate acquisition or disposal, portfolio management, and property and technical management. To succeed in today's global and highly competitive economy, asset optimization in real estate management has become a strategic task. Organizations need to ensure insight into their property portfolio to make informed decisions, improve portfolio performance, and reduce compliance costs. Sophisticated solutions are needed to manage changing consumer demands and the global workforce as well as information management, compliance adherence, and leasing and property management. SAP Flexible Real Estate Management by Daithankar is a full-featured book that integrates REFX with Controlling (CO), Plant and Maintenance (PM), CRM, SAP AA (ssset accounting), and SAP PS (project systems). You will refer to real-world, practical examples to illustrate configuration concepts and processes, and learn in an interactive, hands-on way through the use of screenshots, menu paths, and transaction codes throughout the book. What You Will Learn: Understand the SAP REFX Solutions landscape and industry best practices for SAP REFX implementation Configure SAP REFX Integrate REFX with other modules Understand how processes are supported by SAP REFX Who This Book Is For: CIOs/CEOs of organizations with real estate portfolios, SAP REFX purchasing decision makers, SAP REFX pre-sales teams, SAP REFX implementation/AMS consultants

Data Hiding Techniques in Windows OS

"This unique book delves down into the capabilities of hiding and obscuring data object within the Windows Operating System. However, one of the most noticeable and credible features of this publication is, it takes the reader from the very basics and background of data hiding techniques, and run’s on the reading-road to arrive at some of the more complex methodologies employed for concealing data object from the human eye and/or the investigation. As a practitioner in the Digital Age, I can see this book siting on the shelves of Cyber Security Professionals, and those working in the world of Digital Forensics – it is a recommended read, and is in my opinion a very valuable asset to those who are interested in the landscape of unknown unknowns. This is a book which may well help to discover more about that which is not in immediate view of the onlooker, and open up the mind to expand its imagination beyond its accepted limitations of known knowns." - John Walker, CSIRT/SOC/Cyber Threat Intelligence Specialist Featured in Digital Forensics Magazine, February 2017 In the digital world, the need to protect online communications increase as the technology behind it evolves. There are many techniques currently available to encrypt and secure our communication channels. Data hiding techniques can take data confidentiality to a new level as we can hide our secret messages in ordinary, honest-looking data files. Steganography is the science of hiding data. It has several categorizations, and each type has its own techniques in hiding. Steganography has played a vital role in secret communication during wars since the dawn of history. In recent days, few computer users successfully manage to exploit their Windows® machine to conceal their private data. Businesses also have deep concerns about misusing data hiding techniques. Many employers are amazed at how easily their valuable information can get out of their company walls. In many legal cases a disgruntled employee would successfully steal company private data despite all security measures implemented using simple digital hiding techniques. Human right activists who live in countries controlled by oppressive regimes need ways to smuggle their online communications without attracting surveillance monitoring systems, continuously scan in/out internet traffic for interesting keywords and other artifacts. The same applies to journalists and whistleblowers all over the world. Computer forensic investigators, law enforcements officers, intelligence services and IT security professionals need a guide to tell them where criminals can conceal their data in Windows® OS & multimedia files and how they can discover concealed data quickly and retrieve it in a forensic way. Data Hiding Techniques in Windows OS is a response to all these concerns. Data hiding topics are usually approached in most books using an academic method, with long math equations about how each hiding technique algorithm works behind the scene, and are usually targeted at people who work in the academic arenas. This book teaches professionals and end users alike how they can hide their data and discover the hidden ones using a variety of ways under the most commonly used operating system on earth, Windows®.

IBM DS8880 Architecture and Implementation (Release 8.1)

This edition applies to the DS8880 with DS8000 Release 8.1 Modification1 code. This IBM® Redbooks® publication describes the concepts, architecture, and implementation of the IBM DS8880 family. The book provides reference information to assist readers who need to plan for, install, and configure the DS8880 systems. The IBM DS8000® family is a high-performance, high-capacity, highly secure, and resilient series of disk storage systems. The DS8880 family is the latest and most advanced of the DS8000 offerings to date. The high availability, multiplatform support, including IBM z Systems™, and simplified management tools help provide a cost-effective path to an on-demand world. The IBM DS8880 family includes the new high-performance all-flash model (DS8888 Model 982, with its associated DS8888 Expansion Unit Model 98F), and two high-performance hybrid models (DS8884 Model 980 with its associated DS8884 Expansion Unit Model 98B, and DS8886 Model 981 with its associated DS8886 Expansion Unit Model 98E). Two powerful IBM POWER8® processor-based servers manage the cache to streamline disk I/Os, maximizing performance and throughput. These capabilities are further enhanced with the availability of high-performance flash enclosures (HPFEs). A major change with the introduction of the DS8880 is the reduction of the footprint to a 19-inch rack. Like its predecessors, the DS8880 supports advanced disaster recovery (DR) solutions, business continuity solutions, and thin provisioning. All disk drives in the DS8880 storage system include the Full Disk Encryption (FDE) feature. The DS8880 can automatically optimize the use of each storage tier, particularly flash drives and flash cards, through the IBM Easy Tier® feature. The DS8880 now includes the Copy Services Manager code and allows for easier integration in a Lightweight Directory Access Protocol (LDAP) infrastructure.

Beginning SQL Server Reporting Services

Learn SQL Server Reporting Services and become current with the 2016 edition. Develop interactive, dynamic reports that combine graphs, charts, and tabular data into attractive dashboards and reports to delight business analysts and other users of corporate data. Deliver mobile reports to anywhere and any device. Build vital knowledge of Reporting Services at a time when Microsoft's dominance in business intelligence is on the rise. turns novices into skilled report developers. The book begins by explaining how to set up the development environment. It then walks you through creating your first reports using the built-in wizard. After showing what is possible, the book breaks down and explains the skills needed to create reports from scratch. And not just reports! But also dashboards with charts, graphs, and maps. Each chapter builds on knowledge gained in the previous chapters with step-by-step tutorials. Beginning SQL Server Reporting Services boosts your skills and provides you additional career options. Don't be without those options. Grab and read this book today. Beginning SQL Server Reporting Services Build reports with and without the built-in wizard. Build interactive features such as drill-through reports. Build dashboards full of charts, graphs, and maps. Build mobile reports. What You Will Learn Set up your development environment. Organized projects and share components among reports. Create report using a wizard. Create reports from scratch, including grouping levels, parameters, and drill through features. Build interactive dashboard with graphs, charts, and maps. Deploy and manage reports for use by others in the business. Who This Book Is For Database professionals of all experience levels who have some experience in databases and want to make the leap into business intelligence reporting. The book is an excellent choice for those needing to add Reporting Services to their current list of skills, or who are looking for a skill set that is in demand for in order to break into IT.

Hadoop: Data Processing and Modelling

Unlock the power of your data with Hadoop 2.X ecosystem and its data warehousing techniques across large data sets About This Book Conquer the mountain of data using Hadoop 2.X tools The authors succeed in creating a context for Hadoop and its ecosystem Hands-on examples and recipes giving the bigger picture and helping you to master Hadoop 2.X data processing platforms Overcome the challenging data processing problems using this exhaustive course with Hadoop 2.X Who This Book Is For This course is for Java developers, who know scripting, wanting a career shift to Hadoop - Big Data segment of the IT industry. So if you are a novice in Hadoop or an expert, this book will make you reach the most advanced level in Hadoop 2.X. What You Will Learn Best practices for setup and configuration of Hadoop clusters, tailoring the system to the problem at hand Integration with relational databases, using Hive for SQL queries and Sqoop for data transfer Installing and maintaining Hadoop 2.X cluster and its ecosystem Advanced Data Analysis using the Hive, Pig, and Map Reduce programs Machine learning principles with libraries such as Mahout and Batch and Stream data processing using Apache Spark Understand the changes involved in the process in the move from Hadoop 1.0 to Hadoop 2.0 Dive into YARN and Storm and use YARN to integrate Storm with Hadoop Deploy Hadoop on Amazon Elastic MapReduce and Discover HDFS replacements and learn about HDFS Federation In Detail As Marc Andreessen has said "Data is eating the world," which can be witnessed today being the age of Big Data, businesses are producing data in huge volumes every day and this rise in tide of data need to be organized and analyzed in a more secured way. With proper and effective use of Hadoop, you can build new-improved models, and based on that you will be able to make the right decisions. The first module, Hadoop beginners Guide will walk you through on understanding Hadoop with very detailed instructions and how to go about using it. Commands are explained using sections called "What just happened" for more clarity and understanding. The second module, Hadoop Real World Solutions Cookbook, 2nd edition, is an essential tutorial to effectively implement a big data warehouse in your business, where you get detailed practices on the latest technologies such as YARN and Spark. Big data has become a key basis of competition and the new waves of productivity growth. Hence, once you get familiar with the basics and implement the end-to-end big data use cases, you will start exploring the third module, Mastering Hadoop. So, now the question is if you need to broaden your Hadoop skill set to the next level after you nail the basics and the advance concepts, then this course is indispensable. When you finish this course, you will be able to tackle the real-world scenarios and become a big data expert using the tools and the knowledge based on the various step-by-step tutorials and recipes. Style and approach This course has covered everything right from the basic concepts of Hadoop till you master the advance mechanisms to become a big data expert. The goal here is to help you learn the basic essentials using the step-by-step tutorials and from there moving toward the recipes with various real-world solutions for you. It covers all the important aspects of Hadoop from system designing and configuring Hadoop, machine learning principles with various libraries with chapters illustrated with code fragments and schematic diagrams. This is a compendious course to explore Hadoop from the basics to the most advanced techniques available in Hadoop 2.X.

Cassandra 3.x High Availability - Second Edition

Cassandra 3.x High Availability is an in-depth guide to mastering the high availability features of Apache Cassandra. This book takes you through its architecture, implementing solutions to achieve zero downtime, and configuring clusters for fault tolerance and scalability. With practical examples and tips, it is a go-to resource for designing robust Cassandra-powered systems. What this Book will help me do Understand the architecture of Apache Cassandra and its high availability mechanisms. Master replication and tunable consistency levels for optimal data distribution. Learn to scale out your Cassandra deployments with multiple data centers. Acquire skills in creating efficient and scalable data models for fault-tolerant systems. Prevent system failures by avoiding anti-patterns and managing graceful failover scenarios. Author(s) None Strickland has extensive experience working as a developer and architect with distributed database systems. Specializing in Apache Cassandra, Strickland focuses on designing systems with high availability, scalability, and fault tolerance. Their practical teaching style ensures readers gain actionable knowledge to build robust database solutions. Who is it for? This book is ideal for developers and DevOps engineers familiar with Cassandra basics who wish to deepen their expertise. If your goal is to build highly available and fault-tolerant systems, this book will guide you step by step. It suits professionals managing data-intensive applications and looking to optimize their database strategy using Cassandra.

Practical Hive: A Guide to Hadoop's Data Warehouse System

Dive into the world of SQL on Hadoop and get the most out of your Hive data warehouses. This book is your go-to resource for using Hive: authors Scott Shaw, Ankur Gupta, David Kjerrumgaard, and Andreas Francois Vermeulen take you through learning HiveQL, the SQL-like language specific to Hive, to analyze, export, and massage the data stored across your Hadoop environment. From deploying Hive on your hardware or virtual machine and setting up its initial configuration to learning how Hive interacts with Hadoop, MapReduce, Tez and other big data technologies, Practical Hive gives you a detailed treatment of the software. In addition, this book discusses the value of open source software, Hive performance tuning, and how to leverage semi-structured and unstructured data. What You Will Learn Install and configure Hive for new and existing datasets Perform DDL operations Execute efficient DML operations Use tables, partitions, buckets, and user-defined functions Discover performance tuning tips and Hive best practices Who This Book Is For Developers, companies, and professionals who deal with large amounts of data and could use software that can efficiently manage large volumes of input. It is assumed that readers have the ability to work with SQL.

Big Data War

This book mainly focuses on why data analytics fails in business. It provides an objective analysis and root causes of the phenomenon, instead of abstract criticism of utility of data analytics. The author, then, explains in detail on how companies can survive and win the global big data competition, based on actual cases of companies. Having established the execution and performance-oriented big data methodology based on over 10 years of experience in the field as an authority in big data strategy, the author identifies core principles of data analytics using case analysis of failures and successes of actual companies. Moreover, he endeavors to share with readers the principles regarding how innovative global companies became successful through utilization of big data. This book is a quintessential big data analytics, in which the author’s knowhow from direct and indirect experiences is condensed. How do we survive at this big data war in which Facebook in SNS, Amazon in e-commerce, Google in search, expand their platforms to other areas based on their respective distinct markets? The answer can be found in this book. 

IBM Spectrum Archive Enterprise Edition V1.2.1: Installation and Configuration Guide

This IBM® Redbooks® publication helps you with the planning, installation, and configuration of the new IBM Spectrum™ Archive (formerly IBM Linear Tape File System™ (LTFS)) Enterprise Edition (EE) V1.2.1.0 for the IBM TS3310, IBM TS3500, and IBM TS4500 tape libraries. IBM Spectrum Archive™ EE enables the use of the LTFS for the policy management of tape as a storage tier in a IBM Spectrum Scale™ (formerly IBM General Parallel File System (IBM GPFS™)) based environment and helps encourage the use of tape as a critical tier in the storage environment. This is the second edition of IBM Spectrum Archive V1.2 (SG24-8333-00) although it is based on the prior editions of IBM Linear Tape File System Enterprise Edition V1.1.1.2: Installation and Configuration Guide, SG24-8143. IBM Spectrum Archive EE can run any application that is designed for disk files on a physical tape media. IBM Spectrum Archive EE supports the IBM Linear Tape-Open (LTO) Ultrium 7, 6, and 5 tape drives in IBM TS3310, TS3500, and TS4500 tape libraries. Also, IBM TS1140 and IBM TS1150 tape drives are supported in TS3500 and TS4500 tape library configurations. IBM Spectrum Archive EE can play a major role in reducing the cost of storage for data that does not need the access performance of primary disk. The use of IBM Spectrum Archive EE to replace disks with physical tape in Tier 2 and Tier 3 storage can improve data access over other storage solutions because it improves efficiency and streamlines management for files on tape. IBM Spectrum Archive EE simplifies the use of tape by making it transparent to the user and manageable by the administrator under a single infrastructure. This publication is intended for anyone who wants to understand more about IBM Spectrum Archive EE planning and implementation. This book is suitable for IBM clients, IBM Business Partners, IBM specialist sales representatives, and technical specialists.

IBM Data Engine for Hadoop and Spark

This IBM® Redbooks® publication provides topics to help the technical community take advantage of the resilience, scalability, and performance of the IBM Power Systems™ platform to implement or integrate an IBM Data Engine for Hadoop and Spark solution for analytics solutions to access, manage, and analyze data sets to improve business outcomes. This book documents topics to demonstrate and take advantage of the analytics strengths of the IBM POWER8® platform, the IBM analytics software portfolio, and selected third-party tools to help solve customer's data analytic workload requirements. This book describes how to plan, prepare, install, integrate, manage, and show how to use the IBM Data Engine for Hadoop and Spark solution to run analytic workloads on IBM POWER8. In addition, this publication delivers documentation to complement available IBM analytics solutions to help your data analytic needs. This publication strengthens the position of IBM analytics and big data solutions with a well-defined and documented deployment model within an IBM POWER8 virtualized environment so that customers have a planned foundation for security, scaling, capacity, resilience, and optimization for analytics workloads. This book is targeted at technical professionals (analytics consultants, technical support staff, IT Architects, and IT Specialists) that are responsible for delivering analytics solutions and support on IBM Power Systems.

Pro Oracle GoldenGate for the DBA

Take a simple approach to learning the Oracle GoldenGate product. This approach provides the in-depth perspective of GoldenGate from an implementer's viewpoint; however, also addresses why the management viewpoint is important as well. Your journey through this book includes and architecture discussion of GoldenGate and the benefits of purchasing GoldenGate from a management perspective. Then the book quickly moves into advanced implementation components associated with GoldenGate. You'll find many use-cases and instructions throughout the book to help with everything from easy to complex GoldenGate implementations. An Oracle GoldenGate implementation generally consists of a group project, involving both business and technical resources. provides the viewpoint from the DBA's vantage point. This approach provides the components of who, what, why, when, and how in defining the implementation and support of a GoldenGate project. The success of most technical projects require the support of multiple resource groups, and Pro Oracle GoldenGate for the DBA supplies the insight for the DBA member to understand the implementation and support process. Pro Oracle GoldenGate for the DBA Takes you through justification, installation, and support. Provides the DBA perspective toward a successful a result. Covers from basic toward increasingly advanced implementations What You Will Learn Understand the core architecture of data replication using Oracle GoldenGate Implement a one-way setup of a classic capture and an integrated capture and replication Design, architect and implement a multi-master replication model Replicate unsupported data types using tokens Manage and troubleshoot multiple GoldenGate implementations New features of GoldenGate supported in Oracle 12c Who this Book is For Pro Oracle GoldenGate for the DBA is aimed squarely at Oracle database administrators who find themselves involved in GoldenGate integration projects. The book provides the DBA view into such projects, helping database administrators toward successful implementations and solid business results.

Real World SQL and PL/SQL: Advice from the Experts

Master the Underutilized Advanced Features of SQL and PL/SQL This hands-on guide from Oracle Press shows how to fully exploit lesser known but extremely useful SQL and PL/SQL features―and how to effectively use both languages together. Written by a team of Oracle ACE Directors, Real-World SQL and PL/SQL: Advice from the Experts features best practices, detailed examples, and insider tips that clearly demonstrate how to write, troubleshoot, and implement code for a wide variety of practical applications. The book thoroughly explains underutilized SQL and PL/SQL functions and lays out essential development strategies. Data modeling, advanced analytics, database security, secure coding, and administration are covered in complete detail. Learn how to: • Apply advanced SQL and PL/SQL tools and techniques • Understand SQL and PL/SQL functionality and determine when to use which language • Develop accurate data models and implement business logic • Run PL/SQL in SQL and integrate complex datasets • Handle PL/SQL instrumenting and profiling • Use Oracle Advanced Analytics and Oracle R Enterprise • Build and execute predictive queries • Secure your data using encryption, hashing, redaction, and masking • Defend against SQL injection and other code-based attacks • Work with Oracle Virtual Private Database Code examples in the book are available for download at www.MHProfessional.com. TAG: For a complete list of Oracle Press titles, visit www.OraclePressBooks.com

IBM TS4500 R3 Tape Library Guide

The IBM® TS4500 tape library is a next-generation tape solution that offers higher storage density and integrated management than previous solutions. This IBM Redbooks® publication gives you a close-up view of the new IBM TS4500 tape library. In the TS4500, IBM delivers the density that today's and tomorrow's data growth require, with the cost-effectiveness and the manageability to grow with business data needs, while you preserve existing investments in IBM tape library products. Now, you can achieve both a low cost per terabyte (TB) and a high TB density per square foot because the TS4500 can store up to 5.5 petabytes (PBs) of data in a single 10-square foot library frame, which is up to 3.4 times more capacity than the IBM TS3500 tape library. The TS4500 offers these benefits: High availability dual active accessors with integrated service bays to reduce inactive service space by 40%. The Elastic Capacity option can be used to completely eliminate inactive service space. Flexibility to grow: The TS4500 library can grow from both the right side and the left side of the first L frame because models can be placed in any active position. Increased capacity: The TS4500 can grow from a single L frame up to an additional 17 expansion frames with a capacity of over 23,000 cartridges. High-density (HD) generation 1 frames from the existing TS3500 library can be redeployed in a TS4500. Capacity on demand (CoD): CoD is supported through entry-level, intermediate, and base-capacity configurations. Advanced Library Management System (ALMS): ALMS supports dynamic storage management, which enables users to create and change logical libraries and configure any drive for any logical library. Support for the IBM TS1150 tape drive: The TS1150 gives organizations an easy way to deliver fast access to data, improve security, and provide long-term retention, all at a lower cost than disk solutions. The TS1150 offers high-performance, flexible data storage with support for data encryption. Also, this fifth-generation drive can help protect investments in tape automation by offering compatibility with existing automation. Support of the IBM Linear Tape-Open (LTO) Ultrium 7 tape drive: The LTO Ultrium 7 offering represents significant improvements in capacity, performance, and reliability over the previous generation, LTO Ultrium 6, while they still protect your investment in the previous technology. Integrated TS7700 back-end Fibre Channel (FC) switches are available. Up to four library-managed encryption (LME) key paths per logical library are available. This book describes the TS4500 components, feature codes, specifications, supported tape drives, encryption, new integrated management console (IMC), and command-line interface (CLI). You learn how to accomplish several specific tasks: Improve storage density with increased expansion frame capacity up to 2.4 times and support 33% more tape drives per frame. Manage storage by using the ALMS feature. Improve business continuity and disaster recovery with dual active accessor, automatic control path failover, and data path failover. Help ensure security and regulatory compliance with tape-drive encryption and Write Once Read Many (WORM) media. Support IBM LTO Ultrium 7, 6, and 5, IBM TS1150, and TS1140 tape drives. Provide a flexible upgrade path for users who want to expand their tape storage as their needs grow. Reduce the storage footprint and simplify cabling with 10 U of rack space on top of the library. This guide is for anyone who wants to understand more about the IBM TS4500 tape library. It is particularly suitable for IBM clients, IBM Business Partners, IBM specialist sales representatives, and technical specialists.

Sams Teach Yourself Apache Spark™ in 24 Hours

Apache Spark is a fast, scalable, and flexible open source distributed processing engine for big data systems and is one of the most active open source big data projects to date. In just 24 lessons of one hour or less, Sams Teach Yourself Apache Spark in 24 Hours helps you build practical Big Data solutions that leverage Spark’s amazing speed, scalability, simplicity, and versatility. This book’s straightforward, step-by-step approach shows you how to deploy, program, optimize, manage, integrate, and extend Spark–now, and for years to come. You’ll discover how to create powerful solutions encompassing cloud computing, real-time stream processing, machine learning, and more. Every lesson builds on what you’ve already learned, giving you a rock-solid foundation for real-world success. Whether you are a data analyst, data engineer, data scientist, or data steward, learning Spark will help you to advance your career or embark on a new career in the booming area of Big Data. Learn how to • Discover what Apache Spark does and how it fits into the Big Data landscape • Deploy and run Spark locally or in the cloud • Interact with Spark from the shell • Make the most of the Spark Cluster Architecture • Develop Spark applications with Scala and functional Python • Program with the Spark API, including transformations and actions • Apply practical data engineering/analysis approaches designed for Spark • Use Resilient Distributed Datasets (RDDs) for caching, persistence, and output • Optimize Spark solution performance • Use Spark with SQL (via Spark SQL) and with NoSQL (via Cassandra) • Leverage cutting-edge functional programming techniques • Extend Spark with streaming, R, and Sparkling Water • Start building Spark-based machine learning and graph-processing applications • Explore advanced messaging technologies, including Kafka • Preview and prepare for Spark’s next generation of innovations Instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Spark to solve a wide spectrum of Big Data problems.

Architecting for Access

Fragmented, disparate backend data systems have become the norm in today’s enterprise, where you’ll find a mix of relational databases, Hadoop stores, and NoSQL engines, with access and analytics tools bolted on every which way. This mishmash of options presents a real challenge when it comes to choosing frontend analytics and visualization tools. How did we get here? In this O’Reilly report, IT veteran Rich Morrow takes you through the rapid changes to both backend storage and frontend analytics over the past decade, and provides a pragmatic list of requirements for an analytics stack that will centralize access to all of these data systems. You’ll examine current analytics platforms, including Looker—a new breed of analytics and visualization tools built specifically to handle our fragmented data space. Understand why and how data became so fractured so quickly Explore the tangled web of data and backend tools in today’s enterprises Learn the tool requirements for accessing and analyzing the full spectrum of data Examine the relative strengths of popular analytics and visualization tools, including Looker, Tableau, and MicroStrategy Inspect Looker’s unique focus on both the frontend and backend

In Search of Database Nirvana

The database pendulum is in full swing. Ten years ago, web-scale companies began moving away from proprietary relational databases to handle big data use cases with NoSQL and Hadoop. Now, for a variety of reasons, the pendulum is swinging back toward SQL-based solutions. What many companies really want is a system that can handle all of their operational, OLTP, BI, and analytic workloads. Could such an all-in-one database exist? This O’Reilly report examines this quest for database nirvana, or what Gartner recently dubbed Hybrid Transaction/Analytical Processing (HTAP). Author Rohit Jain takes an in-depth look at the possibilities and the challenges for companies that long for a single query engine to rule them all. With this report, you’ll explore: The challenges of having one query engine support operational, BI, and analytical workloads Efforts to produce a query engine that supports multiple storage engines Attempts to support multiple data models with the same query engine Why an HTAP database engine needs to provide enterprise-caliber capabilities, including high availability, security, and manageability How to assess various options for meeting workload requirements with one database engine, or a combination of query and storage engines

Interactive Spark using PySpark

Apache Spark is an in-memory framework that allows data scientists to explore and interact with big data much more quickly than with Hadoop. Python users can work with Spark using an interactive shell called PySpark. Why is it important? PySpark makes the large-scale data processing capabilities of Apache Spark accessible to data scientists who are more familiar with Python than Scala or Java. This also allows for reuse of a wide variety of Python libraries for machine learning, data visualization, numerical analysis, etc. What you'll learn—and how you can apply it Compare the different components provided by Spark, and what use cases they fit. Learn how to use RDDs (resilient distributed datasets) with PySpark. Write Spark applications in Python and submit them to the cluster as Spark jobs. Get an introduction to the Spark computing framework. Apply this approach to a worked example to determine the most frequent airline delays in a specific month and year. This lesson is for you because… You're a data scientist, familiar with Python coding, who needs to get up and running with PySpark You're a Python developer who needs to leverage the distributed computing resources available on a Hadoop cluster, without learning Java or Scala first Prerequisites Familiarity with writing Python applications Some familiarity with bash command-line operations Basic understanding of how to use simple functional programming constructs in Python, such as closures, lambdas, maps, etc. Materials or downloads needed in advance Apache Spark This lesson is taken from by Jenny Kim and Benjamin Bengfort. Data Analytics with Hadoop

IBM Tape Library Guide for Open Systems

This IBM® Redbooks® publication presents a general introduction to the latest IBM tape and tape library technologies. Featured tape technologies include the IBM LTO Ultrium and Enterprise 3592 tape drives, and their implementation in IBM tape libraries. This 13th edition includes information about the latest enhancements to the IBM TS4500 enterprise tape library. In particular, it includes details about the latest TS4500 High Availability feature and its elastic capacity option. This book also provides details about the new TS7650G IBM ProtecTIER® gateway model DD6, contains technical information about each IBM tape product for open systems, and includes generalized sections about Small Computer System Interface (SCSI) and Fibre Channel connections and multipath architecture configurations. This book also covers tools and techniques for library management. It is intended for anyone who wants to understand more about IBM tape products and their implementation. It is suitable for IBM clients, IBM Business Partners, IBM specialist sales representatives, and technical specialists. If you do not have a background in computer tape storage products, you might need to read other sources of information. In the interest of being concise, topics that are generally understood are not covered in detail.