talk-data.com talk-data.com

Topic

data-engineering

3377

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: O'Reilly Data Engineering Books ×
Beginning Hibernate: For Hibernate 5

Get started with the Hibernate 5 persistence layer and gain a clear introduction to the current standard for object-relational persistence in Java. This updated edition includes the new Hibernate 5.0 framework as well as coverage of NoSQL, MongoDB, and other related technologies, ranging from applications to big data. Beginning Hibernate is ideal if you're experienced in Java with databases (the traditional, or connected, approach), but new to open-source, lightweight Hibernate. The book keeps its focus on Hibernate without wasting time on nonessential third-party tools, so you'll be able to immediately start building transaction-based engines and applications. Experienced authors Joseph Ottinger with Dave Minter and Jeff Linwood provide more in-depth examples than any other book for Hibernate beginners. They present their material in a lively, example-based manner—not a dry, theoretical, hard-to-read fashion. What You'll Learn Build enterprise Java-based transaction-type applications that access complex data with Hibernate Work with Hibernate 5 using a present-day build process Use Java 8 features with Hibernate Integrate into the persistence life cycle Map using Java's annotations Search and query with the new version of Hibernate Integrate with MongoDB using NoSQL Keep track of versioned data with Hibernate Envers Who This Book Is For Experienced Java developers interested in learning how to use and apply object-relational persistence in Java and who are new to the Hibernate persistence framework.

Stepping Away from the Silos

For over twenty years, digitisation has been a core element of the modern information landscape. The digital lifecycle is now well defined, and standards and good practice have been developed for most of its key stages. There remains, however, a widespread lack of coordination of digitisation initiatives, both within and across different sectors, and there are disparate approaches to selection criteria. The result is ‘silos’ of digitised content. Stepping away from the Silos examines the strategic context in the UK since the 1990s and its effect on collaboration and coordination of exemplar digitisation initiatives in higher education and related sectors. It identifies the principal criteria for content selection that are common to the international literature in this field. The outputs of the exemplar projects are examined in relation to these criteria. A range of common practices and patterns in content selection appears to have developed over time, forming a de facto strategy from which several areas of critical mass have emerged. The book discusses the potential to improve strategic collaboration and coordinated selection by building on such a platform, and considers planning options in the context of work on national digitisation strategies in the UK and internationally. Summarises the rise of publicly funded digitisation in the UK from the 1990s to date and identifies the need to improve coordination and content selection criteria Reviews the role of digitisation in government and organisational strategies from the 1990s to the present day Examines the strategic position of collaboration within and across different organisations Identifies common selection criteria and outlines the coverage of exemplar projects Discusses the apparent emergence of a de facto selection strategy and the potential for national strategic planning of digitised content based on existing outputs and improved collaboration

Programming Pig, 2nd Edition

For many organizations, Hadoop is the first step for dealing with massive amounts of data. The next step? Processing and analyzing datasets with the Apache Pig scripting platform. With Pig, you can batch-process data without having to create a full-fledged application, making it easy to experiment with new datasets. Updated with use cases and programming examples, this second edition is the ideal learning tool for new and experienced users alike. You’ll find comprehensive coverage on key features such as the Pig Latin scripting language and the Grunt shell. When you need to analyze terabytes of data, this book shows you how to do it efficiently with Pig. Delve into Pig’s data model, including scalar and complex data types Write Pig Latin scripts to sort, group, join, project, and filter your data Use Grunt to work with the Hadoop Distributed File System (HDFS) Build complex data processing pipelines with Pig’s macros and modularity features Embed Pig Latin in Python for iterative processing and other advanced tasks Use Pig with Apache Tez to build high-performance batch and interactive data processing applications Create your own load and store functions to handle data formats and storage mechanisms

Oracle R Enterprise: Harnessing the Power of R in Oracle Database

Master the Big Data Capabilities of Oracle R Enterprise Effectively manage your enterprise’s big data and keep complex processes running smoothly using the hands-on information contained in this Oracle Press guide. Oracle R Enterprise: Harnessing the Power of R in Oracle Database shows, step-by-step, how to create and execute large-scale predictive analytics and maintain superior performance. Discover how to explore and prepare your data, accurately model business processes, generate sophisticated graphics, and write and deploy powerful scripts. You will also find out how to effectively incorporate Oracle R Enterprise features in APEX applications, OBIEE dashboards, and Apache Hadoop systems. Learn to: • Install, configure, and administer Oracle R Enterprise • Establish connections and move data to the database • Create Oracle R Enterprise packages and functions • Use the R language to work with data in Oracle Database • Build models using ODM, ORE, and other algorithms • Develop and deploy R scripts and use the R script repository • Execute embedded R scripts and employ ORE SQL API functions • Map and manipulate data using Oracle R Advanced Analytics for Hadoop • Use ORE in Oracle Data Miner, OBIEE, and other applications

EU General Data Protection Regulation (GDPR): An Implementation and Compliance Guide

An in-depth guide to the changes your organization needs to make to comply with the EU GDPR.

The EU General Data Protection Regulation (GDPR) will supersede the 1995 EU Data Protection Directive (DPD) and all EU member states’ national laws based on it – including the UK Data Protection Act 1998 – in May 2018.

All organizations – wherever they are in the world – that process the personally identifiable information (PII) of EU residents must comply with the Regulation. Failure to do so could result in fines of up to €20 million or 4% of annual global turnover.

US organizations that process EU residents’ personal data can comply with the GDPR via the EU-US Privacy Shield, which replaced the EU-US Safe Harbor framework in 2016. The Privacy Shield is based on the DPD, and will likely be updated once the GDPR is applied in May 2018.

This book provides a detailed commentary on the GDPR, explains the changes you need to make to your data protection and information security regimes, and tells you exactly what you need to do to avoid severe financial penalties.

Product overview

EU GDPR – An Implementation and Compliance Guide is a clear and comprehensive guide to this new data protection law, explaining the Regulation, and setting out the obligations of data processors and controllers in terms you can understand.

Topics covered include:

The role of the data protection officer (DPO) – including whether you need one and what they should do. Risk management and data protection impact assessments (DPIAs), including how, when and why to conduct a DPIA. Data subjects’ rights, including consent and the withdrawal of consent; subject access requests and how to handle them; and data controllers’ and processors’ obligations. International data transfers to “third countries” – including guidance on adequacy decisions and appropriate safeguards; the EU-US Privacy Shield; international organizations; limited transfers; and Cloud providers. How to adjust your data protection processes to transition to GDPR compliance, and the best way of demonstrating that compliance. A full index of the Regulation to help you find the articles and stipulations relevant to your organization.

The GDPR will have a significant impact on organizational data protection regimes around the world. EU GDPR – An implementation and Compliance Guide shows you exactly what you need to do to comply with the new law.

About the authors

IT Governance is a leading global provider of IT governance, risk management, and compliance expertise, and we pride ourselves on our ability to deliver a broad range of integrated, high-quality solutions that meet the real-world needs of our international client base.

Our privacy team – led by Alan Calder, Richard Campo, and Adrian Ross – has substantial experience in privacy, data protection, compliance, and information security. This experience, and our understanding of the background and drivers for the GDPR, are combined in this manual to provide the world’s first guide to implementing the new data protection regulation.

Spark in Action

Spark in Action teaches you the theory and skills you need to effectively handle batch and streaming data using Spark. Fully updated for Spark 2.0. About the Technology Big data systems distribute datasets across clusters of machines, making it a challenge to efficiently query, stream, and interpret them. Spark can help. It is a processing system designed specifically for distributed data. It provides easy-to-use interfaces, along with the performance you need for production-quality analytics and machine learning. Spark 2 also adds improved programming APIs, better performance, and countless other upgrades. About the Book Spark in Action teaches you the theory and skills you need to effectively handle batch and streaming data using Spark. You'll get comfortable with the Spark CLI as you work through a few introductory examples. Then, you'll start programming Spark using its core APIs. Along the way, you'll work with structured data using Spark SQL, process near-real-time streaming data, apply machine learning algorithms, and munge graph data using Spark GraphX. For a zero-effort startup, you can download the preconfigured virtual machine ready for you to try the book's code. What's Inside Updated for Spark 2.0 Real-life case studies Spark DevOps with Docker Examples in Scala, and online in Java and Python About the Reader Written for experienced programmers with some background in big data or machine learning. About the Authors Petar Zečević and Marko Bonaći are seasoned developers heavily involved in the Spark community. Quotes Dig in and get your hands dirty with one of the hottest data processing engines today. A great guide. - Jonathan Sharley, Pandora Media Must-have! Speed up your learning of Spark as a distributed computing framework. - Robert Ormandi, Yahoo! An easy-to-follow, step-by-step guide. - Gaurav Bhardwaj, 3Pillar Global An ambitiously comprehensive overview of Spark and its diverse ecosystem. - Jonathan Miller, Optensity

IBM i 7.2 Technical Overview with Technology Refresh Updates

This IBM® Redbooks® publication provides a technical overview of the features, functions, and enhancements that are available in IBM i 7.2, including all the available Technology Refresh (TR) levels, from TR1 to TR3. This publication provides a summary and brief explanation of the many capabilities and functions in the operating system. It also describes many of the licensed programs and application development tools that are associated with IBM i. The information that is provided in this book is useful for clients, IBM Business Partners, and IBM service professionals that are involved with planning, supporting, upgrading, and implementing IBM i 7.2 solutions.

Introduction to Apache Flink

There’s growing interest in learning how to analyze streaming data in large-scale systems such as web traffic, financial transactions, machine logs, industrial sensors, and many others. But analyzing data streams at scale has been difficult to do well—until now. This practical book delivers a deep introduction to Apache Flink, a highly innovative open source stream processor with a surprising range of capabilities.

Learning IBM Bluemix

Learning IBM Bluemix provides a comprehensive introduction to developing and deploying applications with the IBM Bluemix cloud platform. By following detailed examples and guided exercises, you'll understand the full life cycle of cloud-based application development, from initial setup to scaling and security. What this Book will help me do Understand the capabilities of IBM Bluemix as a Platform as a Service to build applications efficiently. Learn to develop and deploy applications using Cloud Foundry command line and Bluemix console. Explore microservices architecture and build scalable applications using Bluemix tools. Integrate on-premises systems with cloud-hosted applications on Bluemix. Develop mobile client applications with the support of Bluemix's Mobile services. Author(s) Sreelatha Sankaranarayanan is an experienced developer and cloud technology author, with extensive expertise in IBM Bluemix. Her passion for simplifying complex concepts is reflected in her engaging writing style, ensuring learners can master new skills effectively. She brings years of real-world experience in cloud computing and software development to her instructional materials. Who is it for? This book is tailored for developers aiming to transition to cloud-based application development using IBM Bluemix, with a focus on practical application. Readers should have foundational skills in Java and Node.js to fully benefit. Ideal for professionals looking to expand their capabilities with cloud infrastructure, or for those wanting to leverage microservices and cloud solutions in their applications.

Fast Data Processing with Spark 2 - Third Edition

Fast Data Processing with Spark 2 takes you through the essentials of leveraging Spark for big data analysis. You will learn how to install and set up Spark, handle data using its APIs, and apply advanced functionality like machine learning and graph processing. By the end of the book, you will be well-equipped to use Spark in real-world data processing tasks. What this Book will help me do Install and configure Apache Spark for optimal performance. Interact with distributed datasets using the resilient distributed dataset (RDD) API. Leverage the flexibility of DataFrame API for efficient big data analytics. Apply machine learning models using Spark MLlib to solve complex problems. Perform graph analysis using GraphX to uncover structural insights in data. Author(s) Krishna Sankar is an experienced data scientist and thought leader in big data technologies. With a deep understanding of machine learning, distributed systems, and Apache Spark, Krishna has guided numerous projects in data engineering and big data processing. Matei Zaharia, the co-author, is also widely recognized in the field of distributed systems and cloud computing, contributing to Apache Spark development. Who is it for? This book is catered to software developers and data engineers with a foundational understanding of Scala or Java programming. Beginner to medium-level understanding of big data processing concepts is recommended for readers. If you are aspiring to solve big data problems using scalable distributed computing frameworks, this book is perfect for you. By the end, you will be confident in building Spark-powered applications and analyzing data efficiently.

Oracle Application Express Administration: For DBAs and Developers

Succeed in managing Oracle Application Express (APEX) environments. This book focuses on creating the right combination of scalability, high-availability, backup and recovery, integrity, and resource control. The book covers everything from simple to enterprise-class deployments, with emphasis on enterprise-level requirements and coverage of cloud and hybrid-cloud scenarios. Many books cover how to develop applications in Oracle APEX. It's a tool with a fast-growing user-base as developers come to know how quick and easy it is to create new applications that run in a browser. However, just getting an application off the ground is only a small part of a bigger picture. Applications must be supported. They must be available when users need them. They must be robust against disaster and secure against malicious attack. These are the issues addressed in . These are the issues that when tackled successfully lead to long term success in using Oracle APEX as a rapid application-development toolset. Oracle Application Express Administration Readers of this book learn how to install the Oracle APEX engine in support of small-scale projects such as at the departmental level, and in support of enterprise-level projects accessed by thousands of users across dozens of time zones. Readers learn to take advantage of Oracle Database's underlying feature set in regards to application scalability and performance, integrity, security, high-availability, and robustness against failure and data loss. also describes different cloud solutions, integration with Oracle E-Business Suite, and helps in taking advantage of multitenancy in Oracle Database 12c and beyond. Oracle Application Express Administration Covers important enterprise considerations such as scalability, robustness, high-availability. Describes cloud-based application deployment scenarios Focuses on creating the right deployment environment for long-term success What You Will Learn Install, upgrade, and configure robust APEX environments Back up and recover APEX applications and their data Monitor and tune the APEX engine and its applications Benefit from new administration features in APEX 5.0 Run under multi-tenant architecture in Oracle Database 12c Manage the use of scarce resources with Resource Manager Secure your data with advanced security features Build high-availability into your APEX deployments Integrate APEX with Oracle E-Business Suite Who This Book Is For Architects, administrators, and developers who want to better understand how APEX works in a corporate environment. Readers will use this book to design deployment architectures around Oracle Database strengths like multi-tenancy, resource management, and high availability. The book is also useful to administrators responsible for installation and upgrade, backup and recovery, and the ongoing monitoring of the APEX engine and the applications built upon it.

Deliver Modern UI for IBM BPM with the Coach Framework and Other Approaches

IBM® Coach Framework is a key component of the IBM Business Process Manager (BPM) platform that enables custom user interfaces to be easily embedded within business process solutions. Developer tools enable process authors to rapidly create a compelling user experience (UI) that can be delivered to desktop and mobile devices. IBM Process Portal, used by business operations to access, execute, and manage tasks, is entirely coach-based and can easily be configured and styled. A corporate look and feel can be defined using a graphical theme editor and applied consistently across all process applications. The process federation capability enables business users to access and execute all their tasks using a single UI without being aware of the implementation or origin. Using Coach Framework, you can embed coach-based UI in other web applications, develop BPM UI using alternative UI technology, and create mobile applications for off-line working. This IBM Redbooks® publication explains how to fully benefit from the power of the Coach Framework. It focuses on the capabilities that Coach Framework delivers with IBM BPM version 8.5.7. The content of this document, though, is also pertinent to future versions of the application.

In-Place Analytics with Live Enterprise Data with IBM DB2 Query Management Facility

IBM® DB2® Query Management Facility™ for z/OS® provides a zero-footprint, mobile-enabled, highly secure business analytics solution. IBM QMF™ V11.2.1 offers many significant new features and functions in keeping with the ongoing effort to broaden its usage and value to a wider set of users and business areas. In this IBM Redbooks® publication, we explore several of the new features and options that are available within this new release. This publication introduces TSO enhancements for QMF Analytics for TSO and QMF Enhanced Editor. A chapter describes how the QMF Data Service component connects to multiple mainframe data sources to accomplish the consolidation and delivery of data. This publication describes how self-service business intelligence can be achieved by using QMF Vision to enable self-service dashboards and data exploration. A chapter is dedicated to JavaScript support, demonstrating how application developers can use JavaScript to extend the capabilities of QMF. Additionally, this book describes methods to take advantage of caching for reduced CPU consumption, wider access to information, and faster performance. This publication is of interest to anyone who wants to better understand how QMF can enable in-place analytics with live enterprise data.

Securing SQL Server: DBAs Defending the Database

Protect your data from attack by using SQL Server technologies to implement a defense-in-depth strategy, performing threat analysis, and encrypting sensitive data as a last line of defense against compromise. The multi-layered approach in this book helps ensure that a single breach doesn't lead to loss or compromise of your data that is confidential and important to the business. Database professionals in today's world deal increasingly often with repeated data attacks against high-profile organizations and sensitive data. It is more important than ever to keep your company's data secure. demonstrates how administrators and developers can both play their part in the protection of a SQL Server environment. Securing SQL Server This book provides a comprehensive technical guide to the security model, and to encryption within SQL Server, including coverage of the latest security technologies such as Always Encrypted, Dynamic Data Masking, and Row Level Security. Most importantly, the book gives practical advice and engaging examples on how to defend your data -- and ultimately your job! -- against attack and compromise. Covers the latest security technologies, including Always Encrypted, Dynamic Data Masking, and Row Level Security Promotes security best-practice and strategies for defense-in-depth of business-critical database assets Gives advice on performing threat analysis and reducing the attack surface that your database presents to the outside world What You Will Learn Perform threat analysis Implement access level control and data encryption Avoid non-reputability by implementing comprehensive auditing Use security metadata to ensure your security policies are enforced Apply the latest SQL Server technologies to increase data security Mitigate the risk of credentials being stolen Who This Book Is For SQL Server database administrators who need to understand and counteract the threat of attacks against their company's data. The book is also of interest to database administrators of other platforms, as several of the attack techniques are easily generalized beyond SQL Server and to other database brands.

Securing Your Cloud: IBM z/VM Security for IBM z Systems and LinuxONE

As workloads are being offloaded to IBM® z Systems™ based cloud environments, it is important to ensure that these workloads and environments are secure. This IBM Redbooks® publication describes the necessary steps to secure your environment for all of the components that are involved in a z Systems cloud infrastructure that uses IBM z/VM® and Linux on z Systems. The audience for this book is IT architects and those planning to use z Systems for their cloud environments.

A Practical Guide to ICF Catalogs

This IBM® Redbooks® publication gives a broad understanding of integrated catalog facility (ICF) catalog environments. It includes suggestions for design, planning, and deployment tasks to help you create and maintain a balanced and efficient catalog environment. Four scenarios are provided to illustrate sample implementations of typical activities that are associated with an organization’s requirements. Chapter 5, “Record-level sharing support for ICF catalogs” describes Record Level Sharing (RLS) for Catalogs and shows the results of our tests in a controlled laboratory environment. This version of the book is set at the IBM z/OS V2R2 level. This publication is for readers who want to gain an understanding of ICF catalogs and the considerations and practices that surround an ICF catalog environment deployment.

VersaStack Solution by Cisco and IBM with Oracle RAC, IBM FlashSystem V9000, and IBM Spectrum Protect

Dynamic organizations want to accelerate growth while reducing costs. To do so, they must speed the deployment of business applications and adapt quickly to any changes in priorities. Organizations today require an IT infrastructure that is easy, efficient, and versatile. The VersaStack solution by Cisco and IBM® can help you accelerate the deployment of your data centers. It reduces costs by more efficiently managing information and resources while maintaining your ability to adapt to business change. The VersaStack solution combines the innovation of Cisco UCS Integrated Infrastructure with the efficiency of the IBM Storwize® storage system. The Cisco UCS Integrated Infrastructure includes the Cisco Unified Computing System (Cisco UCS), Cisco Nexus and Cisco MDS switches, and Cisco UCS Director. The IBM FlashSystem® V9000 enhances virtual environments with its Data Virtualization, IBM Real-time Compression™, and IBM Easy Tier® features. These features deliver extraordinary levels of performance and efficiency. The VersaStack solution is Cisco Application Centric Infrastructure (ACI) ready. Your IT team can build, deploy, secure, and maintain applications through a more agile framework. Cisco Intercloud Fabric capabilities help enable the creation of open and highly secure solutions for the hybrid cloud. These solutions accelerate your IT transformation while delivering dramatic improvements in operational efficiency and simplicity. Cisco and IBM are global leaders in the IT industry. The VersaStack solution gives you the opportunity to take advantage of integrated infrastructure solutions that are targeted at enterprise applications, analytics, and cloud solutions. The VersaStack solution is backed by Cisco Validated Designs (CVD) to provide faster delivery of applications, greater IT efficiency, and less risk. This IBM Redbooks® publication is aimed at experienced storage administrators who are tasked with deploying a VersaStack solution with Oracle Real Application Clusters (RAC) and IBM Spectrum™ Protect.

Fast Data Architectures for Streaming Applications

Why have stream-oriented data systems become so popular, when batch-oriented systems have served big data needs for many years? In this report, author Dean Wampler examines the rise of streaming systems for handling time-sensitive problems—such as detecting fraudulent financial activity as it happens. You’ll explore the characteristics of fast data architectures, along with several open source tools for implementing them. Batch-mode processing isn’t going away, but exclusive use of these systems is now a competitive disadvantage. You’ll learn that, while fast data architectures are much harder to build, they represent the state of the art for dealing with mountains of data that require immediate attention. Learn step-by-step how a basic fast data architecture works Understand why event logs are the core abstraction for streaming architectures, while message queues are the core integration tool Use methods for analyzing infinite data sets, where you don’t have all the data and never will Take a tour of open source streaming engines, and discover which ones work best for different use cases Get recommendations for making real-world streaming systems responsive, resilient, elastic, and message driven Explore an example streaming application for the IoT: telemetry ingestion and anomaly detection for home automation systems

Microsoft SQL Server 2016: A Beginner's Guide, Sixth Edition, 6th Edition

Up-to-date Microsoft SQL Server 2016 skills made easy! Get up and running on Microsoft SQL Server 2016 in no time with help from this thoroughly revised, practical resource. The book offers thorough coverage of SQL management and development and features full details on the newest business intelligence, reporting, and security features. Filled with new real-world examples and hands-on exercises, Microsoft SQL Server 2016: A Beginner's Guide, Sixth Edition , starts by explaining fundamental relational database system concepts. From there, you will learn how to write Transact-SQL statements, execute simple and complex database queries, handle system administration and security, and use the powerful analysis and BI tools. XML, spatial data, and full-text search are also covered in this step-by-step tutorial. · Revised from the ground up to cover the latest version of SQL Server · Ideal both as a self-study guide and a classroom textbook · Written by a prominent professor and best-selling author

Implementing the IBM Storwize V5000 Gen2 (including the Storwize V5010, V5020, and V5030)

Organizations of all sizes face the challenge of managing massive volumes of increasingly valuable data. But storing this data can be costly, and extracting value from the data is becoming more difficult. IT organizations have limited resources but must stay responsive to dynamic environments and act quickly to consolidate, simplify, and optimize their IT infrastructures. The IBM® Storwize® V5000 Gen2 system provides a smarter solution that is affordable, easy to use, and self-optimizing, which enables organizations to overcome these storage challenges. The Storwize V5000 Gen2 delivers efficient, entry-level configurations that are designed to meet the needs of small and midsize businesses. Designed to provide organizations with the ability to consolidate and share data at an affordable price, the Storwize V5000 Gen2 offers advanced software capabilities that are found in more expensive systems. This IBM Redbooks® publication is intended for pre-sales and post-sales technical support professionals and storage administrators. It applies to the Storwize V5030, V5020, and V5010. The concepts in this book also relate to the IBM Storwize V3700 where applicable.