talk-data.com talk-data.com

Event

O'Reilly Data Engineering Books

2001-10-19 – 2027-05-25 Oreilly Visit website ↗

Activities tracked

395

Collection of O'Reilly books on Data Engineering.

Filtering by: Analytics ×

Sessions & talks

Showing 201–225 of 395 · Newest first

Search within this event →
IBM Power System AC922 Introduction and Technical Overview

This IBM® Redpaper™ publication is a comprehensive guide that covers the IBM Power System AC922 server (8335-GTG and 8335-GTW models). The Power AC922 server is the next generation of the IBM Power processor-based systems, which are designed for deep learning and artificial intelligence (AI), high-performance analytics, and high-performance computing (HPC). This paper introduces the major innovative Power AC922 server features and their relevant functions: Powerful IBM POWER9™ processors that offer 16 cores at 2.6 GHz with 3.09 GHz turbo performance or 20 cores at 2.0 GHz with 2.87 GHz turbo for the 8335-GTG Eighteen cores at 2.98 GHz with 3.26 GHz turbo performance or 22 at 2.78 GHz cores with 3.07 GHz turbo for the 8335-GTW IBM Coherent Accelerator Processor Interface (CAPI) 2.0, IBM OpenCAPI™, and second-generation NVIDIA NVLink technology for exceptional processor-to-accelerator intercommunication Up to six dedicated NVIDIA Tesla V100 GPUs This publication is for professionals who want to acquire a better understanding of IBM Power Systems™ products and is intended for the following audiences: Clients Sales and marketing professionals Technical support professionals IBM Business Partners Independent software vendors (ISVs) This paper expands the set of IBM Power Systems documentation by providing a desktop reference that offers a detailed technical description of the Power AC922 server. This paper does not replace the current marketing materials and configuration tools. It is intended as an extra source of information that, together with existing sources, can be used to enhance your knowledge of IBM server solutions.

SQL Server 2017 Developer???s Guide

"SQL Server 2017 Developer's Guide" provides a comprehensive approach to learning and utilizing the new features introduced in SQL Server 2017. From advanced Transact-SQL to integrating R and Python into your database projects, this book equips you with the knowledge to design and develop efficient database applications tailored to modern requirements. What this Book will help me do Master new features in SQL Server 2017 to enhance database application development. Implement In-Memory OLTP and columnstore indexes for optimal performance. Utilize JSON support in SQL Server to integrate modern data formats. Leverage R and Python integration to apply advanced data analytics and machine learning. Learn Linux and container deployment options to expand SQL Server usage scenarios. Author(s) The authors of "SQL Server 2017 Developer's Guide" are industry veterans with extensive experience in database design, business intelligence, and advanced analytics. They bring a practical, hands-on writing style that helps developers apply theoretical concepts effectively. Their commitment to teaching is evident in the clear and detailed guidance provided throughout the book. Who is it for? This book is ideal for database developers and solution architects aiming to build robust database applications with SQL Server 2017. It's a valuable resource for business intelligence developers or analysts seeking to harness SQL Server 2017's advanced features. Some familiarity with SQL Server and T-SQL is recommended to fully leverage the insights provided by this book.

Teradata Cookbook

Are you ready to master Teradata, one of the leading relational database management systems for data warehousing? In the "Teradata Cookbook," you will find over 85 recipes covering vital tasks like querying, performance tuning, and administrative operations. With clear and practical instructions, this book will equip you with the skills necessary to optimize data storage and analytics in your organization. What this Book will help me do Master Teradata's advanced features for efficient data warehousing applications. Understand and employ Teradata SQL for effective data manipulation and analytics. Explore practical solutions for Teradata administration tasks, including user and security management. Learn performance tuning techniques to enhance the efficiency of your queries and processes. Acquire detailed knowledge about Teradata's architecture and its unique capabilities. Author(s) The authors of "Teradata Cookbook" are experienced professionals in database management and data warehousing. With a deep understanding of Teradata's architecture and use in real-world applications, they bring a wealth of knowledge to each of the book's recipes. Their focus is to provide practical, actionable insights to help you tackle challenges you may face. Who is it for? This book is ideal for database administrators, data analysts, and professionals working with data warehousing who want to leverage the power of Teradata. Whether you are new to this database management system or looking to enhance your expertise, this cookbook provides practical solutions and in-depth insights, making it an essential resource.

IBM Power Systems Bits: Understanding IBM Patterns for Cognitive Systems

This IBM® Redpaper™ publication addresses IBM Patterns for Cognitive Systems topics to anyone developing, implementing, and using Cognitive Solutions on IBM Power Systems™ servers. Moreover, this publication provides documentation to transfer the knowledge to the sales and technical teams. This publication describes IBM Patterns for Cognitive Systems. Think of a pattern as a use case for a specific scenario, such as event-based real-time marketing for real-time analytics, anti-money laundering, and addressing data oceans by reducing the cost of Hadoop. These examples are just a few of the cognitive patterns that are now available. Patterns identify and address challenges for cognitive infrastructures. These entry points then help you understand where you are on the cognitive journey and enables IBM to demonstrate the set of solutions capabilities for each lifecycle stage. This book targets technical readers, including IT specialist, systems architects, data scientists, developers, and anyone looking for a guide about how to unleash the cognitive capabilities of IBM Power Systems by using patterns.

IBM z14 Technical Guide

Abstract This IBM® Redbooks® publication describes the new member of the IBM Z family, IBM z14®. IBM z14 is the trusted enterprise platform for pervasive encryption, integrating data, transactions, and insights into the data. A data-centric infrastructure must always be available with a 99.999% or better availability, have flawless data integrity, and be secured from misuse. It also must be an integrated infrastructure that can support new applications. Finally, it must have integrated capabilities that can provide new mobile capabilities with real-time analytics that are delivered by a secure cloud infrastructure. IBM z14 servers are designed with improved scalability, performance, security, resiliency, availability, and virtualization. The superscalar design allows z14 servers to deliver a record level of capacity over the prior IBM Z platforms. In its maximum configuration, z14 is powered by up to 170 client characterizable microprocessors (cores) running at 5.2 GHz. This configuration can run more than 146,000 million instructions per second (MIPS) and up to 32 TB of client memory. The IBM z14 Model M05 is estimated to provide up to 35% more total system capacity than the IBM z13® Model NE1. This Redbooks publication provides information about IBM z14 and its functions, features, and associated software support. More information is offered in areas that are relevant to technical planning. It is intended for systems engineers, consultants, planners, and anyone who wants to understand the IBM Z servers functions and plan for their usage. It is intended as an introduction to mainframes. Readers are expected to be generally familiar with existing IBM Z technology and terminology.

Learning Elastic Stack 6.0

Learn how to harness the power of the Elastic Stack 6.0 to manage, analyze, and visualize data effectively. This book introduces you to Elasticsearch, Logstash, Kibana, and other components, helping you build scalable, real-time data processing solutions from scratch. By reading this guide, you'll gain practical insights into the platform's components, including tips for production deployment. What this Book will help me do Understand and utilize the core components of Elastic Stack 6.0, including Elasticsearch, Logstash, and Kibana. Set up scalable data pipelines for ingesting and processing vast amounts of data. Craft real-time data visualizations and analytics using Kibana. Secure and monitor Elastic Stack deployments with X-Pack and other related tools. Deploy Elastic Stack applications effectively in cloud or on-premise production environments. Author(s) Pranav Shukla and Sharath Kumar are experienced professionals with deep knowledge in distributed data systems and the Elastic Stack ecosystem. They are passionate about data analytics and visualization and bring their hands-on experience in building real-world Elastic Stack applications into this book. Their practical approach and explanatory style make complex concepts accessible to readers at all levels. Who is it for? This book is perfect for data professionals who want to analyze large datasets or create effective real-time visualizations. It is suited for those new to Elastic Stack or looking to understand its capabilities. Basic JSON knowledge is recommended, but no prior expertise with Elastic Stack is required to benefit from this practical guide.

Learning Google BigQuery

If you're ready to untap the potential of data analytics in the cloud, 'Learning Google BigQuery' will take you from understanding foundational concepts to mastering advanced techniques of this powerful platform. Through hands-on examples, you'll learn how to query and analyze massive datasets efficiently, develop custom applications, and integrate your results seamlessly with other tools. What this Book will help me do Understand the fundamentals of Google Cloud Platform and how BigQuery operates within it. Migrate enterprise-scale data seamlessly into BigQuery for further analytics. Master SQL techniques for querying large-scale datasets in BigQuery. Enable real-time data analytics and visualization with tools like Tableau and Python. Learn to create dynamic datasets, manage partition tables and use BigQuery APIs effectively. Author(s) None Berlyant, None Haridass, and None Brown are specialists with years of experience in data science, big data platforms, and cloud technologies. They bring their expertise in data analytics and teaching to make advanced concepts accessible. Their hands-on approach and real-world examples ensure readers can directly apply the skills they acquire to practical scenarios. Who is it for? This book is tailored for developers, analysts, and data scientists eager to leverage cloud-based tools for handling and analyzing large-scale datasets. If you seek to gain hands-on proficiency in working with BigQuery or want to enhance your organization's data capabilities, this book is a fit. No prior BigQuery knowledge is needed, just a willingness to learn.

IBM DS8880 Architecture and Implementation (Release 8.3)

Abstract This IBM® Redbooks® publication describes the concepts, architecture, and implementation of the IBM DS8880 family. The book provides reference information to assist readers who need to plan for, install, and configure the DS8880 systems. The IBM DS8000® family is a high-performance, high-capacity, highly secure, and resilient series of disk storage systems. The DS8880 family is the latest and most advanced of the DS8000 offerings to date. The high availability, multiplatform support, including IBM Z, and simplified management tools help provide a cost-effective path to an on-demand and cloud-based infrastructures. The IBM DS8880 family now offers business-critical, all-flash, and hybrid data systems that span a wide range of price points: DS8884 -- Business Class DS8886 -- Enterprise Class DS8888 -- Analytics Class The DS8884 and DS8886 are available as either hybrid models, or can be configured as all-flash. Each model represents the most recent in this series of high-performance, high-capacity, flexible, and resilient storage systems. These systems are intended to address the needs of the most demanding clients. Two powerful IBM POWER8® processor-based servers manage the cache to streamline disk I/O, maximizing performance and throughput. These capabilities are further enhanced with the availability of the second generation of high-performance flash enclosures (HPFEs Gen-2) and newer flash drives. Like its predecessors, the DS8880 supports advanced disaster recovery (DR) solutions, business continuity solutions, and thin provisioning. All disk drives in the DS8880 storage system include the Full Disk Encryption (FDE) feature. The DS8880 can automatically optimize the use of each storage tier, particularly flash drives, by using the IBM Easy Tier® feature.

The Sentient Enterprise

Mohan and Oliver have been very fortunate to have intimate views into the data challenges that face the largest organizations and institutions across every possible industry—and what they have been hearing about for some time is how the business needs to use data and analytics to their advantage. They continually hear the same issues, such as: We're spending valuable meeting time wondering why everyone's data doesn't match up. We can't leverage our economies of scale while remaining agile with data. We need self-serve apps that let the enterprise experiment with data and accelerate the development process. We need to get on a more predictive curve to ensure long-term success. To really address the data concerns of today's enterprise, they wanted to find a way to help enterprises achieve the success they seek. Not as a prescriptive process—but a methodology to become agile and leverage data and analytics to drive a competitive advantage. You know, it's amazing what can happen when two people with very different perspectives get together to solve a big problem. This evolutionary guide resulted from the a-ha moment between these two influencers at the top of their fields—one, an academic researcher and consultant, and the other, a longtime analytics practitioner and chief product officer at Teradata. Together, they created a powerful framework every type of business can use to connect analytic power, business practices, and human dynamics in ways that can transform what is currently possible.

Data Warehousing in the Age of Artificial Intelligence

Nearly 7,000 new mobile applications appear every day, and a constant stream of data gives them life. Many organizations rely on a predictive analytics model to turn data into useful business information and ensure the predictions remain accurate as data changes. It can be a complex, time-consuming process. This book shows how to automate and accelerate that process using machine learning (ML) on a modern data warehouse that runs on any cloud. Product specialists from MemSQL explain how today’s modern data warehouses provide the foundations to implement ML algorithms that run efficiently. Through several real-time use cases, you’ll learn how to quickly identify the right metrics to make actionable business decisions. This book explores foundational ML and artificial intelligence concepts to help you understand: How data warehouses accelerate deployment and simplify manageability How companies make a choice between cloud and on-premises deployments for building data processing applications Ways to build analytics and visualizations for business intelligence on historical data The technologies and architecture for building and deploying real-time data pipelines This book demonstrates specific models and examples for building supervised and unsupervised real-time ML applications, and gives practical advice on how to make the choice between building an ML pipeline or buying an existing solution. If you need to use data accurately and efficiently, a real-time data warehouse is a critical business tool.

Introduction to GPUs for Data Analytics

Moore’s law has finally run out of steam for CPUs. The number of x86 cores that can be placed cost-effectively on a single chip has reached a practical limit, making higher densities prohibitively expensive for most applications. Fortunately, for big data analytics, machine learning, and database applications, a more capable and cost-effective alternative for scaling compute performance is already available: the graphics processing unit, or GPU. In this report, executives at Kinetica and Sierra Communications explain how incorporating GPUs is ideal for keeping pace with the relentless growth in streaming, complex, and large data confronting organizations today. Technology professionals, business analysts, and data scientists will learn how their organizations can begin implementing GPU-accelerated solutions either on premise or in the cloud. This report explores: How GPUs supplement CPUs to enable continued price/performance gains The many database and data analytics applications that can benefit from GPU acceleration Why GPU databases with user-defined functions (UDFs) can simplify and unify the machine learning/deep learning pipeline How GPU-accelerated databases can process streaming data from the Internet of Things and other sources in real time The performance advantage of GPU databases in demanding geospatial analytics applications How cognitive computing—the most compute-intensive application currently imaginable—is now within reach, using GPUs

Practical Real-time Data Processing and Analytics

This book provides a comprehensive guide to real-time data processing and analytics using modern frameworks like Apache Spark, Flink, Storm, and Kafka. Through practical examples and in-depth explanations, you will learn how to implement efficient, scalable, real-time processing pipelines. What this Book will help me do Understand real-time data processing essentials and the technology stack Learn integration of components like Apache Spark and Kafka Master the concepts of stream processing with detailed case studies Gain expertise in developing monitoring and alerting solutions for real-time systems Prepare to implement production-grade real-time data solutions Author(s) Shilpi Saxena and Saurabh Gupta, the authors, are experienced professionals in distributed systems and data engineering, focusing on practical applications of real-time computing. They bring their extensive industry experience to this book, helping readers understand the complexities of real-time data solutions in an approachable and hands-on manner. Who is it for? This book is ideal for software engineers and data engineers with a background in Java who seek to develop real-time data solutions. It is suitable for readers familiar with concepts of real-time data processing, and enhances knowledge in frameworks like Spark, Flink, Storm, and Kafka. Target audience includes learners building production data solutions and those designing distributed analytics engines.

Apache Spark 2.x Machine Learning Cookbook

This book is your gateway to mastering machine learning with Apache Spark 2.x. Through detailed hands-on recipes, you'll delve into building scalable ML models, optimizing big data processes, and enhancing project efficiency. Gain practical knowledge and explore real-world applications of recommendations, clustering, analytics, and more with Spark's powerful capabilities. What this Book will help me do Understand how to integrate Scala and Spark for effective machine learning development. Learn to create scalable recommendation engines using Spark. Master the development of clustering systems to organize unlabelled data at scale. Explore Spark libraries to implement efficient text analytics and search engines. Optimize large-scale data operations, tackling high-dimensional issues with Spark. Author(s) The team of authors brings expertise in machine learning, data science, and Spark technologies. Their combined industry experience and academic knowledge ensure the book is grounded in practical applications while offering theoretical insights. With clear explanations and a step-by-step approach, they aim to simplify complex concepts for developers and data scientists. Who is it for? This book is crafted for Scala developers familiar with machine learning concepts but seeking practical applications with Spark. If you have been implementing models but want to scale them and leverage Spark's robust ecosystem, this guide will serve you well. It is ideal for professionals seeking to deepen their skills in Spark and data science.

Data Warehousing with Greenplum

Relational databases haven’t gone away, but they are evolving to integrate messy, disjointed unstructured data into a cleansed repository for analytics. With the execution of massively parallel processing (MPP), the latest generation of analytic data warehouses is helping organizations move beyond business intelligence to processing a variety of advanced analytic workloads. These MPP databases expose their power with the familiarity of SQL. This report introduces the Greenplum Database, recently released as an open source project by Pivotal Software. Lead author Marshall Presser of Pivotal Data Engineering takes you through the Greenplum approach to data analytics and data-driven decisions, beginning with Greenplum’s shared-nothing architecture. You’ll explore data organization and storage, data loading, running queries, as well as performing analytics in the database. You’ll learn: How each networked node in Greenplum’s architecture features an independent operating system, memory, and storage Four deployment options to help you balance security, cost, and time to usability Ways to organize data, including distribution, storage, partitioning, and loading How to use Apache MADlib for in-database analytics, and GPText to process and analyze free-form text Tools for monitoring, managing, securing, and optimizing query responses available in the Pivotal Greenplum commercial database

Apache Spark 2.x for Java Developers

Delve into mastering big data processing with 'Apache Spark 2.x for Java Developers.' This book provides a practical guide to implementing Apache Spark using the Java APIs, offering a unique opportunity for Java developers to leverage Spark's powerful framework without transitioning to Scala. What this Book will help me do Learn how to process data from formats like XML, JSON, CSV using Spark Core. Implement real-time analytics using Spark Streaming and third-party tools like Kafka. Understand data querying with Spark SQL and master SQL schema processing. Apply machine learning techniques with Spark MLlib to real-world scenarios. Explore graph processing and analytics using Spark GraphX. Author(s) None Kumar and None Gulati, experienced professionals in Java development and big data, bring their wealth of practical experience and passion for teaching to this book. With a clear and concise writing style, they aim to simplify Spark for Java developers, making big data approachable. Who is it for? This book is perfect for Java developers who are eager to expand their skillset into big data processing with Apache Spark. Whether you are a seasoned Spark user or first diving into big data concepts, this book meets you at your level. With practical examples and straightforward explanations, you can unlock the potential of Spark in real-world scenarios.

IBM z14 Technical Introduction

Abstract This IBM® Redpaper Redbooks® publication introduces the latest IBM Z platform, the IBM z14®. It includes information about the Z environment and how it helps integrate data and transactions more securely, and can infuse insight for faster and more accurate business decisions. The z14 is state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to the digital era and the trust economy. These capabilities include: - Securing data with pervasive encryption - Transforming a transactional platform into a data powerhouse - Getting more out of the platform with IT Operational Analytics - Providing resilience with key to zero downtime - Accelerating digital transformation with agile service delivery - Revolutionizing business processes - Blending open source and Z technologies This book explains how this system uses both new innovations and traditional Z strengths to satisfy growing demand for cloud, analytics, and security. With the z14 as the base, applications can run in a trusted, reliable, and secure environment that both improves operations and lessens business risk.

Mastering Apache Spark 2.x - Second Edition

Mastering Apache Spark 2.x is the essential guide to harnessing the power of big data processing. Dive into real-time data analytics, machine learning, and cluster computing using Apache Spark's advanced features and modules like Spark SQL and MLlib. What this Book will help me do Gain proficiency in Spark's batch and real-time data processing with SparkSQL. Master techniques for machine learning and deep learning using SparkML and SystemML. Understand the principles of Spark's graph processing with GraphX and GraphFrames. Learn to deploy Apache Spark efficiently on platforms like Kubernetes and IBM Cloud. Optimize Spark cluster performance by configuring parameters effectively. Author(s) Romeo Kienzler is a seasoned professional in big data and machine learning technologies. With years of experience in cloud-based distributed systems, Romeo brings practical insights into leveraging Apache Spark. He combines his deep technical expertise with a clear and engaging writing style. Who is it for? This book is tailored for intermediate Apache Spark users eager to deepen their knowledge in Spark 2.x's advanced features. Ideal for data engineers and big data professionals seeking to enhance their analytics pipelines with Spark. A basic understanding of Spark and Scala is necessary. If you're aiming to optimize Spark for real-world applications, this book is crafted for you.

Moving Hadoop to the Cloud

Until recently, Hadoop deployments existed on hardware owned and run by organizations. Now, of course, you can acquire the computing resources and network connectivity to run Hadoop clusters in the cloud. But there’s a lot more to deploying Hadoop to the public cloud than simply renting machines. This hands-on guide shows developers and systems administrators familiar with Hadoop how to install, use, and manage cloud-born clusters efficiently. You’ll learn how to architect clusters that work with cloud-provider features—not just to avoid pitfalls, but also to take full advantage of these services. You’ll also compare the Amazon, Google, and Microsoft clouds, and learn how to set up clusters in each of them. Learn how Hadoop clusters run in the cloud, the problems they can help you solve, and their potential drawbacks Examine the common concepts of cloud providers, including compute capabilities, networking and security, and storage Build a functional Hadoop cluster on cloud infrastructure, and learn what the major providers require Explore use cases for high availability, relational data with Hive, and complex analytics with Spark Get patterns and practices for running cloud clusters, from designing for price and security to dealing with maintenance

Learning SAP Analytics Cloud

Discover the power of SAP Analytics Cloud in solving business intelligence challenges through concise and clear instruction. This book is the essential guide for beginners, providing you a comprehensive understanding of the platform's features and capabilities. By the end, you'll master creating reports, models, and dashboards, making data-driven decisions with confidence. What this Book will help me do Learn how to navigate and utilize the SAP Analytics Cloud interface effectively. Create data models using various sources like Excel or text files for comprehensive insights. Design and compile visually engaging stories, reports, and dashboards effortlessly. Master collaborative and presentation tools inside SAP Digital Boardroom. Understand how to plan, predict, and analyze seamlessly within a single platform. Author(s) None Ahmed is an experienced SAP consultant and analytics professional, bringing years of practical experience in BI tools and enterprise analytics. As an expert in SAP Analytics Cloud, None has guided numerous teams in deploying effective analytics solutions. Their writing aims to demystify complex tools for learners. Who is it for? This book is ideal for IT professionals, business analysts, and newcomers eager to understand SAP Analytics Cloud. Beginner-level BI developers and managers seeking guided steps for mastering this platform will find it invaluable. If you aim to enhance your career in cloud-based analytics, this book is tailored for you.

Streaming Data

Streaming Data introduces the concepts and requirements of streaming and real-time data systems. The book is an idea-rich tutorial that teaches you to think about how to efficiently interact with fast-flowing data. About the Technology As humans, we're constantly filtering and deciphering the information streaming toward us. In the same way, streaming data applications can accomplish amazing tasks like reading live location data to recommend nearby services, tracking faults with machinery in real time, and sending digital receipts before your customers leave the shop. Recent advances in streaming data technology and techniques make it possible for any developer to build these applications if they have the right mindset. This book will let you join them. About the Book Streaming Data is an idea-rich tutorial that teaches you to think about efficiently interacting with fast-flowing data. Through relevant examples and illustrated use cases, you'll explore designs for applications that read, analyze, share, and store streaming data. Along the way, you'll discover the roles of key technologies like Spark, Storm, Kafka, Flink, RabbitMQ, and more. This book offers the perfect balance between big-picture thinking and implementation details. What's Inside The right way to collect real-time data Architecting a streaming pipeline Analyzing the data Which technologies to use and when About the Reader Written for developers familiar with relational database concepts. No experience with streaming or real-time applications required. About the Author Andrew Psaltis is a software engineer focused on massively scalable real-time analytics. Quotes The definitive book if you want to master the architecture of an enterprise-grade streaming application. - Sergio Fernandez Gonzalez, Accenture A thorough explanation and examination of the different systems, strategies, and tools for streaming data implementations. - Kosmas Chatzimichalis, Mach 7x A well-structured way to learn about streaming data and how to put it into practice in modern real-time systems. - Giuliano Araujo Bertoti, FATEC This book is all you need to understand what streaming is all about! - Carlos Curotto, Globant

Learning Elasticsearch

This comprehensive guide to Elasticsearch will teach you how to build robust and scalable search and analytics applications using Elasticsearch 5.x. You will learn the fundamentals of Elasticsearch, including its APIs and tools, and how to apply them to real-world problems. By the end of the book, you will have a solid grasp of Elasticsearch and be ready to implement your own solutions. What this Book will help me do Master the setup and configuration of Elasticsearch and Kibana. Learn to efficiently query and analyze both structured and unstructured data. Understand how to use Elasticsearch aggregations to perform advanced analytics. Gain knowledge of advanced search features including geospatial queries and autocomplete. Explore the Elastic Stack and learn deployment best practices and cloud hosting options. Author(s) None Andhavarapu is an expert in database technology and distributed systems, with years of experience in Elasticsearch. Their passion for search technologies is reflected in their clear and practical teaching style. They've written this guide to help developers of all levels get up to speed with Elasticsearch quickly and comprehensively. Who is it for? This book is perfect for software developers looking to implement effective search and analytics solutions. It's ideal for those who are new to Elasticsearch as well as for professionals familiar with other search tools like Lucene or Solr. The book assumes basic programming knowledge but no prior experience with Elasticsearch.

SQL Server 2017 Integration Services Cookbook

SQL Server 2017 Integration Services Cookbook is your key to mastering effective data integration and transformation solutions using SSIS 2017. Through clear, concise recipes, this book teaches the advanced ETL techniques necessary for creating efficient data workflows, leveraging both traditional and modern data platforms. What this Book will help me do Master the integration of diverse data sources into comprehensive data models. Develop optimized ETL workflows that improve operational efficiency. Leverage the new features introduced in SQL Server 2017 for enhanced data processing. Implement scalable data warehouse solutions suitable for modern analytics workloads. Customize and extend integration services to handle specific data transformation needs. Author(s) The authors are seasoned professionals in data integration and ETL technologies. They bring years of real-world experience using SQL Server Integration Services in various enterprise scenarios. Their combined expertise ensures practical insights and guidance, making complex concepts accessible to learners and practitioners alike. Who is it for? This book is ideal for data engineers and ETL developers who already understand the basics of SQL Server and want to master advanced data integration techniques. It is also suitable for database administrators and data analysts aiming to enhance their skill set with efficient ETL processes. Arm yourself with this guide to learn not just the how, but also the why, behind successful data transformations.

Advanced Analytics with Spark, 2nd Edition

In the second edition of this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. Updated for Spark 2.1, this edition acts as an introduction to these techniques and other best practices in Spark programming. You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—including classification, clustering, collaborative filtering, and anomaly detection—to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you’ll find the book’s patterns useful for working on your own data applications. With this book, you will: Familiarize yourself with the Spark programming model Become comfortable within the Spark ecosystem Learn general approaches in data science Examine complete implementations that analyze large public data sets Discover which machine learning tools make sense for particular problems Acquire code that can be adapted to many uses

Apache Spark 2.x Cookbook

Discover how to harness the power of Apache Spark 2.x for your Big Data processing projects. In this book, you will explore over 70 cloud-ready recipes that will guide you to perform distributed data analytics, structured streaming, machine learning, and much more. What this Book will help me do Effectively install and configure Apache Spark with various cluster managers and platforms. Set up and utilize development environments tailored for Spark applications. Operate on schema-aware data using RDDs, DataFrames, and Datasets. Perform real-time streaming analytics with sources such as Apache Kafka. Leverage MLlib for supervised learning, unsupervised learning, and recommendation systems. Author(s) None Yadav is a seasoned data engineer with a deep understanding of Big Data tools and technologies, particularly Apache Spark. With years of experience in the field of distributed computing and data analysis, Yadav brings practical insights and techniques to enrich the learning experience of readers. Who is it for? This book is ideal for data engineers, data scientists, and Big Data professionals who are keen to enhance their Apache Spark 2.x skills. If you're working with distributed processing and want to solve complex data challenges, this book addresses practical problems. Note that a basic understanding of Scala is recommended to get the most out of this resource.

Tabular Modeling in Microsoft SQL Server Analysis Services, Second Edition

Build agile and responsive business intelligence solutions Create a semantic model and analyze data using the tabular model in SQL Server 2016 Analysis Services to create corporate-level business intelligence (BI) solutions. Led by two BI experts, you will learn how to build, deploy, and query a tabular model by following detailed examples and best practices. This hands-on book shows you how to use the tabular model’s in-memory database to perform rapid analytics—whether you are new to Analysis Services or already familiar with its multidimensional model. Discover how to: • Determine when a tabular or multidimensional model is right for your project • Build a tabular model using SQL Server Data Tools in Microsoft Visual Studio 2015 • Integrate data from multiple sources into a single, coherent view of company information • Choose a data-modeling technique that meets your organization’s performance and usability requirements • Implement security by establishing administrative and data user roles • Define and implement partitioning strategies to reduce processing time • Use Tabular Model Scripting Language (TMSL) to execute and automate administrative tasks • Optimize your data model to reduce the memory footprint for VertiPaq • Choose between in-memory (VertiPaq) and pass-through (DirectQuery) engines for tabular models • Select the proper hardware and virtualization configurations • Deploy and manipulate tabular models from C# and PowerShell using AMO and TOM libraries Get code samples, including complete apps, at: https://aka.ms/tabular/downloads About This Book • For BI professionals who are new to SQL Server 2016 Analysis Services or already familiar with previous versions of the product, and who want the best reference for creating and maintaining tabular models. • Assumes basic familiarity with database design and business analytics concepts.