Analytics

Learning Elasticsearch

2017-06-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Abhishek Andhavarapu

API Cloud Computing ELK Kibana data data-engineering elasticsearch search

This comprehensive guide to Elasticsearch will teach you how to build robust and scalable search and analytics applications using Elasticsearch 5.x. You will learn the fundamentals of Elasticsearch, including its APIs and tools, and how to apply them to real-world problems. By the end of the book, you will have a solid grasp of Elasticsearch and be ready to implement your own solutions. What this Book will help me do Master the setup and configuration of Elasticsearch and Kibana. Learn to efficiently query and analyze both structured and unstructured data. Understand how to use Elasticsearch aggregations to perform advanced analytics. Gain knowledge of advanced search features including geospatial queries and autocomplete. Explore the Elastic Stack and learn deployment best practices and cloud hosting options. Author(s) None Andhavarapu is an expert in database technology and distributed systems, with years of experience in Elasticsearch. Their passion for search technologies is reflected in their clear and practical teaching style. They've written this guide to help developers of all levels get up to speed with Elasticsearch quickly and comprehensively. Who is it for? This book is perfect for software developers looking to implement effective search and analytics solutions. It's ideal for those who are new to Elasticsearch as well as for professionals familiar with other search tools like Lucene or Solr. The book assumes basic programming knowledge but no prior experience with Elasticsearch.

SQL Server 2017 Integration Services Cookbook

2017-06-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Christian Cote , Dejan Sarka , Matija Lah

DWH ETL/ELT SQL SSIS data data-engineering microsoft-sql-server relational-databases

SQL Server 2017 Integration Services Cookbook is your key to mastering effective data integration and transformation solutions using SSIS 2017. Through clear, concise recipes, this book teaches the advanced ETL techniques necessary for creating efficient data workflows, leveraging both traditional and modern data platforms. What this Book will help me do Master the integration of diverse data sources into comprehensive data models. Develop optimized ETL workflows that improve operational efficiency. Leverage the new features introduced in SQL Server 2017 for enhanced data processing. Implement scalable data warehouse solutions suitable for modern analytics workloads. Customize and extend integration services to handle specific data transformation needs. Author(s) The authors are seasoned professionals in data integration and ETL technologies. They bring years of real-world experience using SQL Server Integration Services in various enterprise scenarios. Their combined expertise ensures practical insights and guidance, making complex concepts accessible to learners and practitioners alike. Who is it for? This book is ideal for data engineers and ETL developers who already understand the basics of SQL Server and want to master advanced data integration techniques. It is also suitable for database administrators and data analysts aiming to enhance their skill set with efficient ETL processes. Arm yourself with this guide to learn not just the how, but also the why, behind successful data transformations.

Advanced Analytics with Spark, 2nd Edition

2017-06-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sandy Ryza (Databricks) , Sean Owen (Databricks) , Josh Wills , Uri Laserson

AI/ML Data Science Java Python Scala Cyber Security Spark apache-spark data data-engineering

In the second edition of this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. Updated for Spark 2.1, this edition acts as an introduction to these techniques and other best practices in Spark programming. You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—including classification, clustering, collaborative filtering, and anomaly detection—to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you’ll find the book’s patterns useful for working on your own data applications. With this book, you will: Familiarize yourself with the Spark programming model Become comfortable within the Spark ecosystem Learn general approaches in data science Examine complete implementations that analyze large public data sets Discover which machine learning tools make sense for particular problems Acquire code that can be adapted to many uses

Apache Spark 2.x Cookbook

2017-05-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Rishi Yadav (Roost.ai)

AI/ML Big Data Cloud Computing Data Analytics Kafka Scala Spark Data Streaming apache-spark data data-engineering

Discover how to harness the power of Apache Spark 2.x for your Big Data processing projects. In this book, you will explore over 70 cloud-ready recipes that will guide you to perform distributed data analytics, structured streaming, machine learning, and much more. What this Book will help me do Effectively install and configure Apache Spark with various cluster managers and platforms. Set up and utilize development environments tailored for Spark applications. Operate on schema-aware data using RDDs, DataFrames, and Datasets. Perform real-time streaming analytics with sources such as Apache Kafka. Leverage MLlib for supervised learning, unsupervised learning, and recommendation systems. Author(s) None Yadav is a seasoned data engineer with a deep understanding of Big Data tools and technologies, particularly Apache Spark. With years of experience in the field of distributed computing and data analysis, Yadav brings practical insights and techniques to enrich the learning experience of readers. Who is it for? This book is ideal for data engineers, data scientists, and Big Data professionals who are keen to enhance their Apache Spark 2.x skills. If you're working with distributed processing and want to solve complex data challenges, this book addresses practical problems. Note that a basic understanding of Scala is recommended to get the most out of this resource.

Tabular Modeling in Microsoft SQL Server Analysis Services, Second Edition

2017-04-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alberto Ferrari , Marco Russo

Agile/Scrum BI Data Modelling Microsoft PowerShell Cyber Security SQL SQL Server data data-engineering microsoft-sql-server relational-databases

Build agile and responsive business intelligence solutions Create a semantic model and analyze data using the tabular model in SQL Server 2016 Analysis Services to create corporate-level business intelligence (BI) solutions. Led by two BI experts, you will learn how to build, deploy, and query a tabular model by following detailed examples and best practices. This hands-on book shows you how to use the tabular model’s in-memory database to perform rapid analytics—whether you are new to Analysis Services or already familiar with its multidimensional model. Discover how to: • Determine when a tabular or multidimensional model is right for your project • Build a tabular model using SQL Server Data Tools in Microsoft Visual Studio 2015 • Integrate data from multiple sources into a single, coherent view of company information • Choose a data-modeling technique that meets your organization’s performance and usability requirements • Implement security by establishing administrative and data user roles • Define and implement partitioning strategies to reduce processing time • Use Tabular Model Scripting Language (TMSL) to execute and automate administrative tasks • Optimize your data model to reduce the memory footprint for VertiPaq • Choose between in-memory (VertiPaq) and pass-through (DirectQuery) engines for tabular models • Select the proper hardware and virtualization configurations • Deploy and manipulate tabular models from C# and PowerShell using AMO and TOM libraries Get code samples, including complete apps, at: https://aka.ms/tabular/downloads About This Book • For BI professionals who are new to SQL Server 2016 Analysis Services or already familiar with previous versions of the product, and who want the best reference for creating and maintaining tabular models. • Assumes basic familiarity with database design and business analytics concepts.

Mastering Spark for Data Science

2017-03-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Matthew Hallett , David George , Antoine Amend (Databricks) , Andrew Morgan

AI/ML API Big Data Data Science Spark SQL Data Streaming apache-spark data data-engineering

Master the techniques and sophisticated analytics used to construct Spark-based solutions that scale to deliver production-grade data science products About This Book Develop and apply advanced analytical techniques with Spark Learn how to tell a compelling story with data science using Spark’s ecosystem Explore data at scale and work with cutting edge data science methods Who This Book Is For This book is for those who have beginner-level familiarity with the Spark architecture and data science applications, especially those who are looking for a challenge and want to learn cutting edge techniques. This book assumes working knowledge of data science, common machine learning methods, and popular data science tools, and assumes you have previously run proof of concept studies and built prototypes. What You Will Learn Learn the design patterns that integrate Spark into industrialized data science pipelines See how commercial data scientists design scalable code and reusable code for data science services Explore cutting edge data science methods so that you can study trends and causality Discover advanced programming techniques using RDD and the DataFrame and Dataset APIs Find out how Spark can be used as a universal ingestion engine tool and as a web scraper Practice the implementation of advanced topics in graph processing, such as community detection and contact chaining Get to know the best practices when performing Extended Exploratory Data Analysis, commonly used in commercial data science teams Study advanced Spark concepts, solution design patterns, and integration architectures Demonstrate powerful data science pipelines In Detail Data science seeks to transform the world using data, and this is typically achieved through disrupting and changing real processes in real industries. In order to operate at this level you need to build data science solutions of substance –solutions that solve real problems. Spark has emerged as the big data platform of choice for data scientists due to its speed, scalability, and easy-to-use APIs. This book deep dives into using Spark to deliver production-grade data science solutions. This process is demonstrated by exploring the construction of a sophisticated global news analysis service that uses Spark to generate continuous geopolitical and current affairs insights.You will learn all about the core Spark APIs and take a comprehensive tour of advanced libraries, including Spark SQL, Spark Streaming, MLlib, and more. You will be introduced to advanced techniques and methods that will help you to construct commercial-grade data products. Focusing on a sequence of tutorials that deliver a working news intelligence service, you will learn about advanced Spark architectures, how to work with geographic data in Spark, and how to tune Spark algorithms so they scale linearly. Style and approach This is an advanced guide for those with beginner-level familiarity with the Spark architecture and working with Data Science applications. Mastering Spark for Data Science is a practical tutorial that uses core Spark APIs and takes a deep dive into advanced libraries including: Spark SQL, visual streaming, and MLlib. This book expands on titles like: Machine Learning with Spark and Learning Spark. It is the next learning curve for those comfortable with Spark and looking to improve their skills.

Learning Apache Spark 2

2017-03-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Muhammad Asif Abbasi

AI/ML Big Data Data Analytics Scala Spark SQL Data Streaming apache-spark data data-engineering

Dive into the world of Big Data with "Learning Apache Spark 2". This book introduces you to the powerful Apache Spark framework, tailored for real-time data analytics and machine learning. Through practical examples and real-world use-cases, you'll gain hands-on experience in leveraging Spark's capabilities for your data processing needs. What this Book will help me do Master the fundamentals of Apache Spark 2 and its new features. Effectively use Spark SQL, MLlib, RDDs, GraphX, and Spark Streaming to tackle real-world challenges. Gain skills in data processing, transformation, and analysis with Spark. Deploy and operate your Spark applications in clustered environments. Develop your own recommendation engines and predictive analytics models with Spark. Author(s) None Abbasi brings a wealth of expertise in Big Data technologies with a keen focus on simplifying complex concepts for learners. With substantial experience working in data processing frameworks, their approach to teaching creates an engaging and practical learning experience. With "Learning Apache Spark 2", None empowers readers to confidently tackle challenges in Big Data processing and analytics. Who is it for? This book is ideal for aspiring Big Data professionals seeking an accessible introduction to Apache Spark. Beginners in Spark will find step-by-step guidance, while those familiar with earlier versions will appreciate the insights into Spark 2's new features. Familiarity with Big Data concepts and Scala programming is recommended for optimal understanding.

SQL Server 2016 Developer's Guide

2017-03-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Milo≈° Radivojeviƒá , William Durkin , Dejan Sarka

JSON Cyber Security SQL data data-engineering microsoft-sql-server relational-databases

SQL Server 2016 Developer's Guide provides an in-depth overview of the new features and enhancements introduced in SQL Server 2016 that can significantly improve your development process. This book covers robust techniques for building high-performance, secure database applications while leveraging cutting-edge functionalities such as Stretch Database, temporal tables, and enhanced In-Memory OLTP capabilities. What this Book will help me do Master the new development features introduced in SQL Server 2016 and understand their applications. Use In-Memory OLTP enhancements to significantly boost application performance. Efficiently manage and analyze data using temporal tables and JSON integration. Explore SQL Server security enhancements to ensure data safety and access control. Gain insights into integrating R with SQL Server 2016 for advanced analytics. Author(s) None Radivojević, Dejan Sarka, and William Durkin are experienced database developers and architects with a strong focus on SQL Server technologies. They bring years of practical experience and a clear, insightful approach to teaching complex concepts. Their expertise shines in this comprehensive guide, providing readers with both foundational knowledge and advanced techniques. Who is it for? This guide is perfect for database developers and solution architects looking to harness the full potential of SQL Server 2016's new features. It's intended for professionals with prior experience in SQL Server or similar platforms who aim to develop efficient, high-performance applications. You'll benefit from this book if you are keen to master SQL Server 2016 and elevate your development skills.

Mastering Elastic Stack

2017-02-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ravi Kumar Gupta , Yuvraj Gupta

Data Analytics ELK Kibana Logstash Cyber Security data data-engineering elastic-stack-elk-stack elastic stack (elk stack) elasticsearch search

Mastering Elastic Stack is your complete guide to advancing your data analytics expertise using the ELK Stack. With detailed coverage of Elasticsearch, Logstash, Kibana, Beats, and X-Pack, this book equips you with the skills to process and analyze any type of data efficiently. Through practical examples and real-world scenarios, you'll gain the ability to build end-to-end pipelines and create insightful dashboards. What this Book will help me do Build and manage log pipelines using Logstash, Beats, and Elasticsearch for real-time analytics. Develop advanced Kibana dashboards to visualize and interpret complex datasets. Efficiently utilize X-Pack features for alerting, monitoring, and security in the Elastic Stack. Master plugin customization and deployment for a tailored Elastic Stack environment. Apply Elastic Stack solutions to real-world cases for centralized logging and actionable insights. Author(s) The authors, None Kumar Gupta and None Gupta, are experienced technologists who have spent years working at the forefront of data processing and analytics. They are well-versed in Elasticsearch, Logstash, Kibana, and the Elastic ecosystem, having worked extensively in enterprise environments where these tools have transformed operations. Their passion for teaching and thorough understanding of the tools culminate in this comprehensive resource. Who is it for? The ideal reader is a developer already familiar with Elasticsearch, Logstash, and Kibana who wants to deepen their understanding of the stack. If you're involved in creating scalable data pipelines, analyzing complex datasets, or looking to implement centralized logging solutions in your work, this book is an excellent resource. It bridges the gap from intermediate to expert knowledge, allowing you to use the Elastic Stack effectively in various scenarios. Whether you are transitioning from a beginner or enhancing your skill set, this book meets your needs.

Mastering Elasticsearch 5.x - Third Edition

2017-02-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Bharvi Dixit

Big Data ELK data data-engineering elasticsearch search

This comprehensive guide dives deep into the functionalities of Elasticsearch 5, the widely-used search and analytics engine. Leveraging the power of Apache Lucene, this book will help you understand advanced concepts like querying, indexing, and cluster management to build efficient and scalable search solutions. What this Book will help me do Master advanced features of Elasticsearch such as text scoring, sharding, and aggregation. Understand how to handle big data efficiently using Elasticsearch's architecture. Learn practical implementation techniques for Elasticsearch features through hands-on examples. Develop custom plugins for Elasticsearch to tailor its functionalities to specific needs. Scale and optimize Elasticsearch clusters for high performance in production environments. Author(s) Bharvi Dixit is an experienced software engineer and a recognized expert in implementing Elasticsearch solutions. With a strong background in distributed systems and database management, Bharvi's writing is informed by real-world experience and a focus on practical applications. Who is it for? This book is ideal for developers and data engineers with existing experience in Elasticsearch who wish to deepen their knowledge. It serves as a valuable resource for professionals tasked with creating scalable search applications. A working understanding of Elasticsearch basics and query DSL is recommended to fully benefit from this guide.

Geospatial Data and Analysis

2017-02-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jon Bruner , Bill Day , Aurelia Moser

Big Data GIS Hadoop IoT data data-engineering geographic-information-system-gis geographic information system (gis) location-data

Geospatial data, or data with location information, is generated in huge volumes every day by billions of mobile phones, IoT sensors, drones, nanosatellites, and many other sources in an unending stream. This practical ebook introduces you to the landscape of tools and methods for making sense of all that data, and shows you how to apply geospatial analytics to a variety of issues, large and small. Authors Aurelia Moser, Jon Bruner, and Bill Day provide a complete picture of the geospatial analysis options available, including low-scale commercial desktop GIS tools, medium-scale options such as PostGIS and Lucene-based searching, and true big data solutions built on technologies such as Hadoop. You’ll learn when it makes sense to move from one type of solution to the next, taking increased costs and complexity into account. Explore the structure of basic webmaps, and the challenges and constraints involved when working with geo data Dive into low- to medium-scale mapping tools for use in backend and frontend web development Focus on tools for robust medium-scale geospatial projects that don’t quite justify a big data solution Learn about innovative platforms and software packages for solving issues of processing and storage of large-scale data Examine geodata analysis use cases, including disaster relief, urban planning, and agriculture and environmental monitoring

Elasticsearch 5.x Cookbook - Third Edition

2017-02-06 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alberto Paro

Big Data ELK Java JSON data data-engineering elasticsearch search

Elasticsearch 5.x Cookbook is a comprehensive guide that teaches you how to leverage the full power of Elasticsearch for high-performance search and analytics. Through step-by-step recipes, you'll explore deployment, query building, plugin integration, and advanced analytics, ensuring you can manage and scale Elasticsearch like a pro. What this Book will help me do Understand and deploy complex Elasticsearch cluster topologies for optimal performance. Create tailored mappings to gain finer control over data indexing and retrieval. Design and execute advanced queries and analytics using Elasticsearch capabilities. Integrate Elasticsearch with popular programming languages and big data platforms. Monitor and improve Elasticsearch cluster health using the best practices and tools. Author(s) Alberto Paro is a seasoned software engineer and data scientist with extensive experience in distributed systems and search technologies. Having worked on numerous search-related projects, he brings practical, real-world insights to his writing. Alberto is passionate about teaching and simplifying complex concepts, making this book both approachable and expertly detailed. Who is it for? This book is ideal for developers or data engineers seeking to utilize Elasticsearch for advanced search and analytics tasks. If you have some prior knowledge of JSON and programming concepts, particularly Java, you will benefit most from this material. Whether you're looking to integrate Elasticsearch into your systems or to optimize its usage, this book caters to your needs.

Tabular Modeling with SQL Server 2016 Analysis Services Cookbook

2017-01-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Derek Wilson

BI Data Analytics DAX SQL SSAS data data-engineering microsoft-sql-server relational-databases

With "Tabular Modeling with SQL Server 2016 Analysis Services Cookbook," you'll discover how to harness the full potential of the latest Tabular models in SQL Server Analysis Services (SSAS). This practical guide equips data professionals with the tools, techniques, and knowledge to optimize data analytics and deliver fast, reliable, and impactful business insights. What this Book will help me do Understand the fundamentals of Tabular modeling and its advantages over traditional methods. Use SQL Server 2016 SSAS features to build and deploy Tabular models tailored to business needs. Master DAX for creating powerful calculated fields and optimized measures. Administer and secure your models effectively, ensuring robust BI solutions. Optimize performance and explore advanced features in Tabular solutions for maximum efficiency. Author(s) None Wilson is an experienced SQL BI professional with a strong background in database modeling and analytics. With years of hands-on experience in developing BI solutions, Wilson takes a practical and straightforward teaching approach. Their guidance in this book makes the complex topics of Tabular modeling and SSAS accessible to both seasoned professionals and newcomers to the field. Who is it for? This book is tailored for SQL BI professionals, database architects, and data analysts aiming to leverage Tabular models in SQL Server Analysis Services. It caters to those familiar with database management and basic BI concepts who are eager to improve their analysis solutions. It's a valuable resource if you aim to gain expertise in using tabular modeling for business intelligence.

IBM DS8880 Architecture and Implementation (Release 8.2.1)

2017-01-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Bjoern Wesselbaum , Kerstin Blum , Peter Kimmel , Andre Coelho , Sherry Brunson , Bert Dufrasne , Jeffery Cook

IBM data data-engineering

This IBM® Redbooks® publication describes the concepts, architecture, and implementation of the IBM DS8880 family. The book provides reference information to assist readers who need to plan for, install, and configure the DS8880 systems. The IBM DS8000® family is a high-performance, high-capacity, highly secure, and resilient series of disk storage systems. The DS8880 family is the latest and most advanced of the DS8000 offerings to date. The high availability, multiplatform support, including IBM z Systems®, and simplified management tools help provide a cost-effective path to an on-demand world. The IBM DS8880 family now offers business-critical, all-flash, and hybrid data systems that span a wide range of price points: DS8884 -- Business Class DS8886 -- Enterprise Class DS8888 -- Analytics Class The DS8884 and DS8886 are available as either hybrid models, or can be configured as all-flash. Each model represents the most recent in this series of high-performance, high-capacity, flexible, and resilient storage systems. These systems are intended to address the needs of the most demanding clients. Two powerful IBM POWER8® processor-based servers manage the cache to streamline disk I/O, maximizing performance and throughput. These capabilities are further enhanced with the availability of the second generation of high-performance flash enclosures (HPFEs Gen-2). Like its predecessors, the DS8880 supports advanced disaster recovery (DR) solutions, business continuity solutions, and thin provisioning. All disk drives in the DS8880 storage system include the Full Disk Encryption (FDE) feature. The DS8880 can automatically optimize the use of each storage tier, particularly flash drives and flash cards, through the IBM Easy Tier® feature. The DS8880 also includes the Copy Services Manager code and allows for easier integration in a Lightweight Directory Access Protocol (LDAP) infrastructure.

Introducing and Implementing IBM FlashSystem V9000

2016-12-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Christophe Fagiano , Jon Herd , Detlef Helmbrecht , Carsten Larsen , Renato Santos , Jeffrey Irving , James Thompson , Jana Jamsek

Cloud Computing Data Management IBM SAS Cyber Security data data-engineering

The success or failure of businesses often depends on how well organizations use their data assets for competitive advantage. Deeper insights from data require better information technology. As organizations modernize their IT infrastructure to boost innovation rather than limit it, they need a data storage system that can keep pace with highly virtualized environments, cloud computing, mobile and social systems of engagement, and in-depth, real-time analytics. Making the correct decision on storage investment is critical. Organizations must have enough storage performance and agility to innovate as they need to implement cloud-based IT services, deploy virtual desktop infrastructure, enhance fraud detection, and use new analytics capabilities. At the same time, future storage investments must lower IT infrastructure costs while helping organizations to derive the greatest possible value from their data assets. The IBM® FlashSystem V9000 is the premier, fully integrated, Tier 1, all-flash offering from IBM. It has changed the economics of today’s data center by eliminating storage bottlenecks. Its software-defined storage features simplify data management, improve data security, and preserve your investments in storage. The IBM FlashSystem® V9000 SAS expansion enclosures provide new tiering options with read-intensive SSDs or nearline SAS HDDs. IBM FlashSystem V9000 includes IBM FlashCore® technology and advanced software-defined storage available in one solution in a compact 6U form factor. IBM FlashSystem V9000 improves business application availability. It delivers greater resource utilization so you can get the most from your storage resources, and achieve a simpler, more scalable, and cost-efficient IT Infrastructure. This IBM Redbooks® publication provides information about IBM FlashSystem V9000 Software V7.7 and introduces the recently announced V7.8. It describes the product architecture, software, hardware, and implementation, and provides hints and tips. It illustrates use cases and independent software vendor (ISV) scenarios that demonstrate real-world solutions, and also provides examples of the benefits gained by integrating the IBM FlashSystem storage into business environments. This book offers IBM FlashSystem V9000 scalability concepts and guidelines for planning, installing, and configuring, which can help environments scale up and out to add more flash capacity and expand virtualized systems. Port utilization methodologies are provided to help you maximize the full potential of IBM FlashSystem V9000 performance and low latency in your scalable environment. This book is intended for pre-sales and post-sales technical support professionals, storage administrators, and anyone who wants to understand how to implement this exciting technology.

Apache Spark for Data Science Cookbook

2016-12-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Padma Priya Chitturi

AI/ML Big Data Data Analytics Data Science NLP NumPy Pandas SciPy Spark apache-spark data data-engineering

In "Apache Spark for Data Science Cookbook," you'll delve into solving real-world analytical challenges using the robust Apache Spark framework. This book features hands-on recipes that cover data analysis, distributed machine learning, and real-time data processing. You'll gain practical skills to process, visualize, and extract insights from large datasets efficiently. What this Book will help me do Master using Apache Spark for processing and analyzing large-scale datasets effectively. Harness Spark's MLLib for implementing machine learning algorithms like classification and clustering. Utilize libraries such as NumPy, SciPy, and Pandas in conjunction with Spark for numerical computations. Apply techniques like Natural Language Processing and text mining using Spark-integrated tools. Perform end-to-end data science workflows, including data exploration, modeling, and visualization. Author(s) Nagamallikarjuna Inelu and None Chitturi bring their extensive experience working with data science and distributed computing frameworks like Apache Spark. Nagamallikarjuna specializes in applying machine learning algorithms to big data problems, while None has contributed to various big data system implementations. Together, they focus on providing practitioners with practical and efficient solutions. Who is it for? This book is primarily intended for novice and intermediate data scientists and analysts who are curious about using Apache Spark to tackle data science problems. Readers are expected to have some familiarity with basic data science tasks. If you want to learn practical applications of Spark in data analysis and enhance your big data analytics skills, this resource is for you.

Practical Data Science with Hadoop® and Spark: Designing and Building Effective Analytics at Scale

2016-12-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Casey Stella , Douglas Eadline , Ofer Mendelevitch

AI/ML Big Data Data Quality Data Science Hadoop HDFS Hive NLP Spark data data-engineering

The Complete Guide to Data Science with Hadoop—For Technical Professionals, Businesspeople, and Students Demand is soaring for professionals who can solve real data science problems with Hadoop and Spark. is your complete guide to doing just that. Drawing on immense experience with Hadoop and big data, three leading experts bring together everything you need: high-level concepts, deep-dive techniques, real-world use cases, practical applications, and hands-on tutorials. Practical Data Science with Hadoop® and Spark The authors introduce the essentials of data science and the modern Hadoop ecosystem, explaining how Hadoop and Spark have evolved into an effective platform for solving data science problems at scale. In addition to comprehensive application coverage, the authors also provide useful guidance on the important steps of data ingestion, data munging, and visualization. Once the groundwork is in place, the authors focus on specific applications, including machine learning, predictive modeling for sentiment analysis, clustering for document analysis, anomaly detection, and natural language processing (NLP). This guide provides a strong technical foundation for those who want to do practical data science, and also presents business-driven guidance on how to apply Hadoop and Spark to optimize ROI of data science initiatives. Learn What data science is, how it has evolved, and how to plan a data science career How data volume, variety, and velocity shape data science use cases Hadoop and its ecosystem, including HDFS, MapReduce, YARN, and Spark Data importation with Hive and Spark Data quality, preprocessing, preparation, and modeling Visualization: surfacing insights from huge data sets Machine learning: classification, regression, clustering, and anomaly detection Algorithms and Hadoop tools for predictive modeling Cluster analysis and similarity functions Large-scale anomaly detection NLP: applying data science to human language

Implementing IBM FlashSystem 900

2016-11-18 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Karen Orlando , Jon Herd , Detlef Helmbrecht , Carsten Larsen , Ingo Dimmer , Matt Levan

Cloud Computing IBM data data-engineering

Today’s global organizations depend on being able to unlock business insights from massive volumes of data. Now, with IBM® FlashSystem 900, powered by IBM FlashCore™ technology, they can make faster decisions based on real-time insights and unleash the power of the most demanding applications, including online transaction processing (OLTP) and analytics databases, virtual desktop infrastructures (VDIs), technical computing applications, and cloud environments. This IBM Redbooks® publication introduces clients to the IBM FlashSystem® 900. It provides in-depth knowledge of the product architecture, software and hardware, implementation, and hints and tips. Also illustrated are use cases that show real-world solutions for tiering, flash-only, and preferred-read, and also examples of the benefits gained by integrating the FlashSystem storage into business environments. This book is intended for pre-sales and post-sales technical support professionals and storage administrators, and for anyone who wants to understand how to implement this new and exciting technology. This book describes the following offerings of the IBM Spectrum™ Storage family: IBM Spectrum Storage™ IBM Spectrum Control™ IBM Spectrum Virtualize™ IBM Spectrum Scale™ IBM Spectrum Accelerate™

The Big Data Transformation

2016-11-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ashish Thusoo

Big Data Data Analytics DWH Hadoop Marketing Vertica data data-engineering

Business executives today are well aware of the power of data, especially for gaining actionable insight into products and services. But how do you jump into the big data analytics game without spending millions on data warehouse solutions you don’t need? This 40-page report focuses on massively parallel processing (MPP) analytical databases that enable you to run queries and dashboards on a variety of business metrics at extreme speed and Exabyte scale. Because they leverage the full computational power of a cluster, MPP analytical databases can analyze massive volumes of data—both structured and semi-structured—at unprecedented speeds. This report presents five real-world case studies from Etsy, Cerner Corporation, Criteo and other global enterprises to focus on one big data analytics platform in particular, HPE Vertica. You’ll discover: How one prominent data storage company convinced both business and tech stakeholders to adopt an MPP analytical database Why performance marketing technology company Criteo used a Center of Excellence (CoE) model to ensure the success of its big data analytics endeavors How YPSM uses Vertica to speed up its Hadoop-based data processing environment Why Cerner adopted an analytical database to scale its highly successful health information technology platform How Etsy drives success with the company’s big data initiative by avoiding common technical and organizational mistakes

Oracle R Enterprise: Harnessing the Power of R in Oracle Database

2016-11-04 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Brendan Tierney

API Big Data Hadoop Oracle R SQL data data-engineering oracle-database-solutions

Master the Big Data Capabilities of Oracle R Enterprise Effectively manage your enterprise’s big data and keep complex processes running smoothly using the hands-on information contained in this Oracle Press guide. Oracle R Enterprise: Harnessing the Power of R in Oracle Database shows, step-by-step, how to create and execute large-scale predictive analytics and maintain superior performance. Discover how to explore and prepare your data, accurately model business processes, generate sophisticated graphics, and write and deploy powerful scripts. You will also find out how to effectively incorporate Oracle R Enterprise features in APEX applications, OBIEE dashboards, and Apache Hadoop systems. Learn to: • Install, configure, and administer Oracle R Enterprise • Establish connections and move data to the database • Create Oracle R Enterprise packages and functions • Use the R language to work with data in Oracle Database • Build models using ODM, ORE, and other algorithms • Develop and deploy R scripts and use the R script repository • Execute embedded R scripts and employ ORE SQL API functions • Map and manipulate data using Oracle R Advanced Analytics for Hadoop • Use ORE in Oracle Data Miner, OBIEE, and other applications

talk-data.com

Activity Trend

Top Events

Top Speakers

Learning Elasticsearch

SQL Server 2017 Integration Services Cookbook

Advanced Analytics with Spark, 2nd Edition

Apache Spark 2.x Cookbook

Tabular Modeling in Microsoft SQL Server Analysis Services, Second Edition

Mastering Spark for Data Science

Learning Apache Spark 2

SQL Server 2016 Developer's Guide

Mastering Elastic Stack

Mastering Elasticsearch 5.x - Third Edition

Geospatial Data and Analysis

Elasticsearch 5.x Cookbook - Third Edition

Tabular Modeling with SQL Server 2016 Analysis Services Cookbook

IBM DS8880 Architecture and Implementation (Release 8.2.1)

Introducing and Implementing IBM FlashSystem V9000

Apache Spark for Data Science Cookbook

Practical Data Science with Hadoop® and Spark: Designing and Building Effective Analytics at Scale

Implementing IBM FlashSystem 900

The Big Data Transformation

Oracle R Enterprise: Harnessing the Power of R in Oracle Database