Cassandra

Trino: The Definitive Guide, 2nd Edition

2022-10-03 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Manfred Moser , Matt Fuller , Martin Traverso (Facebook)

Analytics Data Lake Data Lakehouse Delta Hive Iceberg Kafka Oracle SQL Trino data data-engineering +2 more

Perform fast interactive analytics against different data sources using the Trino high-performance distributed SQL query engine. In the second edition of this practical guide, you'll learn how to conduct analytics on data where it lives, whether it's a data lake using Hive, a modern lakehouse with Iceberg or Delta Lake, a different system like Cassandra, Kafka, or SingleStore, or a relational database like PostgreSQL or Oracle. Analysts, software engineers, and production engineers learn how to manage, use, and even develop with Trino and make it a critical part of their data platform. Authors Matt Fuller, Manfred Moser, and Martin Traverso show you how a single Trino query can combine data from multiple sources to allow for analytics across your entire organization. Explore Trino's use cases, and learn about tools that help you connect to Trino for querying and processing huge amounts of data Learn Trino's internal workings, including how to connect to and query data sources with support for SQL statements, operators, functions, and more Deploy and secure Trino at scale, monitor workloads, tune queries, and connect more applications Learn how other organizations apply Trino successfully

Cassandra: The Definitive Guide, (Revised) Third Edition, 3rd Edition

2022-01-23 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Eben Hewitt , Jeff Carpenter

Cloud Computing Data Modelling Docker ELK Kafka Kubernetes Spark data data-engineering nosql-databases

Imagine what you could do if scalability wasn't a problem. With this hands-on guide, you'll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This revised third edition--updated for Cassandra 4.0 and new developments in the Cassandra ecosystem, including deployments in Kubernetes with K8ssandra--provides technical details and practical examples to help you put this database to work in a production environment. Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandra's nonrelational design, with special attention to data modeling. Developers, DBAs, and application architects looking to solve a database scaling issue or future-proof an application will learn how to harness Cassandra's speed and flexibility. Understand Cassandra's distributed and decentralized structure Use the Cassandra Query Language (CQL) and cqlsh (the CQL shell) Create a working data model and compare it with an equivalent relational model Design and develop applications using client drivers Explore cluster topology and learn how nodes exchange data Maintain a high level of performance in your cluster Deploy Cassandra onsite, in the cloud, or with Docker and Kubernetes Integrate Cassandra with Spark, Kafka, Elasticsearch, Solr, and Lucene

Cassandra: The Definitive Guide, 3rd Edition

2020-04-06 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Eben Hewitt , Jeff Carpenter

Data Modelling Java JavaScript Python data data-engineering nosql-databases

Imagine what you could do if scalability wasn't a problem. With this hands-on guide, you’ll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This third edition—updated for Cassandra 4.0—provides the technical details and practical examples you need to put this database to work in a production environment. Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandra’s nonrelational design, with special attention to data modeling. If you’re a developer, DBA, or application architect looking to solve a database scaling issue or future-proof your application, this guide helps you harness Cassandra’s speed and flexibility. Understand Cassandra’s distributed and decentralized structure Use the Cassandra Query Language (CQL) and cqlsh—the CQL shell Create a working data model and compare it with an equivalent relational model Develop sample applications using client drivers for languages including Java, Python, and Node.js Explore cluster topology and learn how nodes exchange data

Big Data Simplified

2019-06-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sayan Goswami , Sourabh Mukherjee , Amit Kumar Das

Big Data Data Science Hadoop Hive IoT Kafka MongoDB NoSQL Python Spark data data-engineering

"Big Data Simplified blends technology with strategy and delves into applications of big data in specialized areas, such as recommendation engines, data science and Internet of Things (IoT) and enables a practitioner to make the right technology choice. The steps to strategize a big data implementation are also discussed in detail. This book presents a holistic approach to the topic, covering a wide landscape of big

data technologies like Hadoop 2.0 and package implementations, such as Cloudera. In-depth discussion of associated technologies, such as MapReduce, Hive, Pig, Oozie, ApacheZookeeper, Flume, Kafka, Spark, Python and NoSQL databases like Cassandra, MongoDB, GraphDB, etc., is also included.

Mastering Apache Cassandra 3.x - Third Edition

2018-10-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tejaswi Malepati , Aaron Ploetz

Analytics Big Data Data Analytics Spark data data-engineering nosql-databases

This expert guide, "Mastering Apache Cassandra 3.x," is designed for individuals looking to achieve scalable and fault-tolerant database deployment using Apache Cassandra. From mastering the foundational components of Cassandra architecture to advanced topics like clustering and analytics integration with Apache Spark, this book equips readers with practical, actionable skills. What this Book will help me do Understand and deploy Apache Cassandra clusters for fault-tolerant and scalable databases. Use advanced features of CQL3 to streamline database queries and operations. Optimize and configure Cassandra nodes to improve performance for demanding applications. Monitor and manage Cassandra clusters effectively using best practices. Combine Cassandra with Apache Spark to build robust data analytics pipelines. Author(s) None Ploetz and None Malepati are experienced technologists and software professionals with extensive expertise in distributed database systems and big data algorithms. They've combined their industry knowledge and teaching backgrounds to create accessible and practical guides for learners worldwide. Their collaborative work is focused on demystifying complex systems for maximum learning impact. Who is it for? This book is ideal for database administrators, software developers, and big data specialists seeking to expand their skill set into scalable data storage using Cassandra. Readers should have a basic understanding of database concepts and some programming experience. If you're looking to design robust databases optimized for modern big data use-cases, this book will serve as a valuable resource.

Designing Fast Data Application Architectures

2018-06-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sean Glover , Stavros Kontopoulos , Gerard Maas

Kafka Spark Data Streaming data data-engineering

Today’s digital companies demand real-time insights and immediate action for everything from purchase to fulfillment, recommendation, and more. As a result, many organizations are adopting fast data applications to accelerate the value they extract from data as it flows into the system. With this practical ebook, you’ll learn the common architectural patterns that form the foundation of successful fast data deployments. Engineers from Lightbend identify the key characteristics of fast data architectures, separate them into functional blocks, and show you how to implement those functions using components like those in the SMACK stack—Spark, Mesos, Akka, Cassandra, and Kafka, as well as others. Architects will learn how to choose, combine, and run SMACK stack technologies to build resilient, scalable, and responsive systems that your company requires. This ebook examines: The anatomy of fast data applications: the application model, streaming data sources, processing engines, and data sinks Functional composition of the SMACK stack and extensions The event backbone that connects all the major components of a fast data platform together Compute engines for transforming data into valuable insights Storage systems that form the transition between the fast data domain and client applications Patterns you can use in the data serving layer, including data-driven microservices Container orchestrators in the substrate layer that provide resources to services, frameworks, and applications

Seven NoSQL Databases in a Week

2018-03-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Xun (Brian) Wu , Sudarshan Kadambi

DynamoDB Apache HBase Java MongoDB Neo4j NoSQL Python RDBMS Redis data data-engineering nosql-databases

Learn the fundamentals of seven essential NoSQL databases in just one week with this book. Covering MongoDB, DynamoDB, Redis, Cassandra, Neo4j, InfluxDB, and HBase, you'll explore their functionalities and practical applications. Designed to give you a working understanding of NoSQL database types, this guide helps aspiring DBAs and developers comprehend and utilize modern data solutions. What this Book will help me do Master the fundamentals of MongoDB, including high-performance, high-availability, and scaling features. Gain hands-on experience with Neo4j to perform database queries and integrate with Python and Java applications. Learn efficient querying with Redis for storage and retrieval tasks. Understand Cassandra's powerful solution for scalable and fault-tolerant systems. Get well-versed with HBase for creating tables, and reading and writing data efficiently. Author(s) Sudarshan Kadambi and Xun (Brian) Wu bring a wealth of experience in database technologies. They have worked extensively in the software development and database management fields. With their practical and concise teaching approach, the authors make complex topics accessible for readers. Who is it for? This book is ideal for budding DBAs and developers looking to understand NoSQL databases. It is particularly useful for those transitioning from relational databases who want to learn about modern database technologies. Suitable for both beginners and those with some database knowledge, it aims to bridge skill gaps and expand the reader's technical expertise.

Complete Guide to Open Source Big Data Stack

2018-01-18 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Michael Frampton

Big Data Cloud Computing Cloud Storage DataViz Hadoop Kafka Spark apache-spark data data-engineering

See a Mesos-based big data stack created and the components used. You will use currently available Apache full and incubating systems. The components are introduced by example and you learn how they work together. In the Complete Guide to Open Source Big Data Stack, the author begins by creating a private cloud and then installs and examines Apache Brooklyn. After that, he uses each chapter to introduce one piece of the big data stack—sharing how to source the software and how to install it. You learn by simple example, step by step and chapter by chapter, as a real big data stack is created. The book concentrates on Apache-based systems and shares detailed examples of cloud storage, release management, resource management, processing, queuing, frameworks, data visualization, and more. What You’ll Learn Install a private cloud onto the local cluster using Apache cloud stack Source, install, and configure Apache: Brooklyn, Mesos, Kafka, and Zeppelin See how Brooklyn can be used to install Mule ESB on a cluster and Cassandra in the cloud Install and use DCOS for big data processing Use Apache Spark for big data stack data processing Who This Book Is For Developers, architects, IT project managers, database administrators, and others charged with developing or supporting a big data system. It is also for anyone interested in Hadoop or big data, and those experiencing problems with data size.

Expert Apache Cassandra Administration

2017-12-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sam R. Alapati

Amazon EC2 Big Data Data Modelling Docker ELK Spark data data-engineering nosql-databases

Follow this handbook to build, configure, tune, and secure Apache Cassandra databases. Start with the installation of Cassandra and move on to the creation of a single instance, and then a cluster of Cassandra databases. Cassandra is increasingly a key player in many big data environments, and this book shows you how to use Cassandra with Apache Spark, a popular big data processing framework. Also covered are day-to-day topics of importance such as the backup and recovery of Cassandra databases, using the right compression and compaction strategies, and loading and unloading data. Expert Apache Cassandra Administration provides numerous step-by-step examples starting with the basics of a Cassandra database, and going all the way through backup and recovery, performance optimization, and monitoring and securing the data. The book serves as an authoritative and comprehensive guide to the building and management of simpleto complex Cassandra databases. The book: Takes you through building a Cassandra database from installation of the software and creation of a single database, through to complex clusters and data centers Provides numerous examples of actual commands in a real-life Cassandra environment that show how to confidently configure, manage, troubleshoot, and tune Cassandra databases Shows how to use the Cassandra configuration properties to build a highly stable, available, and secure Cassandra database that always operates at peak efficiency What You'll Learn Install the Cassandra software and create your first database Understand the Cassandra data model, and the internal architecture of a Cassandra database Create your own Cassandra cluster, step-by-step Run a Cassandra cluster on Docker Work with Apache Spark by connecting to a Cassandra database Deploy Cassandra clusters in your data center, or on Amazon EC2 instances Back up and restore mission-critical Cassandra databases Monitor, troubleshoot, and tune production Cassandra databases, and cut your spending on resources such as memory, servers, and storage Who This Book Is For Database administrators, developers, and architects who are looking for an authoritative and comprehensive single volume for all their Cassandra administration needs. Also for administrators who are tasked with setting up and maintaining highly reliable and high-performing Cassandra databases. An excellent choice for big data administrators, database administrators, architects, and developers who use Cassandra as their key data store, to support high volume online transactions, or as a decentralized, elastic data store.

Learning Apache Cassandra - Second Edition

2017-04-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sandeep Yarabarla , Graham Doman

Java NoSQL RDBMS data data-engineering nosql-databases

Learning Apache Cassandra is an engaging and in-depth guide to understanding the concepts and practical applications of Apache Cassandra, one of the most robust distributed NoSQL databases available. By the end of this book, you will have the necessary skills to design and manage scalable, high-performance database solutions tailored for modern applications. What this Book will help me do Set up Apache Cassandra and its multi-node clusters confidently and efficiently. Master schema design principles, including the use of composite keys, collections, and user-defined types. Implement efficient query strategies with secondary indexes and materialized views. Understand data distribution strategies and tune consistency levels for different application requirements. Dive into advanced topics like user-defined functions, batch operations, and Java client optimizations for scalable database architecture. Author(s) None Yarabarla brings practical expertise and deep knowledge to the subject of Apache Cassandra. With hands-on industry experience designing scalable database solutions, the author ensures complex topics are presented through clear and actionable insights. This is coupled with real-world scenarios to help you apply your learning effectively. Who is it for? This book is ideal for developers and IT professionals interested in learning Apache Cassandra from scratch or enhancing their NoSQL database expertise. It is particularly suited for those transitioning from relational databases to NoSQL systems. Even without prior coding experience, readers can expect to follow along and achieve practical results.

Usage-Driven Database Design: From Logical Data Modeling through Physical Schema Definition

2017-04-07 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by George Tillmann

Big Data Data Modelling Hadoop NoSQL Oracle SQL data data-engineering data-models

Design great databases—from logical data modeling through physical schema definition. You will learn a framework that finally cracks the problem of merging data and process models into a meaningful and unified design that accounts for how data is actually used in production systems. Key to the framework is a method for taking the logical data model that is a static look at the definition of the data, and merging that static look with the process models describing how the data will be used in actual practice once a given system is implemented. The approach solves the disconnect between the static definition of data in the logical data model and the dynamic flow of the data in the logical process models. The design framework in this book can be used to create operational databases for transaction processing systems, or for data warehouses in support of decision support systems. The information manager can be a flat file, Oracle Database, IMS, NoSQL, Cassandra, Hadoop, or any other DBMS. Usage-Driven Database Design emphasizes practical aspects of design, and speaks to what works, what doesn't work, and what to avoid at all costs. Included in the book are lessons learned by the author over his 30+ years in the corporate trenches. Everything in the book is grounded on good theory, yet demonstrates a professional and pragmatic approach to design that can come only from decades of experience. Presents an end-to-end framework from logical data modeling through physical schema definition. Includes lessons learned, techniques, and tricks that can turn a database disaster into a success. Applies to all types of database management systems, including NoSQL such as Cassandra and Hadoop, and mainstream SQL databases such as Oracle and SQL Server What You'll Learn Create logical data models that accurately reflect the real world of the user Create usage scenarios reflecting how applications will use a new database Merge static data models with dynamic process models to create resilient yet flexible database designs Support application requirements by creating responsive database schemas in any database architecture Cope with big data and unstructured data for transaction processing and decision support systems Recognize when relational approaches won't work, and when to turn toward NoSQL solutions such as Cassandra or Hadoop Who This Book Is For System developers, including business analysts, database designers, database administrators, and application designers and developers who must design or interact with database systems

Fast Data Processing Systems with SMACK Stack

2016-12-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Raúl Estrada

Big Data Kafka Scala Spark data data-engineering smack-stack

Fast Data Processing Systems with SMACK Stack introduces you to the SMACK stack-a combination of Spark, Mesos, Akka, Cassandra, and Kafka. You will learn to integrate these technologies to build scalable, efficient, and real-time data processing platforms tailored for solving critical business challenges. What this Book will help me do Understand the concepts of fast data pipelines and design scalable architectures using the SMACK stack Gain expertise in functional programming with Scala and leverage its power in data processing tasks Build and optimize distributed databases using Apache Cassandra for scaling extensively Deploy and manage real-time data streams using Apache Kafka to handle massive messaging workloads Implement cost-effective cluster infrastructures with Apache Mesos for efficient resource utilization Author(s) None Estrada is an expert in distributed systems and big data technologies. With years of experience implementing SMACK-based solutions across industries, Estrada offers a practical viewpoint to designing scalable systems. Their blend of theoretical knowledge and applied practices ensures readers receive actionable guidance. Who is it for? This book is perfect for software developers, data engineers, or data scientists looking to deepen their understanding of real-time data processing systems. If you have a foundational knowledge of the technologies in the SMACK stack or wish to learn how to combine these cutting-edge tools to solve complex problems, this is for you. Readers with an interest in building efficient big data solutions will find tremendous value here.

Data modeling with Cassandra

2016-12-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Eben Hewitt , Jeff Carpenter

Data Modelling data data-engineering nosql-databases

In this lesson, you’ll learn how to design data models for Cassandra, including a data modeling process and notation. To apply this knowledge, we’ll design the data model for a sample application. This will help show how all the parts fit together. Along the way, we’ll use a tool to help us manage our CQL (Cassandra Query Language) scripts. What you’ll learn—and how you can apply it You will learn common patterns and antipatterns for data modeling in Cassandra. This lesson will cover the concepts around data modeling and will compare a Cassandra data model with an equivalent relational database model. You’ll learn about defining queries and about logical and physical database modeling. You’ll learn how to optimize your model for performance, and finally you’ll learn how to implement your model schema using CQL. This lesson is for you because… You are an application developer or architect who wants to learn how data is stored and processed in Cassandra. You are a database administrator who wants to learn about Cassandra. Prerequisites Helpful but not essential to have a basic understanding of relational vs. distributed databases. Helpful but not essential to understand Cassandra Query Language, CQL. Materials or downloads needed in advance None

Optimizing Cassandra performance

2016-12-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Eben Hewitt , Jeff Carpenter

Data Modelling data data-engineering nosql-databases

In this lesson, we look at how to tune Cassandra to improve performance. There are a variety of settings in the configuration file and on individual tables. Although the default settings are appropriate for many use cases, there might be circumstances in which you need to change them. We’ll look at how and why to make these changes. We also see how to use the cassandra-stress test tool that ships with Cassandra to generate load against Cassandra and quickly see how it behaves under stress test circumstances. We can then tune Cassandra appropriately and feel confident that we’re ready to deploy to a production environment. What you’ll learn—and how you can apply it You’ll learn how to monitor and analyze Cassandra performance. You’ll learn about Cassandra features such as caching, memtables, commit logs, SStables, hinted handoff, compaction, and threading to improve responsiveness, consistency, and speed and reduce data loss. We’ll also look at timeout properties and JVM settings. This lesson is for you because… You are a developer, database administrator, or architect who wants to learn how to tune Cassandra. Prerequisites Understanding of Cassandra architecture and data model. If you want to run cassandra-stress Cassandra installed with a running Cassandra cluster. Materials or downloads needed A Cassandra cluster if you want to run cassandra-stress

Big Data SMACK: A Guide to Apache Spark, Mesos, Akka, Cassandra, and Kafka

2016-09-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Isaac Ruiz , Raul Estrada

Big Data Docker Kafka Scala Spark SQL Data Streaming data data-engineering

Learn how to integrate full-stack open source big data architecture and to choose the correct technology—Scala/Spark, Mesos, Akka, Cassandra, and Kafka—in every layer. Big data architecture is becoming a requirement for many different enterprises. So far, however, the focus has largely been on collecting, aggregating, and crunching large data sets in a timely manner. In many cases now, organizations need more than one paradigm to perform efficient analyses. Big Data SMACK explains each of the full-stack technologies and, more importantly, how to best integrate them. It provides detailed coverage of the practical benefits of these technologies and incorporates real-world examples in every situation. This book focuses on the problems and scenarios solved by the architecture, as well as the solutions provided by every technology. It covers the six main concepts of big data architecture and how integrate, replace, and reinforce every layer: What You'll Learn The language: Scala The engine: Spark (SQL, MLib, Streaming, GraphX) The container: Mesos, Docker The view: Akka The storage: Cassandra The message broker: Kafka What You Will Learn: Make big data architecture without using complex Greek letter architectures Build a cheap but effective cluster infrastructure Make queries, reports, and graphs that business demands Manage and exploit unstructured and No-SQL data sources Use tools to monitor the performance of your architecture Integrate all technologies and decide which ones replace and which ones reinforce Who This Book Is For Developers, data architects, and data scientists looking to integrate the most successful big data open stack architecture and to choose the correct technology in every layer

Cassandra 3.x High Availability - Second Edition

2016-08-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Robbie Strickland

DevOps data data-engineering nosql-databases

Cassandra 3.x High Availability is an in-depth guide to mastering the high availability features of Apache Cassandra. This book takes you through its architecture, implementing solutions to achieve zero downtime, and configuring clusters for fault tolerance and scalability. With practical examples and tips, it is a go-to resource for designing robust Cassandra-powered systems. What this Book will help me do Understand the architecture of Apache Cassandra and its high availability mechanisms. Master replication and tunable consistency levels for optimal data distribution. Learn to scale out your Cassandra deployments with multiple data centers. Acquire skills in creating efficient and scalable data models for fault-tolerant systems. Prevent system failures by avoiding anti-patterns and managing graceful failover scenarios. Author(s) None Strickland has extensive experience working as a developer and architect with distributed database systems. Specializing in Apache Cassandra, Strickland focuses on designing systems with high availability, scalability, and fault tolerance. Their practical teaching style ensures readers gain actionable knowledge to build robust database solutions. Who is it for? This book is ideal for developers and DevOps engineers familiar with Cassandra basics who wish to deepen their expertise. If your goal is to build highly available and fault-tolerant systems, this book will guide you step by step. It suits professionals managing data-intensive applications and looking to optimize their database strategy using Cassandra.

Sams Teach Yourself Apache Spark™ in 24 Hours

2016-08-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jeffrey Aven

AI/ML API Big Data Cloud Computing Data Engineering Kafka NoSQL Python Scala Spark SQL Data Streaming +3 more

Apache Spark is a fast, scalable, and flexible open source distributed processing engine for big data systems and is one of the most active open source big data projects to date. In just 24 lessons of one hour or less, Sams Teach Yourself Apache Spark in 24 Hours helps you build practical Big Data solutions that leverage Spark’s amazing speed, scalability, simplicity, and versatility. This book’s straightforward, step-by-step approach shows you how to deploy, program, optimize, manage, integrate, and extend Spark–now, and for years to come. You’ll discover how to create powerful solutions encompassing cloud computing, real-time stream processing, machine learning, and more. Every lesson builds on what you’ve already learned, giving you a rock-solid foundation for real-world success. Whether you are a data analyst, data engineer, data scientist, or data steward, learning Spark will help you to advance your career or embark on a new career in the booming area of Big Data. Learn how to • Discover what Apache Spark does and how it fits into the Big Data landscape • Deploy and run Spark locally or in the cloud • Interact with Spark from the shell • Make the most of the Spark Cluster Architecture • Develop Spark applications with Scala and functional Python • Program with the Spark API, including transformations and actions • Apply practical data engineering/analysis approaches designed for Spark • Use Resilient Distributed Datasets (RDDs) for caching, persistence, and output • Optimize Spark solution performance • Use Spark with SQL (via Spark SQL) and with NoSQL (via Cassandra) • Leverage cutting-edge functional programming techniques • Extend Spark with streaming, R, and Sparkling Water • Start building Spark-based machine learning and graph-processing applications • Explore advanced messaging technologies, including Kafka • Preview and prepare for Spark’s next generation of innovations Instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Spark to solve a wide spectrum of Big Data problems.

Cassandra: The Definitive Guide, 2nd Edition

2016-07-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Eben Hewitt , Jeff Carpenter

Cloud Computing Data Modelling Docker ELK Hadoop Java JavaScript Python Spark data data-engineering nosql-databases

Imagine what you could do if scalability wasn't a problem. With this hands-on guide, you’ll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This expanded second edition—updated for Cassandra 3.0—provides the technical details and practical examples you need to put this database to work in a production environment. Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandra’s non-relational design, with special attention to data modeling. If you’re a developer, DBA, or application architect looking to solve a database scaling issue or future-proof your application, this guide helps you harness Cassandra’s speed and flexibility. Understand Cassandra’s distributed and decentralized structure Use the Cassandra Query Language (CQL) and cqlsh—the CQL shell Create a working data model and compare it with an equivalent relational model Develop sample applications using client drivers for languages including Java, Python, and Node.js Explore cluster topology and learn how nodes exchange data Maintain a high level of performance in your cluster Deploy Cassandra on site, in the Cloud, or with Docker Integrate Cassandra with Spark, Hadoop, Elasticsearch, Solr, and Lucene

Pro Spark Streaming: The Zen of Real-Time Analytics Using Apache Spark

2016-06-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Zubair Nabi

AI/ML Analytics AWS Lambda BI Big Data ETL/ELT Apache HBase Hive IoT Kafka Redis Spark +4 more

Learn the right cutting-edge skills and knowledge to leverage Spark Streaming to implement a wide array of real-time, streaming applications. This book walks you through end-to-end real-time application development using real-world applications, data, and code. Taking an application-first approach, each chapter introduces use cases from a specific industry and uses publicly available datasets from that domain to unravel the intricacies of production-grade design and implementation. The domains covered in Pro Spark Streaming include social media, the sharing economy, finance, online advertising, telecommunication, and IoT. In the last few years, Spark has become synonymous with big data processing. DStreams enhance the underlying Spark processing engine to support streaming analysis with a novel micro-batch processing model. Pro Spark Streaming by Zubair Nabi will enable you to become a specialist of latency sensitive applications by leveraging the key features of DStreams, micro-batch processing, and functional programming. To this end, the book includes ready-to-deploy examples and actual code. Pro Spark Streaming will act as the bible of Spark Streaming. What You'll Learn Discover Spark Streaming application development and best practices Work with the low-level details of discretized streams Optimize production-grade deployments of Spark Streaming via configuration recipes and instrumentation using Graphite, collectd, and Nagios Ingest data from disparate sources including MQTT, Flume, Kafka, Twitter, and a custom HTTP receiver Integrate and couple with HBase, Cassandra, and Redis Take advantage of design patterns for side-effects and maintaining state across the Spark Streaming micro-batch model Implement real-time and scalable ETL using data frames, SparkSQL, Hive, and SparkR Use streaming machine learning, predictive analytics, and recommendations Mesh batch processing with stream processing via the Lambda architecture Who This Book Is For Data scientists, big data experts, BI analysts, and data architects.

Big Data Analytics with Spark: A Practitioner’s Guide to Using Spark for Large-Scale Data Processing, Machine Learning, and Graph Analytics, and High-Velocity Data Stream Processing

2016-01-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Mohammed Guller

AI/ML Analytics Avro BI Big Data Data Analytics ETL/ELT Apache HBase HDFS Kafka Parquet Scala +6 more

This book is a step-by-step guide for learning how to use Spark for different types of big-data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning. It covers Spark core and its add-on libraries, including Spark SQL, Spark Streaming, GraphX, MLlib, and Spark ML. Big Data Analytics with Spark shows you how to use Spark and leverage its easy-to-use features to increase your productivity. You learn to perform fast data analysis using its in-memory caching and advanced execution engine, employ in-memory computing capabilities for building high-performance machine learning and low-latency interactive analytics applications, and much more. Moreover, the book shows you how to use Spark as a single integrated platform for a variety of data processing tasks, including ETL pipelines, BI, live data stream processing, graph analytics, and machine learning. The book also includes a chapter on Scala, the hottest functional programming language, and the language that underlies Spark. You’ll learn the basics of functional programming in Scala, so that you can write Spark applications in it. What's more, Big Data Analytics with Spark provides an introduction to other big data technologies that are commonly used along with Spark, such as HDFS, Avro, Parquet, Kafka, Cassandra, HBase, Mesos, and so on. It also provides an introduction to machine learning and graph concepts. So the book is self-sufficient; all the technologies that you need to know to use Spark are covered. The only thing that you are expected to have is some programming knowledge in any language.

talk-data.com

Activity Trend

Top Events

Top Speakers

Trino: The Definitive Guide, 2nd Edition

Cassandra: The Definitive Guide, (Revised) Third Edition, 3rd Edition

Cassandra: The Definitive Guide, 3rd Edition

Big Data Simplified

Mastering Apache Cassandra 3.x - Third Edition

Designing Fast Data Application Architectures

Seven NoSQL Databases in a Week

Complete Guide to Open Source Big Data Stack

Expert Apache Cassandra Administration

Learning Apache Cassandra - Second Edition

Usage-Driven Database Design: From Logical Data Modeling through Physical Schema Definition

Fast Data Processing Systems with SMACK Stack

Data modeling with Cassandra

Optimizing Cassandra performance

Big Data SMACK: A Guide to Apache Spark, Mesos, Akka, Cassandra, and Kafka

Cassandra 3.x High Availability - Second Edition

Sams Teach Yourself Apache Spark™ in 24 Hours

Cassandra: The Definitive Guide, 2nd Edition

Pro Spark Streaming: The Zen of Real-Time Analytics Using Apache Spark

Big Data Analytics with Spark: A Practitioner’s Guide to Using Spark for Large-Scale Data Processing, Machine Learning, and Graph Analytics, and High-Velocity Data Stream Processing