talk-data.com

Topic

Amazon EC2

cloud_computing iaas aws

Activities

tagged

Activity Trend

2 peak/qtr

2020-Q1 2026-Q1

Top Events

O'Reilly Data Engineering Books 4 AWS re:Invent 2024 3 Databricks DATA + AI Summit 2023 3 The Performance Engine - Architecting for Real-Time Scale 1 Airflow Summit 2023 1 Airflow Summit 2020 1 The Performance Engine - Architecting for Real-Time Scale 1

Top Speakers

Nati Cohen (AWS) 2 David Hows 1 Sam R. Alapati 1 Eelco Plugge 1 Zohar Donenhirsh 1 Eric Redmond 1 Jim R. Wilson 1 Alina Aven 1 Aaron Feng (Roblox) 1 Peter Membrey 1 Naresh Yegireddi (PlayStation) 1 Matthew Liem (AWS) 1

Activities

4 activities · Newest first

All Video Podcast Book

High Performant File System Workloads for AI and HPC on AWS using IBM Spectrum Scale

2021-03-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sanjay Sudam

AI/ML AWS Cloud Computing ELK IBM Linux data data-engineering

This IBM® Redpaper® publication is intended to facilitate the deployment and configuration of the IBM Spectrum® Scale based high-performance storage solutions for the scalable data and AI solutions on Amazon Web Services (AWS). Configuration, testing results, and tuning guidelines for running the IBM Spectrum Scale based high-performance storage solutions for the data and AI workloads on AWS are the focus areas of the paper. The LAB Validation was conducted with the Red Hat Linux nodes to IBM Spectrum Scale by using the various Amazon Elastic Compute Cloud (EC2) instances. Simultaneous workloads are simulated across multiple Amazon EC2 nodes running with Red Hat Linux to determine scalability against the IBM Spectrum Scale clustered file system. Solution architecture, configuration details, and performance tuning demonstrate how to maximize data and AI application performance with IBM Spectrum Scale on AWS.

Expert Apache Cassandra Administration

2017-12-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sam R. Alapati

Big Data Cassandra Data Modelling Docker ELK Spark data data-engineering nosql-databases

Follow this handbook to build, configure, tune, and secure Apache Cassandra databases. Start with the installation of Cassandra and move on to the creation of a single instance, and then a cluster of Cassandra databases. Cassandra is increasingly a key player in many big data environments, and this book shows you how to use Cassandra with Apache Spark, a popular big data processing framework. Also covered are day-to-day topics of importance such as the backup and recovery of Cassandra databases, using the right compression and compaction strategies, and loading and unloading data. Expert Apache Cassandra Administration provides numerous step-by-step examples starting with the basics of a Cassandra database, and going all the way through backup and recovery, performance optimization, and monitoring and securing the data. The book serves as an authoritative and comprehensive guide to the building and management of simpleto complex Cassandra databases. The book: Takes you through building a Cassandra database from installation of the software and creation of a single database, through to complex clusters and data centers Provides numerous examples of actual commands in a real-life Cassandra environment that show how to confidently configure, manage, troubleshoot, and tune Cassandra databases Shows how to use the Cassandra configuration properties to build a highly stable, available, and secure Cassandra database that always operates at peak efficiency What You'll Learn Install the Cassandra software and create your first database Understand the Cassandra data model, and the internal architecture of a Cassandra database Create your own Cassandra cluster, step-by-step Run a Cassandra cluster on Docker Work with Apache Spark by connecting to a Cassandra database Deploy Cassandra clusters in your data center, or on Amazon EC2 instances Back up and restore mission-critical Cassandra databases Monitor, troubleshoot, and tune production Cassandra databases, and cut your spending on resources such as memory, servers, and storage Who This Book Is For Database administrators, developers, and architects who are looking for an authoritative and comprehensive single volume for all their Cassandra administration needs. Also for administrators who are tasked with setting up and maintaining highly reliable and high-performing Cassandra databases. An excellent choice for big data administrators, database administrators, architects, and developers who use Cassandra as their key data store, to support high volume online transactions, or as a decentralized, elastic data store.

The Definitive Guide to MongoDB: A complete guide to dealing with Big Data using MongoDB, Second Edition

2013-11-06 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by David Hows , Eelco Plugge , Peter Membrey , Tim Hawkins

Azure Big Data Cloud Computing Data Management Linux MongoDB NoSQL Python SQL data data-engineering nosql-databases

The Definitive Guide to MongoDB, Second Edition, is updated for the latest version and includes all of the latest MongoDB features, including the aggregation framework introduced in version 2.2 and hashed indexes in version 2.4. MongoDB is the most popular of the "Big Data" NoSQL database technologies, and it's still growing. David Hows from 10gen, along with experienced MongoDB authors Peter Membrey and Eelco Plugge, provide their expertise and experience in teaching you everything you need to know to become a MongoDB pro. The Definitive Guide to MongoDB, Second Edition, starts with the basics, including how to install on Windows, Linux, and OS X, and how MongoDB handles your data. Then you'll learn how to develop with MongoDB with both PHP and Python, including an example application using a PHP driver to create a blog application. Finally, you'll dig into more advanced but extremely important MongoDB features, including optimization, replication, and sharding -- load-balancing that makes MongoDB ideal for dealing with Big Data. If you're dealing with data, MongoDB should be on your must-learn list. The Definitive Guide to MongoDB, Second Edition, is just the book you need. What you'll learn Set up MongoDB on all major server platforms, including Windows, Linux, OS X, and cloud platforms like Rackspace, Azure, and Amazon EC2 Work with GridFS and the new aggregation framework Work with your data using non-SQL commands Write applications using either PHP or Python Optimize MongoDB Master MongoDB administration, including replication, replication tagging, and tag-aware sharding Who this book is for Database admins and developers who need to get up to speed on MongoDB and its Big Data, NoSQL approach to dealing with data management.

Seven Databases in Seven Weeks

2012-05-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Eric Redmond , Jim R. Wilson

Big Data Cloud Computing Data Management DynamoDB ELK Apache HBase Java Linux MongoDB Neo4j NoSQL RDBMS +5 more

Data is getting bigger and more complex by the day, and so are the choices in handling that data. As a modern application developer you need to understand the emerging field of data management, both RDBMS and NoSQL. Seven Databases in Seven Weeks takes you on a tour of some of the hottest open source databases today. In the tradition of Bruce A. Tate's Seven Languages in Seven Weeks, this book goes beyond your basic tutorial to explore the essential concepts at the core each technology. Redis, Neo4J, CouchDB, MongoDB, HBase, Riak and Postgres. With each database, you'll tackle a real-world data problem that highlights the concepts and features that make it shine. You'll explore the five data models employed by these databases-relational, key/value, columnar, document and graph-and which kinds of problems are best suited to each. You'll learn how MongoDB and CouchDB are strikingly different, and discover the Dynamo heritage at the heart of Riak. Make your applications faster with Redis and more connected with Neo4J. Use MapReduce to solve Big Data problems. Build clusters of servers using scalable services like Amazon's Elastic Compute Cloud (EC2). Discover the CAP theorem and its implications for your distributed data. Understand the tradeoffs between consistency and availability, and when you can use them to your advantage. Use multiple databases in concert to create a platform that's more than the sum of its parts, or find one that meets all your needs at once. Seven Databases in Seven Weeks will take you on a deep dive into each of the databases, their strengths and weaknesses, and how to choose the ones that fit your needs. What You Need: To get the most of of this book you'll have to follow along, and that means you'll need a *nix shell (Mac OSX or Linux preferred, Windows users will need Cygwin), and Java 6 (or greater) and Ruby 1.8.7 (or greater). Each chapter will list the downloads required for that database.