MongoDB

Stretching The Elastic Stack with Philipp Krenn - Episode 23

2018-03-19 · Data Engineering Podcast Listen

podcast_episode

by Philipp Krenn (Elastic) , Tobias Macey

API Cassandra Cloud Computing Data Engineering Data Management Datadog ELK GitHub Kibana Logstash Neo4j NoSQL

Summary

Search is a common requirement for applications of all varieties. Elasticsearch was built to make it easy to include search functionality in projects built in any language. From that foundation, the rest of the Elastic Stack has been built, expanding to many more use cases in the proces. In this episode Philipp Krenn describes the various pieces of the stack, how they fit together, and how you can use them in your infrastructure to store, search, and analyze your data.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. For complete visibility into the health of your pipeline, including deployment tracking, and powerful alerting driven by machine-learning, DataDog has got you covered. With their monitoring, metrics, and log collection agent, including extensive integrations and distributed tracing, you’ll have everything you need to find and fix performance bottlenecks in no time. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial and get a sweet new T-Shirt. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Philipp Krenn about the Elastic Stack and the ways that you can use it in your systems

Interview

Introduction How did you get involved in the area of data management? The Elasticsearch product has been around for a long time and is widely known, but can you give a brief overview of the other components that make up the Elastic Stack and how they work together? Beyond the common pattern of using Elasticsearch as a search engine connected to a web application, what are some of the other use cases for the various pieces of the stack? What are the common scaling bottlenecks that users should be aware of when they are dealing with large volumes of data? What do you consider to be the biggest competition to the Elastic Stack as you expand the capabilities and target usage patterns? What are the biggest challenges that you are tackling in the Elastic stack, technical or otherwise? What are the biggest challenges facing Elastic as a company in the near to medium term? Open source as a business model: https://www.elastic.co/blog/doubling-down-on-open?utm_source=rss&utm_medium=rss What is the vision for Elastic and the Elastic Stack going forward and what new features or functionality can we look forward to?

Contact Info

@xeraa on Twitter xeraa on GitHub Website Email

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Elastic Vienna – Capital of Austria What Is Developer Advocacy? NoSQL MongoDB Elasticsearch Cassandra Neo4J Hazelcast Apache Lucene Logstash Kibana Beats X-Pack ELK Stack Metrics APM (Application Performance Monitoring) GeoJSON Split Brain Elasticsearch Ingest Nodes PacketBeat Elastic Cloud Elasticon Kibana Canvas SwiftType

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

Database Refactoring Patterns with Pramod Sadalage - Episode 22

2018-03-12 · Data Engineering Podcast Listen

podcast_episode

by Pramod Sadalage , Tobias Macey

Agile/Scrum CI/CD Data Engineering Data Management DevOps Docker DWH GitHub Java Linux Neo4j NoSQL +1 more

Summary

As software lifecycles move faster, the database needs to be able to keep up. Practices such as version controlled migration scripts and iterative schema evolution provide the necessary mechanisms to ensure that your data layer is as agile as your application. Pramod Sadalage saw the need for these capabilities during the early days of the introduction of modern development practices and co-authored a book to codify a large number of patterns to aid practitioners, and in this episode he reflects on the current state of affairs and how things have changed over the past 12 years.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at dataengineeringpodcast.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your data pipelines or trying out the tools you hear about on the show. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers Your host is Tobias Macey and today I’m interviewing Pramod Sadalage about refactoring databases and integrating database design into an iterative development workflow

Interview

Introduction How did you get involved in the area of data management? You first co-authored Refactoring Databases in 2006. What was the state of software and database system development at the time and why did you find it necessary to write a book on this subject? What are the characteristics of a database that make them more difficult to manage in an iterative context? How does the practice of refactoring in the context of a database compare to that of software? How has the prevalence of data abstractions such as ORMs or ODMs impacted the practice of schema design and evolution? Is there a difference in strategy when refactoring the data layer of a system when using a non-relational storage system? How has the DevOps movement and the increased focus on automation affected the state of the art in database versioning and evolution? What have you found to be the most problematic aspects of databases when trying to evolve the functionality of a system? Looking back over the past 12 years, what has changed in the areas of database design and evolution?

How has the landscape of tooling for managing and applying database versioning changed since you first wrote Refactoring Databases? What do you see as the biggest challenges facing us over the next few years?

Contact Info

Website pramodsadalage on GitHub @pramodsadalage on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Database Refactoring

Website Book

Thoughtworks Martin Fowler Agile Software Development XP (Extreme Programming) Continuous Integration

The Book Wikipedia

Test First Development DDL (Data Definition Language) DML (Data Modification Language) DevOps Flyway Liquibase DBMaintain Hibernate SQLAlchemy ORM (Object Relational Mapper) ODM (Object Document Mapper) NoSQL Document Database MongoDB OrientDB CouchBase CassandraDB Neo4j ArangoDB Unit Testing Integration Testing OLAP (On-Line Analytical Processing) OLTP (On-Line Transaction Processing) Data Warehouse Docker QA==Quality Assurance HIPAA (Health Insurance Portability and Accountability Act) PCI DSS (Payment Card Industry Data Security Standard) Polyglot Persistence Toplink Java ORM Ruby on Rails ActiveRecord Gem

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

TimescaleDB: Fast And Scalable Timeseries with Ajay Kulkarni and Mike Freedman - Episode 18

2018-02-11 · Data Engineering Podcast Listen

podcast_episode

by Mike Freedman (Timescale) , Ajay Kulkarni (Timescale) , Tobias Macey

Amazon RDS Azure Cloud Computing Cloudflare Data Engineering Data Management Databricks DevOps Docker ELK GCP GitHub +14 more

Summary

As communications between machines become more commonplace the need to store the generated data in a time-oriented manner increases. The market for timeseries data stores has many contenders, but they are not all built to solve the same problems or to scale in the same manner. In this episode the founders of TimescaleDB, Ajay Kulkarni and Mike Freedman, discuss how Timescale was started, the problems that it solves, and how it works under the covers. They also explain how you can start using it in your infrastructure and their plans for the future.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at dataengineeringpodcast.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your data pipelines or trying out the tools you hear about on the show. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers Your host is Tobias Macey and today I’m interviewing Ajay Kulkarni and Mike Freedman about Timescale DB, a scalable timeseries database built on top of PostGreSQL

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what Timescale is and how the project got started? The landscape of time series databases is extensive and oftentimes difficult to navigate. How do you view your position in that market and what makes Timescale stand out from the other options? In your blog post that explains the design decisions for how Timescale is implemented you call out the fact that the inserted data is largely append only which simplifies the index management. How does Timescale handle out of order timestamps, such as from infrequently connected sensors or mobile devices? How is Timescale implemented and how has the internal architecture evolved since you first started working on it?

What impact has the 10.0 release of PostGreSQL had on the design of the project? Is timescale compatible with systems such as Amazon RDS or Google Cloud SQL?

For someone who wants to start using Timescale what is involved in deploying and maintaining it? What are the axes for scaling Timescale and what are the points where that scalability breaks down?

Are you aware of anyone who has deployed it on top of Citus for scaling horizontally across instances?

What has been the most challenging aspect of building and marketing Timescale? When is Timescale the wrong tool to use for time series data? One of the use cases that you call out on your website is for systems metrics and monitoring. How does Timescale fit into that ecosystem and can it be used along with tools such as Graphite or Prometheus? What are some of the most interesting uses of Timescale that you have seen? Which came first, Timescale the business or Timescale the database, and what is your strategy for ensuring that the open source project and the company around it both maintain their health? What features or improvements do you have planned for future releases of Timescale?

Contact Info

Ajay

LinkedIn @acoustik on Twitter Timescale Blog

Mike

Website LinkedIn @michaelfreedman on Twitter Timescale Blog

Timescale

Website @timescaledb on Twitter GitHub

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Timescale PostGreSQL Citus Timescale Design Blog Post MIT NYU Stanford SDN Princeton Machine Data Timeseries Data List of Timeseries Databases NoSQL Online Transaction Processing (OLTP) Object Relational Mapper (ORM) Grafana Tableau Kafka When Boring Is Awesome PostGreSQL RDS Google Cloud SQL Azure DB Docker Continuous Aggregates Streaming Replication PGPool II Kubernetes Docker Swarm Citus Data

Website Data Engineering Podcast Interview

Database Indexing B-Tree Index GIN Index GIST Index STE Energy Redis Graphite Prometheus pg_prometheus OpenMetrics Standard Proposal Timescale Parallel Copy Hadoop PostGIS KDB+ DevOps Internet of Things MongoDB Elastic DataBricks Apache Spark Confluent New Enterprise Associates MapD Benchmark Ventures Hortonworks 2σ Ventures CockroachDB Cloudflare EMC Timescale Blog: Why SQL is beating NoSQL, and what this means for the future of data

The intro and outro music is from a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug?utm_source=rss&utm_medium=rss" target="_blank"…

Mastering MongoDB 3.x

2017-11-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alex Giamas

Big Data Cloud Computing NoSQL data data-engineering nosql-databases

"Mastering MongoDB 3.x" is your comprehensive guide to mastering the world of MongoDB, the leading NoSQL database. This book equips you with both foundational and advanced skills to effectively design, develop, and manage MongoDB-powered applications. Discover how to build fault-tolerant systems and dive deep into database internals, deployment strategies, and much more. What this Book will help me do Gain expertise in advanced querying using indexing and data expressions for efficient data retrieval. Master MongoDB administration for both on-premise and cloud-based environments efficiently. Learn data sharding and replication techniques to ensure scalability and fault tolerance. Understand the intricacies of MongoDB internals, including performance optimization techniques. Leverage MongoDB for big data processing by integrating with complex data pipelines. Author(s) Alex Giamas is a seasoned database developer and administrator with strong expertise in NoSQL technologies, particularly MongoDB. With years of experience guiding teams on creating and optimizing database structures, Alex ensures clear and practical methods for learning the essential aspects of MongoDB. His writing focuses on actionable knowledge and practical solutions for modern database challenges. Who is it for? This book is perfect for database developers, system architects, and administrators who are already familiar with database concepts and are looking to deepen their knowledge in NoSQL databases, specifically MongoDB. Whether you're working on building web applications, scaling data systems, or ensuring fault tolerance, this book provides the guidance to optimize your database management skill set.

MongoDB Administrator's Guide

2017-10-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Cyrus Dasadia

NoSQL Cyber Security data data-engineering nosql-databases

The "MongoDB Administrator's Guide" is an indispensable resource for database administrators and developers looking to gain mastery over administrating MongoDB systems. This book offers over 100 practical recipes, designed to simplify the tasks of maintaining, optimizing, and securing MongoDB deployments. What this Book will help me do Deploy and configure production-grade MongoDB environments efficiently. Manage and optimize MongoDB indexing to improve query performance. Implement and maintain high availability through replication and sharding. Ensure database security with robust authentication and authorization. Perform efficient backups, recovery, and database performance monitoring. Author(s) None Dasadia is a seasoned MongoDB expert with extensive experience in database administration and optimization. Having worked extensively in developing and managing high-performance database systems, None ensures a hands-on and practical approach in their writing. Their aim is to guide readers to effectively solve real-world database challenges with MongoDB. Who is it for? This book is ideal for database administrators with a foundational understanding of MongoDB, as well as developers aiming to enhance their administration skills in this NoSQL ecosystem. Whether you're seeking best practices for routine tasks or scalable solutions for enterprise-level applications, this guide has comprehensive coverage tailored for you.

Web Development with MongoDB and Node - Third Edition

2017-09-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Bruno Joseph D'mello , Jason Krol , Mithun Satheesh

AWS Azure Cloud Computing HTML JavaScript Microsoft data data-engineering nosql-databases

Explore the power of combining Node.js and MongoDB to build modern, scalable web applications in 'Web Development with MongoDB and Node.' You'll not only learn how to integrate these two technologies effectively, but you'll also gain practical insights into using modern frameworks like Express and Angular to build feature-rich web apps. What this Book will help me do Master core concepts of Node.js and MongoDB for efficient web development. Learn to build and configure a web server using the Express.js framework. Implement data persistence with MongoDB using the Mongoose ODM library. Automate testing using tools like Mocha and streamline workflows with Gulp. Deploy applications to cloud platforms like Heroku, AWS, and Microsoft Azure. Author(s) Jason Krol and None Joseph D'mello, along with None Satheesh, bring extensive experience in web development and technical writing to this book. The authors have collectively worked on cutting-edge web technologies for years and are passionate about sharing their expertise to help developers create efficient web applications. Who is it for? This book is perfect for JavaScript developers at any proficiency level who are looking to expand their skills into full-stack development with Node.js and MongoDB. Even if you have a basic understanding of JavaScript and HTML, this book will guide you through building complete web applications from scratch. If you're eager to learn and create performant, scalable web apps, this book is for you.

Astronomer with Ry Walker - Episode 6

2017-08-06 · Data Engineering Podcast Listen

podcast_episode

by Ry Walker (Astronomer) , Tobias Macey

Airflow Flink API Astronomer AWS Kinesis AWS Lambda Data Engineering Data Management Docker Druid ELK +12 more

Summary

Building a data pipeline that is reliable and flexible is a difficult task, especially when you have a small team. Astronomer is a platform that lets you skip straight to processing your valuable business data. Ry Walker, the CEO of Astronomer, explains how the company got started, how the platform works, and their commitment to open source.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at www.dataengineeringpodcast.com/linode?utm_source=rss&utm_medium=rss and get a $20 credit to try out their fast and reliable Linux virtual servers for running your data pipelines or trying out the tools you hear about on the show. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers This is your host Tobias Macey and today I’m interviewing Ry Walker, CEO of Astronomer, the platform for data engineering.

Interview

Introduction How did you first get involved in the area of data management? What is Astronomer and how did it get started? Regulatory challenges of processing other people’s data What does your data pipelining architecture look like? What are the most challenging aspects of building a general purpose data management environment? What are some of the most significant sources of technical debt in your platform? Can you share some of the failures that you have encountered while architecting or building your platform and company and how you overcame them? There are certain areas of the overall data engineering workflow that are well defined and have numerous tools to choose from. What are some of the unsolved problems in data management? What are some of the most interesting or unexpected uses of your platform that you are aware of?

Contact Information

Email @rywalker on Twitter

Links

Astronomer Kiss Metrics Segment Marketing tools chart Clickstream HIPAA FERPA PCI Mesos Mesos DC/OS Airflow SSIS Marathon Prometheus Grafana Terraform Kafka Spark ELK Stack React GraphQL PostGreSQL MongoDB Ceph Druid Aries Vault Adapter Pattern Docker Kinesis API Gateway Kong AWS Lambda Flink Redshift NOAA Informatica SnapLogic Meteor

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

JSON at Work

2017-07-03 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tom Marrs

API Java JavaScript JSON JSON Schema Kafka data data-engineering storage-formats

JSON is becoming the backbone for meaningful data interchange over the internet. This format is now supported by an entire ecosystem of standards, tools, and technologies for building truly elegant, useful, and efficient applications. With this hands-on guide, author and architect Tom Marrs shows you how to build enterprise-class applications and services by leveraging JSON tooling and message/document design. JSON at Work provides application architects and developers with guidelines, best practices, and use cases, along with lots of real-world examples and code samples. You’ll start with a comprehensive JSON overview, explore the JSON ecosystem, and then dive into JSON’s use in the enterprise. Get acquainted with JSON basics and learn how to model JSON data Learn how to use JSON with Node.js, Ruby on Rails, and Java Structure JSON documents with JSON Schema to design and test APIs Search the contents of JSON documents with JSON Search tools Convert JSON documents to other data formats with JSON Transform tools Compare JSON-based hypermedia formats, including HAL and jsonapi Leverage MongoDB to store and access JSON documents Use Apache Kafka to exchange JSON-based messages between services

Agile Data Science 2.0

2017-06-13 · O'Reilly Data Science Books O'Reilly Amazon

book

by Russell Jurney

Agile/Scrum Airflow Analytics Data Science ELK JavaScript Kafka Python Scikit-learn Spark data data-science

Data science teams looking to turn research into useful analytics applications require not only the right tools, but also the right approach if they’re to succeed. With the revised second edition of this hands-on guide, up-and-coming data scientists will learn how to use the Agile Data Science development methodology to build data applications with Python, Apache Spark, Kafka, and other tools. Author Russell Jurney demonstrates how to compose a data platform for building, deploying, and refining analytics applications with Apache Kafka, MongoDB, ElasticSearch, d3.js, scikit-learn, and Apache Airflow. You’ll learn an iterative approach that lets you quickly change the kind of analysis you’re doing, depending on what the data is telling you. Publish data science work as a web application, and affect meaningful change in your organization. Build value from your data in a series of agile sprints, using the data-value pyramid Extract features for statistical models from a single dataset Visualize data with charts, and expose different aspects through interactive reports Use historical data to predict the future via classification and regression Translate predictions into actions Get feedback from users after each sprint to keep your project on track

Pro Tableau: A Step-by-Step Guide

2016-12-23 · O'Reilly Data Science Books O'Reilly Amazon

book

by Seema Acharya , Subhashini Chellappan

Analytics BI Cassandra Data Science Microsoft MySQL NoSQL RDBMS SQL SQL Server Tableau data +3 more

Leverage the power of visualization in business intelligence and data science to make quicker and better decisions. Use statistics and data mining to make compelling and interactive dashboards. This book will help those familiar with Tableau software chart their journey to being a visualization expert. Pro Tableau demonstrates the power of visual analytics and teaches you how to: Connect to various data sources such as spreadsheets, text files, relational databases (Microsoft SQL Server, MySQL, etc.), non-relational databases (NoSQL such as MongoDB, Cassandra), R data files, etc. Write your own custom SQL, etc. Perform statistical analysis in Tableau using R Use a multitude of charts (pie, bar, stacked bar, line, scatter plots, dual axis, histograms, heat maps, tree maps, highlight tables, box and whisker, etc.) What you'll learn Connect to various data sources such as relational databases (Microsoft SQL Server, MySQL), non-relational databases (NoSQL such as MongoDB, Cassandra), write your own custom SQL, join and blend data sources, etc. Leverage table calculations (moving average, year over year growth, LOD (Level of Detail), etc. Integrate Tableau with R Tell a compelling story with data by creating highly interactive dashboards Who this book is for All levels of IT professionals, from executives responsible for determining IT strategies to systems administrators, to data analysts, to decision makers responsible for driving strategic initiatives, etc. The book will help those familiar with Tableau software chart their journey to a visualization expert.

Advanced R: Data Programming and the Cloud

2016-11-17 · O'Reilly Data Science Books O'Reilly Amazon

book

by Joshua F. Wiley , Matt Wiley

Cloud Computing Data Management GitHub data data-science data-science-tools r

Program for data analysis using R and learn practical skills to make your work more efficient. This book covers how to automate running code and the creation of reports to share your results, as well as writing functions and packages. Advanced R is not designed to teach advanced R programming nor to teach the theory behind statistical procedures. Rather, it is designed to be a practical guide moving beyond merely using R to programming in R to automate tasks. This book will show you how to manipulate data in modern R structures and includes connecting R to data bases such as SQLite, PostgeSQL, and MongoDB. The book closes with a hands-on section to get R running in the cloud. Each chapter also includes a detailed bibliography with references to research articles and other resources that cover relevant conceptual and theoretical topics. What You Will Learn Write and document R functions Make an R package and share it via GitHub or privately Add tests to R code to insure it works as intended Build packages automatically with GitHub Use R to talk directly to databases and do complex data management Run R in the Amazon cloud Generate presentation-ready tables and reports using R Who This Book Is For Working professionals, researchers, or students who are familiar with R and basic statistical techniques such as linear regression and who want to learn how to take their R coding and programming to the next level.

Beginning Hibernate: For Hibernate 5

2016-11-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dave Minter , Joseph Ottinger , Jeff Linwood

Big Data Java NoSQL data data-engineering database-management-tools hibernate object-relational-mapping

Get started with the Hibernate 5 persistence layer and gain a clear introduction to the current standard for object-relational persistence in Java. This updated edition includes the new Hibernate 5.0 framework as well as coverage of NoSQL, MongoDB, and other related technologies, ranging from applications to big data. Beginning Hibernate is ideal if you're experienced in Java with databases (the traditional, or connected, approach), but new to open-source, lightweight Hibernate. The book keeps its focus on Hibernate without wasting time on nonessential third-party tools, so you'll be able to immediately start building transaction-based engines and applications. Experienced authors Joseph Ottinger with Dave Minter and Jeff Linwood provide more in-depth examples than any other book for Hibernate beginners. They present their material in a lively, example-based manner—not a dry, theoretical, hard-to-read fashion. What You'll Learn Build enterprise Java-based transaction-type applications that access complex data with Hibernate Work with Hibernate 5 using a present-day build process Use Java 8 features with Hibernate Integrate into the persistence life cycle Map using Java's annotations Search and query with the new version of Hibernate Integrate with MongoDB using NoSQL Keep track of versioned data with Hibernate Envers Who This Book Is For Experienced Java developers interested in learning how to use and apply object-relational persistence in Java and who are new to the Hibernate persistence framework.

Practical Data Analysis - Second Edition

2016-09-30 · O'Reilly Data Science Books O'Reilly Amazon

book

by Hector Cuesta , Dr. Sampath Kumar

AI/ML Big Data Pandas Spark data data-science data-science-tasks exploratory-data-analysis

Practical Data Analysis provides a hands-on guide to mastering essential data analysis techniques using tools like Pandas, MongoDB, and Apache Spark. With step-by-step instructions, you'll explore how to process diverse data types, apply machine learning methods, and uncover actionable insights that can drive innovative projects and business solutions. What this Book will help me do Master data acquisition, formatting, and visualization techniques to prepare your data for analysis. Understand and apply machine learning algorithms for tasks like classification and forecasting. Learn to analyze textual data, such as performing sentiment analysis and text classification. Effectively work with databases using tools like MongoDB and handle big data with Apache Spark. Develop data-driven applications using real-world examples like image similarity searches and social network graph analysis. Author(s) None Cuesta and Dr. Sampath Kumar are experienced data scientists and educators. They have considerable experience applying data analysis techniques in various domains and a passion for teaching these skills. Their practical approach to data analysis ensures an engaging learning experience for readers. Who is it for? This book is ideal for developers and data enthusiasts aiming to incorporate practical data analysis into their projects. It is perfectly suited for readers with basic programming, statistics, and linear algebra knowledge. Even if you're new to professional data analysis, you'll find the step-by-step examples approachable. This book guides you in transforming raw data into valuable insights.

Mastering Redis

2016-05-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jeremy Nelson (Insight)

ELK NoSQL Redis data data-engineering nosql-databases

"Mastering Redis" is your comprehensive guide to truly leveraging the power of the Redis data structure server. This hands-on resource offers detailed insights into scaling data with Redis clusters, optimizing memory, scripting with Lua, and integrating Redis with other NoSQL technologies to create robust, efficient applications. What this Book will help me do Select and utilize the appropriate Redis data structure to solve specific use cases efficiently. Implement Lua scripts on Redis for complex workflows and custom functionality. Optimize Redis configurations to achieve efficient memory usage and server performance. Integrate Redis with other NoSQL databases, such as MongoDB and Elasticsearch, for enhanced capabilities. Set up Redis Clusters and use Redis Sentinel for distributed and highly available setups. Author(s) Vidyasagar N V and None Nelson bring a wealth of expertise in software development and distributed systems to this book. Vidyasagar has extensive hands-on experience with Redis, enabling him to provide practical insights and best practices. Nelson complements this with deep knowledge of database optimization, making their combined perspective invaluable for anyone diving deep into Redis. Who is it for? This book is aimed at software developers who have an understanding of Redis basics and want to advance their proficiency. It is also targeted at developers aiming to implement Redis in production efficiently. By reading this book, readers will deepen their Redis skills and learn how to integrate it with other technologies to develop scalable, high-performance applications.

MongoDB in Action, Second Edition

2016-03-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Douglas Garrett , Shaun Verch , Kyle Banker , Tim Hawkins , Peter Bakkum

Analytics Big Data Data Modelling NoSQL data data-engineering nosql-databases

GET MORE WITH MANNING An eBook copy of the previous edition, MongoDB in Action (First Edition), is included at no additional cost. It will be automatically added to your Manning Bookshelf within 24 hours of purchase. MongoDB in Action, Second Edition is a completely revised and updated version. It introduces MongoDB 3.0 and the document-oriented database model. This perfectly paced book gives you both the big picture you'll need as a developer and enough low-level detail to satisfy system engineers. About the Technology This document-oriented database was built for high availability, supports rich, dynamic schemas, and lets you easily distribute data across multiple servers. MongoDB 3.0 is flexible, scalable, and very fast, even with big data loads. About the Book MongoDB in Action, Second Edition is a completely revised and updated version. It introduces MongoDB 3.0 and the document-oriented database model. This perfectly paced book gives you both the big picture you'll need as a developer and enough low-level detail to satisfy system engineers. Lots of examples will help you develop confidence in the crucial area of data modeling. You'll also love the deep explanations of each feature, including replication, auto-sharding, and deployment. What's Inside Indexes, queries, and standard DB operations Aggregation and text searching Map-reduce for custom aggregations and reporting Deploying for scale and high availability Updated for Mongo 3.0 About the Reader Written for developers. No previous MongoDB or NoSQL experience is assumed. About the Authors After working at MongoDB, Kyle Banker is now at a startup. Peter Bakkum is a developer with MongoDB expertise. Shaun Verch has worked on the core server team at MongoDB. A Genentech engineer, Doug Garrett is one of the winners of the MongoDB Innovation Award for Analytics. A software architect, Tim Hawkins has led search engineering at Yahoo Europe. Technical Contributor: Wouter Thielen Technical Editor: Mihalis Tsoukalos Quotes A thorough manual for learning, practicing, and implementing MongoDB - Jeet Marwah, Acer Inc. A must-read to properly use MongoDB and model your data in the best possible way. - Hernan Garcia, Betterez Inc. Provides all the necessary details to get you jump-started with MongoDB. - Gregor Zurowski, Independent Software Development Consultant Awesome! MongoDB in a nutshell. - Hardy Ferentschik, Red Hat

MongoDB Cookbook - Second Edition - Second Edition

2016-01-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Amol Nayak , Cyrus Dasadia

Cloud Computing Data Management Hadoop Java NoSQL Python data data-engineering nosql-databases

Designed to help developers and administrators harness the full potential of MongoDB, this book provides clear instruction and practical guidance no matter your level. By exploring both fundamental aspects like installation and configuration, and advanced topics like using cloud services, this book serves as a comprehensive reference for anyone navigating the modern NoSQL database capabilities of MongoDB. What this Book will help me do Understand how to install and configure MongoDB for different environments, enabling efficient setup and operation. Master database administration skills, including monitoring and backup strategies, which are essential for stability and performance. Develop applications with MongoDB using Java and Python, allowing integration into modern tech stacks. Leverage advanced querying and indexing techniques, improving data retrieval and operational efficiency. Integrate MongoDB with cloud platforms and tools like Hadoop, enhancing scalability and expanded use cases. Author(s) None Dasadia and None Nayak are seasoned database professionals with extensive experience in MongoDB and NoSQL database systems. Their practical approach to technical writing focuses on real-world applications and providing solutions to complex challenges. With backgrounds in software development and data management, they ensure that readers have a hands-on learning experience. Their passion for spreading knowledge makes this book both instructional and engaging. Who is it for? This book is ideal for database administrators and software developers interested in adopting or expanding their knowledge of MongoDB. If you're a complete novice or someone with experience who seeks hands-on solutions and examples, this book offers value. It's particularly suited for professionals working with Java or Python, as examples focus on these programming languages. Whether you're enhancing your skills for personal projects or looking to implement MongoDB at work, this resource equips you with the know-how.

Practical MongoDB: Architecting, Developing, and Administering MongoDB

2015-12-23 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Shakuntala Gupta Edward , Navin Sabharwal

Data Modelling NoSQL data data-engineering nosql-databases

Practical Guide to MongoDB: Architecting, Developing, and Administering MongoDB begins with a short introduction to the basics of NoSQL databases and then introduces readers to MongoDB—the leading document based NoSQL database, acquainting them step-by-step with all aspects of MongoDB. Practical Guide to MongoDB covers the data model, underlying architecture, coding with Mongo Shell, and administrating the MongoDB platform, among other topics. The book also provides clear guidelines and practical examples for architecting, developing, and deploying applications using the MongoDB platform. Database developers, architects, and administrators will find useful information covering all aspects of the MongoDB platform and how to put it to use practically. The "one-size-fits-all" thinking regarding traditional RDBMSs has been challenged in the last few years by the emergence of diversified NoSQL databases. More than 120 NoSQL databases are now available in the market, and the leader by far is MongoDB. With so many companies opting for MongoDB as their NoSQL database of choice, there's a need for a practical how-to combined with expert advice for getting the most out of the software. Practical Guide to MongoDB provides readers with: A solid understanding of NoSQL databases An understanding of how to get started with MongoDB Methodical coverage of the architecture, development, and administration of MongoDB A plethora of "How to’s" enabling you to use the technology most efficiently to solve the problems you face

Python Business Intelligence Cookbook

2015-12-22 · O'Reilly Business Intelligence Books O'Reilly Amazon

book

by Robert Dempsey

BI Matplotlib NoSQL Pandas Python programming-languages software-development

Learn how to harness Python for business intelligence tasks with the 'Python Business Intelligence Cookbook.' This guide provides practical recipes that help transform raw data into actionable insights for better decision-making. From preparing and analyzing to visualizing data, you will acquire useful skills for implementing efficient BI systems within your organization. What this Book will help me do Master installing and setting up tools like Anaconda and MongoDB for BI work. Prepare datasets by cleaning, standardizing, and extracting essential data. Use Pandas and NoSQL databases to analyze data and extract insights. Build business dashboards utilizing visualization tools like Matplotlib. Gain the ability to create complete BI systems for various business needs. Author(s) None Dempsey has extensive experience in Python programming and data analysis. With a passion for teaching and applied business intelligence, Dempsey writes in a straightforward and approachable style, making complex topics accessible to readers. The recipes compiled in this book are built to be both practical and intuitive. Who is it for? This book is ideal for data analysts, managers, and professionals who have a basic understanding of Python and want to apply it to business intelligence tasks. It's also helpful for those familiar with BI concepts looking to enhance or modernize their workflows with Python-based tools. If you're seeking to gain actionable insights from data in your business, this book is for you.

The Definitive Guide to MongoDB: A complete guide to dealing with Big Data using MongoDB, Third Edition

2015-12-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by David Hows , Eelco Plugge , Peter Membrey , Tim Hawkins

Big Data JavaScript NoSQL Python data data-engineering nosql-databases

The Definitive Guide to MongoDB, Third Edition, is updated for MongoDB 3 and includes all of the latest MongoDB features, including the aggregation framework introduced in version 2.2 and hashed indexes in version 2.4. The Third Edition also now includes Node.js along with Python. MongoDB is the most popular of the "Big Data" NoSQL database technologies, and it's still growing. David Hows from 10gen, along with experienced MongoDB authors Peter Membrey and Eelco Plugge, provide their expertise and experience in teaching you everything you need to know to become a MongoDB pro.

Web Development with MongoDB and NodeJS - Second Edition

2015-10-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Bruno Joseph D'mello , Jason Krol , Mithun Satheesh

API AWS Azure Cloud Computing HTML JavaScript data data-engineering nosql-databases

Discover how to build a full-featured, interactive web application from scratch using Node.js and MongoDB in this comprehensive guide. You will learn to set up your development environment, create a web server with Express.js, and integrate MongoDB for data persistence. By the end of this book, you will have the knowledge and skills to develop and deploy robust web applications ready for the cloud. What this Book will help me do Set up a Node.js development environment and connect it to MongoDB. Develop a web server using Express.js and write integrated APIs. Implement dynamic HTML pages leveraging the Handlebars template engine. Build efficient and scalable data-driven features using Mongoose ODM. Deploy web applications seamlessly to cloud platforms like Heroku, AWS, or Azure. Author(s) This book was co-authored by experts None Satheesh, None Joseph D'mello, and Jason Krol, who bring years of experience in software development and expertise in modern web technologies. With a focus on practical application and best practices, the authors aim to empower readers to succeed in real-world development projects using the innovative Node.js and MongoDB stack. Who is it for? This book is tailored for developers who have a basic understanding of JavaScript and HTML and wish to advance their web development skills. If you are motivated to learn how to leverage Node.js and MongoDB for full-stack development or are curious about building and deploying complete web applications, this book is ideal for you. It addresses learners from early career to experienced developers looking to strengthen their skills in modern development stacks.

talk-data.com

Activity Trend

Top Events

Top Speakers

Stretching The Elastic Stack with Philipp Krenn - Episode 23

Database Refactoring Patterns with Pramod Sadalage - Episode 22

TimescaleDB: Fast And Scalable Timeseries with Ajay Kulkarni and Mike Freedman - Episode 18

Mastering MongoDB 3.x

MongoDB Administrator's Guide

Web Development with MongoDB and Node - Third Edition

Astronomer with Ry Walker - Episode 6

JSON at Work

Agile Data Science 2.0

Pro Tableau: A Step-by-Step Guide

Advanced R: Data Programming and the Cloud

Beginning Hibernate: For Hibernate 5

Practical Data Analysis - Second Edition

Mastering Redis

MongoDB in Action, Second Edition

MongoDB Cookbook - Second Edition - Second Edition

Practical MongoDB: Architecting, Developing, and Administering MongoDB

Python Business Intelligence Cookbook

The Definitive Guide to MongoDB: A complete guide to dealing with Big Data using MongoDB, Third Edition

Web Development with MongoDB and NodeJS - Second Edition