Topic

Kafka

Apache Kafka

distributed_streaming message_queue event_streaming

Activities

3

tagged

Activity Trend

20 peak/qtr

2020-Q1 2026-Q2

Top Events

O'Reilly Data Engineering Books 60 Data Engineering Podcast 42 Databricks DATA + AI Summit 2023 17 Data + AI Summit 2025 16 DATA MINER Big Data Europe Conference 2020 15 Google Cloud Next '25 4 Big Data LDN 2025 4 DataTalks.Club 3 IN-PERSON! Apache Kafka® Meetup Septembre 3 AWS re:Invent 2024 3 O'Reilly Data Science Books 3 Big Data LDN 2024 3

Top Speakers

Tobias Macey 42 Tom Scott (Streambased) 5 Olena Kutsenko (Confluent) 4 Scott Corrigan (meshIQ) 3 Scott Haines (Databricks) 3 Raúl Estrada 3 Kir Titievsky (Google) 3 Mehreen Tahir (New Relic) 2 Gerard Maas 2 Bill Bejeck 2 Ajay Kulkarni (Timescale) 2 Mike Freedman (Timescale) 2

Activities

Showing filtered results

All Video Podcast Book

Filtering by: O'Reilly Data Science Books ×

Data Science on AWS

2021-04-07 · O'Reilly Data Science Books O'Reilly Amazon

book

by Antje Barth , Chris Fregly

AI/ML Analytics AWS Kinesis Cloud Computing Data Engineering Data Science NLP Amazon SageMaker Cyber Security Data Streaming data +1 more

With this practical book, AI and machine learning practitioners will learn how to successfully build and deploy data science projects on Amazon Web Services. The Amazon AI and machine learning stack unifies data science, data engineering, and application development to help level up your skills. This guide shows you how to build and run pipelines in the cloud, then integrate the results into applications in minutes instead of days. Throughout the book, authors Chris Fregly and Antje Barth demonstrate how to reduce cost and improve performance. Apply the Amazon AI and ML stack to real-world use cases for natural language processing, computer vision, fraud detection, conversational devices, and more Use automated machine learning to implement a specific subset of use cases with SageMaker Autopilot Dive deep into the complete model development lifecycle for a BERT-based NLP use case including data ingestion, analysis, model training, and deployment Tie everything together into a repeatable machine learning operations pipeline Explore real-time ML, anomaly detection, and streaming analytics on data streams with Amazon Kinesis and Managed Streaming for Apache Kafka Learn security best practices for data science projects and workflows including identity and access management, authentication, authorization, and more

Learning Apache Drill

2018-11-02 · O'Reilly Data Science Books O'Reilly Amazon

book

by Paul Rogers , Charles Givre

AI/ML Cloud Computing CSV Hadoop Apache HBase HDFS Hive JSON MongoDB Parquet RDBMS S3 +6 more

Get up to speed with Apache Drill, an extensible distributed SQL query engine that reads massive datasets in many popular file formats such as Parquet, JSON, and CSV. Drill reads data in HDFS or in cloud-native storage such as S3 and works with Hive metastores along with distributed databases such as HBase, MongoDB, and relational databases. Drill works everywhere: on your laptop or in your largest cluster. In this practical book, Drill committers Charles Givre and Paul Rogers show analysts and data scientists how to query and analyze raw data using this powerful tool. Data scientists today spend about 80% of their time just gathering and cleaning data. With this book, you’ll learn how Drill helps you analyze data more effectively to drive down time to insight. Use Drill to clean, prepare, and summarize delimited data for further analysis Query file types including logfiles, Parquet, JSON, and other complex formats Query Hadoop, relational databases, MongoDB, and Kafka with standard SQL Connect to Drill programmatically using a variety of languages Use Drill even with challenging or ambiguous file formats Perform sophisticated analysis by extending Drill’s functionality with user-defined functions Facilitate data analysis for network security, image metadata, and machine learning

Agile Data Science 2.0

2017-06-13 · O'Reilly Data Science Books O'Reilly Amazon

book

by Russell Jurney

Agile/Scrum Airflow Analytics Data Science ELK JavaScript MongoDB Python Scikit-learn Spark data data-science

Data science teams looking to turn research into useful analytics applications require not only the right tools, but also the right approach if they’re to succeed. With the revised second edition of this hands-on guide, up-and-coming data scientists will learn how to use the Agile Data Science development methodology to build data applications with Python, Apache Spark, Kafka, and other tools. Author Russell Jurney demonstrates how to compose a data platform for building, deploying, and refining analytics applications with Apache Kafka, MongoDB, ElasticSearch, d3.js, scikit-learn, and Apache Airflow. You’ll learn an iterative approach that lets you quickly change the kind of analysis you’re doing, depending on what the data is telling you. Publish data science work as a web application, and affect meaningful change in your organization. Build value from your data in a series of agile sprints, using the data-value pyramid Extract features for statistical models from a single dataset Visualize data with charts, and expose different aspects through interactive reports Use historical data to predict the future via classification and regression Translate predictions into actions Get feedback from users after each sprint to keep your project on track