Data Streaming

Data Scientists at Work

2014-12-15 · O'Reilly Data Science Books O'Reilly Amazon

book

by Sebastian Gutierrez

AI/ML Big Data Cloud Computing Data Science DataViz Marketing data data-science data-science-as-a-profession

Data Scientists at Work is a collection of interviews with sixteen of the world's most influential and innovative data scientists from across the spectrum of this hot new profession. "Data scientist is the sexiest job in the 21st century," according to the Harvard Business Review. By 2018, the United States will experience a shortage of 190,000 skilled data scientists, according to a McKinsey report. Through incisive in-depth interviews, this book mines the what, how, and why of the practice of data science from the stories, ideas, shop talk, and forecasts of its preeminent practitioners across diverse industries: social network (Yann LeCun, Facebook); professional network (Daniel Tunkelang, LinkedIn); venture capital (Roger Ehrenberg, IA Ventures); enterprise cloud computing and neuroscience (Eric Jonas, formerly Salesforce.com); newspaper and media (Chris Wiggins, The New York Times); streaming television (Caitlin Smallwood, Netflix); music forecast (Victor Hu, Next Big Sound); strategic intelligence (Amy Heineike, Quid); environmental big data (Andre´ Karpis?ts?enkoEach of these data scientists shares how he or she tailors the torrent-taming techniques of big data, data visualization, search, and statistics to specific jobs by dint of ingenuity, imagination, patience, and passion. , Planet OS); geospatial marketing intelligence (Jonathan Lenaghan, PlaceIQ); advertising (Claudia Perlich, Dstillery); fashion e-commerce (Anna Smith, Rent the Runway); specialty retail (Erin Shellman, Nordstrom); email marketing (John Foreman, MailChimp); predictive sales intelligence (Kira Radinsky, SalesPredict); and humanitarian nonprofit (Jake Porway, DataKind). The book features a stimulating foreword by Google's Director of Research, Peter Norvig. Data Scientists at Work parts the curtain on the interviewees’ earliest data projects, how they became data scientists, their discoveries and surprises in working with data, their thoughts on the past, present, and future of the profession, their experiences of team collaboration within their organizations, and the insights they have gained as they get their hands dirty refining mountains of raw data into objects of commercial, scientific, and educational value for their organizations and clients.

Using Flume

2014-09-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Hari Shreedharan

API ELK GitHub Hadoop Apache HBase HDFS data data-engineering log-data

How can you get your data from frontend servers to Hadoop in near real time? With this complete reference guide, you’ll learn Flume’s rich set of features for collecting, aggregating, and writing large amounts of streaming data to the Hadoop Distributed File System (HDFS), Apache HBase, SolrCloud, Elastic Search, and other systems. Using Flume shows operations engineers how to configure, deploy, and monitor a Flume cluster, and teaches developers how to write Flume plugins and custom components for their specific use-cases. You’ll learn about Flume’s design and implementation, as well as various features that make it highly scalable, flexible, and reliable. Code examples and exercises are available on GitHub. Learn how Flume provides a steady rate of flow by acting as a buffer between data producers and consumers Dive into key Flume components, including sources that accept data and sinks that write and deliver it Write custom plugins to customize the way Flume receives, modifies, formats, and writes data Explore APIs for sending data to Flume agents from your own applications Plan and deploy Flume in a scalable and flexible way—and monitor your cluster once it’s running

Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data

2014-07-21 · O'Reilly Data Visualization Books O'Reilly Amazon

book

by Byron Ellis

Analytics Big Data Data Analytics DWH data data-engineering real-time-analytics streaming-messaging

Construct a robust end-to-end solution for analyzing and visualizing streaming data Real-time analytics is the hottest topic in data analytics today. In Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data, expert Byron Ellis teaches data analysts technologies to build an effective real-time analytics platform. This platform can then be used to make sense of the constantly changing data that is beginning to outpace traditional batch-based analysis platforms. The author is among a very few leading experts in the field. He has a prestigious background in research, development, analytics, real-time visualization, and Big Data streaming and is uniquely qualified to help you explore this revolutionary field. Moving from a description of the overall analytic architecture of real-time analytics to using specific tools to obtain targeted results, Real-Time Analytics leverages open source and modern commercial tools to construct robust, efficient systems that can provide real-time analysis in a cost-effective manner. The book includes: A deep discussion of streaming data systems and architectures Instructions for analyzing, storing, and delivering streaming data Tips on aggregating data and working with sets Information on data warehousing options and techniques Real-Time Analytics includes in-depth case studies for website analytics, Big Data, visualizing streaming and mobile data, and mining and visualizing operational data flows. The book's "recipe" layout lets readers quickly learn and implement different techniques. All of the code examples presented in the book, along with their related data sets, are available on the companion website.

Google BigQuery Analytics

2014-06-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Siddartha Naidu , Jordan Tigani (MotherDuck)

Analytics API BigQuery Hadoop Python Tableau data data-engineering google-bigquery

How to effectively use BigQuery, avoid common mistakes, and execute sophisticated queries against large datasets Google BigQuery Analytics is the perfect guide for business and data analysts who want the latest tips on running complex queries and writing code to communicate with the BigQuery API. The book uses real-world examples to demonstrate current best practices and techniques, and also explains and demonstrates streaming ingestion, transformation via Hadoop in Google Compute engine, AppEngine datastore integration, and using GViz with Tableau to generate charts of query results. In addition to the mechanics of BigQuery, the book also covers the architecture of the underlying Dremel query engine, providing a thorough understanding that leads to better query results. Features a companion website that includes all code and data sets from the book Uses real-world examples to explain everything analysts need to know to effectively use BigQuery Includes web application examples coded in Python

IBM InfoSphere Streams: Accelerating Deployments with Analytic Accelerators

2014-02-07 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tetsuya Shimada , Robert Uleman , Oliver Brandt , Roger Rea , Bharath Devaraju , Peter Nicholls , Ankit Pasricha , John Thorson , Kevin Foster , Chris Howard , Chuck Ballard , Daniel Farrell , Sandra Tucker , Norbert Schulz

AI/ML Analytics Big Data Hadoop HDFS IBM NLP SPSS data data-engineering infosphere

This IBM® Redbooks® publication describes visual development, visualization, adapters, analytics, and accelerators for IBM InfoSphere® Streams (V3), a key component of the IBM Big Data platform. Streams was designed to analyze data in motion, and can perform analysis on incredibly high volumes with high velocity, using a wide variety of analytic functions and data types. The Visual Development environment extends Streams Studio with drag-and-drop development, provides round tripping with existing text editors, and is ideal for rapid prototyping. Adapters facilitate getting data in and out of Streams, and V3 supports WebSphere MQ, Apache Hadoop Distributed File System, and IBM InfoSphere DataStage. Significant analytics include the native Streams Processing Language, SPSS Modeler analytics, Complex Event Processing, TimeSeries Toolkit for machine learning and predictive analytics, Geospatial Toolkit for location-based applications, and Annotation Query Language for natural language processing applications. Accelerators for Social Media Analysis and Telecommunications Event Data Analysis sample programs can be modified to build production level applications. Want to learn how to analyze high volumes of streaming data or implement systems requiring high performance across nodes in a cluster? Then this book is for you. Please note that the additional material referenced in the text is not available from IBM.

Joe Celko’s Complete Guide to NoSQL

2013-10-07 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Joe Celko

Big Data NoSQL SQL data data-engineering nosql-databases

Joe Celko's Complete Guide to NoSQL provides a complete overview of non-relational technologies so that you can become more nimble to meet the needs of your organization. As data continues to explode and grow more complex, SQL is becoming less useful for querying data and extracting meaning. In this new world of bigger and faster data, you will need to leverage non-relational technologies to get the most out of the information you have. Learn where, when, and why the benefits of NoSQL outweigh those of SQL with Joe Celko's Complete Guide to NoSQL. This book covers three areas that make today's new data different from the data of the past: velocity, volume and variety. When information is changing faster than you can collect and query it, it simply cannot be treated the same as static data. Celko will help you understand velocity, to equip you with the tools to drink from a fire hose. Old storage and access models do not work for big data. Celko will help you understand volume, as well as different ways to store and access data such as petabytes and exabytes. Not all data can fit into a relational model, including genetic data, semantic data, and data generated by social networks. Celko will help you understand variety, as well as the alternative storage, query, and management frameworks needed by certain kinds of data. Gain a complete understanding of the situations in which SQL has more drawbacks than benefits so that you can better determine when to utilize NoSQL technologies for maximum benefit Recognize the pros and cons of columnar, streaming, and graph databases Make the transition to NoSQL with the expert guidance of best-selling SQL expert Joe Celko

Instant PostgreSQL Backup and Restore How-to

2013-03-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Shaun Thomas

data data-engineering postgresql relational-databases

Are you tasked with managing and protecting your PostgreSQL databases? "Instant PostgreSQL Backup and Restore How-to" provides practical, step-by-step guidance for backing up and restoring both simple and complex PostgreSQL databases safely and efficiently. You'll learn essential skills to ensure your critical data is always secure and available. What this Book will help me do Master the process of backing up and restoring PostgreSQL databases effectively. Learn to target specific data for backup with partial dumps for higher flexibility. Utilize advanced compression techniques to optimize backup time and storage. Implement streaming replication for up-to-date standby servers. Apply file system snapshot techniques to ensure consistent online binary backups. Author(s) The authors of this book are experienced database administrators and PostgreSQL experts. They bring years of hands-on expertise in safeguarding and managing enterprise-level databases. Known for their engaging teaching style, they focus on delivering clear instructions and actionable insights to enable all database professionals to succeed with PostgreSQL. Who is it for? This book is designed for database administrators and IT professionals responsible for the durability, reliability, and recovery of data housed in PostgreSQL systems. It is well-suited for professionals ranging from beginners looking to understand PostgreSQL backup basics to experienced admins seeking to refine advanced restoration techniques. Readers should possess a basic familiarity with database concepts but do not need prior experience with PostgreSQL backup procedures.

Getting Started with Storm

2012-08-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Gabriel Eisbruch , Dario Simonassi , Jonathan Leibiusky

Analytics Big Data Java JavaScript Python Redis data data-engineering storm streaming-messaging

Even as big data is turning the world upside down, the next phase of the revolution is already taking shape: real-time data analysis. This hands-on guide introduces you to Storm, a distributed, JVM-based system for processing streaming data. Through simple tutorials, sample Java code, and a complete real-world scenario, you’ll learn how to build fast, fault-tolerant solutions that process results as soon as the data arrives. Discover how easy it is to set up Storm clusters for solving various problems, including continuous data computation, distributed remote procedure calls, and data stream processing. Learn how to program Storm components: spouts for data input and bolts for data transformation Discover how data is exchanged between spouts and bolts in a Storm topology Make spouts fault-tolerant with several commonly used design strategies Explore bolts—their life cycle, strategies for design, and ways to implement them Scale your solution by defining each component’s level of parallelism Study a real-time web analytics system built with Node.js, a Redis server, and a Storm topology Write spouts and bolts with non-JVM languages such as Python, Ruby, and Javascript

Programming Microsoft® SQL Server® 2012

2012-07-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Leonard Lobel and Andrew Brust

Azure BI C#/.NET Cloud Computing Microsoft SQL SQL Server data data-engineering microsoft-sql-server relational-databases

Your essential guide to key programming features in Microsoft SQL Server 2012 Take your database programming skills to a new level—and build customized applications using the developer tools introduced with SQL Server 2012. This hands-on reference shows you how to design, test, and deploy SQL Server databases through tutorials, practical examples, and code samples. If you’re an experienced SQL Server developer, this book is a must-read for learning how to design and build effective SQL Server 2012 applications. Discover how to: Build and deploy databases using the SQL Server Data Tools IDE Query and manipulate complex data with powerful Transact-SQL enhancements Integrate non-relational features, including native file streaming and geospatial data types Consume data with Microsoft ADO.NET, LINQ, and Entity Framework Deliver data using Windows Communication Foundation (WCF) Data Services and WCF RIA Services Move your database to the cloud with Windows Azure SQL Database Develop Windows Phone cloud applications using SQL Data Sync Use SQL Server BI components, including xVelocity in-memory technologies

IBM InfoSphere Streams: Assembling Continuous Insight in the Information Revolution

2011-10-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Vitali N. Zoubov , Roger Rea , Deepak Rajan , Bugra Gedik , Michael P. Koranda , Kevin Foster , Chuck Ballard , Mike Spicer , Senthil Nathan , Andy Frenkiel , Brian Williams

Analytics Big Data Data Analytics HTML IBM data data-engineering infosphere

In this IBM® Redbooks® publication, we discuss and describe the positioning, functions, capabilities, and advanced programming techniques for IBM InfoSphere™ Streams (V2), a new paradigm and key component of IBM Big Data platform. Data has traditionally been stored in files or databases, and then analyzed by queries and applications. With stream computing, analysis is performed moment by moment as the data is in motion. In fact, the data might never be stored (perhaps only the analytic results). The ability to analyze data in motion is called real-time analytic processing (RTAP). IBM InfoSphere Streams takes a fundamentally different approach to Big Data analytics and differentiates itself with its distributed runtime platform, programming model, and tools for developing and debugging analytic applications that have a high volume and variety of data types. Using in-memory techniques and analyzing record by record enables high velocity. Volume, variety and velocity are the key attributes of Big Data. The data streams that are consumable by IBM InfoSphere Streams can originate from sensors, cameras, news feeds, stock tickers, and a variety of other sources, including traditional databases. It provides an execution platform and services for applications that ingest, filter, analyze, and correlate potentially massive volumes of continuous data streams. This book is intended for professionals that require an understanding of how to process high volumes of streaming data or need information about how to implement systems to satisfy those requirements. See: http://www.redbooks.ibm.com/abstracts/sg247865.html for the IBM InfoSphere Streams (V1) release.

Oracle 10g Developing Media Rich Applications

2011-04-08 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Lynne Dunckley , Larry Guros

Java Oracle Cyber Security data data-engineering oracle-11g oracle-database-solutions

Oracle 10g Developing Media Rich Applications is focused squarely on database administrators and programmers as the foundation of multimedia database applications. With the release of Oracle8 Database in 1997, Oracle became the first commercial database with integrated multimedia technology for application developers. Since that time, Oracle has enhanced and extended these features to include native support for image, audio, video and streaming media storage; indexing, retrieval and processing in the Oracle Database, Application Server; and development tools. Databases are not only words and numbers for accountants, but they also should utilize a full range of media to satisfy customer needs, from race car engineers, to manufacturing processes to security. The full range of audio, video and integration of media into databases is mission critical to these applications. This book details the most recent features in Oracle’s multimedia technology including those of the Oracle10gR2 Database and the Oracle9i Application Server. The technology covered includes: object relational media storage and services within the database, middle tier application development interfaces, wireless delivery mechanisms, and Java-based tools. * Gives broad coverage to integration of multimedia features such as audio and instrumentation video, from race cars to analyze performance, to voice and picture recognition for security data bases. As well as full multimedia for presentations * Includes field tested examples in enterprise environments * Provides coverage in a thorough and clear fashion developed in a London University Professional Course

21 Recipes for Mining Twitter

2011-01-31 · O'Reilly Data Science Books O'Reilly Amazon

book

by Matthew Russell (Digital Reasoning)

API Python data data-science data-science-tasks web-scraping

Millions of public Twitter streams harbor a wealth of data, and once you mine them, you can gain some valuable insights. This short and concise book offers a collection of recipes to help you extract nuggets of Twitter information using easy-to-learn Python tools. Each recipe offers a discussion of how and why the solution works, so you can quickly adapt it to fit your particular needs. The recipes include techniques to: Use OAuth to access Twitter data Create and analyze graphs of retweet relationships Use the streaming API to harvest tweets in realtime Harvest and analyze friends and followers Discover friendship cliques Summarize webpages from short URLs This book is a perfect companion to O’Reilly's Mining the Social Web.

Hadoop in Action

2010-11-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Chuck Lam

API Big Data Hadoop Hive Java NoSQL data data-engineering

Hadoop in Action introduces the subject and teaches you how to write programs in the MapReduce style. It starts with a few easy examples and then moves quickly to show Hadoop use in more complex data analysis tasks. Included are best practices and design patterns of MapReduce programming. About the Technology Big data can be difficult to handle using traditional databases. Apache Hadoop is a NoSQL applications framework that runs on distributed clusters. This lets it scale to huge datasets. If you need analytic information from your data, Hadoop's the way to go. About the Book What's Inside Introduction to MapReduce Examples illustrating ideas in practice Hadoop's Streaming API Other related tools, like Pig and Hive About the Reader This book requires basic Java skills. Knowing basic statistical concepts can help with the more advanced examples. About the Author Chuck Lam is a Senior Engineer at RockYou! He has a PhD in pattern recognition from Stanford University. Quotes A guide for beginners, a source of insight for advanced users. - Philipp K. Janert, Principal Value, LLC A nice mix of the what, why, and how of Hadoop. - Paul Stusiak, Falcon Technologies Corp. Demystifies Hadoop. A great resource! - Rick Wagner, Acxiom Corp. Covers it all! Plus, gives you sweet extras no one else does. - John S. Griffin, Overstock.com An excellent introduction to Hadoop and MapReduce. - Kenneth DeLong, BabyCenter, LLC

Knowledge Discovery from Data Streams

2010-05-25 · O'Reilly Data Science Books O'Reilly Amazon

book

by Joao Gama

data data-science data-science-tasks exploratory-data-analysis

Exploring how to extract knowledge structures from evolving and time-changing data, this book presents a coherent overview of state-of-the-art research in learning from data streams. It covers the fundamentals that are imperative to understanding data streams and describes important applications, such as TCP/IP traffic, GPS data, sensor networks, and customer click streams. It also explores advanced areas, such as ubiquitous data stream mining; addresses several challenges of data mining in the future, when stream mining will be at the core of many applications; and includes pseudo-code of more than 30 streaming-like algorithms.

Programming Microsoft® SQL Server™ 2008

2008-10-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Leonard Lobel, Andrew Brust, and Stephen Forte

Microsoft SQL XML data data-engineering microsoft-sql-server relational-databases

Extend your programming skills with a comprehensive study of the key features of SQL Server 2008. Delve into the new core capabilities, get practical guidance from expert developers, and put their code samples to work. This is a must-read for Microsoft .NET and SQL Server developers who work with data access—at the database, business logic, or presentation levels. Discover how to: Query complex data with powerful Transact-SQL enhancements Use new, non-relational features: hierarchical tables, native file streaming, and geospatial capabilities Exploit XML inside the database to design XML-aware applications Consume and deliver your data using Microsoft LINQ, Entity Framework, and data binding Implement database-level encryption and server auditing Build and maintain data warehouses Use Microsoft Excel to build front ends for OLAP cubes, and MDX to query them Integrate data mining into applications quickly and effectively. Get code samples on the Web.

Analyze customer sentiment in real time

· Google Cloud Next '25

demo

AI/ML BigQuery Cloud Computing GCP Looker product-alloydb-for-postgresql product-bigquery product-datastream product-vertex-ai

Build fully integrated streaming pipelines on Google Cloud and learn how to leverage AlloyDB, Datastream, BigQuery, Looker, and Vertex AI for real-time data analysis.

Kayak to AI: Take a virtual kayaking adventure, guided by Gemini

· Google Cloud Next '25

demo

AI/ML API LLM product-cloud-run product-gemini-20 product-pubsub product-vertex-ai-search

Kayak through Seattle’s lakes with AI! Discover how Gemini can take you where you want to go, through rapid prototyping with its function calling, streaming, and multimodal APIs. Learn how you can build your own immersive AI experiences faster than ever.

On-demand: Get Real with Real-Time Intelligence to Transform Data into Action

· Microsoft Ignite 2025

talk

Analytics

Harness the power of real-time analytics and digital twins to achieve critical operational tasks. In this lab, you'll learn how to transform physical systems into dynamic digital replicas, enhancing simulations and optimizing operations. Discover how to build end-to-end solutions for event-driven scenarios, streaming data, and data logs. These practical steps will empower you to drive smart decision-making and foster innovation within your organization.

Talk #1: 🇬🇧 Thinking Concurrently: Understanding React’s New Mental Model

· React Paris #011

talk

by ariel shulman (Factify)

React

The most important addition in React 18 is something we hope you never have to think about: concurrency. — React 18 docs Well, in this talk — we are going to think about it! React’s new mental model is not easy to wrap our heads around, and that’s where this talk comes in. We’ll explore how concurrency works in React from the ground up, what problems it solves, and how SSR and streaming components fit into the picture. By the end of the talk, you’ll understand how React’s concurrent model ties into UX principles — and how to make the most of it in your apps.

talk-data.com

Activity Trend

Top Events

Top Speakers

Data Scientists at Work

Using Flume

Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data

Google BigQuery Analytics

IBM InfoSphere Streams: Accelerating Deployments with Analytic Accelerators

Joe Celko’s Complete Guide to NoSQL

Instant PostgreSQL Backup and Restore How-to

Getting Started with Storm

Programming Microsoft® SQL Server® 2012

IBM InfoSphere Streams: Assembling Continuous Insight in the Information Revolution

Oracle 10g Developing Media Rich Applications

21 Recipes for Mining Twitter

Hadoop in Action

Knowledge Discovery from Data Streams

Programming Microsoft® SQL Server™ 2008

Analyze customer sentiment in real time

Kayak to AI: Take a virtual kayaking adventure, guided by Gemini

On-demand: Get Real with Real-Time Intelligence to Transform Data into Action

Talk #1: 🇬🇧 Thinking Concurrently: Understanding React’s New Mental Model