BigQuery

Google BigQuery Analytics

2014-06-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Siddartha Naidu , Jordan Tigani (MotherDuck)

Analytics API Hadoop Python Data Streaming Tableau data data-engineering google-bigquery

How to effectively use BigQuery, avoid common mistakes, and execute sophisticated queries against large datasets Google BigQuery Analytics is the perfect guide for business and data analysts who want the latest tips on running complex queries and writing code to communicate with the BigQuery API. The book uses real-world examples to demonstrate current best practices and techniques, and also explains and demonstrates streaming ingestion, transformation via Hadoop in Google Compute engine, AppEngine datastore integration, and using GViz with Tableau to generate charts of query results. In addition to the mechanics of BigQuery, the book also covers the architecture of the underlying Dremel query engine, providing a thorough understanding that leads to better query results. Features a companion website that includes all code and data sets from the book Uses real-world examples to explain everything analysts need to know to effectively use BigQuery Includes web application examples coded in Python

Data Just Right: Introduction to Large-Scale Data & Analytics

2013-12-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Michael Manoochehri

AI/ML Analytics Big Data Cloud Computing Dashboard Hadoop Hive Java NoSQL Pandas Python Redis +2 more

Making Big Data Work: Real-World Use Cases and Examples, Practical Code, Detailed Solutions Large-scale data analysis is now vitally important to virtually every business. Mobile and social technologies are generating massive datasets; distributed cloud computing offers the resources to store and analyze them; and professionals have radically new technologies at their command, including NoSQL databases. Until now, however, most books on “Big Data” have been little more than business polemics or product catalogs. is different: It’s a completely practical and indispensable guide for every Big Data decision-maker, implementer, and strategist. Data Just Right Michael Manoochehri, a former Google engineer and data hacker, writes for professionals who need practical solutions that can be implemented with limited resources and time. Drawing on his extensive experience, he helps you focus on building applications, rather than infrastructure, because that’s where you can derive the most value. Manoochehri shows how to address each of today’s key Big Data use cases in a cost-effective way by combining technologies in hybrid solutions. You’ll find expert approaches to managing massive datasets, visualizing data, building data pipelines and dashboards, choosing tools for statistical analysis, and more. Throughout, the author demonstrates techniques using many of today’s leading data analysis tools, including Hadoop, Hive, Shark, R, Apache Pig, Mahout, and Google BigQuery. Coverage includes Mastering the four guiding principles of Big Data success—and avoiding common pitfalls Emphasizing collaboration and avoiding problems with siloed data Hosting and sharing multi-terabyte datasets efficiently and economically “Building for infinity” to support rapid growth Developing a NoSQL Web app with Redis to collect crowd-sourced data Running distributed queries over massive datasets with Hadoop, Hive, and Shark Building a data dashboard with Google BigQuery Exploring large datasets with advanced visualization Implementing efficient pipelines for transforming immense amounts of data Automating complex processing with Apache Pig and the Cascading Java library Applying machine learning to classify, recommend, and predict incoming information Using R to perform statistical analysis on massive datasets Building highly efficient analytics workflows with Python and Pandas Establishing sensible purchasing strategies: when to build, buy, or outsource Previewing emerging trends and convergences in scalable data technologies and the evolving role of the Data Scientist

Accelerate data insights with AI data agents

· Google Cloud Next '25

demo

AI/ML product-alloydb-for-postgresql product-bigquery product-gemini-in-bigquery

Discover how semiautonomous agents reduce manual effort and accelerate business outcomes with AI-powered experiences for data discovery, data preparation, analysis, and insight generation.

Accelerate migration to BigQuery

· Google Cloud Next '25

demo

Teradata product-bigquery-data-transfer-service product-bigquery-migration-service product-data-validation-tool-dvt

Explore how the open source Data Migration Tool can help with end-to-end automation for Teradata to BigQuery migration, including schema migration, data migration, and schema & data validation.

Analyze customer sentiment in real time

· Google Cloud Next '25

demo

AI/ML Cloud Computing GCP Looker Data Streaming product-alloydb-for-postgresql product-bigquery product-datastream product-vertex-ai

Build fully integrated streaming pipelines on Google Cloud and learn how to leverage AlloyDB, Datastream, BigQuery, Looker, and Vertex AI for real-time data analysis.

Automate data pipelines with AI agents in BigQuery

· Google Cloud Next '25

session

by Michael Kilberry (Google Cloud) , Matt Iames (Deloitte) , Terence Yim (Google Cloud)

AI/ML Data Engineering

Routine tasks such as data wrangling and pipeline maintenance often inhibit data teams from doing higher-value analysis and insights-led decision-making. This session showcases how intelligent data agents in BigQuery can help automate complex data engineering tasks. You’ll learn how to use natural language prompts to streamline data engineering tasks from ingestion and transformation, such as data cleaning, formatting, and loading results into BigQuery tables that accelerate the time to build and validate data pipelines.

Automate data pipelines with AI agents in BigQuery (Recap)

· Google Cloud Next '25

session

by Michael Kilberry (Google Cloud) , Matt Iames (Deloitte) , Terence Yim (Google Cloud)

AI/ML Data Engineering

Routine tasks such as data wrangling and pipeline maintenance often inhibit data teams from doing higher-value analysis and insights-led decision-making. This session showcases how intelligent data agents in BigQuery can help automate complex data engineering tasks. You’ll learn how to use natural language prompts to streamline data engineering tasks from ingestion and transformation, such as data cleaning, formatting, and loading results into BigQuery tables that accelerate the time to build and validate data pipelines.

Automate data workflows with Gemini in BigQuery

· Google Cloud Next '25

demo

AI/ML Analytics Data Analytics LLM product-bigquery product-dataplex product-gemini-in-bigquery

Streamline data analytics workflows with AI assistance. Enhance data team productivity and reduce costs.

Behavioral Data: Outpace the Competition & Ensure Customer Satisfaction

· Google Cloud Next '24

session

by erin mcnamara (Fullstory)

Cloud Computing GCP

Businesses need to predict what customers want and create personalized experiences to gain a competitive advantage and drive revenue. They need to deliver customized, tailored interactions that increase customer acquisition, improve loyalty and increase satisfaction. Join Fullstory’s Head of Data Products to learn how Data + Engineering teams can supercharge tools like DialogFlow and BigQuery with unprecedented behavioral data to accurately forecast and create experiences that outpace the competition and keep customers coming back for more. By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Big Data is Dead: Long Live Hot Data 🔥

· Small Data SF 2024 Watch

video

Analytics Big Data Cloud Computing Data Analytics Data Engineering DuckDB DWH Hadoop Motherduck Snowflake

Over the last decade, Big Data was everywhere. Let's set the record straight on what is and isn't Big Data. We have been consumed by a conversation about data volumes when we should focus more on the immediate task at hand: Simplifying our work.

Some of us may have Big Data, but our quest to derive insights from it is measured in small slices of work that fit on your laptop or in your hand. Easy data is here— let's make the most of it.

📓 Resources Big Data is Dead: https://motherduck.com/blog/big-data-is-dead/ Small Data Manifesto: https://motherduck.com/blog/small-data-manifesto/ Small Data SF: https://www.smalldatasf.com/

➡️ Follow Us LinkedIn: https://linkedin.com/company/motherduck X/Twitter : https://twitter.com/motherduck Blog: https://motherduck.com/blog/

Explore the "Small Data" movement, a counter-narrative to the prevailing big data conference hype. This talk challenges the assumption that data scale is the most important feature of every workload, defining big data as any dataset too large for a single machine. We'll unpack why this distinction is crucial for modern data engineering and analytics, setting the stage for a new perspective on data architecture.

Delve into the history of big data systems, starting with the non-linear hardware costs that plagued early data practitioners. Discover how Google's foundational papers on GFS, MapReduce, and Bigtable led to the creation of Hadoop, fundamentally changing how we scale data processing. We'll break down the "big data tax"—the inherent latency and system complexity overhead required for distributed systems to function, a critical concept for anyone evaluating data platforms.

Learn about the architectural cornerstone of the modern cloud data warehouse: the separation of storage and compute. This design, popularized by systems like Snowflake and Google BigQuery, allows storage to scale almost infinitely while compute resources are provisioned on-demand. Understand how this model paved the way for massive data lakes but also introduced new complexities and cost considerations that are often overlooked.

We examine the cracks appearing in the big data paradigm, especially for OLAP workloads. While systems like Snowflake are still dominant, the rise of powerful alternatives like DuckDB signals a shift. We reveal the hidden costs of big data analytics, exemplified by a petabyte-scale query costing nearly $6,000, and argue that for most use cases, it's too expensive to run computations over massive datasets.

The key to efficient data processing isn't your total data size, but the size of your "hot data" or working set. This talk argues that the revenge of the single node is here, as modern hardware can often handle the actual data queried without the overhead of the big data tax. This is a crucial optimization technique for reducing cost and improving performance in any data warehouse.

Discover the core principles for designing systems in a post-big data world. We'll show that since only 1 in 500 users run true big data queries, prioritizing simplicity over premature scaling is key. For low latency, process data close to the user with tools like DuckDB and SQLite. This local-first approach offers a compelling alternative to cloud-centric models, enabling faster, more cost-effective, and innovative data architectures.

Build customer 360 recommendation applications using BigQuery graph

· Google Cloud Next '25

demo

product-bigquery product-bigquery-graph product-googlesql

Learn to leverage the newest graph query capabilities to find hidden patterns, and build a Customer 360 graph for more personalized product recommendations.

Core Infrastructure l

· Google Cloud Next '24

session

by Eoin Carroll (Google Cloud)

Cloud Computing GCP SQL Virtual Machine

In this game you will create and manage permissions for Google Cloud resources, run structured queries on BigQuery and Cloud SQL, create several VPC networks and VM instances and test connectivity across networks, and monitor a Google Compute Engine VM instance with Cloud Monitoring.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Create a unified data platform with BigQuery

· Google Cloud Next '25

demo

AI/ML product-biglake product-bigquery product-dataplex product-dataproc

Build an open, secure, and integrated AI data platform. Manage the end-to-end data life cycle with built-in governance.

Create ML models with SQL in BigQuery

· Google Cloud Next '25

demo

AI/ML GenAI SQL product-bigquery product-bigquery-ml product-gemini-models product-vertex-ai

Build and run predictive models using SQL in BigQuery. Leverage the latest models for generative AI tasks.

Drive decisions with real-time vehicle intelligence

· Google Cloud Next '25

demo

LLM product-bigquery product-cloud-run product-gemini-15-pro product-gemini-20-flash

Transform vehicle operational data into actionable intelligence. Reduce maintenance downtime, optimize fleet operations, and make data-driven decisions through interactive diagnostics using Gemini models and BigQuery.

Etsy: Connecting shoppers with special items with Google AI

· Google Cloud Next '25

demo

AI/ML LLM product-bigquery product-cloud-bigtable product-cloud-run product-dataflow-cloud-storage product-tensorflow product-vertex-ai

Step into Etsy’s "Museum of Extraordinary Objects" where Gemini on Vertex AI curates 100M+ unique goods from makers around the world. Discover how Google AI connects Etsy's extraordinary items with the right buyers—transforming the art of finding what you love, faster.

Explore data with Gemini in BigQuery

· Google Cloud Next '25

session

Cloud Computing GCP LLM SQL

In this hands-on lab, you'll explore data with BigQuery's intuitive table explorer and data insight features, enabling you to gain valuable insights without writing SQL queries from scratch. Learn how to generate key insights from order item data, query location tables, and interact with your data seamlessly. By the end, you’ll be equipped to navigate complex datasets and uncover actionable insights quickly and efficiently.

If you register for a Learning Center lab, please ensure that you sign up for a Google Cloud Skills Boost account for both your work domain and personal email address. You will need to authenticate your account as well (be sure to check your spam folder!). This will ensure you can arrive and access your labs quickly onsite. You can follow this link to sign up!

Geospatial Reasoning

· Google Cloud Next '25

demo

AI/ML GenAI LLM product-bigquery product-earth-engine product-gemini

A reference implementation for analyzing geospatial data with generative AI. Combine Google’s unique ML models with your own data and Gemini’s reasoning capabilities to unlock powerful insights for crisis response, public health, climate resilience, and more.

How BigQuery can help modernize your data platform for the AI era (Recap)

· Google Cloud Next '25

session

by Usman Ali (Google Cloud) , Davide Corda (Intesa Sanpaolo) , Vaishali Walia (Paypal) , Mohit Virendra (Google Cloud)

AI/ML

Is your outdated data infrastructure hindering your ability to leverage the full potential of AI and machine learning? This session explores how migrating to BigQuery can empower you to modernize your data infrastructure and unlock new opportunities for innovation with all of your data. Hear how Paypal and Intesa Sanpaolo transformed their data platform with BigQuery to get the most value from their data lakes and warehouses and the lessons they learned along the way.

Impact sustainability with geospatial analytics

· Google Cloud Next '25

demo

Analytics product-bigquery product-earth-engine product-gemini-capabilities-in-google-earth product-vertexai

Discover how to use geospatial analytics to drive sustainability insights. See how Earth Engine, BigQuery and Google Earth empower any professional to unlock insights and make data-driven decisions, regardless of their expertise or training.

talk-data.com

Activity Trend

Top Events

Top Speakers

Google BigQuery Analytics

Data Just Right: Introduction to Large-Scale Data & Analytics

Accelerate data insights with AI data agents

Accelerate migration to BigQuery

Analyze customer sentiment in real time

Automate data pipelines with AI agents in BigQuery

Automate data pipelines with AI agents in BigQuery (Recap)

Automate data workflows with Gemini in BigQuery

Behavioral Data: Outpace the Competition & Ensure Customer Satisfaction

Big Data is Dead: Long Live Hot Data 🔥

Build customer 360 recommendation applications using BigQuery graph

Core Infrastructure l

Create a unified data platform with BigQuery

Create ML models with SQL in BigQuery

Drive decisions with real-time vehicle intelligence

Etsy: Connecting shoppers with special items with Google AI

Explore data with Gemini in BigQuery

Geospatial Reasoning

How BigQuery can help modernize your data platform for the AI era (Recap)

Impact sustainability with geospatial analytics