talk-data.com talk-data.com

Topic

BigQuery

Google BigQuery

data_warehouse analytics google_cloud olap

315

tagged

Activity Trend

17 peak/qtr
2020-Q1 2026-Q1

Activities

315 activities · Newest first

Google BigQuery Analytics

How to effectively use BigQuery, avoid common mistakes, and execute sophisticated queries against large datasets Google BigQuery Analytics is the perfect guide for business and data analysts who want the latest tips on running complex queries and writing code to communicate with the BigQuery API. The book uses real-world examples to demonstrate current best practices and techniques, and also explains and demonstrates streaming ingestion, transformation via Hadoop in Google Compute engine, AppEngine datastore integration, and using GViz with Tableau to generate charts of query results. In addition to the mechanics of BigQuery, the book also covers the architecture of the underlying Dremel query engine, providing a thorough understanding that leads to better query results. Features a companion website that includes all code and data sets from the book Uses real-world examples to explain everything analysts need to know to effectively use BigQuery Includes web application examples coded in Python

Data Just Right: Introduction to Large-Scale Data & Analytics

Making Big Data Work: Real-World Use Cases and Examples, Practical Code, Detailed Solutions Large-scale data analysis is now vitally important to virtually every business. Mobile and social technologies are generating massive datasets; distributed cloud computing offers the resources to store and analyze them; and professionals have radically new technologies at their command, including NoSQL databases. Until now, however, most books on “Big Data” have been little more than business polemics or product catalogs. is different: It’s a completely practical and indispensable guide for every Big Data decision-maker, implementer, and strategist. Data Just Right Michael Manoochehri, a former Google engineer and data hacker, writes for professionals who need practical solutions that can be implemented with limited resources and time. Drawing on his extensive experience, he helps you focus on building applications, rather than infrastructure, because that’s where you can derive the most value. Manoochehri shows how to address each of today’s key Big Data use cases in a cost-effective way by combining technologies in hybrid solutions. You’ll find expert approaches to managing massive datasets, visualizing data, building data pipelines and dashboards, choosing tools for statistical analysis, and more. Throughout, the author demonstrates techniques using many of today’s leading data analysis tools, including Hadoop, Hive, Shark, R, Apache Pig, Mahout, and Google BigQuery. Coverage includes Mastering the four guiding principles of Big Data success—and avoiding common pitfalls Emphasizing collaboration and avoiding problems with siloed data Hosting and sharing multi-terabyte datasets efficiently and economically “Building for infinity” to support rapid growth Developing a NoSQL Web app with Redis to collect crowd-sourced data Running distributed queries over massive datasets with Hadoop, Hive, and Shark Building a data dashboard with Google BigQuery Exploring large datasets with advanced visualization Implementing efficient pipelines for transforming immense amounts of data Automating complex processing with Apache Pig and the Cascading Java library Applying machine learning to classify, recommend, and predict incoming information Using R to perform statistical analysis on massive datasets Building highly efficient analytics workflows with Python and Pandas Establishing sensible purchasing strategies: when to build, buy, or outsource Previewing emerging trends and convergences in scalable data technologies and the evolving role of the Data Scientist

Routine tasks such as data wrangling and pipeline maintenance often inhibit data teams from doing higher-value analysis and insights-led decision-making. This session showcases how intelligent data agents in BigQuery can help automate complex data engineering tasks. You’ll learn how to use natural language prompts to streamline data engineering tasks from ingestion and transformation, such as data cleaning, formatting, and loading results into BigQuery tables that accelerate the time to build and validate data pipelines.

Routine tasks such as data wrangling and pipeline maintenance often inhibit data teams from doing higher-value analysis and insights-led decision-making. This session showcases how intelligent data agents in BigQuery can help automate complex data engineering tasks. You’ll learn how to use natural language prompts to streamline data engineering tasks from ingestion and transformation, such as data cleaning, formatting, and loading results into BigQuery tables that accelerate the time to build and validate data pipelines.

Businesses need to predict what customers want and create personalized experiences to gain a competitive advantage and drive revenue. They need to deliver customized, tailored interactions that increase customer acquisition, improve loyalty and increase satisfaction. Join Fullstory’s Head of Data Products to learn how Data + Engineering teams can supercharge tools like DialogFlow and BigQuery with unprecedented behavioral data to accurately forecast and create experiences that outpace the competition and keep customers coming back for more. By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Big Data is Dead: Long Live Hot Data 🔥

Over the last decade, Big Data was everywhere. Let's set the record straight on what is and isn't Big Data. We have been consumed by a conversation about data volumes when we should focus more on the immediate task at hand: Simplifying our work.

Some of us may have Big Data, but our quest to derive insights from it is measured in small slices of work that fit on your laptop or in your hand. Easy data is here— let's make the most of it.

📓 Resources Big Data is Dead: https://motherduck.com/blog/big-data-is-dead/ Small Data Manifesto: https://motherduck.com/blog/small-data-manifesto/ Small Data SF: https://www.smalldatasf.com/

➡️ Follow Us LinkedIn: https://linkedin.com/company/motherduck X/Twitter : https://twitter.com/motherduck Blog: https://motherduck.com/blog/


Explore the "Small Data" movement, a counter-narrative to the prevailing big data conference hype. This talk challenges the assumption that data scale is the most important feature of every workload, defining big data as any dataset too large for a single machine. We'll unpack why this distinction is crucial for modern data engineering and analytics, setting the stage for a new perspective on data architecture.

Delve into the history of big data systems, starting with the non-linear hardware costs that plagued early data practitioners. Discover how Google's foundational papers on GFS, MapReduce, and Bigtable led to the creation of Hadoop, fundamentally changing how we scale data processing. We'll break down the "big data tax"—the inherent latency and system complexity overhead required for distributed systems to function, a critical concept for anyone evaluating data platforms.

Learn about the architectural cornerstone of the modern cloud data warehouse: the separation of storage and compute. This design, popularized by systems like Snowflake and Google BigQuery, allows storage to scale almost infinitely while compute resources are provisioned on-demand. Understand how this model paved the way for massive data lakes but also introduced new complexities and cost considerations that are often overlooked.

We examine the cracks appearing in the big data paradigm, especially for OLAP workloads. While systems like Snowflake are still dominant, the rise of powerful alternatives like DuckDB signals a shift. We reveal the hidden costs of big data analytics, exemplified by a petabyte-scale query costing nearly $6,000, and argue that for most use cases, it's too expensive to run computations over massive datasets.

The key to efficient data processing isn't your total data size, but the size of your "hot data" or working set. This talk argues that the revenge of the single node is here, as modern hardware can often handle the actual data queried without the overhead of the big data tax. This is a crucial optimization technique for reducing cost and improving performance in any data warehouse.

Discover the core principles for designing systems in a post-big data world. We'll show that since only 1 in 500 users run true big data queries, prioritizing simplicity over premature scaling is key. For low latency, process data close to the user with tools like DuckDB and SQLite. This local-first approach offers a compelling alternative to cloud-centric models, enabling faster, more cost-effective, and innovative data architectures.

In this game you will create and manage permissions for Google Cloud resources, run structured queries on BigQuery and Cloud SQL, create several VPC networks and VM instances and test connectivity across networks, and monitor a Google Compute Engine VM instance with Cloud Monitoring.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Step into Etsy’s "Museum of Extraordinary Objects" where Gemini on Vertex AI curates 100M+ unique goods from makers around the world. Discover how Google AI connects Etsy's extraordinary items with the right buyers—transforming the art of finding what you love, faster.

In this hands-on lab, you'll explore data with BigQuery's intuitive table explorer and data insight features, enabling you to gain valuable insights without writing SQL queries from scratch. Learn how to generate key insights from order item data, query location tables, and interact with your data seamlessly. By the end, you’ll be equipped to navigate complex datasets and uncover actionable insights quickly and efficiently.

If you register for a Learning Center lab, please ensure that you sign up for a Google Cloud Skills Boost account for both your work domain and personal email address. You will need to authenticate your account as well (be sure to check your spam folder!). This will ensure you can arrive and access your labs quickly onsite. You can follow this link to sign up!

Is your outdated data infrastructure hindering your ability to leverage the full potential of AI and machine learning? This session explores how migrating to BigQuery can empower you to modernize your data infrastructure and unlock new opportunities for innovation with all of your data. Hear how Paypal and Intesa Sanpaolo transformed their data platform with BigQuery to get the most value from their data lakes and warehouses and the lessons they learned along the way.