talk-data.com talk-data.com

Topic

BigQuery

Google BigQuery

data_warehouse analytics google_cloud olap

27

tagged

Activity Trend

17 peak/qtr
2020-Q1 2026-Q1

Activities

27 activities · Newest first

The JupyterLab Extension Ecosystem: Trends & Signals from PyPI and GitHub

What does the JupyterLab extension ecosystem actually look like in 2025? While extensions drive much of JupyterLab's practical value, their overall landscape remains largely unexplored. This talk analyzes public PyPI (via BigQuery) and GitHub data to quantify growth, momentum, and health: monthly downloads by category, release recency, star-download relationships, and the rise of AI-focused extensions. I will present my approach for building this analysis pipeline and offer lessons learned. Finally, I will demonstrate of an open, read-only web catalog built on this data set.

Unleash the power of dbt on Google Cloud: BigQuery, Iceberg, DataFrames and beyond

The data world has long been divided, with data engineers and data scientists working in silos. This fragmentation creates a long, difficult journey from raw data to machine learning models. We've unified these worlds through the Google Cloud and dbt partnership. In this session, we'll show you an end-to-end workflow that simplifies data to AI journey. The availability of dbt Cloud on Google Cloud Marketplace streamlines getting started, and its integration with BigQuery's new Apache Iceberg tables creates an open foundation. We'll also highlight how BigQuery DataFrames' integration with dbt Python models lets you perform complex data science at scale, all within a single, streamlined process. Join us to learn how to build a unified data and AI platform with dbt on Google Cloud.

How to Build an Open Lakehouse: Best Practices for Interoperability

Building an open data lakehouse? Start with the right blueprint. This session walks through common reference architectures for interoperable lakehouse deployments across AWS, Google Cloud, Azure and tools like Snowflake, BigQuery and Microsoft Fabric. Learn how to design for cross-platform data access, unify governance with Unity Catalog and ensure your stack is future-ready — no matter where your data lives.

Sponsored by: Onehouse | Open By Default, Fast By Design: One Lakehouse That Scales From BI to AI

You already see the value of the lakehouse. But are you truly maximizing its potential across all workloads, from BI to AI? In this session, Onehouse unveils how our open lakehouse architecture unifies your entire stack, enabling true interoperability across formats, catalogs, and engines. From lightning-fast ingestion at scale to cost-efficient processing and multi-catalog sync, Onehouse helps you go beyond trade-offs. Discover how Apache XTable (Incubating) enables cross-table-format compatibility, how OpenEngines puts your data in front of the best engine for the job, and how OneSync keeps data consistent across Snowflake, Athena, Redshift, BigQuery, and more. Meanwhile, our purpose-built lakehouse runtime slashes ingest and ETL costs. Whether you’re delivering BI, scaling AI, or building the next big thing, you need a lakehouse that’s open and powerful. Onehouse opens everything—so your data can power anything.

Sigma Data Apps Product Releases & Roadmap | The Data Apps Conference

Organizations today require more than dashboards—they need applications that combine insights with data collection and action capabilities to drive meaningful change. In this session, Stipo Josipovic (Director of Product) will showcase the key innovations enabling this shift, from expanded write-back capabilities to workflow automation features.

You'll learn about Sigma's growing data app capabilities, including:

Enhanced write-back features: Redshift and upcoming BigQuery support, bulk data entry, and form-based collection for structured workflows Advanced security controls: Conditional editing and row-level security for precise data governance Intuitive interface components: Containers, modals, and tabbed navigation for app-like experiences Powerful Actions framework: API integrations, notifications, and automated triggers to drive business processes This session covers both recently released features and Sigma's upcoming roadmap, including detail views, simplified form-building, and new API actions to integrate with your tech stack. Discover how Sigma helps organizations move beyond analysis to meaningful action.

➡️ Learn more about Data Apps: https://www.sigmacomputing.com/product/data-applications?utm_source=youtube&utm_medium=organic&utm_campaign=data_apps_conference&utm_content=pp_data_apps


➡️ Sign up for your free trial: https://www.sigmacomputing.com/go/free-trial?utm_source=youtube&utm_medium=video&utm_campaign=free_trial&utm_content=free_trial

sigma #sigmacomputing #dataanalytics #dataanalysis #businessintelligence #cloudcomputing #clouddata #datacloud #datastructures #datadriven #datadrivendecisionmaking #datadriveninsights #businessdecisions #datadrivendecisions #embeddedanalytics #cloudcomputing #SigmaAI #AI #AIdataanalytics #AIdataanalysis #GPT #dataprivacy #python #dataintelligence #moderndataarchitecture

Chloe Caron: How Well do LLMs Detect Anomalies in Your Data?

🌟 Session Overview 🌟

Session Name: How Well do LLMs Detect Anomalies in Your Data? Speaker: Chloe Caron Session Description: Data quality challenges can severely impact businesses, causing a reported average 12% revenue loss for US companies (according to an Experian report). In this talk, we will follow a journey into constructing an anomaly detector, exploring LLMs, prompt engineering, and data type impacts. Along this path, we will analyze the use of multiple tools, including OpenAI, BigQuery, and Mistral. By the end, you will have gained insights on how to boost the accuracy of your anomaly detector and strengthen your data quality strategy.

🚀 About Big Data and RPA 2024 🚀

Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨

📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP

💡 Stay Connected & Updated 💡

Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop!

🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT

Karim Wadie: Fail-Safe BigQuery: Disaster Recovery Automation Techniques

🌟 Session Overview 🌟

Session Name: Fail-Safe BigQuery: Disaster Recovery Automation Techniques Speaker: Karim Wadie Session Description: Disaster recovery planning is critical for business continuity against unforeseen events, the most frequent of which are human errors. To guard against this, organizations need to define a backup strategy for their BigQuery tables and execute it at scale. For that, we introduce BQ Backup Manager[1], an open-source solution developed by Google Consulting Services that acts as a framework for defining varying backup policies across the organization and automating their execution on thousands of tables.

Join our session to learn more about the framework's features, architecture, and how it can immediately benefit your organization or customer.

[1] https://github.com/GoogleCloudPlatform/bq-backup-manager

🚀 About Big Data and RPA 2024 🚀

Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨

📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP

💡 Stay Connected & Updated 💡

Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop!

🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT

Big Data is Dead: Long Live Hot Data 🔥

Over the last decade, Big Data was everywhere. Let's set the record straight on what is and isn't Big Data. We have been consumed by a conversation about data volumes when we should focus more on the immediate task at hand: Simplifying our work.

Some of us may have Big Data, but our quest to derive insights from it is measured in small slices of work that fit on your laptop or in your hand. Easy data is here— let's make the most of it.

📓 Resources Big Data is Dead: https://motherduck.com/blog/big-data-is-dead/ Small Data Manifesto: https://motherduck.com/blog/small-data-manifesto/ Small Data SF: https://www.smalldatasf.com/

➡️ Follow Us LinkedIn: https://linkedin.com/company/motherduck X/Twitter : https://twitter.com/motherduck Blog: https://motherduck.com/blog/


Explore the "Small Data" movement, a counter-narrative to the prevailing big data conference hype. This talk challenges the assumption that data scale is the most important feature of every workload, defining big data as any dataset too large for a single machine. We'll unpack why this distinction is crucial for modern data engineering and analytics, setting the stage for a new perspective on data architecture.

Delve into the history of big data systems, starting with the non-linear hardware costs that plagued early data practitioners. Discover how Google's foundational papers on GFS, MapReduce, and Bigtable led to the creation of Hadoop, fundamentally changing how we scale data processing. We'll break down the "big data tax"—the inherent latency and system complexity overhead required for distributed systems to function, a critical concept for anyone evaluating data platforms.

Learn about the architectural cornerstone of the modern cloud data warehouse: the separation of storage and compute. This design, popularized by systems like Snowflake and Google BigQuery, allows storage to scale almost infinitely while compute resources are provisioned on-demand. Understand how this model paved the way for massive data lakes but also introduced new complexities and cost considerations that are often overlooked.

We examine the cracks appearing in the big data paradigm, especially for OLAP workloads. While systems like Snowflake are still dominant, the rise of powerful alternatives like DuckDB signals a shift. We reveal the hidden costs of big data analytics, exemplified by a petabyte-scale query costing nearly $6,000, and argue that for most use cases, it's too expensive to run computations over massive datasets.

The key to efficient data processing isn't your total data size, but the size of your "hot data" or working set. This talk argues that the revenge of the single node is here, as modern hardware can often handle the actual data queried without the overhead of the big data tax. This is a crucial optimization technique for reducing cost and improving performance in any data warehouse.

Discover the core principles for designing systems in a post-big data world. We'll show that since only 1 in 500 users run true big data queries, prioritizing simplicity over premature scaling is key. For low latency, process data close to the user with tools like DuckDB and SQLite. This local-first approach offers a compelling alternative to cloud-centric models, enabling faster, more cost-effective, and innovative data architectures.

Coalesce 2024: Needle in the (data) stack: How Spotify powers Salesforce

Spotify has absurd quantities of data. This is a huge asset, but it makes it difficult to power their frontline partnership team in Salesforce with the relevant cuts of that data they need. After struggling with both ad-hoc solutions and Salesforce consultant-led solutions, they've landed on a flexible, secure, and automated data strategy: they use dbt and Hightouch to refine critical data in Google BigQuery, sync updated records to Salesforce, and then close the loop for intelligence and analytics.

They'll share their optimal solution, with no caveats, for the real, everyday data issues that many teams encounter at scale with Salesforce.

Speaker: Tim Leonard Sr Insights Manager Spotify

Read the blog to learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements

Coalesce 2024: implementing real-time slowly changing dimensions (SCD) type 2 with dbt

Learn how Hostinger, a leading European provider of web hosting solutions, leverages dbt to implement slowly changing dimensions (SCD)...and not go bankrupt while doing it at scale. They'll cover how they used the Debezium connector to tackle the challenges of change data capture (CDC), and how leveraging dynamic incremental predicates allowed them to scale their solution at a fraction of the cost using BigQuery.

Speakers: Augustinas Karvelis Analytics Engineer Hostinger

Valentinas Mitalauskas Analytics Engineering Lead Hostinger

Read the blog to learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements

Damian Filonowicz: Lessons Learned from the GCP Data Migration

Join Damian Filonowicz as he shares 'Lessons Learned from the GCP Data Migration.' 🌐 Discover how PAYBACK tackled challenges in shifting data to the cloud, navigated privacy regulations, and uncovered insights about Google Cloud services like Cloud Dataflow, Cloud DLP, BigQuery, and more. Gain valuable suggestions for future endeavors in this enlightening presentation! 🚀🔍 #DataMigration #GCP #lessonslearned

✨ H I G H L I G H T S ✨

🙌 A huge shoutout to all the incredible participants who made Big Data Conference Europe 2023 in Vilnius, Lithuania, from November 21-24, an absolute triumph! 🎉 Your attendance and active participation were instrumental in making this event so special. 🌍

Don't forget to check out the session recordings from the conference to relive the valuable insights and knowledge shared! 📽️

Once again, THANK YOU for playing a pivotal role in the success of Big Data Conference Europe 2023. 🚀 See you next year for another unforgettable conference! 📅 #BigDataConference #SeeYouNextYear

Karim Wadie: Don’t Worry About BigQuery PII

Karim Wadie: Don’t Worry About BigQuery PII: How Google Cloud Helped a Large Telco Automate Data Governance at Scale

Discover how Google Cloud helped a large Telco automate data governance at scale in BigQuery with Karim Wadie. 📊🔒 Learn about the technical solution, GCP concepts, and see a live demo to fast-track your cloud journey. 🌐💡 #DataGovernance #BigQuery #googlecloud

✨ H I G H L I G H T S ✨

🙌 A huge shoutout to all the incredible participants who made Big Data Conference Europe 2023 in Vilnius, Lithuania, from November 21-24, an absolute triumph! 🎉 Your attendance and active participation were instrumental in making this event so special. 🌍

Don't forget to check out the session recordings from the conference to relive the valuable insights and knowledge shared! 📽️

Once again, THANK YOU for playing a pivotal role in the success of Big Data Conference Europe 2023. 🚀 See you next year for another unforgettable conference! 📅 #BigDataConference #SeeYouNextYear

Angelika Postaremczak: Best Practices for Storing Data in BigQuery

Join Angelika Postaremczak in an enlightening session on 'Best Practices for Storing Data in BigQuery' and discover the keys to optimizing data storage for lightning-fast queries without breaking the bank! 🚀💾 Explore table design strategies, data partitioning, clustering, and resource management through Infrastructure as Code for maximizing the potential of cloud data storage. ☁️📊 #BigQuery #datastorage

✨ H I G H L I G H T S ✨

🙌 A huge shoutout to all the incredible participants who made Big Data Conference Europe 2023 in Vilnius, Lithuania, from November 21-24, an absolute triumph! 🎉 Your attendance and active participation were instrumental in making this event so special. 🌍

Don't forget to check out the session recordings from the conference to relive the valuable insights and knowledge shared! 📽️

Once again, THANK YOU for playing a pivotal role in the success of Big Data Conference Europe 2023. 🚀 See you next year for another unforgettable conference! 📅 #BigDataConference #SeeYouNextYear

Transforming healthcare by putting data in the driver’s seat at Vida Health - Coalesce 2023

In this session, Vida Health’s senior director of data, mobile, and web engineering shares a story that can help other data and business leaders capitalize on the opportunities being created by current technology innovations, market realities, and real-world problems. This includes a playbook on how Vida Health uses modern data technologies like dbt Cloud, Fivetran, Looker, BigQuery, BigQueryML/dbtML, Vertex AI, LLMs, and more to put data in the driver’s seat to solve meaningful problems in complex industries like healthcare.

Speaker: Trenton Huey, Senior Director, Data and Frontend Engineering, Vida Health

Register for Coalesce at https://coalesce.getdbt.com

Leveraging dbt Cloud to transform loan warehousing - Coalesce 2023

Learn how dv01 uses dbt Cloud and BigQuery to create a scalable and modern data pipeline for offerings in loan warehousing analytics. These products serve an esoteric niche of finance and are run by a team of financial analysts with deep industry expertise.

With the challenge of tracking the performance of millions of loans from various sources and file structures, the team initially relied on Excel-based workflows. However, as the client base grew, they needed a reliable solution: a scalable data pipeline with dbt Cloud and BigQuery that allows the team to scale into a growing market and provide innovative new products and services.

Explore the transformative power of dbt Cloud in modernizing unscalable data processes, fostering skill development, and driving success in the specialized world of loan warehousing finance.

Speaker: David Maguire, Data Engineer, dv01

Register for Coalesce at https://coalesce.getdbt.com

Domesticating a feral cat data stack - Coalesce 2023

Lauren Benezra has been volunteering with a local cat rescue since 2018. She recently took on the challenge of rebuilding their data stack from scratch, replacing a Jenga tower of incomprehensible Google Sheets with a more reliable system backed by the Modern Data Stack. By using Airtable, Airbyte, BigQuery, dbt Cloud and Census, her role as Foster Coordinator has transformed: instead of digging for buried information while wrangling cats, she now serves up accurate data with ease while... well... wrangling cats.

Viewers will learn that it's possible to run an extremely scalable and reliable stack on a shoestring budget, and will come away with actionable steps to put Lauren's hard-won lessons into practice in their own volunteering projects or as the first data hire in a tiny startup.

Speakers: Lauren Benezra, Senior Analytics Engineer, dbt Labs

Register for Coalesce at https://coalesce.getdbt.com/

Scaling dbt and BigQuery to infinity and beyond - Coalesce 2023

Bluecore works with the largest retail brands around the world to engage shoppers and keep them coming back. In this talk, you’ll learn how the team at Bluecore went about creating, scaling, and maturing an analytics data warehouse in BigQuery to orchestrate 10,000+ models every 30 minutes without bankrupting the company.

Speakers: Adam Whitaker, Analytics Lead, bluecore; Nicole Dallar-Malburg, Analytics Engineer, Bluecore

Register for Coalecse at https://coalesce.getdbt.com/

Vector and Raster Data Unification Through H3 | M. Colic | Tech Lead Public Sector UK&I | Databricks

Milos Colic, Tech Lead Public Sector UK&I at Databricks, demonstrates how raster and vector geospatial data can be standardised into a unified domain.

This unification facilitates an easy plugin/plugout capability for all raster and vector layers. Databricks used these principles to design an easy, scalable and extensible Flood Risk for Physical Assets solution using H3 as a unification grid.

To learn more about H3 check out: https://docs.carto.com/data-and-analysis/analytics-toolbox-for-bigquery/sql-reference/h3

FinOps Examples using Google Cloud BigQuery by Aliz.ai - Zoltán Guth & Gergely Schmidt, Aliz.ai

This talk was recorded at Crunch Conference 2022. Zoltán and Gergely from Aliz.ai company spoke about FinOps examples using Google Cloud BigQuery.

"In this talk we will talk about the basics of FinOps concept and going to make an introduction to it through real life examples using BigQuery."

The event was organized by Crafthub.

You can watch the rest of the conference talks on our channel.

If you are interested in more speakers, tickets and details of the conference, check out our website: https://crunchconf.com/ If you are interested in more events from our company: https://crafthub.events/

The Story of DevRel at Snowflake - How We Got Here | Snowflake

ABOUT THE TALK: In this talk, Felipe Hoffa and Daniel Myers present an honest take of their wildly different approaches to Developer Relations and how both have been critical in building Snowflake's world-class developer community and ecosystem from the ground up. Learn how they define DevRel KPIs & metrics and daily challenges they face and lessons learned along the way. You might even get inspired to become a Developer Advocate after understanding the different ways to engage with the Snowflake community and what's next for Snowflake Developer Relations.

ABOUT THE SPEAKERS: Felipe Hoffa is the Data Cloud Advocate at Snowflake. Previously he worked at Google, as a Developer Advocate on Data Analytics for BigQuery, after joining as a Software Engineer. He moved from Chile to San Francisco in 2011. His goal is to inspire developers and data scientists around the world to analyze and understand their data in ways they never could before.

Daniel Myers is in Developer Relations and previously held roles at different companies, including Google, Cisco, and Fujitsu. In addition, he led and founded multiple startups.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/