talk-data.com talk-data.com

Topic

Data Collection

146

tagged

Activity Trend

17 peak/qtr
2020-Q1 2026-Q1

Activities

146 activities · Newest first

Leveling Up Gaming Analytics: How Supercell Evolved Player Experiences With Snowplow and Databricks

In the competitive gaming industry, understanding player behavior is key to delivering engaging experiences. Supercell, creators of Clash of Clans and Brawl Stars, faced challenges with fragmented data and limited visibility into user journeys. To address this, they partnered with Snowplow and Databricks to build a scalable, privacy-compliant data platform for real-time insights. By leveraging Snowplow’s behavioral data collection and Databricks’ Lakehouse architecture, Supercell achieved: Cross-platform data unification: A unified view of player actions across web, mobile and in-game Real-time analytics: Streaming event data into Delta Lake for dynamic game balancing and engagement Scalable infrastructure: Supporting terabytes of data during launches and live events AI & ML use cases: Churn prediction and personalized in-game recommendations This session explores Supercell’s data journey and AI-driven player engagement strategies.

Sponsored by: Oxylabs | Web Scraping and AI: A Quiet but Critical Partnership

Behind every powerful AI system lies a critical foundation: fresh, high-quality web data. This session explores the symbiotic relationship between web scraping and artificial intelligence that's transforming how technical teams build data-intensive applications. We'll showcase how this partnership enables crucial use cases: analyzing trends, forecasting behaviors, and enhancing AI models with real-time information. Technical challenges that once made web scraping prohibitively complex are now being solved through the very AI systems they help create. You'll learn how machine learning revolutionizes web data collection, making previously impossible scraping projects both feasible and maintainable, while dramatically reducing engineering overhead and improving data quality. Join us to explore this quiet but critical partnership that's powering the next generation of AI applications.

Summary In this episode of the Data Engineering Podcast Alex Albu, tech lead for AI initiatives at Starburst, talks about integrating AI workloads with the lakehouse architecture. From his software engineering roots to leading data engineering efforts, Alex shares insights on enhancing Starburst's platform to support AI applications, including an AI agent for data exploration and using AI for metadata enrichment and workload optimization. He discusses the challenges of integrating AI with data systems, innovations like SQL functions for AI tasks and vector databases, and the limitations of traditional architectures in handling AI workloads. Alex also shares his vision for the future of Starburst, including support for new data formats and AI-driven data exploration tools.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.This is a pharmaceutical Ad for Soda Data Quality. Do you suffer from chronic dashboard distrust? Are broken pipelines and silent schema changes wreaking havoc on your analytics? You may be experiencing symptoms of Undiagnosed Data Quality Syndrome — also known as UDQS. Ask your data team about Soda. With Soda Metrics Observability, you can track the health of your KPIs and metrics across the business — automatically detecting anomalies before your CEO does. It’s 70% more accurate than industry benchmarks, and the fastest in the category, analyzing 1.1 billion rows in just 64 seconds. And with Collaborative Data Contracts, engineers and business can finally agree on what “done” looks like — so you can stop fighting over column names, and start trusting your data again.Whether you’re a data engineer, analytics lead, or just someone who cries when a dashboard flatlines, Soda may be right for you. Side effects of implementing Soda may include: Increased trust in your metrics, reduced late-night Slack emergencies, spontaneous high-fives across departments, fewer meetings and less back-and-forth with business stakeholders, and in rare cases, a newfound love of data. Sign up today to get a chance to win a $1000+ custom mechanical keyboard. Visit dataengineeringpodcast.com/soda to sign up and follow Soda’s launch week. It starts June 9th. This episode is brought to you by Coresignal, your go-to source for high-quality public web data to power best-in-class AI products. Instead of spending time collecting, cleaning, and enriching data in-house, use ready-made multi-source B2B data that can be smoothly integrated into your systems via APIs or as datasets. With over 3 billion data records from 15+ online sources, Coresignal delivers high-quality data on companies, employees, and jobs. It is powering decision-making for more than 700 companies across AI, investment, HR tech, sales tech, and market intelligence industries. A founding member of the Ethical Web Data Collection Initiative, Coresignal stands out not only for its data quality but also for its commitment to responsible data collection practices. Recognized as the top data provider by Datarade for two consecutive years, Coresignal is the go-to partner for those who need fresh, accurate, and ethically sourced B2B data at scale. Discover how Coresignal's data can enhance your AI platforms. Visit dataengineeringpodcast.com/coresignal to start your free 14-day trial.Your host is Tobias Macey and today I'm interviewing Alex Albu about how Starburst is extending the lakehouse to support AI workloadsInterview IntroductionHow did you get involved in the area of data management?Can you start by outlining the interaction points of AI with the types of data workflows that you are supporting with Starburst?What are some of the limitations of warehouse and lakehouse systems when it comes to supporting AI systems?What are the points of friction for engineers who are trying to employ LLMs in the work of maintaining a lakehouse environment?Methods such as tool use (exemplified by MCP) are a means of bolting on AI models to systems like Trino. What are some of the ways that is insufficient or cumbersome?Can you describe the technical implementation of the AI-oriented features that you have incorporated into the Starburst platform?What are the foundational architectural modifications that you had to make to enable those capabilities?For the vector storage and indexing, what modifications did you have to make to iceberg?What was your reasoning for not using a format like Lance?For teams who are using Starburst and your new AI features, what are some examples of the workflows that they can expect?What new capabilities are enabled by virtue of embedding AI features into the interface to the lakehouse?What are the most interesting, innovative, or unexpected ways that you have seen Starburst AI features used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on AI features for Starburst?When is Starburst/lakehouse the wrong choice for a given AI use case?What do you have planned for the future of AI on Starburst?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links StarburstPodcast EpisodeAWS AthenaMCP == Model Context ProtocolLLM Tool UseVector EmbeddingsRAG == Retrieval Augmented GenerationAI Engineering Podcast EpisodeStarburst Data ProductsLanceLanceDBParquetORCpgvectorStarburst IcehouseThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Securing Databricks using Databricks as SIEM showcases our approach on how we leverage Databricks product capabilities to prevent and mitigate security risks for Databricks. It demonstrates how Databricks can serve as a powerful Security Information and Event Management (SIEM) platform, offering advanced capabilities for data collection and threat detection. This session explores data collection from diverse data sources and real-time threat detection.

Optimize Cost and User Value Through Model Routing AI Agent

Each LLM has unique strengths and weaknesses, and there is no one-size-fits-all solution. Companies strive to balance cost reduction with maximizing the value of their use cases by considering various factors such as latency, multi-modality, API costs, user need, and prompt complexity. Model routing helps in optimizing performance and cost along with enhanced scalability and user satisfaction. Overview of cost-effective models training using AI gateway logs, user feedback, prompt, and model features to design an intelligent model-routing AI agent. Covers different strategies for model routing, deployment in Mosaic AI, re-training, and evaluation through A/B testing and end-to-end Databricks workflows. Additionally, it will delve into the details of training data collection, feature engineering, prompt formatting, custom loss functions, architectural modifications, addressing cold-start problems, query embedding generation and clustering through VectorDB, and RL policy-based exploration.

Sigma Data Apps Product Releases & Roadmap | The Data Apps Conference

Organizations today require more than dashboards—they need applications that combine insights with data collection and action capabilities to drive meaningful change. In this session, Stipo Josipovic (Director of Product) will showcase the key innovations enabling this shift, from expanded write-back capabilities to workflow automation features.

You'll learn about Sigma's growing data app capabilities, including:

Enhanced write-back features: Redshift and upcoming BigQuery support, bulk data entry, and form-based collection for structured workflows Advanced security controls: Conditional editing and row-level security for precise data governance Intuitive interface components: Containers, modals, and tabbed navigation for app-like experiences Powerful Actions framework: API integrations, notifications, and automated triggers to drive business processes This session covers both recently released features and Sigma's upcoming roadmap, including detail views, simplified form-building, and new API actions to integrate with your tech stack. Discover how Sigma helps organizations move beyond analysis to meaningful action.

➡️ Learn more about Data Apps: https://www.sigmacomputing.com/product/data-applications?utm_source=youtube&utm_medium=organic&utm_campaign=data_apps_conference&utm_content=pp_data_apps


➡️ Sign up for your free trial: https://www.sigmacomputing.com/go/free-trial?utm_source=youtube&utm_medium=video&utm_campaign=free_trial&utm_content=free_trial

sigma #sigmacomputing #dataanalytics #dataanalysis #businessintelligence #cloudcomputing #clouddata #datacloud #datastructures #datadriven #datadrivendecisionmaking #datadriveninsights #businessdecisions #datadrivendecisions #embeddedanalytics #cloudcomputing #SigmaAI #AI #AIdataanalytics #AIdataanalysis #GPT #dataprivacy #python #dataintelligence #moderndataarchitecture

Behavioural data is fast becoming a cornerstone of modern business strategy. Not just for media measurement or advertising optimisation, but across product, pricing, logistics, and platform development. It tells us what people actually do, not just what they say they do. As traditional market research struggles with low engagement and recall bias, brands are turning to digital behavioural data to make sharper, faster decisions. Whether it's tracking consumer journeys in the app economy or identifying early adoption trends (like the impact of AI tools on category disruption), the value lies in real, observable behaviour at scale. But, that shift raises new questions around data ownership, consent, and fairness. And, the rise of AI is only accelerating both the opportunity and the complexity. In the latest episode of Hub & Spoken, Jason Foster, CEO & Founder of Cynozure, speaks to Chris Havemann, CEO of RealityMine, and discusses everything from: The transition from survey-based research to behavioural data analysis The impact of AI on interpreting digital interactions Ethical considerations surrounding data consent and transparency Building trust through clear data collection and usage practices Learn from Chris's 25+ years in data and insight, and explore how behavioural signals are reshaping everything from media to market intelligence. ****    Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. It works with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and data leadership. The company was named one of The Sunday Times' fastest-growing private companies in both 2022 and 2023 and recognised as The Best Place to Work in Data by DataIQ in 2023 and 2024. Cynozure is a certified B Corporation. 

The like button has transformed how we interact online, becoming a cornerstone of digital engagement with over 7 billion clicks daily. What started as a simple user interface solution has evolved into a powerful data collection tool that companies use to understand customer preferences, predict trends, and build sophisticated recommendation systems. The data behind these interactions forms what experts call the 'like graph' - a valuable network of connections that might be one of your company's most underutilized assets. Bob Goodson is President and Founder of Quid, a Silicon Valley–based company whose AI models are used by a third of the Fortune 50. Before starting Quid, he was the first employee at Yelp, where he played a role in the genesis of the like button and observed firsthand the rise of the social media industry. After Quid received an award in 2016 from the World Economic Forum for “Contributions to the Future of the Internet,” Bob served a two-year term on WEF’s Global Future Council for Artificial Intelligence & Robotics. While at Oxford University doing graduate research in language theory, Bob co-founded Oxford Entrepreneurs to connect scientists with business-minded students. Bob is co-author of a new book, Like: The Button That Changed the World, focussed on the origins of the ubiquitous Like Button in social media. In the episode, Richie and Bob explore the origins of the like button, its impact on user interaction and business, the evolution of social media features, the significance of relational data, and the future of social networks in the age of AI, and much more. Links Mentioned in the Show: Bob’s book—Like: The Button That Changed the WorldConnect with BobCourse: Analyzing Social Media Data in PythonRelated Episode: How I Nearly Got Fired For Running An A/B Test with Vanessa Larco, Former Partner at New Enterprise AssociatesRewatch sessions from RADAR: Skills Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

Sarah McKenna joins me to chat about all things web scraping. We discuss its applications, the evolution of alternative data, and AI's impact on the industry. We also discuss privacy concerns, the challenges of bot blocking, and the importance of data quality. Sarah shares ideas on how to get started with web scraping and the ethical considerations surrounding copyright and data collection.

Automating Data Quality via Shift Left for Real-Time Web Data Feeds at Industrial Scale | Sarah M...

Automating Data Quality via Shift Left for Real-Time Web Data Feeds at Industrial Scale | Sarah McKenna | Shift Left Data Conference 2025

Real-time web data is one of the hardest data streams to automate with trust since web sites don't want to be scraped, are constantly changing with no notice, and employ sophisticated bot blocking mechanisms to try to stop automated data collection. At Sequentum we cut our teeth on web data and have come out with a general purpose cloud platform for any type of data ingestion and data enrichment that our clients can transparently audit and ultimately trust to get their mission critical data delivered on time and with quality to fuel their business decision making.

Data Insight Foundations: Step-by-Step Data Analysis with R

This book is an essential guide designed to equip you with the vital tools and knowledge needed to excel in data science. Master the end-to-end process of data collection, processing, validation, and imputation using R, and understand fundamental theories to achieve transparency with literate programming, renv, and Git--and much more. Each chapter is concise and focused, rendering complex topics accessible and easy to understand. Data Insight Foundations caters to a diverse audience, including web developers, mathematicians, data analysts, and economists, and its flexible structure allows enables you to explore chapters in sequence or navigate directly to the topics most relevant to you. While examples are primarily in R, a basic understanding of the language is advantageous but not essential. Many chapters, especially those focusing on theory, require no programming knowledge at all. Dive in and discover how to manipulate data, ensure reproducibility, conduct thorough literature reviews, collect data effectively, and present your findings with clarity. What You Will Learn Data Management: Master the end-to-end process of data collection, processing, validation, and imputation using R. Reproducible Research: Understand fundamental theories and achieve transparency with literate programming, renv, and Git. Academic Writing: Conduct scientific literature reviews and write structured papers and reports with Quarto. Survey Design: Design well-structured surveys and manage data collection effectively. Data Visualization: Understand data visualization theory and create well-designed and captivating graphics using ggplot2. Who this Book is For Career professionals such as research and data analysts transitioning from academia to a professional setting where production quality significantly impacts career progression. Some familiarity with data analytics processes and an interest in learning R or Python are ideal.

Grokking Relational Database Design

A friendly illustrated guide to designing and implementing your first database. Grokking Relational Database Design makes the principles of designing relational databases approachable and engaging. Everything in this book is reinforced by hands-on exercises and examples. In Grokking Relational Database Design, you’ll learn how to: Query and create databases using Structured Query Language (SQL) Design databases from scratch Implement and optimize database designs Take advantage of generative AI when designing databases A well-constructed database is easy to understand, query, manage, and scale when your app needs to grow. In Grokking Relational Database Design you’ll learn the basics of relational database design including how to name fields and tables, which data to store where, how to eliminate repetition, good practices for data collection and hygiene, and much more. You won’t need a computer science degree or in-depth knowledge of programming—the book’s practical examples and down-to-earth definitions are beginner-friendly. About the Technology Almost every business uses a relational database system. Whether you’re a software developer, an analyst creating reports and dashboards, or a business user just trying to pull the latest numbers, it pays to understand how a relational database operates. This friendly, easy-to-follow book guides you from square one through the basics of relational database design. About the Book Grokking Relational Database Design introduces the core skills you need to assemble and query tables using SQL. The clear explanations, intuitive illustrations, and hands-on projects make database theory come to life, even if you can’t tell a primary key from an inner join. As you go, you’ll design, implement, and optimize a database for an e-commerce application and explore how generative AI simplifies the mundane tasks of database designs. What's Inside Define entities and their relationships Minimize anomalies and redundancy Use SQL to implement your designs Security, scalability, and performance About the Reader For self-taught programmers, software engineers, data scientists, and business data users. No previous experience with relational databases assumed. About the Authors Dr. Qiang Hao and Dr. Michail Tsikerdekis are both professors of Computer Science at Western Washington University. Quotes If anyone is looking to improve their database design skills, they can’t go wrong with this book. - Ben Brumm, DatabaseStar Goes beyond SQL syntax and explores the core principles. An invaluable resource! - William Jamir Silva, Adjust Relational database design is best done right the first time. This book is a great help to achieve that! - Maxim Volgin, KLM Provides necessary notions to design and build databases that can stand the data challenges we face. - Orlando Méndez, Experian

podcast_episode
by Val Kroll , Michael Tiffany (Fulcra Dynamics) , Julie Hoyer , Tim Wilson (Analytics Power Hour - Columbus (OH) , Moe Kiss (Canva) , Michael Helbling (Search Discovery)

Every listener of this show is keenly aware that they are enabling the collection of various forms of hyper-specific data. Smartphones are movement and light biometric data collection machines. Many of us augment this data with a smartwatch, a smart ring, or both. A connected scale? Sure! Maybe even a continuous glucose monitor (CGM)! But… why? And what are the ramifications both for changing the ways we move through life for the better (Live healthier! Proactive wellness!) and for the worse (privacy risks and bad actors)? We had a wide-ranging discussion with Michael Tiffany, co-founder and CEO of Fulcra Dynamics, that took a run at these topics and more. Why, it's possible you'll get so excited by the content that one of your devices will record a temporary spike in your heart rate! For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

Historically speaking, digital analytics has focused predominantly on client-side tracking, but recent shifts in regulations, privacy and technology have driven analysts towards server-side solutions - primarily server-side tag management. While server-side solutions are starting to be more widely considered, full server-side tracking remains an underutilized opportunity. This talk unpacks the differences between client-side and server-side tracking (not tagging!), explores how server-side tracking can improve data quality, and demonstrates how integrating both approaches can elevate your behavioural data collection strategy.

Discover how the Digital Analytics team at Itaú, the largest private bank in Latin America, built a high-quality, unified, and standardized tagging ecosystem that simplified data collection, ensured consistent data quality across various measurement tools, and created a comprehensive, customer-centric data strategy across over 10 apps, thousands of developers, and hundreds of business operations.

Episode Description Ever feel like your phone knows you a little too well? One Google search, and suddenly, ads follow you across the internet like a digital stalker. AI-powered personalization has long relied on collecting massive amounts of personal data—but what if it didn’t have to? In this episode of Data & AI with Mukundan, we explore a game-changing shift in AI—personalized experiences without intrusive tracking. Two groundbreaking techniques, Sequential Layer Expansion and FedSelect, are reshaping how AI learns from users while keeping their data private. We’ll break down: ✅ Why AI personalization has been broken until now ✅ How these new models improve AI recommendations without privacy risks ✅ Real-world applications in streaming, e-commerce, and healthcare ✅ How AI can respect human identity while scaling globally The future of AI is personal, but it doesn’t have to be invasive. Tune in to discover how AI can work for you—without spying on you. Key Takeaways 🔹 The Problem: Why AI Personalization Has Been Broken Streaming services, e-commerce, and healthcare AI often make irrelevant or generic recommendations.Most AI models collect massive amounts of user data, stored on centralized servers—risking leaks, breaches, and misuse.AI personalization has been a “one-size-fits-all” approach that doesn’t truly adapt to individual needs.🔹 The Solution: AI That Learns Without Spying on You ✨ Sequential Layer Expansion – AI that grows with you Instead of static AI models, this method builds in layers, adapting over time.It learns only what’s relevant to you, reducing unnecessary data collection.Think of it like training for a marathon—starting small and progressively improving.✨ FedSelect – AI that fine-tunes only what matters Instead of changing an entire AI model, it selectively updates the most relevant parameters.Think of it like tuning a car—you upgrade what’s needed instead of replacing the whole engine.Everything happens locally on your device, meaning your raw data never leaves.🔹 Real-World Impact: How This Changes AI for You 🎬 Streaming Services – Netflix finally gets your taste right—without tracking you across the web. 🛍️ E-commerce – Shopping apps suggest what you actually need, not random trending items. 🏥 Healthcare – AI-powered health plans tailored to your genes and habits—without sharing your medical data. 🔹 The Bigger Picture: Why This Matters for the Future of AI Personalized AI at scale: AI adapts to billions of users while remaining privacy-first.AI that respects human identity: You control your AI, not the other way around.The end of surveillance-style tracking: No more creepy ads following you around.🌟 AI can be personal—without being invasive. That’s the future we should all demand. Fedselect: https://arxiv.org/abs/2404.02478 | Sequential Layer Expansion:https://arxiv.org/abs/2404.17799 🔔 Subscribe, rate, and review for more AI insights!

IAPP CIPP / US Certified Information Privacy Professional Study Guide, 2nd Edition

Prepare for success on the IAPP CIPP/US exam and further your career in privacy with this effective study guide - now includes a downloadable supplement to get you up to date on the current CIPP exam for 2024-2025! Information privacy has become a critical and central concern for small and large businesses across the United States. At the same time, the demand for talented professionals able to navigate the increasingly complex web of legislation and regulation regarding privacy continues to increase. Written from the ground up to prepare you for the United States version of the Certified Information Privacy Professional (CIPP) exam, Sybex's IAPP CIPP/US Certified Information Privacy Professional Study Guide also readies you for success in the rapidly growing privacy field. You'll efficiently and effectively prepare for the exam with online practice tests and flashcards as well as a digital glossary. The concise and easy-to-follow instruction contained in the IAPP/CIPP Study Guide covers every aspect of the CIPP/US exam, including the legal environment, regulatory enforcement, information management, private sector data collection, law enforcement and national security, workplace privacy and state privacy law, and international privacy regulation. Provides the information you need to gain a unique and sought-after certification that allows you to fully understand the privacy framework in the US Fully updated to prepare you to advise organizations on the current legal limits of public and private sector data collection and use Includes 1 year free access to the Sybex online learning center, with chapter review questions, full-length practice exams, hundreds of electronic flashcards, and a glossary of key terms, all supported by Wiley's support agents who are available 24x7 via email or live chat to assist with access and login questions Perfect for anyone considering a career in privacy or preparing to tackle the challenging IAPP CIPP exam as the next step to advance an existing privacy role, the IAPP CIPP/US Certified Information Privacy Professional Study Guide offers you an invaluable head start for success on the exam and in your career as an in-demand privacy professional.

Data Science for Decision Makers

Data Science for Decision Makers is an essential guide for executives, managers, entrepreneurs, and anyone seeking to harness the power of data to drive business success. In today's fast-paced and increasingly digital world, the ability to make informed decisions based on data-driven insights is vital. This book serves as a bridge between the complex world of data science and the strategic decision-making process, providing readers with the knowledge and tools they need to leverage data effectively. With a clear focus on practical application, this book demystifies key concepts in data science, from data collection and analysis to predictive modeling and visualization. Via real-world examples, case studies, and actionable insights, readers will learn how to extract insights from data and translate them into actionable strategies that drive organizational growth. Written in a reader-friendly manner, this book caters to both novice and experienced professionals alike. Whether you're a seasoned executive looking to sharpen your strategic acumen or a manager seeking to enhance your team's data literacy, this essential reference provides the necessary foundation to navigate the complex landscape of data science with confidence.

Data Science Essentials For Dummies

Feel confident navigating the fundamentals of data science Data Science Essentials For Dummies is a quick reference on the core concepts of the exploding and in-demand data science field, which involves data collection and working on dataset cleaning, processing, and visualization. This direct and accessible resource helps you brush up on key topics and is right to the point—eliminating review material, wordy explanations, and fluff—so you get what you need, fast. Strengthen your understanding of data science basics Review what you've already learned or pick up key skills Effectively work with data and provide accessible materials to others Jog your memory on the essentials as you work and get clear answers to your questions Perfect for supplementing classroom learning, reviewing for a certification, or staying knowledgeable on the job, Data Science Essentials For Dummies is a reliable reference that's great to keep on hand as an everyday desk reference.