Bridging Accessibility and AI: Sign Language Recognition & Inclusive Design with Sheida Rashidi

2025-11-12 · Women in AI and Data Science Conference 2025 Watch

video

by Sheida Rashidi

AI/ML Data Science GitHub Keras TensorFlow

As AI continues to shape human-computer interaction, there’s a growing opportunity and responsibility to ensure these technologies serve everyone, including people with communication disabilities. In this talk, I will present my ongoing work in developing a real-time American Sign Language (ASL) recognition system, and explore how integrating accessible design principles into AI research can expand both usability and impact.

The core of the talk will cover the Sign Language Recogniser project (available on GitHub), in which I used MediaPipe Studio together with TensorFlow, Keras, and OpenCV to train a model that classifies ASL letters from hand-tracking features.

I’ll share the methodology: data collection, feature extraction via MediaPipe, model training, and demo/testing results. I’ll also discuss challenges encountered, such as dealing with gesture variability, lighting and camera differences, latency constraints, and model generalization.

Beyond the technical implementation, I’ll reflect on the broader implications: how accessibility-focused AI projects can promote inclusion, how design decisions affect trust and usability, and how women in AI & data science can lead innovation that is both rigorous and socially meaningful. Attendees will leave with actionable insights for building inclusive AI systems, especially in domains involving rich human modalities such as gesture or sign.

The Elephant in the room between data collection and data science with Katya Kovalenko

2025-11-12 · Women in AI and Data Science Conference 2025 Watch

video

by Katya Kovalenko

AI/ML Data Science

Whether you call it wrangling, cleaning, or preprocessing, data prep is often the most expensive and time-consuming part of the analytical pipeline. It may involve converting data into machine-readable formats, integrating across many datasets or outlier detection, and it can be a large source of error if done manually. Lack of machine-readable or integrated data limits connectivity across fields and data accessibility, sharing, and reuse, becoming a significant contributor to research waste.

For students, it is perhaps the greatest barrier to adopting quantitative tools and advancing their coding and analytical skills. AI tools are available for automating the cleanup and integration, but due to the one-of-a-kind nature of these problems, these approaches still require extensive human collaboration and testing. I review some of the common challenges in data cleanup and integration, approaches for understanding dataset structures, and strategies for developing and testing workflows.

Benchmarking 2000+ Cloud Servers for GBM Model Training and LLM Inference Speed

2025-09-01 · PyData Berlin 2025 Watch

talk

by Gergely Daroczi

Cloud Computing LLM Python

Spare Cores is a Python-based, open-source, and vendor-independent ecosystem collecting, generating, and standardizing comprehensive data on cloud server pricing and performance. In our latest project, we started 2000+ server types across five cloud vendors to evaluate their suitability for serving Large Language Models from 135M to 70B parameters. We tested how efficiently models can be loaded into memory of VRAM, and measured inference speed across varying token lengths for prompt processing and text generation. The published data can help you find the optimal instance type for your LLM serving needs, and we will also share our experiences and challenges with the data collection and insights into general patterns.

Leveling Up Gaming Analytics: How Supercell Evolved Player Experiences With Snowplow and Databricks

2025-06-12 · Data + AI Summit 2025 Watch

lightning_talk

by Alex Dean (Snowplow)

AI/ML Analytics Data Lakehouse Databricks Delta Snowplow Data Streaming

In the competitive gaming industry, understanding player behavior is key to delivering engaging experiences. Supercell, creators of Clash of Clans and Brawl Stars, faced challenges with fragmented data and limited visibility into user journeys. To address this, they partnered with Snowplow and Databricks to build a scalable, privacy-compliant data platform for real-time insights. By leveraging Snowplow’s behavioral data collection and Databricks’ Lakehouse architecture, Supercell achieved: Cross-platform data unification: A unified view of player actions across web, mobile and in-game Real-time analytics: Streaming event data into Delta Lake for dynamic game balancing and engagement Scalable infrastructure: Supporting terabytes of data during launches and live events AI & ML use cases: Churn prediction and personalized in-game recommendations This session explores Supercell’s data journey and AI-driven player engagement strategies.

Sponsored by: Oxylabs | Web Scraping and AI: A Quiet but Critical Partnership

Optimize Cost and User Value Through Model Routing AI Agent

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Aditya Gautam (Meta)

AI/ML API Databricks LLM Vector DB

Each LLM has unique strengths and weaknesses, and there is no one-size-fits-all solution. Companies strive to balance cost reduction with maximizing the value of their use cases by considering various factors such as latency, multi-modality, API costs, user need, and prompt complexity. Model routing helps in optimizing performance and cost along with enhanced scalability and user satisfaction. Overview of cost-effective models training using AI gateway logs, user feedback, prompt, and model features to design an intelligent model-routing AI agent. Covers different strategies for model routing, deployment in Mosaic AI, re-training, and evaluation through A/B testing and end-to-end Databricks workflows. Additionally, it will delve into the details of training data collection, feature engineering, prompt formatting, custom loss functions, architectural modifications, addressing cold-start problems, query embedding generation and clustering through VectorDB, and RL policy-based exploration.

Sigma Data Apps Product Releases & Roadmap | The Data Apps Conference

2025-06-02 · Sigma Data Apps Conference 2025 Watch

video

by Stipo Josipovic (Sigma Computing)

AI/ML API BigQuery Data Governance Python Redshift Cyber Security

Organizations today require more than dashboards—they need applications that combine insights with data collection and action capabilities to drive meaningful change. In this session, Stipo Josipovic (Director of Product) will showcase the key innovations enabling this shift, from expanded write-back capabilities to workflow automation features.

You'll learn about Sigma's growing data app capabilities, including:

Enhanced write-back features: Redshift and upcoming BigQuery support, bulk data entry, and form-based collection for structured workflows Advanced security controls: Conditional editing and row-level security for precise data governance Intuitive interface components: Containers, modals, and tabbed navigation for app-like experiences Powerful Actions framework: API integrations, notifications, and automated triggers to drive business processes This session covers both recently released features and Sigma's upcoming roadmap, including detail views, simplified form-building, and new API actions to integrate with your tech stack. Discover how Sigma helps organizations move beyond analysis to meaningful action.

➡️ Learn more about Data Apps: https://www.sigmacomputing.com/product/data-applications?utm_source=youtube&utm_medium=organic&utm_campaign=data_apps_conference&utm_content=pp_data_apps

➡️ Sign up for your free trial: https://www.sigmacomputing.com/go/free-trial?utm_source=youtube&utm_medium=video&utm_campaign=free_trial&utm_content=free_trial

sigma #sigmacomputing #dataanalytics #dataanalysis #businessintelligence #cloudcomputing #clouddata #datacloud #datastructures #datadriven #datadrivendecisionmaking #datadriveninsights #businessdecisions #datadrivendecisions #embeddedanalytics #cloudcomputing #SigmaAI #AI #AIdataanalytics #AIdataanalysis #GPT #dataprivacy #python #dataintelligence #moderndataarchitecture

Automating Data Quality via Shift Left for Real-Time Web Data Feeds at Industrial Scale | Sarah M...

2025-04-02 · Shift Left Data Conference 2025 Watch

video

by Sarah McKenna

Cloud Computing Data Quality

Automating Data Quality via Shift Left for Real-Time Web Data Feeds at Industrial Scale | Sarah McKenna | Shift Left Data Conference 2025

Real-time web data is one of the hardest data streams to automate with trust since web sites don't want to be scraped, are constantly changing with no notice, and employ sophisticated bot blocking mechanisms to try to stop automated data collection. At Sequentum we cut our teeth on web data and have come out with a general purpose cloud platform for any type of data ingestion and data enrichment that our clients can transparently audit and ultimately trust to get their mission critical data delivered on time and with quality to fuel their business decision making.

Continuous Data Pipeline for Real time Benchmarking & Data Set Augmentation | Teleskope

2023-05-11 · Data Council 2023 Watch

video

by Ivan Aguilar (Teleskope)

AI/ML Analytics API Data Engineering NLP

ABOUT THE TALK: Building and curating representative datasets is crucial for accurate ML systems. Monitoring metrics post-deployment helps improve the model. Unstructured language models may face data shifts, leading to unpredictable inferences. Open-source APIs and annotation tools streamline annotation and reduce analyst workload.

This talk discusses generating datasets and real-time precision/recall splits to detect data shifts, prioritize data collection, and retrain models.

ABOUT THE SPEAKER: Ivan Aguilar is a data scientist at Teleskope focused on building scalable models for detecting PII/PHI/Secrets and other compliance related entities within customers' clouds. Prior to joining Teleskope, Ivan was a ML Engineer at Forge.AI, a Boston based shop working on information extraction, content extraction, and other NLP related tasks.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Protecting PII/PHI Data in Data Lake via Column Level Encryption

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

Data Lake Databricks HTML Cyber Security

Data breach is a concern for any data collection company including Northwestern mutual. Every measure is taken to avoid the identity theft and fraud for our customers; however they are still not sufficient if the security around it is not updated periodically. A multiple layer of encryption is the most common approach utilized to avoid breaches however unauthorized internal access to this sensitive data still poses a threat

This presentation will walk you following steps: - Design to build encryption at column level - How to protect PII data that is used as key for joins - Ability for authorized users to decrypt data at run time - Ability to rotate the encryption keys if needed

At Northwestern Mutual, a combination of Fernet, AES encryption libraries, user-defined functions (UDFs), and Databricks secrets, were utilized to develop a process to encrypt PII information. Access was only provided to those with a business need to decrypt it, this helps avoids the internal threat. This is also done without data duplication or metadata (view/tables) duplication. Our goal is to help you understand on how you can build a secure data lake for your organization which can eliminate threats of data breach internally and externally. Associated blog: https://databricks.com/blog/2020/11/20/enforcing-column-level-encryption-and-avoiding-data-duplication-with-pii.html

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Unifying Data Science and Business: AI Augmentation/Integration in Production Business Applications

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

by Dr. Harvey

AI/ML Analytics API Data Science Databricks

Why is it so hard to integrate Machine Learning into real business applications? In 2019 Gartner predicted that AI augmentation would solve this problem and would create will create $2.9 trillion of business value and 6.2 billion hours of worker productivity in 2021. A new realm of business science methods that encompass AI-powered analytics that allows people with domain expertise to make smarter decisions faster and with more confidence have also emerged as a solution to this problem. Dr. Harvey will demystify why integration challenges still account for $30.2 billion in annual global losses and discuss what it takes to integrate AI/ML code or algorithms into real business applications and the effort that goes into making each component, including data collection, preparation, training, and serving production-ready, enabling organizations to use the results of integrated models repeatedly with minimal user intervention. Finally, Dr. Harvey will discuss AISquared’s integration with Databricks and MLFlow to accelerate the integration of AI by unifying data science with business. By adding five lines of code to your model, users can now leverage AISquared’s model integration API framework which provides a quick and easy way to integrate models directly into live business applications.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Distributed Machine Learning at Lyft

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

AI/ML API Databricks Kubernetes Spark SQL

Data collection, preprocessing, feature engineering are the fundamental steps in any Machine Learning Pipeline. After feature engineering, being able to parallelize training on multiple low cost machines helps to reduce cost and time both. And, then being able to train models in a distributed manner speeds up Hyperparameter Tuning. How can we unify these stages of ML Pipeline in one unified distributed training platform together? And that too on Kubernetes?

Our ML platform is completely based on Kubernetes because of its scalability and rapid bootstrapping time of resources. In this talk we will demonstrate how Lyft uses Spark on Kubernetes, Fugue (our home grown unifying compute abstraction layer) to design a holistic end to end ML Pipeline system for distributed feature engineering, training & prediction experience for our customers on our ML Platform on top of Spark on K8s. We will also do a deep dive to show how we are abstracting and hiding infrastructure complexities so that our Data Scientists and Research Scientist can focus only on the business logic for their models through simple pythonic APIs and SQL. We let the users focus on ''what to do'' and the platform takes care of ''how to do''. We will share our challenges, learning and the fun we had while implementing. Using Spark on K8s have helped us achieve large scale data processing with 90% less cost and at times bringing down processing time from 2 hours to less than 20 mins.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

SSAC22: The Art & Science of Data Collection: Power Growth, While Delighting Your Fans (Adobe)

2022-03-25 · Sports Analytics Conference 2022 Watch

video

ANALYTICS IN THE AGE OF THE MODERN DATA STACK

2022-02-01 · Superweek 2022 Watch

talk

by Ibrahim Elawadi (/ Greenpeace)

Analytics Modern Data Stack

The pace of change in the analytics sector increased dramatically since 2012 with tons of new tools, paving the way to the birth of the Modern Data Stack. The rapid explosion of tools is met with a rapid explosion of restrictions, challenging the status quo of data collection, processing and storage. How does that reflect on Analytics and its future?

SERVER-SIDE TAGGING: DATA QUALITY OR DATA QUANTITY?

2022-02-01 · Superweek 2022 Watch

talk

by Simo Ahava (NetBooster, Helsinki - Finland)

Data Governance Data Quality Marketing

Simo explores the latest and greatest paradigm in Google's marketing stack: server-side tagging in Google Tag Manager. The benefits of moving data collection server-side are obvious – or are they? The same tools and mechanisms that help with data governance and oversight can be abused due to the opaqueness associated with moving data collections server-side. In this talk, Simo takes a honest look at just what problems server-side tagging seeks to address, and whether it actually manages to do what it’s set out to do.

talk-data.com

Data Collection

Activity Trend

Top Events

Top Speakers

Bridging Accessibility and AI: Sign Language Recognition & Inclusive Design with Sheida Rashidi

The Elephant in the room between data collection and data science with Katya Kovalenko

Benchmarking 2000+ Cloud Servers for GBM Model Training and LLM Inference Speed

Leveling Up Gaming Analytics: How Supercell Evolved Player Experiences With Snowplow and Databricks

Sponsored by: Oxylabs | Web Scraping and AI: A Quiet but Critical Partnership

Sponsored by: Tealium | Personalizing Experiences and Improving Engagement with a Modernized Data Infrastructure

Optimize Cost and User Value Through Model Routing AI Agent

Sigma Data Apps Product Releases & Roadmap | The Data Apps Conference

Automating Data Quality via Shift Left for Real-Time Web Data Feeds at Industrial Scale | Sarah M...

Continuous Data Pipeline for Real time Benchmarking & Data Set Augmentation | Teleskope

Protecting PII/PHI Data in Data Lake via Column Level Encryption

Unifying Data Science and Business: AI Augmentation/Integration in Production Business Applications

Distributed Machine Learning at Lyft

SSAC22: The Art & Science of Data Collection: Power Growth, While Delighting Your Fans (Adobe)

ANALYTICS IN THE AGE OF THE MODERN DATA STACK

SERVER-SIDE TAGGING: DATA QUALITY OR DATA QUANTITY?