talk-data.com talk-data.com

Topic

Data Science

machine_learning statistics analytics

1516

tagged

Activity Trend

68 peak/qtr
2020-Q1 2026-Q1

Activities

1516 activities · Newest first

What Works: Practical Lessons in Applying Privacy-Enhancing Technologies (PET) in Data Science

Privacy-Enhancing Technologies (PETs) promise to bridge the gap between data utility and privacy — but how do they perform in practice? In this talk, we’ll share real-world insights from our hands-on experience testing and implementing leading PET solutions across various data science use cases. We explored tools such as differential privacy libraries, homomorphic encryption frameworks, federated learning, multi-party computation, etc. Some lived up to their promise — others revealed critical limitations. You’ll walk away with a clear understanding of which PET solutions work best for which types of data and analysis, what trade-offs to expect, and how to set realistic goals when integrating PETs into your workflows. This session is ideal for data professionals and decision-makers who are navigating privacy risks while still wanting to innovate responsibly.

Ever been burned by a mysterious slowdown in your data pipeline? In this session, we'll reveal how a stealthy performance regression in the Polars DataFrame library was hunted down and squashed. Using git bisect, Bash scripting, and uv, we automated commit compilation and benchmarking across two repos to pinpoint a commit that degraded multi-file Parquet loading. This led to challenging assumptions and rethinking performance monitoring for the Python data science library Polars.

To explore how the University of Oxford leverages a unified approach to high-performance computing infrastructure and scalable data platforms across the Big Data Institute and the Centre for Human Genetics to advance biomedical research across the entire University.

This session will discuss:

  • Breakthroughs enabled by HPC and secure data platforms in health research
  • Infrastructure needs for biomedical innovation and large-scale data science
  • Oxford’s partnership journey with Dell Technologies and NVIDIA and its real-world impact
  • How scalable AI infrastructure is accelerating research outcomes

Data governance often begins with Data Defense — centralized stewardship focused on compliance and regulatory needs, built on passive metadata, manual documentation, and heavy SME reliance. While effective for audits, this top-down approach offers limited business value. 

Data Governance has moved to a Data Offense model to drive Data Monetization of Critical Data Assets in focusing on analytics and data science outcomes for improved decision-making, customer and associate experiences. This involves the integration of data quality and observability with a shift-left based on tangible impact to business outcomes, improved governance maturity, and accelerated resolution of business-impacting issues.

The next iteration is to move to the next phase of Data Stewardship in advancing to AI-Augmented and Autonomous Stewardship — embedding SME knowledge into automated workflows, managing critical assets autonomously, and delivering actionable context through proactive, shift-left observability, producer–consumer contracts, and SLAs that are built into data product development.

The data landscape is fickle, and once-coveted roles like 'DBA' and 'Data Scientist' have faced challenges. Now, the spotlight shines on Data Engineers, but will they suffer the same fate? This talk dives into historical trends.

In the early 2010’s, DBA/data warehouse was the sexiest job. Data Warehouse became the “No Team.”

In the mid-2010’s, data scientist was the sexiest job. Data Science became the “mistaken for” team.

Now, data engineering is the sexiest job. Data Engineering became the “confused team”. The confusion run rampant with questions about the industry: What is a data engineer? What do they do? Should we have all kinds of nuanced titles for variations? Just how technical should they be?

Together, let’s go back to history and look for ways on how data engineering can avoid the same fate as data warehousing and data science. This talk provides a thought-provoking discussion on navigating the exciting yet challenging world of data engineering. Let's avoid the pitfalls of the past and shape a future where data engineers thrive as essential drivers of innovation and success.

Development teams often embrace Agile ways of working, yet the systems we build can still struggle to adapt when business needs shift. In this talk, we’ll share the journey of how a cross-functional data science team at the LEGO Group evolved its machine learning architecture to handle real-world complexity and change.

We’ll highlight how new modelling strategies, advanced feature engineering, and modern MLOps pipelines were designed not only for performance, but for flexibility. You’ll gain insight into how we architected a resilient ML system that supports changing requirements, scales with ease, and enables faster iteration. Expect actionable ideas on how to future-proof your own ML solutions and ensure they remain relevant in dynamic business contexts.

Powered by: Women in Data®

Face To Face
by Alex Read (EDF) , Joe Herbert (Matillion)

This session features Joe, Matillion's Principal Solution Architect, in conversation with Alex R from EDF, a Lighthouse customer. EDF has already experienced the business value of Maia, with a 75% reduction in data science product delivery time and streamlining hundreds of jobs into a unified system. The discussion will highlight how Maia delivers

unprecedented productivity gains by helping customers move from legacy systems to an AI-driven data future, addressing tech consolidation and eliminating data friction. The Lighthouse program grants participants direct access to Matillion's product and engineering teams, fostering joint development and shared roadmaps, and aims to produce public-facing case studies

Edmund Optics stands at the forefront of advanced manufacturing, distributing more than 34,000 products and customised solutions in optics, photonics and imaging to a range of industries across the globe. Just a year ago, Edmund Optics began an ambitious journey to transform its data science capabilities, aiming to use Machine Learning (ML) and AI to deliver real value to their business and customers.  

Join us for an engaging panel discussion featuring Daniel Adams, Global Analytics Manager at Edmund Optics, as he shares the company's remarkable transformation from having no formal data science capabilities to deploying multiple ML and AI models in production—all within just 12 months. Daniel will highlight how Edmund Optics cultivated internal enthusiasm for data solutions, built trust, and created momentum to push the boundaries of what’s possible with data. 

In this session, Daniel will reveal three key lessons learned on the journey from “data zero” to “data hero.” If you’re navigating a similar path, don’t miss this opportunity to discover actionable insights and strategies that can empower your own internal data initiatives.

In today’s landscape, data truly is the new currency. But unlocking its full value requires overcoming silos, ensuring trust and quality, and then applying the right AI and analytics capabilities to create real business impact. In this session, we’ll explore how Oakbrook Finance is tackling these challenges head-on — and the role that Fivetran and Databricks play in enabling that journey.

Oakbrook Finance is a UK-based consumer lender transforming how people access credit. By combining advanced data science with a customer-first approach, Oakbrook delivers fair, transparent, and flexible credit solutions — proving that lending can be both innovative and human-centred.

Discover how Google Cloud's AI-native platform is transforming data science, moving beyond traditional methods to empower you with an intuitive experience, an open ecosystem, and the ability to build intelligent, data-native AI agents. This shift eliminates integration headaches and scales your impact, enabling you to innovate faster and drive real-world outcomes. Explore how these advancements unify your workflows and unlock unprecedented possibilities for real-time, agent-driven insights.

The Big Book of Data Science. Part I: Data Processing

There are already excellent books on software programming for data processing and data transformation for instance: Wes McKinney’s. This book, reflecting on my own industrial and teaching experience, tries to overcome the big learning curve newcomers to the field have to travel before they are ready to tackle real data science and AI challenges. In this regard this book is different to other books in that:

It assumes zero software programming knowledge. This instructional design is intentional given the book’s aim to open the practice of data science to anyone interested in data exploration and analysis irrespective of their previous background.

It follows an incremental approach to facilitate the assimilation of, sometimes, arcane software techniques to manipulate data.

It is practice oriented to ensure readers can apply what they learn in their daily practices.

Illustrates how to use generative AI to help you become a more productive data scientist and AI engineer.

By reading and working on the labs included in this book you will develop software programming skills required to successfully contribute to the data understanding and data preparation stages involved in any data related project. You will become proficient at manipulating and transforming datasets in industrial contexts and produce clean, reliable datasets that can drive accurate analysis and informed decision-making. Moreover you will be prepared to develop and deploy dashboards and visualizations supporting the insights and conclusions in the deployment stage.

Data modelling and evaluation are not covered in this book. We are working on a second installment of the book series illustrating the application of statistical and machine learning techniques to derive data insights.