talk-data.com talk-data.com

Topic

Big Data

data_processing analytics large_datasets

1217

tagged

Activity Trend

28 peak/qtr
2020-Q1 2026-Q1

Activities

1217 activities · Newest first

Comment l’IA transforme-t-elle l’intelligence d’entreprise ? Découvrez comment agir dès aujourd’hui lors de notre démo session à Big Data & IA Paris.

Au programme:

- Comment passer d’analytics fragmentés à une approche basée sur la confiance, la sécurité et centrée sur l’utilisateur ?

- Les dernières avancées de Strategy Mosaic, la première couche sémantique ouverte, construite par l’IA et pour l’IA.

- Industrialise la gouvernance de vos outils de BI et d’IA sur l’ensemble de vos données.

L'industrie sidérurgique se distingue par des filières de production complexes et des processus physico-chimiques exigeants, générant un besoin majeur de modélisation pour optimiser la productivité.

Notre stratégie de consolidation des informations industrielles et de mise à disposition de flux de données temps réel permet de multiplier et d'accélérer le déploiement de modèles tant physiques qu'à base d'intelligence artificielle.

Les principaux sujets abordés sont :

• La libération et le désilotage des données via une architecture hybride basée sur Kafka

• La simplification de l'intégration de composants dans nos systèmes industriels

• Le gain d'autonomie des équipes via des outils self-service (catalogue de données, requêtage, plateformes de modélisation)

Cette session s’achève sur un cas concret de modélisation, illustrant les bénéfices opérationnels apportés par cette architecture, avec un focus sur le pilotage en temps réel du process pour maîtriser la qualité produit avec des solutions hybrides couplant la modélisation physique et des modèles de données.

A Journey Through a Geospatial Data Pipeline: From Raw Coordinates to Actionable Insights

Every dataset has a story — and when it comes to geospatial data, it’s a story deeply rooted in space and scale. But working with geospatial information is often a hidden challenge: massive file sizes, strange formats, projections, and pipelines that don't scale easily.

In this talk, we'll follow the life of a real-world geospatial dataset, from its raw collection in the field to its transformation into meaningful insights. Along the way, we’ll uncover the key steps of building a robust, scalable open-source geospatial pipeline.

Drawing on years of experience at Camptocamp, we’ll explore:

  • How raw spatial data is ingested and cleaned
  • How vector and raster data are efficiently stored and indexed (PostGIS, Cloud Optimized GeoTIFFs, Zarr)
  • How modern tools like Dask, GeoServer, and STAC (SpatioTemporal Asset Catalogs) help process and serve geospatial data
  • How to design pipelines that handle both "small data" (local shapefiles) and "big data" (terabytes of satellite imagery)
  • Common pitfalls and how to avoid them when moving from prototypes to production

This journey will show how the open-source ecosystem has matured to make geospatial big data accessible — and how spatial thinking can enrich almost any data project, whether you are building dashboards, doing analytics, or setting the stage for machine learning later on.

Comment exploiter la gestion de la data pour transformer vos masses de données en décisions rapides et efficaces, et piloter une supply chain plus visible, plus réactive et parfaitement maîtrisée ? Cette session vous plonge au cœur de la révolution des supply chains pilotées par la data et l’IA, où la visibilité en temps réel et la réactivité immédiate deviennent les leviers essentiels de performance. Apprenez à analyser instantanément des volumes colossaux d’informations, détecter les inefficiences, anticiper les risques, et agir avec agilité pour garantir une supply chain fluide, résiliente et optimisée.

Au programme : des cas concrets issus de grands groupes, une vision puissante du pilotage intelligent de la supply chain à l’ère du Big Data, et une démonstration live impressionnante révélant tout le potentiel de la data combinée à l’IA au service de votre chaîne logistique.

CodeCommons: Towards transparent, richer and sustainable datasets for code generation model training

Built on top of Software Heritage - the largest public archive of source code - the CodeCommons collaboration is building a large-scale, meta-data rich source code dataset designed to make training AI models on code more transparent, sustainable, and fair. Code will be enriched with contextual information such as issues, pull request discussions, licensing data, and provenance. In this presentation, we will present the goals and structure of both Software Heritage and CodeCommons projects, and discuss our particular contribution to CodeCommon's big data infrastructure.

From spreadsheets to strategy: what does data look like from the CEO's chair? For this episode, we sat down with Anna Lee, CEO of Flybuys and former CFO/COO of THE ICONIC, to get her view on data-led leadership and what great looks like in data and analytics. Discover how Anna's journey from finance to the corner office has shaped her approach to leveraging evidence for strategic decision-making. From productive curiosity, to informed pragmatism, and how data teams can build trust with leadership, this is a candid conversation about analytics from the top down. Whether you're embedded in a squad or building the next big data platform, this one's for anyone who's ever wondered what it takes to truly influence the C-suite! This episode's Measurement Bite from show sponsor Recast is an overview of the fundamental problem of causal inference from Michael Kaminsky! For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

On average, one woman is killed by an abusive partner or ex every five days in England and Wales (Refuge).  

Clare’s Law, introduced in 2014, was intended to help prevent such tragedies, allowing Police to proactively disclose domestic violence history to victims potentially at risk. Yet while the law gave the Police new powers, officers’ capacity could inhibit efficient response to these requests - with siloed data across systems making proactively identifying safeguarding risks a challenge.

Bedfordshire Police, in partnership with Palantir, has built a solution to address these challenges - harnessing AI and big data to transform their force’s safeguarding approach from reactive to proactive, enabling faster identification and safeguarding of those at risk.

Emily Fitzsimons (Deployment Strategist, Palantir) alongside Bedfordshire Police will explore the challenges created for Police Public Protection Units by an increasingly complex data landscape - and how they're using latest technologies like AI to overcome them.

Join us to see how we're transforming safeguarding to help reduce domestic violence by empowering potential victims with vital information before it's too late.

Powered by Women in Data®

Everyone’s talking about GenAI. But at Big Data London, you want more than hype. 

In this session, Simon Devine (Founder of Hopton Analytics) shares how the East of England Co-op embedded GenBI – Pyramid’s generative AI tool – into their business intelligence platform to improve how decisions are made across the organisation. 

This wasn’t a flashy experiment. It was a carefully planned rollout of AI-generated explanations, natural language querying, and explainable analytics – designed to support busy operational teams, reduce report backlogs, and drive smarter decisions at scale. 

Simon will take you behind the scenes of the project: how it was planned, what hurdles had to be overcome, and the governance structures that helped it succeed. You'll hear honest reflections on what worked, what didn’t, and what they’d do differently.

 Whether you’re a data leader looking for real-world use cases, a BI owner exploring GenAI adoption, or a transformation lead trying to unlock value from your reporting stack – this session will give you practical insight, not just theory.

 Come for the lived experience. Leave with ideas you can actually use.

Data leaders today face a familiar challenge: complex pipelines, duplicated systems, and spiraling infrastructure costs. Standardizing around Kafka for real-time and Iceberg for large-scale analytics has gone some way towards addressing this but still requires separate stacks, leaving teams to stitch them together at high expense and risk.

This talk will explore how Kafka and Iceberg together form a new foundation for data infrastructure. One that unifies streaming and analytics into a single, cost-efficient layer. By standardizing on these open technologies, organizations can reduce data duplication, simplify governance, and unlock both instant insights and long-term value from the same platform.

You will come away with a clear understanding of why this convergence is reshaping the industry, how it lowers operational risk, and advantages it offers for building durable, future-proof data capabilities.

In an era where data drives decisions, Knight Frank is redefining what it means to be in property with purpose. This session explores how Knight Frank harnesses the power of big data not just to optimise real estate strategies, but to create meaningful social impact across communities. From urban regeneration to inclusive housing initiatives, data is at the heart of their mission to build a better future.

A key highlight of the talk will be Knight Frank’s collaboration with Girls in Data, a pioneering initiative aimed at empowering young women to pursue careers in data and analytics. Through mentorship, workshops, and hands-on experience, Knight Frank is helping to close the gender gap in tech and foster the next generation of data leaders.

Join us to discover how data can be a force for good, driving equity, opportunity, and transformation in the property sector and beyond.

Powered by: Women in Data®

Your AI is only as good as your data. Downtime, pipeline failures, and blind spots threaten revenue, compliance, and trust. Join Acceldata at Big Data London to explore Agentic Data Management (ADM), where AI agents autonomously resolve issues, optimize pipelines, and ensure governance. Powered by xLake Reasoning Engine, ADM delivers trusted, AI-ready data with self-healing operations. Hear how enterprises like Dun & Bradstreet boosted reliability and compliance. Ideal for data leaders, engineers, architects, analysts, product managers, and governance heads seeking autonomous data excellence. Visit Booth M70 for live demos

If you want to scare a Data Engineer with four words, ‘big data, high concurrency’ will probably do it. As data moved from the realm of BI reporting to being a customer-facing commodity, serving huge volumes of data to thousands of unforgiving app users is no small challenge. In this session, Connor Carreras will share (and demo!) how a major martech platform uses Firebolt to serve data about millions of websites to their worldwide customers with consistent millisecond response times. After this session, you will know how you can build low-latency data applications yourself. You’ll also have a deep understanding of what it takes for modern high-performance query engines to do well on these workloads.

Face To Face
by Gavi Regunath (Advancing Analytics) , Simon Whiteley (Advancing Analytics) , Holly Smith (Databricks)

We’re excited to be back at Big Data LDN this year—huge thanks to the organisers for hosting Databricks London once more!

Join us for an evening of insights, networking, and community with the Databricks Team and Advancing Analytics!

🎤 Agenda:

6:00 PM – 6:10 PM | Kickoff & Warm Welcome

Grab a drink, say hi, and get the lowdown on what’s coming up. We’ll set the scene for an evening of learning and laughs.

6:10 PM – 6:50 PM | The Metadata Marathon: How three projects are racing forward – Holly Smith (Staff Developer Advocate, Databricks)

With the enormous amount of discussion about open storage formats between nerds and even not-nerds, it can be hard to keep track of who’s doing what and how this actually makes any impact on day to day data projects.

Holly will take a closer look at the three big projects in this space; Delta, Hudi and Iceberg. They’re all trying to solve for similar data problems and have tackled the various challenges in different ways. Her talk will start with the very basics of how we got here, what the history is before diving deep into the underlying tech, their roadmaps, and their impacts on the data landscape as a whole.

6:50 PM – 7:10 PM | What’s New in Databricks & Databricks AI – Simon Whiteley & Gavi Regunath

Hot off the press! Simon and Gavi will walk you through the latest and greatest from Databricks, including shiny new AI features and platform updates you’ll want to try ASAP.

7:10 PM onwards | Q&A Panel + Networking

Your chance to ask the experts anything—then stick around for drinks, snacks, and some good old-fashioned data geekery.

Join us for an unmissable evening of insight, discussion, and lively debate at The High Performance Data and AI Debate, hosted by Chris Tabb — a unique Big Data London special running from 6:00–8:00 PM. This fast-paced, interactive event brings together some of the brightest minds in data and AI to tackle the most pressing questions shaping the future of teams, architecture, and products in an AI-first world.

The evening kicks off at 6:00 PM with a welcome and free drinks. Then, across three rapid-fire 20-minute debates, our expert panels will explore:

AI & Data – Teams (Chair: Eevamaija Virtanen)

Mehdi Ouazza, Paul Rankin, Jesse Anderson, Hugo Lu

AI & Data – Architecture (Chair: Adi Polak)

Chris Freestone, David Richardson, Nick White, Karl Ivo Sokolov

AI & Data – Products (Chair: Jai Parmar)

Kelsey Hammock, Jean-Georges (jgp) Perrin, Taylor McGrath, Jon Cooke

Refuel with free pizza at 6:50 PM, then stay for the Town Hall Debate, where all speakers return to the stage for an open-floor Q&A — your chance to challenge their ideas, share perspectives, and shape the conversation.

Expect fresh perspectives, healthy disagreement, and practical takeaways you can bring back to your organisation. Whether you’re leading a data team, designing cutting-edge architectures, or building AI-powered products, this is your space to engage with the people shaping what’s next.

Face To Face
by Jeremiah Stone (snapLogic) , Dr Mary Osbourne (SAS) , Mike Ferguson (Big Data LDN) , David Kalmuk (IBM Core Software) , Chris Aberger (Alation) , Vivienne Wei (Salesforce)

In this, the 10th year of Big Data LDN, in its flagship Great Dat Debate keynote panel, conference chair and leading industry analyst Mike Ferguson welcomes executives from leading software vendors to discuss key topics in data management and analytics. Panellists will debate the challenges and success factors in building an agentic enterprise, the importance of unified data and AI governance, the implications of key industry trends in data management, how best to deal with real-world customer challenges, how to build a modern data and analytics (D&A) architecture, and issues on-the-horizon that companies should be planning for today.

Attendees will learn best practices for data and analytics implementation in a modern data and AI -driven enterprise from seasoned executives and an experienced industry analyst in a packed, unscripted, candid discussion.

Last year, Big Data London’s GenAI theatres were packed. Fast forward 12 months, and AI is everywhere. So, this AI lark is easy now… right?  

 

Lifting the lid on the AI bubble, reality is starting to bite. AI initiatives are stalling, models are drifting, and demonstrating tangible business value is really hard. Why? Because we’ve all sprinted into the AI future without first packing the essentials: high-quality, trusted data; a shared language for decision-making; solid governance; and the skilled people to make it all work.  

 

In 2025, the organisations that will see the best returns from their AI programs are those that have gone back to the future by pressing rewind to get their data foundations right before scaling the shiny stuff.  

 

Join Andy Crossley, CTO at Oakland, alongside Alex Pearce, Chief Microsoft Strategist at Softcat, for a no-holds-barred conversation about the realities of AI in practice.  

 

Lifting the lid on:  

 

Why so many AI projects fail to deliver real value  

 

The critical data foundations every business needs to succeed  

 

Real-world lessons from organisations discovering that AI is far more complex than the hype suggests  

 

The good news? You’ll leave with practical, actionable steps to start unlocking value from your AI investments.  

 

We can’t promise all the answers, but this session will reassure you that you are not alone. We aim to inspire new thinking and provide the guidance you need to navigate the most common pitfalls on the path to making AI work for you. 

To explore how the University of Oxford leverages a unified approach to high-performance computing infrastructure and scalable data platforms across the Big Data Institute and the Centre for Human Genetics to advance biomedical research across the entire University.

This session will discuss:

  • Breakthroughs enabled by HPC and secure data platforms in health research
  • Infrastructure needs for biomedical innovation and large-scale data science
  • Oxford’s partnership journey with Dell Technologies and NVIDIA and its real-world impact
  • How scalable AI infrastructure is accelerating research outcomes

DuckDB is well-loved by SQL-ophiles to handle their small data workloads. How do you make it scale? What happens when you feed it Big Data? What is this DuckLake thing I've been hearing about? This talk will help answer these questions from real-world experience running a DuckDB service in the cloud.