Speaker

Justine BEL-LETOILE

Activities

2

talks

Filter by Event / Source

PyData Paris 2025 1 PyData Paris 2024 1

Talks & appearances

2 activities · Newest first

Search activities →

Balancing Privacy and Utility: Efficient PII Detection and Replacement in Textual Data

2025-10-01 · PyData Paris 2025 Watch

talk

with Elizaveta Clouet , Justine BEL-LETOILE

NLP

Anonymizing free-text data is harder than it seems. While structured databases have well-established anonymization techniques, textual data — like invoices, resumes, or medical records — poses unique challenges. Personally identifiable information (PII) can appear anywhere, in unpredictable formats, and how to modify it while preserving the dataset's usefulness?

Let's explore a practical, open-source 2-step approach to text anonymization: (1) detecting PII using NER models and (2) replacing it while preserving key dataset characteristics (e.g. document formatting, statistical distributions). We will demonstrate how to build a robust pipeline leveraging tools such as pre-trained PII detection models, gliner for fine-tuning, or Faker for generating meaningful replacements.

Ideal for those with a basic understanding of NLP, this session offers practical insights for anyone working with sensitive textual data.

Leveraging LLMs to build supervised datasets suitable for smaller models

2024-09-25 · PyData Paris 2024

talk

with Cérès Carton , Justine BEL-LETOILE

LLM NLP

For some natural language processing (NLP) tasks, based on your production constraints, a simpler custom model can be a good contender to off-the-shelf large language models (LLMs), as long as you have enough qualitative data to build it. The stumbling block being how to obtain such data? Going over some practical cases, we will see how we can leverage the help of LLMs during this phase of an NLP project. How can it help us select the data to work on, or (pre)annotate it? Which model is suitable for which task? What are common pitfalls and where should you put your efforts and focus?