talk-data.com

Topic

pdf

Activities

tagged

Activity Trend

3 peak/qtr

2020-Q1 2026-Q2

Top Events

[AI Alliance] Workshop: Preparing High Quality Datasets with Data Prep Kit 1 [AI Alliance] Workshop: Hands-on with Docling 1 Data Science Retreat Demo Day #42 1 [AI Alliance] Workshop: Hands-on with Docling 1

Activities

4 activities · Newest first

All Video Podcast Book

Automated ESRS-Tagging Pipeline for CSRD Compliance

2025-07-10 · Data Science Retreat Demo Day #42

talk

CSV bert csrd compliance esrs tagging ixbrl xbrl

This project delivers a fully automated software pipeline that converts raw sustainability reports into ESRS-tagged, XBRL-ready disclosures for CSRD compliance. The tool ingests diverse file formats (PDF, iXBRL, CSV), classifies content using a fine-tuned BERT model, validates completeness and consistency against ESRS rules, and exports compliant XBRL packages. By automating what is traditionally a 6–12-week manual process, the tool reduces turnaround to 1–2 days and lowers costs by up to €500K.

Data Prep Kit Workshop

2025-03-27 · [AI Alliance] Workshop: Preparing High Quality Datasets with Data Prep Kit

workshop

HTML Python data prep kit google colab

Hands-on workshop on cleaning and preparing high-quality datasets using Data Prep Kit. Topics include extracting content from PDFs and HTML, cleaning up markup, detecting and removing SPAM content, scoring and removing low-quality documents, identifying and removing PII data, and detecting and removing HAP (Hate Abuse Profanity) speech. More about Data Prep Kit: https://github.com/IBM/data-prep-kit

Docling Hands-on Workshop

2025-03-13 · [AI Alliance] Workshop: Hands-on with Docling

Hands-on workshop

HTML Python docling docx google colab ocr

Hands-on session exploring how to use Docling for data extraction and cleanup across PDFs, HTML, and DOCX. Includes getting started with Docling, extracting content from documents, handling table and image data, and extracting content from scanned PDF documents using OCR.

Getting started with Docling

2025-03-13 · [AI Alliance] Workshop: Hands-on with Docling

Hands-on workshop

HTML Python docling docx google colab ocr

Hands-on workshop on using Docling to extract and clean data from documents, including PDFs, HTML, and OCR for scanned PDFs. Key activities: getting started with Docling; extracting content from PDFs/HTML; handling table and image data; extracting content from scanned PDFs using OCR.