talk-data.com talk-data.com

Topic

pdf

4

tagged

Activity Trend

3 peak/qtr
2020-Q1 2026-Q1

Activities

4 activities · Newest first

This project delivers a fully automated software pipeline that converts raw sustainability reports into ESRS-tagged, XBRL-ready disclosures for CSRD compliance. The tool ingests diverse file formats (PDF, iXBRL, CSV), classifies content using a fine-tuned BERT model, validates completeness and consistency against ESRS rules, and exports compliant XBRL packages. By automating what is traditionally a 6–12-week manual process, the tool reduces turnaround to 1–2 days and lowers costs by up to €500K.

Hands-on workshop on cleaning and preparing high-quality datasets using Data Prep Kit. Topics include extracting content from PDFs and HTML, cleaning up markup, detecting and removing SPAM content, scoring and removing low-quality documents, identifying and removing PII data, and detecting and removing HAP (Hate Abuse Profanity) speech. More about Data Prep Kit: https://github.com/IBM/data-prep-kit