Hands-on session exploring how to use Docling for data extraction and cleanup across PDFs, HTML, and DOCX. Includes getting started with Docling, extracting content from documents, handling table and image data, and extracting content from scanned PDF documents using OCR.
talk-data.com
Topic
docx
2
tagged
Activity Trend
2
peak/qtr
2020-Q1
2026-Q1
Hands-on workshop on using Docling to extract and clean data from documents, including PDFs, HTML, and OCR for scanned PDFs. Key activities: getting started with Docling; extracting content from PDFs/HTML; handling table and image data; extracting content from scanned PDFs using OCR.