Day 2 focuses on visual dataset curation with FiftyOne and iterative improvement of image classification models.
talk-data.com
Topic
google colab
13
tagged
Activity Trend
Top Events
Focus on building and training neural networks with PyTorch.
Focus on visual dataset curation with FiftyOne and iterative improvement of image classification models.
Hands-on workshop on cleaning and preparing high-quality datasets using Data Prep Kit. Topics include extracting content from PDFs and HTML, cleaning up markup, detecting and removing SPAM content, scoring and removing low-quality documents, identifying and removing PII data, and detecting and removing HAP (Hate Abuse Profanity) speech. More about Data Prep Kit: https://github.com/IBM/data-prep-kit
Hands-on workshop on using Data Prep Kit to extract content from PDFs/HTML, clean up data, remove SPAM, score and remove low-quality documents, identify and remove PII data, and detect and remove HAP (Hate Abuse Profanity) speech to improve dataset quality. Code will be run in Google Colab using Python.
Hands-on workshop on using Data Prep Kit to clean and prepare high-quality datasets: extract content from PDFs/HTML, cleanup markups, remove SPAM, score and filter low-quality documents, identify and remove PII data, and detect Hate/Abusive language. Prerequisites: comfortable with Python; run the workshop in Google Colab.
Hands-on session to explore Data Prep Kit and accelerate data preparation for building robust LLM applications. Topics include getting started with Data Prep Kit, extracting content from PDFs, DOCX, and HTML, cleanup of excess markup, detecting/removing duplicate documents, and removing low-quality and spam documents. Attendees should be comfortable with Python; workshop code will run in Google Colab.
Hands-on workshop to explore IBM Data Prep Kit for data preparation, including getting started, extracting content from PDFs, DOCX, and HTML, cleaning markup, deduplicating data, and removing low-quality or spam documents. The session will be run in Google Colab and is suitable for LLM app developers, data scientists, and data engineers. Prerequisites: comfortable with Python.
Hands-on session to explore Data Prep Kit and how to accelerate data preparation for LLM applications. The workshop covers getting started with Data Prep Kit, extracting content from PDFs, DOCX, and HTML, cleaning markup, deduplicating content, and detecting/removing low-quality or spam documents.
Hands-on workshop exploring Docling for data wrangling and document extraction. Topics include getting started with Docling, extracting content from PDFs and HTML, handling tables and images, and extracting content from scanned PDFs using OCR.
Hands-on session exploring how to use Docling for data extraction and cleanup across PDFs, HTML, and DOCX. Includes getting started with Docling, extracting content from documents, handling table and image data, and extracting content from scanned PDF documents using OCR.
Hands-on workshop on using Docling to extract and clean data from documents, including PDFs, HTML, and OCR for scanned PDFs. Key activities: getting started with Docling; extracting content from PDFs/HTML; handling table and image data; extracting content from scanned PDFs using OCR.
Inhalte: Piwik PRO hat jede Menge APIs für alle erdenklichen Zwecke. Auch zum gezielten Abruf von konsolidierten Zahlen, Rohdaten und Berichten. Wir schauen uns an, was man braucht, um die API zu nutzen, wie man Daten abruft und für verschiedene Zwecke einsetzen kann. Dazu verwenden wir Python und Google Colab Notebooks als Basis und fangen ganz von vorn an, so dass jeder auf Wunsch - parallel oder später - die einzelnen Schritte mit den eigenen Daten nachvollziehen und weiter ausbauen kann. Programmierkenntnisse sind dafür nicht zwingend erforderlich – auch das ist ein Vorteil des Tool-Stacks, den wir im Training näher beleuchten werden.