Selenium

AI-Powered Web Scraping: From Data Collection to Strategic Insights

2025-12-09 · PyData Eindhoven 2025

talk

by Yevhenii

AI/ML Airflow Cloud Computing Data Collection Kubernetes Marketing Playwright Python

Companies today are hungry for external data to stay competitive, but actually getting and making sense of that data isn’t easy. Standard web scraping often produces messy or incomplete results, and modern anti-bot systems make reliable collection even tougher.

In this talk, I’ll share how pairing Python’s scraping frameworks (like Scrapy, Playwright, and Selenium) with AI/ML can turn raw, unstructured data into clear, actionable insights.

We’ll look at:

1) How to build scrapers that still work in 2025.

2) Ways to use AI to automatically clean, enrich, and classify data.

3) Real-world applications of sentiment analysis for reviews and social media.

4) Case studies showing how SMEs have used these pipelines to sharpen marketing and product strategies.

By the end, you’ll see how to design pipelines that don’t just gather data, but deliver real strategic value. The session will focus on practical Python tools, scalable deployment (Airflow, Kubernetes, cloud platforms), and key lessons learned from hands-on projects at the intersection of scraping and AI.

Spoonfuls of TDD and a pinch of AI: ingredients for a robust automation framework

2025-08-06 · Spoonfuls of TDD and a pinch of AI ingredients for a robust automation framework

talk

ai tdd

In this talk, we will explore how Test Driven Development (TDD) serves as a foundation for building robust automation frameworks like Selenium. The speaker shares how TDD has improved development processes and how AI can act as a helpful pair programmer, with a coding demo showing building a new Selenium API using TDD and AI. The talk covers prompts, structuring prompts to provide context with user stories and test cases, and how to get AI to provide valuable code suggestions.

The test automation toolbox: exploring frameworks built on WebDriver — the API for browser automation

2025-05-14 · The test automation toolbox: exploring frameworks built on WebDriver

talk

Java JavaScript Python c# ruby webdriver

This session explores the power and versatility of WebDriver, the standard API for browser automation, and its broad adoption across programming languages. We’ll dive into the WebDriver ecosystem, examining open-source frameworks built in Java, C#, Ruby, Python, and JavaScript. Attendees will gain insights into how these frameworks leverage WebDriver, the challenges of cross-language implementation, and best practices for choosing the right tool. We’ll also assess framework health using key GitHub metrics and discuss ways to contribute to the open-source automation community. Whether you’re a tester, developer, or QA engineer, this talk will help you navigate the test automation toolbox more effectively.

Hands-On Web Scraping with Python - Second Edition

2023-10-06 · O'Reilly Data Science Books O'Reilly Amazon

book

by Anish Chapagain

API Data Science Pandas Plotly Python data data-science data-science-tasks web-scraping

In "Hands-On Web Scraping with Python," you'll learn how to harness the power of Python libraries to extract, process, and analyze data from the web. This book provides a practical, step-by-step guide for beginners and data enthusiasts alike. What this Book will help me do Master the use of Python libraries like requests, lxml, Scrapy, and Beautiful Soup for web scraping. Develop advanced techniques for secure browsing and data extraction using APIs and Selenium. Understand the principles behind regex and PDF data parsing for comprehensive scraping. Analyze and visualize data using data science tools such as Pandas and Plotly. Build a portfolio of real-world scraping projects to demonstrate your capabilities. Author(s) Anish Chapagain, the author of "Hands-On Web Scraping with Python," is an experienced programmer and instructor who specializes in Python and data-related technologies. With his vast experience in teaching individuals from diverse backgrounds, Anish approaches complex concepts with clarity and a hands-on methodology. Who is it for? This book is perfect for aspiring data scientists, Python beginners, and anyone who wants to delve into web scraping. Readers should have a basic understanding of how websites work but no prior coding experience is required. If you aim to develop scraping skills and understand data analysis, this book is the ideal starting point.

Hands-On Web Scraping with Python

2019-07-15 · O'Reilly Data Science Books O'Reilly Amazon

book

by Anish Chapagain

API Python Cyber Security data data-science data-science-tasks web-scraping

This book, "Hands-On Web Scraping with Python", is your comprehensive guide to mastering web scraping techniques and tools. Harnessing the power of Python libraries like Scrapy, Beautiful Soup, and Selenium, you'll learn how to extract and analyze data from websites effectively and efficiently. What this Book will help me do Master the foundational concepts of web scraping using Python. Efficiently use libraries such as Scrapy, Beautiful Soup, and Selenium for data extraction. Handle advanced scenarios such as forms, logins, and dynamic content in scraping. Leverage XPath, CSS selectors, and Regex for precise data targeting and processing. Improve scraping reliability and manage challenges like cookies, API use, and web security. Author(s) None Chapagain is an accomplished Python programmer and an expert in web scraping methodologies. With years of experience in applying Python to solve practical data challenges, they bring a clear and insightful approach to teaching these skills. Readers appreciate their practical examples and ready-to-use guidance for real-world applications. Who is it for? This book is designed for Python developers and data enthusiasts eager to master web scraping. Whether you're a beginner looking to dep dive into new techniques or an analyst needing reliable data extraction methods, this book offers clear guidance. A basic understanding of Python is recommended to fully benefit from this text.

Practical Web Scraping for Data Science: Best Practices and Examples with Python

2018-04-18 · O'Reilly Data Science Books O'Reilly Amazon

book

by Seppe vanden Broucke , Bart Baesens

Data Science HTML JavaScript Python SAS SPSS data data-science data-science-tasks web-scraping

This book provides a complete and modern guide to web scraping, using Python as the programming language, without glossing over important details or best practices. Written with a data science audience in mind, the book explores both scraping and the larger context of web technologies in which it operates, to ensure full understanding. The authors recommend web scraping as a powerful tool for any data scientist’s arsenal, as many data science projects start by obtaining an appropriate data set. Starting with a brief overview on scraping and real-life use cases, the authors explore the core concepts of HTTP, HTML, and CSS to provide a solid foundation. Along with a quick Python primer, they cover Selenium for JavaScript-heavy sites, and web crawling in detail. The book finishes with a recap of best practices and a collection of examples that bring together everything you've learned and illustrate various data science use cases. What You'll Learn Leverage well-established best practices and commonly-used Python packages Handle today's web, including JavaScript, cookies, and common web scraping mitigation techniques Understand the managerial and legal concerns regarding web scraping Who This Book is For A data science oriented audience that is probably already familiar with Python or another programming language or analytical toolkit (R, SAS, SPSS, etc). Students or instructors in university courses may also benefit. Readers unfamiliar with Python will appreciate a quick Python primer in chapter 1 to catch up with the basics and provide pointers to other guides as well.

Python Web Scraping Cookbook

2018-02-09 · O'Reilly Data Science Books O'Reilly Amazon

book

by Mei Lu , Lazar Telebak , Michael Heydt

AWS Cloud Computing Data Engineering JavaScript Python data data-science data-science-tasks web-scraping

Python Web Scraping Cookbook is your comprehensive guide to building efficient and functional web scraping tools using Python. With practical recipes, you'll learn to overcome the challenges of dynamic content, captcha, and irregular web structures while deploying scalable solutions. What this Book will help me do Master the use of Python libraries like BeautifulSoup and Scrapy for scraping data. Perfect techniques for handling JavaScript-heavy sites using Selenium. Learn to overcome web scraping challenges, such as captchas and rate-limiting. Design scalable scraping pipelines with cloud deployment in AWS. Understand web data extraction techniques with XPath, CSS selectors, and more. Author(s) Michael Heydt is a seasoned software engineer and technical author with a focus on data engineering and cloud solutions. Having worked with Python extensively, he brings real-world insights into web scraping. His practical approach simplifies complex concepts. Who is it for? This book is perfect for Python developers and data enthusiasts keen to master web scraping techniques. If you're a programmer with insights into Python scripting and wish to scrape, analyze, and utilize web data efficiently, this book is for you.

Python Web Scraping - Second Edition

2017-05-30 · O'Reilly Data Science Books O'Reilly Amazon

book

by Katharine Jarmul (Cape Privacy)

Data Collection JavaScript Python data data-science data-science-tasks web-scraping

"Python Web Scraping" is a practical guide to extracting and processing online data using the Python programming language. With this book, you'll learn step-by-step how to build web scrapers and crawlers that can handle a range of data sources and structures. After reading this, you will be equipped to tackle real-world web scraping challenges effectively. What this Book will help me do Learn how to extract structured data from standard webpages using Python. Gain proficiency with libraries such as Selenium and PyQt for handling dynamic and JavaScript-dependent content. Build concurrent scrapers to efficiently process large volumes of web pages in parallel. Understand and implement form interaction automation for data extraction from complex websites. Develop advanced scrapers using Scrapy to handle sophisticated web crawling tasks. Author(s) None Jarmul is an experienced data scientist and programmer with extensive knowledge in Python. They bring practical expertise from working on real-world web scraping projects. In their work, they focus on creating content that empowers readers by demystifying complex technical topics. Who is it for? This book is perfect for software developers eager to dive into web scraping using Python, even if they're new to the subject. If you have basic to intermediate Python skills and want to automate data collection and processing, this is the book for you. The techniques here are valuable for tackling diverse data extraction scenarios.

talk-data.com

Activity Trend

Top Events

Top Speakers

AI-Powered Web Scraping: From Data Collection to Strategic Insights

Spoonfuls of TDD and a pinch of AI: ingredients for a robust automation framework

The test automation toolbox: exploring frameworks built on WebDriver — the API for browser automation

Hands-On Web Scraping with Python - Second Edition

Hands-On Web Scraping with Python

Practical Web Scraping for Data Science: Best Practices and Examples with Python

Python Web Scraping Cookbook

Python Web Scraping - Second Edition