talk-data.com talk-data.com

Z

Speaker

Zdravko Hvarlingov

2

talks

Senior Data Engineer at Financial Times

Filter by Event / Source

Talks & appearances

2 activities · Newest first

Search activities →

Investigative journalism often relies on uncovering hidden patterns in vast amounts of unstructured and semi-structured data. At the FT, we leverage Airflow to orchestrate AI-powered pipelines that transform complex, fragmented datasets into structured insights. Our Storyfinding team works closely with journalists to automate tedious data processing, enabling them to tell stories that might otherwise go untold. This talk will explore how we use Airflow to process and analyze text, documents, and other difficult-to-structure data sources combining AI, machine learning, and advanced computational techniques to extract meaningful entities, relationships, and patterns. We’ll also showcase our connection analysis workflows, which link various datasets to reveal previously hidden chains of people and companies, a crucial capability for investigative reporting. Attendees will learn: How Airflow can orchestrate AI-driven pipelines for handling unstructured and semi-structured data. Techniques for automating connection analysis to support investigative journalism. Lessons from our experience working with journalists to develop data-driven storytelling and storyfinding capabilities.

Inside the Financial Times, we’ve been gradually moving our batching data processing from a custom solution to Airflow. To enable various teams within the company to use Airflow more effectively, we’ve been working on extending the system’s self-service capabilities. This includes giving ownership to teams of their DAGs and separating resources such as connections. The batch data ingestion processes are the main ETL - like jobs that we run on Airflow. The creation of a new job used to be a manual and repetitive task of receiving the data specification, creating the requisite tables in our data warehouse and writing the DAG that would move the data there. Airflow allowed us to automate this process to a degree that surprised us, completely removing the need to write DAG code. We will use the talk to describe what the current process of creating a new ETL workflow looks like and our plans for further improvements.