The presentation will introduce the “Stile eco-system” developed at ING Analytics to speed up the time to market of machine learning models for the instant lending domain. The main issue to solve is the duality between Spark and Pandas for feature generation. Spark is used for development while dealing with the billions of transactions stored in the data warehouse. Pandas is used in production when applications are scored one by one in a real-time situation. During the presentation, Gilles will explain how the template for model development works, with a specific focus on feature creation. Additionally, Gilles will highlight how Pandas and PySpark are integrated in common functionalities, and the user-friendly testing framework developed to ensure consistency between the two worlds, and, finally, how to easily trim the code to only produce the features required for the final model. Gilles Verbockhaven is Chapter Lead at ING Retail Banking Analytics and manages a team of five Data Scientists. He has been working at ING for 20 years now and has experience in various domains, ranging from market risk to modelling. Since 2017, he has been working in the Machine Learning area and has specialized in designing analytic solutions for collections and pricing. In his free time, he spends his energy running and biking.
talk-data.com
Topic
PySpark
big_data
distributed_computing
python
1
tagged
Activity Trend
14
peak/qtr
2020-Q1
2026-Q1
Top Events
O'Reilly Data Engineering Books
19
Databricks DATA + AI Summit 2023
16
Data + AI Summit 2025
13
Data Engineering Podcast
4
O'Reilly Data Science Books
2
PyData Berlin 2025
2
PyData Cardiff - July 2025
1
From a Fintech lens: MCP server live-coding & feature selection data hacks
1
dbt Coalesce 2025
1
PyData Seattle 2025
1
PyConDE & PyData Berlin 2023
1
SciPy 2025
1
Filtering by:
From a Fintech lens: MCP server live-coding & feature selection data hacks
×