Ever tried building a credit risk model when your data lives in Google Sheets and your loan statuses are about as reliable as weather forecasts? You'll learn practical data science lessons about surviving data quality issues, the critical importance of target variable definition, adding genetics to feature selection algorithms, and how engineered transactional features can transform your predictions from probably fine to we actually know what we're doing. We’ll show how classical ML approaches like logistic regression and XGBoost remain highly effective for binary classification problems, proving that sometimes the fundamentals work better than the latest AI trends. Perfect for anyone who's ever wondered how machine learning works when your data isn't clean, your labels aren't perfect, and your stakeholders want results yesterday.
talk-data.com
Topic
xgboost
2
tagged
Activity Trend
In this tutorial, we will explore a range of feature engineering techniques for time series forecasting using popular machine learning algorithms such as XGBoost, LightGBM, and CatBoost. We'll begin by transforming time series data into a tabular format and demonstrate how to create window and lag features, as well as features that capture seasonality and trends.
We'll cover best practices for encoding categorical variables, decomposing time series, identifying outliers, and avoiding common pitfalls such as data leakage and look-ahead bias. Additionally, we’ll touch on more advanced topics like intermittency and hierarchical forecasting.
The session will also delve into cross-validation methods - specifically backtesting methods suited for time series data. We'll examine why traditional K-fold cross-validation is inappropriate for time-dependent datasets and highlight alternative approaches along with their trade-offs.
Finally, we’ll review best practices for evaluating model performance. This includes a comprehensive overview of error metrics, discussing their strengths, weaknesses, and the contexts in which each should be used.