talk-data.com talk-data.com

Topic

scikit-learn

7

tagged

Activity Trend

3 peak/qtr
2020-Q1 2026-Q1

Activities

7 activities · Newest first

Scikit-learn now makes it easier to explore estimators by displaying their parameter values and allowing them to be copied. In the next release, each parameter will also include a short documentation preview and a link to the full reference page. More enhancements are on the way to make model inspection even richer and more intuitive. This work blends front-end development with Python. Dea's path into open source and the PyData ecosystem started with a desire for a new career direction and a lifelong curiosity for technical challenges.

This presentation introduces the Genetic Algorithms + Feature Importance Feature Selection technique, implemented in the open source Python package felimination. Genetic algorithms are a powerful optimization technique that can be effectively utilized for feature selection in machine learning models. By combining genetic algorithms with feature importance, we aim to enhance the feature selection process, leading to more robust and interpretable models. We will start by reviewing genetic algorithms, detailing the steps of pool initialization, crossover, mutation, and selection. The presentation will continue by showcasing some code snippets using felimination, a Python package containing a suite of algorithms for feature selection, including the genetic algorithm with feature importance selector. Claudio Salvatore Arcidiacono is a Senior Machine Learning Engineer at Mollie. He has been working in the fintech sector over the past 7 years with lots of experience in classical machine learning problems. He loves to contribute to data science open source libraries like feature engine, scikit-learn, and narwhals. He maintains a couple of open source libraries himself (felimination and sklearo). In his free time, he is a coffee scientist, using a data-driven approach to dial in the perfect cup of espresso.

Scikit-learn is a popular machine learning library. It currently has over 200 estimators ready to use for a vast array of use cases. What if you are working on something special that still hasn't found its way into the library? Scikit-learn offers a way to write new compatible estimators, which can be seamlessly integrated with the rest of the library. We will look into what an estimator is, what API that scikit-learn estimators have, reasons why you would like to implement your own and an example of how to. We will end with real-world examples of how other OSS projects use this for their needs.

Passing metadata such as sample_weight and groups through a scikit-learn cross_validate, GridSearchCV, or a Pipeline to the right estimators, scorers, and CV splitters has been either cumbersome, hacky, or impossible. The new metadata routing mechanism in scikit-learn enables you to pass metadata through these objects. As a use-case, we study how you can implement a revenue sensitive scoring while doing a hyperparameter search within a GridSearchCV object.

Today state of the art technology and scientific research strongly depend on open source libraries. The demographic of the contributors to these libraries is predominantly white and male. This situation creates problems not only for individual contributors outside of this demographic but also for open source projects such as loss of career opportunities and less robust technologies, respectively. In recent years there have been a number of various recommendations and initiatives to increase the participation in open source projects of groups who are underrepresented in this domain. While these efforts are valuable and much needed, contributor diversity remains a challenge in open source communities. This talk highlights the underlying problems and explores how we can overcome them.