Topic

Pandas

data_manipulation data_analysis python

Activities

3

tagged

Activity Trend

17 peak/qtr

2020-Q1 2026-Q1

Top Events

O'Reilly Data Science Books 72 Data Engineering Podcast 11 O'Reilly Data Visualization Books 10 O'Reilly Data Engineering Books 9 Databricks DATA + AI Summit 2023 8 PyConDE & PyData Berlin 2023 8 SciPy 2025 8 Data + AI Summit 2025 3 The Joe Reis Show 3 PyData Paris 2024 3 DataTopics: All Things Data, AI & Tech 2 April 5-6: FREE 2-Day Deep Learning Fundamentals NVIDIA DLI Certification Course 2

Top Speakers

Tobias Macey 11 Wes McKinney (Posit) 6 Dr. Yasin Ceran (KAIST) 4 Joe Reis (DeepLearning.AI) 3 Michael Heydt 3 Patrick Hoefler 3 Stefanie Molin 3 Thomas Joseph 3 Antonio Rueda-Toicen (Hasso Plattner Institute) 3 Fabio Nelli 3 Robert Thas John 3 Marco Gorelli (Narwhals) 3

Activities

Showing filtered results

All Video Podcast Book

Filtering by: PyData Paris 2024 ×

Open Source Sustainability & Philanthropy: Building Contributor Communities

2024-09-26 · PyData Paris 2024

talk

by Devpriya Dave , Alyssa Wright

Open Source Software, the backbone of today’s digital infrastructure, must be sustainable for the long-term. Qureshi and Fang (2011) find that motivating, engaging, and retaining new contributors is what makes open source projects sustainable.

Yet, as Steinmacher, et al. (2015) identifies, first-time open source contributors often lack timely answers to questions, newcomer orientation, mentors, and clear documentation. Moreover, since the term was first coined in 1998, open source lags far behind other technical domains in participant diversity. Trinkenreich, et al. (2022) reports that only about 5% of projects were reported to have women as core developers, and women authored less than 5% of pull requests, but had similar or even higher rates of pull request acceptances to men. So, how can we achieve more diversity in open source communities and projects?

Bloomberg’s Women in Technology (BWIT) community, Open Source Program Office (OSPO), and Corporate Philanthropy team collaborated with NumFOCUS to develop a volunteer incentive model that aligns business value, philanthropic impact, and individual technical growth. Through it, participating Bloomberg engineers were given the opportunity to convert their hours spent contributing to the pandas open source project into a charitable donation to a non-profit of their choice.

The presenters will discuss how we wove together differing viewpoints: non-profit foundation and for-profit corporation, corporate philanthropy and engineers, first-time contributors and core devs. They will showcase why and how we converted technical contributions into charitable dollars, the difference this community-building model had in terms of creating a diverse and sustained group of new open source contributors, and the viability of extending this to other open source projects and corporate partners to contribute to the long-term sustainability of open source—thereby demonstrating the true convergence of tech and social impact.

NOTE: [1] Qureshi, I, and Fang, Y. "Socialization in open source software projects: A growth mixture modeling approach." 2011. [2] Steinmacher, I., et al. "Social barriers faced by newcomers placing their first contribution in open source software projects." 2015. [3] Trinkenreich, B., et al. "Women’s participation in open source software: A survey of the literature." 2022.

Fast NetworkX and How Accelerated Backends Are Changing Graph Analytics

2024-09-26 · PyData Paris 2024

talk

by Erik Welch , Rick Ratzel

Analytics Python

NetworkX is arguably the most popular graph analytics library available today, but one of its greatest strengths - the pure-python implementation - is also possibly its biggest weakness. If you're a seasoned data scientists or a new student of the fascinating field of graph analytics, you're probably familiar with NetworkX and interested in how to make this extremely easy-to-use library powerful enough to handle realistically large graph workflows that often exceed the limitations of its pure-python implementation.

This talk will describe a relatively new capability of NetworkX; support for accelerated backends, and how they can benefit NetworkX users by allowing it to finally be both easy to use and fast. Through the use of backends, NetworkX can also be incorporated into workflows that take advantage of similar accelerators, such as Accelerated Pandas (cudf.pandas), to finally make these easy to use solutions scale to larger problems.

Attend this talk to learn about how you can leverage the various backends available to NetworkX today to seamlessly run graph analytics on GPUs, use GraphBLAS implementations, and more, all without leaving the comfort and convenience of the most popular graph analytics library available.

Jupylates: spaced repetition for teaching with Jupyter

2024-09-25 · PyData Paris 2024

talk

by Nicolas M. Thiéry , Chiara Marmo

AI/ML GitLab

Jupyter based environments are getting a lot of traction for teaching computing, programming, and data sciences. The narrative structure of notebooks has indeed proven its value for guiding each student at it's own pace to the discovery and understanding of new concepts or new idioms (e.g. how do I extract a column in pandas?). But then these new pieces of knowledge tend to quickly fade out and be forgotten. Indeed long term acquisition of knowledge and skills takes reinforcement by repetition. This is the foundation of many online learning platforms like Webwork or WIMS that offer exercises with randomization and automatic feedback. And of popular "AI-powered" apps -- e.g. to learn foreign languages -- that use spaced repetition algorithms designed by educational and neuro sciences to deliver just the right amount of repetition.

What if you could author such exercizes as notebooks, to benefit from everything that Jupyter can offer (think rich narratives, computations, visualization, interactions)? What if you could integrate such exercises right into your Jupyter based course? What if a learner could get personalized exercise recommandations based on their past learning records, without having to give away these sensitive pieces of information away?

That's Jupylates (work in progress). And thanks to the open source scientific stack, it's just a small Jupyter extension.