talk-data.com talk-data.com

Event

PyData Paris 2024

2024-09-25 – 2024-09-27 PyData

Activities tracked

7

Filtering by: Cloud Computing ×

Sessions & talks

Showing 1–7 of 7 · Newest first

Search within this event →

Processing medical images at scale on the cloud

2024-09-26
talk

The MedTech industry is undergoing a revolutionary transformation with continuous innovations promising greater precision, efficiency, and accessibility. In particular oncology, a branch of medicine that focuses on cancer, will benefit immensely from these new technologies, which may enable clinicians to detect cancer earlier and increase chances of survival. Detecting cancerous cells in microscopic photography of cells (Whole Slide Images, aka WSIs) is usually done with segmentation algorithms, which neural networks (NNs) are very good at. While using ML and NNs for image segmentation is a fairly standard task with established solutions, doing it on WSIs is a different kettle of fish. Most training pipelines and systems have been designed for analytics, meaning huge columns of small individual datums. In the case of WSIs, a single image is so huge that its file can be up to dozens of gigabytes. To allow innovation in medical imaging with AI, we need efficient and affordable ways to store and process these WSIs at scale.

Onyxia: A User-Centric Interface for Data Scientists in the Cloud Age

2024-09-26
talk

In this talk, we'll look into why Insee had to go beyond usual tools like JupyterHub. With data science growing, it has become important to have tools that are easy to use, can change as needed, and help people work together. The opensource software Onyxia brings a new answer by offering a user-friendly way to boost creativity in a data environment that uses massively containerization and object storage.

Color-composite images from the James Webb Space Telescope

2024-09-26
talk

The astronomical community has built a good amount of software to visualize and analyze the images obtained with the James Webb Space Telescope (JWST). In this talk, I will present the open-source Python package Jdaviz. I will show you how to visualize publicly available JWST images and build the pretty color images that we have all seen in the media. Half the talk will be an introduction to JWST and Jdaviz and half will be a hands on session on a cloud platform (you will only need to create an account) or on your own machine (the package is available on PyPI).

Geoscience at Massive Scale

2024-09-25
talk

When scaling geoscience workloads to large datasets, many scientists and developers reach for Dask, a library for distributed computing that plugs seamlessly into Xarray and offers an Array API that wraps NumPy. Featuring a distributed environment capable of running your workload on large clusters, Dask promises to make it easy to scale from prototyping on your laptop to analyzing petabyte-scale datasets.

Dask has been the de-facto standard for scaling geoscience, but it hasn’t entirely lived up to its promise of operating effortlessly at massive scale. This comes up in a few ways: - Correctly chunking your dataset has a significant impact on Dask’s ability to scale - Workers accidentally run out of memory due to: - Data being loaded too eagerly - Rechunking - Unmanaged memory

Over the last few months, Dask has addressed many of those pains and continues to do so through: - Improvements to its scheduling algorithms - A faster and more memory-stable method for rechunking - First-of-its-kind logical optimization layer for a distributed array framework (ongoing)

Join us as we dive into real-world geoscience workloads, exploring how Dask empowers scientists and developers to run their analyses at massive scale. Discover the impact of improvements made to Dask, ongoing challenges, and future plans for making it truly effortless to scale from your laptop to the cloud.

Building Large Scale ETL Pipelines with Dask

2024-09-25
talk

Building scalable ETL pipelines and deploying them in the cloud can seem daunting. It shouldn't be. Leveraging proper technologies can make this process easy. We will discuss the whole process of developing a composable and scalable ETL pipeline centred around Dask that is fully built with Open Source tools and how we can deploy to the cloud.

JupyterLite, Emscripten-forge, Xeus, and Mamba -- The computational quartet for in browser interactive computing"

2024-09-25
talk
Thorsten Beier , Jeremy Tuloup , Ian Thomas (Publicis Spine)

JupyterLite is a JupyterLab distribution that runs entirely in the web browser, backed by in-browser language kernels. With standard JupyterLab, where kernels run in separate processes and communicate with the client by message passing, JupyterLite uses kernels that run entirely in the browser, based on JavaScript and WebAssembly.

This means JupyterLite deployments can be scaled to millions of users without the need for individual containers for each user session, only static files need to be served, which can be done with a simple web server like GitHub pages.

This opens up new possibilities for large-scale deployments, eliminating the need for complex cloud computing infrastructure. JupyterLite is versatile and supports a wide range of languages, with the majority of its kernels implemented using Xeus, a C++ library for developing language-specific kernels.

In conjunction with JupyterLite, we present Emscripten-forge, a conda/mamba based distribution for WebAssembly packages. Conda-forge is a community effort and a GitHub organization which contains repositories of conda recipes and thus provides conda packages for a wide range of software and platforms. However, targeting WebAssembly is not supported by conda-forge. Emscripten-forge addresses this gap by providing conda packages for WebAssembly, making it possible to create custom JupyterLite deployments with tailored conda environments containing the required kernels and packages.

In this talk, we delve deep into the JupyterLite ecosystem, exploring its integration with Xeus Mamba and Emscripten-forge.

We will demonstrate how this can be used to create sophisticated JupyterLite deployments with custom conda environments and give an outlook for future developments like R packages and runtime package resolution.

Building web-based engineering applications with JupyterLab components.

2024-09-25
talk

In the past few years, web-based engineering software has been steadily gaining momentum over traditional desktop-based applications. It represents a significant shift in how engineers access, collaborate, and utilize software tools for design, analysis, and simulation tasks. However, converting desktop-based applications to web applications presents considerable challenges, especially in translating the functionality of desktop interfaces to the web. It requires careful planning and design expertise to ensure intuitive navigation and responsiveness.

JupyterLab provides a flexible, interactive environment for scientific computing. Despite its popularity among data scientists and researchers, the full potential of JupyterLab as a platform for building scientific web applications has yet to be realized.

In this talk, we will explore how its modular architecture and extensive ecosystem facilitate the seamless integration of components for diverse functionalities: from rich user interfaces, accessibility, and real-time collaboration to cloud deployment options. To illustrate the platform's capabilities, we will demo JupyterCAD, a parametric 3D modeler built on top of JupyterLab components.