Cloud Computing

Processing medical images at scale on the cloud

2024-09-26 · PyData Paris 2024

talk

by guillaume desforges (tamtam.ai)

AI/ML Analytics

The MedTech industry is undergoing a revolutionary transformation with continuous innovations promising greater precision, efficiency, and accessibility. In particular oncology, a branch of medicine that focuses on cancer, will benefit immensely from these new technologies, which may enable clinicians to detect cancer earlier and increase chances of survival. Detecting cancerous cells in microscopic photography of cells (Whole Slide Images, aka WSIs) is usually done with segmentation algorithms, which neural networks (NNs) are very good at. While using ML and NNs for image segmentation is a fairly standard task with established solutions, doing it on WSIs is a different kettle of fish. Most training pipelines and systems have been designed for analytics, meaning huge columns of small individual datums. In the case of WSIs, a single image is so huge that its file can be up to dozens of gigabytes. To allow innovation in medical imaging with AI, we need efficient and affordable ways to store and process these WSIs at scale.

Onyxia: A User-Centric Interface for Data Scientists in the Cloud Age

2024-09-26 · PyData Paris 2024

talk

by Lino Galiana , Quentin Gaudel , Joseph Garrone , Comte Frédéric

Data Science Docker

In this talk, we'll look into why Insee had to go beyond usual tools like JupyterHub. With data science growing, it has become important to have tools that are easy to use, can change as needed, and help people work together. The opensource software Onyxia brings a new answer by offering a user-friendly way to boost creativity in a data environment that uses massively containerization and object storage.

Color-composite images from the James Webb Space Telescope

2024-09-26 · PyData Paris 2024

talk

by Camilla Pacifici , Jesse Averbukh

Python

The astronomical community has built a good amount of software to visualize and analyze the images obtained with the James Webb Space Telescope (JWST). In this talk, I will present the open-source Python package Jdaviz. I will show you how to visualize publicly available JWST images and build the pretty color images that we have all seen in the media. Half the talk will be an introduction to JWST and Jdaviz and half will be a hands on session on a cloud platform (you will only need to create an account) or on your own machine (the package is available on PyPI).

Geoscience at Massive Scale

2024-09-25 · PyData Paris 2024

talk

by Hendrik Makait

API NumPy

When scaling geoscience workloads to large datasets, many scientists and developers reach for Dask, a library for distributed computing that plugs seamlessly into Xarray and offers an Array API that wraps NumPy. Featuring a distributed environment capable of running your workload on large clusters, Dask promises to make it easy to scale from prototyping on your laptop to analyzing petabyte-scale datasets.

Dask has been the de-facto standard for scaling geoscience, but it hasn’t entirely lived up to its promise of operating effortlessly at massive scale. This comes up in a few ways: - Correctly chunking your dataset has a significant impact on Dask’s ability to scale - Workers accidentally run out of memory due to: - Data being loaded too eagerly - Rechunking - Unmanaged memory

Over the last few months, Dask has addressed many of those pains and continues to do so through: - Improvements to its scheduling algorithms - A faster and more memory-stable method for rechunking - First-of-its-kind logical optimization layer for a distributed array framework (ongoing)

Join us as we dive into real-world geoscience workloads, exploring how Dask empowers scientists and developers to run their analyses at massive scale. Discover the impact of improvements made to Dask, ongoing challenges, and future plans for making it truly effortless to scale from your laptop to the cloud.

Building Large Scale ETL Pipelines with Dask

2024-09-25 · PyData Paris 2024

talk

by Patrick Hoefler

ETL/ELT

Building scalable ETL pipelines and deploying them in the cloud can seem daunting. It shouldn't be. Leveraging proper technologies can make this process easy. We will discuss the whole process of developing a composable and scalable ETL pipeline centred around Dask that is fully built with Open Source tools and how we can deploy to the cloud.

JupyterLite, Emscripten-forge, Xeus, and Mamba -- The computational quartet for in browser interactive computing"

2024-09-25 · PyData Paris 2024

talk

by Thorsten Beier , Jeremy Tuloup , Ian Thomas (Publicis Spine)

GitHub JavaScript

JupyterLite is a JupyterLab distribution that runs entirely in the web browser, backed by in-browser language kernels. With standard JupyterLab, where kernels run in separate processes and communicate with the client by message passing, JupyterLite uses kernels that run entirely in the browser, based on JavaScript and WebAssembly.

This means JupyterLite deployments can be scaled to millions of users without the need for individual containers for each user session, only static files need to be served, which can be done with a simple web server like GitHub pages.

This opens up new possibilities for large-scale deployments, eliminating the need for complex cloud computing infrastructure. JupyterLite is versatile and supports a wide range of languages, with the majority of its kernels implemented using Xeus, a C++ library for developing language-specific kernels.

In conjunction with JupyterLite, we present Emscripten-forge, a conda/mamba based distribution for WebAssembly packages. Conda-forge is a community effort and a GitHub organization which contains repositories of conda recipes and thus provides conda packages for a wide range of software and platforms. However, targeting WebAssembly is not supported by conda-forge. Emscripten-forge addresses this gap by providing conda packages for WebAssembly, making it possible to create custom JupyterLite deployments with tailored conda environments containing the required kernels and packages.

In this talk, we delve deep into the JupyterLite ecosystem, exploring its integration with Xeus Mamba and Emscripten-forge.

We will demonstrate how this can be used to create sophisticated JupyterLite deployments with custom conda environments and give an outlook for future developments like R packages and runtime package resolution.

Building web-based engineering applications with JupyterLab components.

2024-09-25 · PyData Paris 2024

talk

by Trung Le

In the past few years, web-based engineering software has been steadily gaining momentum over traditional desktop-based applications. It represents a significant shift in how engineers access, collaborate, and utilize software tools for design, analysis, and simulation tasks. However, converting desktop-based applications to web applications presents considerable challenges, especially in translating the functionality of desktop interfaces to the web. It requires careful planning and design expertise to ensure intuitive navigation and responsiveness.

JupyterLab provides a flexible, interactive environment for scientific computing. Despite its popularity among data scientists and researchers, the full potential of JupyterLab as a platform for building scientific web applications has yet to be realized.

In this talk, we will explore how its modular architecture and extensive ecosystem facilitate the seamless integration of components for diverse functionalities: from rich user interfaces, accessibility, and real-time collaboration to cloud deployment options. To illustrate the platform's capabilities, we will demo JupyterCAD, a parametric 3D modeler built on top of JupyterLab components.

Scaling Airbyte: Challenges and Milestones on the Road to 1.0

2024-09-23 · Data Engineering Podcast Listen

podcast_episode

by Michel Tricot (Airbyte) , Tobias Macey

AI/ML Airbyte Data Engineering Data Management GenAI Modern Data Stack Python

Summary Airbyte is one of the most prominent platforms for data movement. Over the past 4 years they have invested heavily in solutions for scaling the self-hosted and cloud operations, as well as the quality and stability of their connectors. As a result of that hard work, they have declared their commitment to the future of the platform with a 1.0 release. In this episode Michel Tricot shares the highlights of their journey and the exciting new capabilities that are coming next. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementYour host is Tobias Macey and today I'm interviewing Michel Tricot about the journey to the 1.0 launch of Airbyte and what that means for the projectInterview IntroductionHow did you get involved in the area of data management?Can you describe what Airbyte is and the story behind it?What are some of the notable milestones that you have traversed on your path to the 1.0 release?The ecosystem has gone through some significant shifts since you first launched Airbyte. How have trends such as generative AI, the rise and fall of the "modern data stack", and the shifts in investment impacted your overall product and business strategies?What are some of the hard-won lessons that you have learned about the realities of data movement and integration?What are some of the most interesting/challenging/surprising edge cases or performance bottlenecks that you have had to address?What are the core architectural decisions that have proven to be effective?How has the architecture had to change as you progressed to the 1.0 release?A 1.0 version signals a degree of stability and commitment. Can you describe the decision process that you went through in committing to a 1.0 version?What are the most interesting, innovative, or unexpected ways that you have seen Airbyte used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Airbyte?When is Airbyte the wrong choice?What do you have planned for the future of Airbyte after the 1.0 launch?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links AirbytePodcast EpisodeAirbyte CloudAirbyte Connector BuilderSinger ProtocolAirbyte ProtocolAirbyte CDKModern Data StackELTVector DatabasedbtFivetranPodcast EpisodeMeltanoPodcast EpisodedltReverse ETLGraphRAGAI Engineering Podcast EpisodeThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Building With Gemini on Google Cloud – an Overview of Architecture, Capabilities and Usage

2024-09-19 · Big Data LDN 2024

Face To Face

by Bipul Kumar

AI/ML GCP LLM Cyber Security

This session explores Gemini's capabilities, architecture, and performance benchmarks. We'll delve into the significance of its extensive context window and address the critical aspects of safety, security, and responsible AI use. Hallucination, a common concern in LLM applications, remains a focal point of ongoing development. This talk will highlight recent advancements aimed at mitigating the risk of hallucination to enhance LLMs utility across various applications.

Customer Use Case: Snowflake's Epic Data Product Journey with DataOps.live

2024-09-19 · Big Data LDN 2024

Face To Face

by Guy Adams (DataOps.live)

AI/ML DataOps Snowflake

Snowflake had a big challenge: How do you enable a team of 1,000 sales engineers and field CTOs to successfully deploy over 100 new data products per week and demonstrate every feature and capability in the Snowflake AI Data Cloud tailored to different customer needs?

In this session, Andrew Helgeson, Manager of Technology Platform Alliances at Snowflake, and Guy Adams, CTO at DataOps.live, will explain how Snowflake builds and deploys hundreds of data products using DataOps.live. Join us for a deep dive into Snowflake's innovative approach to automating complex data product deployment — and to learn how Snowflake Solutions Central revolutionizes solution discovery and deployment to drive customer success.

Scaling Reliable Data Products and Data Mesh with Data Observability

2024-09-19 · Big Data LDN 2024

Face To Face

by Santosh K Sivan , Hendrik Serruys , Harvey Robson , Roberto Münger

Modern Data Stack Monte Carlo

Roche, is one of the world’s largest biotech companies, as well as a leading provider of in-vitro diagnostics and a global supplier of transformative innovative solutions across major disease areas. Over the past few years, they’ve undergone a migration to the cloud, adopted a modern data stack and implemented data mesh in order to double down on improving data reliability.

Join the data team at Roche to learn how they’ve leveraged data observability to support their sociotechnical shift to data mesh. They walk through their multi-year data observability journey, digging into how they implemented Monte Carlo in a global organization. They’ll also share their approach to data mesh at Roche and deep dive into a current use case.

The Marketing Data Cloud: Accor's Multi-Brand Multi-National CDP

2024-09-19 · Big Data LDN 2024

Face To Face

by Tejas Manohar (Hightouch)

CDP Marketing Snowflake

Accor, a world-leading hospitality group offering experiences across more than 110 countries in 5,500 properties, 10,000 food & beverage venues, wellness facilities or flexible workspaces, relies on its more than 45 hotel brands from luxury to economy and its most awarded traveler loyalty program to connect deeply with customers and increase their lifetime value. With a rich store of data centralized in Snowflake, the team set out to enable their marketing and business teams with a platform that would allow them to autonomously deliver hyper-personalized experiences and campaigns.

Join the session to learn about Accor’s CDP journey and how Hightouch, as their Composable CDP, helps them drive customer engagement, loyalty, and revenue.

Data Platform Engineering at Tenable: Turning your data model into a data platform

2024-09-19 · Big Data LDN 2024

Face To Face

by Thomas Milner

Data Modelling SaaS

This talk will share lessons learned from building an internal data platform to support several Cybersecurity SaaS applications. At Tenable, we have put the data model at the centre of our platform. A centralised data model provides consistent data experience for your application builders and customers alike and provides a focus for discussion and standardisation.

The discussion will highlight the following key areas:

1. Choose cloud: Cloud will accelerate your rate of delivery and reduce cognitive load for your team

2. Get started: Platforms need users and their feedback should drive the evolution of the platform.

3. Maintaining a product mindset: Treat your data platform like a product by maintaining a backlog while working towards longer term vision.

4. Structure your team for success. Using lessons learned from Team Topologies, structure your team to reduce cognitive load and keep the team focussed on delivering value.

5. Making it easy for teams to onboard onto the platform.

Addressing Data Warehousing’s Biggest Challenges With Lakehouse and AI

2024-09-19 · Big Data LDN 2024

Face To Face

by Navita Sood

AI/ML Analytics Data Lakehouse Data Management DWH

The next big innovation in data management after separation of compute and storage is the open table formats. These formats have truly commoditized storage, allowing you to store data anywhere and run multiple compute workloads without vendor lock-in. This innovation addresses the biggest challenges of cloud data warehousing — performance, usability, and high costs—ushering in the era of the data lakehouse architecture.

In this session, discover how an AI-powered data lakehouse:

• Unlocks data for modern AI use cases

• Enhances performance and enables real-time analytics

• Reduces total cost of ownership (TCO) by up to 75%

• Delivers increased interoperability across the entire data landscape

Join us to explore how the integration of AI with the lakehouse architecture can transform your approach to data management and analytics.

Building a Data Culture at Scale

2024-09-19 · Big Data LDN 2024

Face To Face

by Josh Cunningham

Many organisations know of the importance of data culture, especially when undertaking a digital transformation (I.e cloud transformation). And the ?holy grail? of getting it right is often well stated. But what about the bad, and the ugly as well as the good? And what does that look like when you are talking about an organisation the scale of Lloyds Banking Group? This talk is intended to draw back the curtain behind our data culture journey here at Lloyds (though not making it all about us) as a way to truly highlight some of the pitfalls, successes and approaches we have and are taking on our data culture journey.

ICIS CloudDB

2024-09-19 · Exhibitors' Events - Auto created

Face To Face

by Stuart Goldsmith

Please join us for a demo of how ICIS is leveraging Cloud databases to enable customers to easily integrate with ICIS intelligence.

Analytics in Action: AAH's Cloud Strategy Transforms Patient Care

2024-09-19 · Big Data LDN 2024

Face To Face

by Pete Lydon , Ranjit Gill

Analytics

Explore a transformative shift in healthcare with Ranjit Gill, CIO of the AAH (Hallo Healthcare Group) and Pete Lydon, Director Sales Engineering at Actian. This session highlights Hallo's adoption of a cloud-first strategy, effectively managing over 21 million billing entries and thousands of daily orders. Learn how cloud analytics has not only streamlined massive data flows but also significantly enhanced patient service delivery, establishing new benchmarks in healthcare efficiency and responsiveness.

Analytics in Action: AAH’s Cloud Strategy Transforms Patient Care

2024-09-19 · Exhibitors' Events - Auto created

Face To Face

by Ranjit Gill CIO of AAH , Pete Lydon Director of Sales Engineering at Actian

Analytics

See how AAH's cloud strategy optimizes data handling and patient care. Join us to discover the future of healthcare efficiency!

Experience Raden - the world’s first AI Data Engineer

2024-09-18 · Big Data LDN 2024

Face To Face

by Girish Bhat , Sanjay Agrawal

AI/ML Data Engineering Data Quality GenAI

The data engineer role has expanded far beyond data pipeline management. Data engineers are now tasked with managing scalable infrastructure, optimizing cloud resources, and ensuring real-time data processing, while keeping costs in check - which continues to be quite challenging.

In this session, Revefi will demonstrate Raden, the world’s first AI data engineer. Raden augments data teams with “distinguished engineer level” expertise in data architecture, system performance, optimization, and cost management.

Raden uses GenAI and AI to address these challenges by working with your team as an 👩‍✈️ AutoPilot and/or 👨‍✈️ CoPilot by automating critical functions such as Data Quality, Data Observability, Spend Management, Performance Management, and Usage Management, allowing your data team to tackle complex use cases with ease.

Join us to discover how you can revamp your data engineering practices and dramatically improve the ROI from your data investments

Learn How Morrisons Are Accelerating the Availability of Actionable Data at Scale

2024-09-18 · Big Data LDN 2024

Face To Face

by Peter Laflin (Morrisons) , Mike Reed (Intel) , John Kutay (Striim)

GCP

Morrisons are driving business transformation with data, in part through near real-time ingestion of disparate datasets, which enable centralised critical actionable data within Google Cloud, but also operationally, by focussing on outcome driven data teams. Learn how being data driven is challenging, how data volume can be problematic, but also how the benefits of available live data enable success and aid future business growth.

talk-data.com

Activity Trend

Top Events

Top Speakers

Processing medical images at scale on the cloud

Onyxia: A User-Centric Interface for Data Scientists in the Cloud Age

Color-composite images from the James Webb Space Telescope

Geoscience at Massive Scale

Building Large Scale ETL Pipelines with Dask

JupyterLite, Emscripten-forge, Xeus, and Mamba -- The computational quartet for in browser interactive computing"

Building web-based engineering applications with JupyterLab components.

Scaling Airbyte: Challenges and Milestones on the Road to 1.0

Building With Gemini on Google Cloud – an Overview of Architecture, Capabilities and Usage

Customer Use Case: Snowflake's Epic Data Product Journey with DataOps.live

Scaling Reliable Data Products and Data Mesh with Data Observability

The Marketing Data Cloud: Accor's Multi-Brand Multi-National CDP

Data Platform Engineering at Tenable: Turning your data model into a data platform

Addressing Data Warehousing’s Biggest Challenges With Lakehouse and AI

Building a Data Culture at Scale

ICIS CloudDB

Analytics in Action: AAH's Cloud Strategy Transforms Patient Care

Analytics in Action: AAH’s Cloud Strategy Transforms Patient Care

Experience Raden - the world’s first AI Data Engineer

Learn How Morrisons Are Accelerating the Availability of Actionable Data at Scale