The MedTech industry is undergoing a revolutionary transformation with continuous innovations promising greater precision, efficiency, and accessibility. In particular oncology, a branch of medicine that focuses on cancer, will benefit immensely from these new technologies, which may enable clinicians to detect cancer earlier and increase chances of survival. Detecting cancerous cells in microscopic photography of cells (Whole Slide Images, aka WSIs) is usually done with segmentation algorithms, which neural networks (NNs) are very good at. While using ML and NNs for image segmentation is a fairly standard task with established solutions, doing it on WSIs is a different kettle of fish. Most training pipelines and systems have been designed for analytics, meaning huge columns of small individual datums. In the case of WSIs, a single image is so huge that its file can be up to dozens of gigabytes. To allow innovation in medical imaging with AI, we need efficient and affordable ways to store and process these WSIs at scale.
talk-data.com
Topic
Cloud Computing
4055
tagged
Activity Trend
Top Events
In this talk, we'll look into why Insee had to go beyond usual tools like JupyterHub. With data science growing, it has become important to have tools that are easy to use, can change as needed, and help people work together. The opensource software Onyxia brings a new answer by offering a user-friendly way to boost creativity in a data environment that uses massively containerization and object storage.
The astronomical community has built a good amount of software to visualize and analyze the images obtained with the James Webb Space Telescope (JWST). In this talk, I will present the open-source Python package Jdaviz. I will show you how to visualize publicly available JWST images and build the pretty color images that we have all seen in the media. Half the talk will be an introduction to JWST and Jdaviz and half will be a hands on session on a cloud platform (you will only need to create an account) or on your own machine (the package is available on PyPI).
When scaling geoscience workloads to large datasets, many scientists and developers reach for Dask, a library for distributed computing that plugs seamlessly into Xarray and offers an Array API that wraps NumPy. Featuring a distributed environment capable of running your workload on large clusters, Dask promises to make it easy to scale from prototyping on your laptop to analyzing petabyte-scale datasets.
Dask has been the de-facto standard for scaling geoscience, but it hasn’t entirely lived up to its promise of operating effortlessly at massive scale. This comes up in a few ways: - Correctly chunking your dataset has a significant impact on Dask’s ability to scale - Workers accidentally run out of memory due to: - Data being loaded too eagerly - Rechunking - Unmanaged memory
Over the last few months, Dask has addressed many of those pains and continues to do so through: - Improvements to its scheduling algorithms - A faster and more memory-stable method for rechunking - First-of-its-kind logical optimization layer for a distributed array framework (ongoing)
Join us as we dive into real-world geoscience workloads, exploring how Dask empowers scientists and developers to run their analyses at massive scale. Discover the impact of improvements made to Dask, ongoing challenges, and future plans for making it truly effortless to scale from your laptop to the cloud.
Building scalable ETL pipelines and deploying them in the cloud can seem daunting. It shouldn't be. Leveraging proper technologies can make this process easy. We will discuss the whole process of developing a composable and scalable ETL pipeline centred around Dask that is fully built with Open Source tools and how we can deploy to the cloud.
JupyterLite is a JupyterLab distribution that runs entirely in the web browser, backed by in-browser language kernels. With standard JupyterLab, where kernels run in separate processes and communicate with the client by message passing, JupyterLite uses kernels that run entirely in the browser, based on JavaScript and WebAssembly.
This means JupyterLite deployments can be scaled to millions of users without the need for individual containers for each user session, only static files need to be served, which can be done with a simple web server like GitHub pages.
This opens up new possibilities for large-scale deployments, eliminating the need for complex cloud computing infrastructure. JupyterLite is versatile and supports a wide range of languages, with the majority of its kernels implemented using Xeus, a C++ library for developing language-specific kernels.
In conjunction with JupyterLite, we present Emscripten-forge, a conda/mamba based distribution for WebAssembly packages. Conda-forge is a community effort and a GitHub organization which contains repositories of conda recipes and thus provides conda packages for a wide range of software and platforms. However, targeting WebAssembly is not supported by conda-forge. Emscripten-forge addresses this gap by providing conda packages for WebAssembly, making it possible to create custom JupyterLite deployments with tailored conda environments containing the required kernels and packages.
In this talk, we delve deep into the JupyterLite ecosystem, exploring its integration with Xeus Mamba and Emscripten-forge.
We will demonstrate how this can be used to create sophisticated JupyterLite deployments with custom conda environments and give an outlook for future developments like R packages and runtime package resolution.
In the past few years, web-based engineering software has been steadily gaining momentum over traditional desktop-based applications. It represents a significant shift in how engineers access, collaborate, and utilize software tools for design, analysis, and simulation tasks. However, converting desktop-based applications to web applications presents considerable challenges, especially in translating the functionality of desktop interfaces to the web. It requires careful planning and design expertise to ensure intuitive navigation and responsiveness.
JupyterLab provides a flexible, interactive environment for scientific computing. Despite its popularity among data scientists and researchers, the full potential of JupyterLab as a platform for building scientific web applications has yet to be realized.
In this talk, we will explore how its modular architecture and extensive ecosystem facilitate the seamless integration of components for diverse functionalities: from rich user interfaces, accessibility, and real-time collaboration to cloud deployment options. To illustrate the platform's capabilities, we will demo JupyterCAD, a parametric 3D modeler built on top of JupyterLab components.
Summary Airbyte is one of the most prominent platforms for data movement. Over the past 4 years they have invested heavily in solutions for scaling the self-hosted and cloud operations, as well as the quality and stability of their connectors. As a result of that hard work, they have declared their commitment to the future of the platform with a 1.0 release. In this episode Michel Tricot shares the highlights of their journey and the exciting new capabilities that are coming next. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementYour host is Tobias Macey and today I'm interviewing Michel Tricot about the journey to the 1.0 launch of Airbyte and what that means for the projectInterview IntroductionHow did you get involved in the area of data management?Can you describe what Airbyte is and the story behind it?What are some of the notable milestones that you have traversed on your path to the 1.0 release?The ecosystem has gone through some significant shifts since you first launched Airbyte. How have trends such as generative AI, the rise and fall of the "modern data stack", and the shifts in investment impacted your overall product and business strategies?What are some of the hard-won lessons that you have learned about the realities of data movement and integration?What are some of the most interesting/challenging/surprising edge cases or performance bottlenecks that you have had to address?What are the core architectural decisions that have proven to be effective?How has the architecture had to change as you progressed to the 1.0 release?A 1.0 version signals a degree of stability and commitment. Can you describe the decision process that you went through in committing to a 1.0 version?What are the most interesting, innovative, or unexpected ways that you have seen Airbyte used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Airbyte?When is Airbyte the wrong choice?What do you have planned for the future of Airbyte after the 1.0 launch?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links AirbytePodcast EpisodeAirbyte CloudAirbyte Connector BuilderSinger ProtocolAirbyte ProtocolAirbyte CDKModern Data StackELTVector DatabasedbtFivetranPodcast EpisodeMeltanoPodcast EpisodedltReverse ETLGraphRAGAI Engineering Podcast EpisodeThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
This session explores Gemini's capabilities, architecture, and performance benchmarks. We'll delve into the significance of its extensive context window and address the critical aspects of safety, security, and responsible AI use. Hallucination, a common concern in LLM applications, remains a focal point of ongoing development. This talk will highlight recent advancements aimed at mitigating the risk of hallucination to enhance LLMs utility across various applications.
Snowflake had a big challenge: How do you enable a team of 1,000 sales engineers and field CTOs to successfully deploy over 100 new data products per week and demonstrate every feature and capability in the Snowflake AI Data Cloud tailored to different customer needs?
In this session, Andrew Helgeson, Manager of Technology Platform Alliances at Snowflake, and Guy Adams, CTO at DataOps.live, will explain how Snowflake builds and deploys hundreds of data products using DataOps.live. Join us for a deep dive into Snowflake's innovative approach to automating complex data product deployment — and to learn how Snowflake Solutions Central revolutionizes solution discovery and deployment to drive customer success.
Roche, is one of the world’s largest biotech companies, as well as a leading provider of in-vitro diagnostics and a global supplier of transformative innovative solutions across major disease areas. Over the past few years, they’ve undergone a migration to the cloud, adopted a modern data stack and implemented data mesh in order to double down on improving data reliability.
Join the data team at Roche to learn how they’ve leveraged data observability to support their sociotechnical shift to data mesh. They walk through their multi-year data observability journey, digging into how they implemented Monte Carlo in a global organization. They’ll also share their approach to data mesh at Roche and deep dive into a current use case.
Accor, a world-leading hospitality group offering experiences across more than 110 countries in 5,500 properties, 10,000 food & beverage venues, wellness facilities or flexible workspaces, relies on its more than 45 hotel brands from luxury to economy and its most awarded traveler loyalty program to connect deeply with customers and increase their lifetime value. With a rich store of data centralized in Snowflake, the team set out to enable their marketing and business teams with a platform that would allow them to autonomously deliver hyper-personalized experiences and campaigns.
Join the session to learn about Accor’s CDP journey and how Hightouch, as their Composable CDP, helps them drive customer engagement, loyalty, and revenue.
This talk will share lessons learned from building an internal data platform to support several Cybersecurity SaaS applications. At Tenable, we have put the data model at the centre of our platform. A centralised data model provides consistent data experience for your application builders and customers alike and provides a focus for discussion and standardisation.
The discussion will highlight the following key areas:
1. Choose cloud: Cloud will accelerate your rate of delivery and reduce cognitive load for your team
2. Get started: Platforms need users and their feedback should drive the evolution of the platform.
3. Maintaining a product mindset: Treat your data platform like a product by maintaining a backlog while working towards longer term vision.
4. Structure your team for success. Using lessons learned from Team Topologies, structure your team to reduce cognitive load and keep the team focussed on delivering value.
5. Making it easy for teams to onboard onto the platform.
The next big innovation in data management after separation of compute and storage is the open table formats. These formats have truly commoditized storage, allowing you to store data anywhere and run multiple compute workloads without vendor lock-in. This innovation addresses the biggest challenges of cloud data warehousing — performance, usability, and high costs—ushering in the era of the data lakehouse architecture.
In this session, discover how an AI-powered data lakehouse:
• Unlocks data for modern AI use cases
• Enhances performance and enables real-time analytics
• Reduces total cost of ownership (TCO) by up to 75%
• Delivers increased interoperability across the entire data landscape
Join us to explore how the integration of AI with the lakehouse architecture can transform your approach to data management and analytics.
Many organisations know of the importance of data culture, especially when undertaking a digital transformation (I.e cloud transformation). And the ?holy grail? of getting it right is often well stated. But what about the bad, and the ugly as well as the good? And what does that look like when you are talking about an organisation the scale of Lloyds Banking Group? This talk is intended to draw back the curtain behind our data culture journey here at Lloyds (though not making it all about us) as a way to truly highlight some of the pitfalls, successes and approaches we have and are taking on our data culture journey.
Please join us for a demo of how ICIS is leveraging Cloud databases to enable customers to easily integrate with ICIS intelligence.
Explore a transformative shift in healthcare with Ranjit Gill, CIO of the AAH (Hallo Healthcare Group) and Pete Lydon, Director Sales Engineering at Actian. This session highlights Hallo's adoption of a cloud-first strategy, effectively managing over 21 million billing entries and thousands of daily orders. Learn how cloud analytics has not only streamlined massive data flows but also significantly enhanced patient service delivery, establishing new benchmarks in healthcare efficiency and responsiveness.
See how AAH's cloud strategy optimizes data handling and patient care. Join us to discover the future of healthcare efficiency!
The data engineer role has expanded far beyond data pipeline management. Data engineers are now tasked with managing scalable infrastructure, optimizing cloud resources, and ensuring real-time data processing, while keeping costs in check - which continues to be quite challenging.
In this session, Revefi will demonstrate Raden, the world’s first AI data engineer. Raden augments data teams with “distinguished engineer level” expertise in data architecture, system performance, optimization, and cost management.
Raden uses GenAI and AI to address these challenges by working with your team as an 👩✈️ AutoPilot and/or 👨✈️ CoPilot by automating critical functions such as Data Quality, Data Observability, Spend Management, Performance Management, and Usage Management, allowing your data team to tackle complex use cases with ease.
Join us to discover how you can revamp your data engineering practices and dramatically improve the ROI from your data investments
Morrisons are driving business transformation with data, in part through near real-time ingestion of disparate datasets, which enable centralised critical actionable data within Google Cloud, but also operationally, by focussing on outcome driven data teams. Learn how being data driven is challenging, how data volume can be problematic, but also how the benefits of available live data enable success and aid future business growth.