talk-data.com talk-data.com

Event

PyData Berlin 2025

2025-09-01 – 2025-09-03 PyData

Activities tracked

4

Filtering by: Cloud Computing ×

Sessions & talks

Showing 1–4 of 4 · Newest first

Search within this event →
Forget the Cloud: Building Lean Batch Pipelines from TCP Streams with Python and DuckDB

Forget the Cloud: Building Lean Batch Pipelines from TCP Streams with Python and DuckDB

2025-09-02 Watch
talk

Many industrial and legacy systems still push critical data over TCP streams. Instead of reaching for heavyweight cloud platforms, you can build fast, lean batch pipelines on-prem using Python and DuckDB.

In this talk, you'll learn how to turn raw TCP streams into structured data sets, ready for analysis, all running on-premise. We'll cover key patterns for batch processing, practical architecture examples, and real-world lessons from industrial projects.

If you work with sensor data, logs, or telemetry, and you value simplicity, speed, and control this talk is for you.

Building Reactive Data Apps with Shinylive and WebAssembly

Building Reactive Data Apps with Shinylive and WebAssembly

2025-09-02 Watch
talk

WebAssembly is reshaping how Python applications can be delivered - allowing fully interactive apps that run directly in the browser, without a traditional backend server. In this talk, I’ll demonstrate how to build reactive, data-driven web apps using Shinylive for Python, combining efficient local storage with Parquet and extending functionality with optional FastAPI cloud services. We’ll explore the benefits and limitations of this architecture, share practical design patterns, and discuss when browser-based Python is the right choice. Attendees will leave with hands-on techniques for creating modern, lightweight, and highly responsive Python data applications.

Benchmarking 2000+ Cloud Servers for GBM Model Training and LLM Inference Speed

Benchmarking 2000+ Cloud Servers for GBM Model Training and LLM Inference Speed

2025-09-01 Watch
talk

Spare Cores is a Python-based, open-source, and vendor-independent ecosystem collecting, generating, and standardizing comprehensive data on cloud server pricing and performance. In our latest project, we started 2000+ server types across five cloud vendors to evaluate their suitability for serving Large Language Models from 135M to 70B parameters. We tested how efficiently models can be loaded into memory of VRAM, and measured inference speed across varying token lengths for prompt processing and text generation. The published data can help you find the optimal instance type for your LLM serving needs, and we will also share our experiences and challenges with the data collection and insights into general patterns.

🛰️➡️🧑‍💻: Streamlining Satellite Data for Analysis-Ready Outputs

🛰️➡️🧑‍💻: Streamlining Satellite Data for Analysis-Ready Outputs

2025-09-01 Watch
talk

I will share how our team built an end-to-end system to transform raw satellite imagery into analysis-ready datasets for use cases like vegetation monitoring, deforestation detection, and identifying third-party activity. We streamlined the entire pipeline from automated acquisition and cloud storage to preprocessing that ensures spatial, spectral, and temporal consistency. By leveraging Prefect for orchestration, Anyscale Ray for scalable processing, and the open source STAC standard for metadata indexing, we reduced processing times from days to near real-time. We addressed challenges like inconsistent metadata and diverse sensor types, building a flexible system capable of supporting large-scale geospatial analytics and AI workloads.