Cloud Computing

Peter Hanssens - Building Strong Data Engineering Communities

2025-07-09 · The Joe Reis Show Listen

podcast_episode

by Peter Hanssens (DataEngBytes)

Data Engineering

Peter Hanssens is an Australia-based data engineer, business owner, and community pillar. He runs Cloud Shuttle, a data engineering consultancy and organizes DataEngBytes, a series of meetups and conferences throughout Australia and New Zealand.

We chat about building data engineering communities, running conferences, and much more.

Processing Cloud-optimized data in Python with Serverless Functions (Lithops, Dataplug)

2025-07-08 · SciPy 2025

talk

by Universitat Rovira i Virgili (Pedro Garcia Lopez) , Enrique Molina Giménez

Cloud Storage Data Management GitHub Python

Cloud-optimized (CO) data formats are designed to efficiently store and access data directly from cloud storage without needing to download the entire dataset. These formats enable faster data retrieval, scalability, and cost-effectiveness by allowing users to fetch only the necessary subsets of data. They also allow for efficient parallel data processing using on-the-fly partitioning, which can considerably accelerate data management operations. In this sense, cloud-optimized data is a nice fit for data-parallel jobs using serverless. FaaS provides a data-driven scalable and cost-efficient experience, with practically no management burden. Each serverless function will read and process a small portion of the cloud-optimized dataset, being read in parallel directly from object storage, significantly increasing the speedup.

In this talk, you will learn how to process cloud-optimized data formats in Python using the Lithops toolkit. Lithops is a serverless data processing toolkit that is specially designed to process data from Cloud Object Storage using Serverless functions. We will also demonstrate the Dataplug library that enables Cloud Optimized data managament of scientific settings such as genomics, metabolomics, or geospatial data. We will show different data processing pipelines in the Cloud that demonstrate the benefits of cloud-optimized data management.

AI and ML for Coders in PyTorch

2025-07-08 · O'Reilly AI & ML Books O'Reilly Amazon

book

by Laurence Moroney

AI/ML GenAI NLP PyTorch ai-ml data deep-learning machine-learning

Eager to learn AI and machine learning but unsure where to start? Laurence Moroney's hands-on, code-first guide demystifies complex AI concepts without relying on advanced mathematics. Designed for programmers, it focuses on practical applications using PyTorch, helping you build real-world models without feeling overwhelmed. From computer vision and natural language processing (NLP) to generative AI with Hugging Face Transformers, this book equips you with the skills most in demand for AI development today. You'll also learn how to deploy your models across the web and cloud confidently. Gain the confidence to apply AI without needing advanced math or theory expertise Discover how to build AI models for computer vision, NLP, and sequence modeling with PyTorch Learn generative AI techniques with Hugging Face Diffusers and Transformers

All the SQL a Pythonista needs to know: an introduction to SQL and DataFrames with DuckDB

2025-07-07 · SciPy 2025

talk

by Guen Prawiroatmodjo , Jacob Matson (MotherDuck) , Alex Monahan (MotherDuck)

DuckDB HTML Pandas Polars Python SQL

Structured Query Language (or SQL for short) is a programming language to manage data in a database system and an essential part of any data engineer’s tool kit. In this tutorial, you will learn how to use SQL to create databases, tables, insert data into them and extract, filter, join data or make calculations using queries. We will use DuckDB, a new open source embedded in-process database system that combines cutting edge database research with dataframe-inspired ease of use. DuckDB is only a pip install away (with zero dependencies), and runs right on your laptop. You will learn how to use DuckDB with your existing Python tools like Pandas, Polars, and Ibis to simplify and speed up your pipelines. Lastly, you will learn how to use SQL to create fast, interactive data visualizations, and how to teach your data how to fly and share it via the Cloud.

AI in Action : Announces IA Marquantes de Google I/O 2025

2025-07-01 · Google I/O Extended 2025 Paris

talk

by Ibtissem Hattab (Google Cloud)

AI/ML

Ibtissem Hattab (GDE Cloud) : Cette présentation offrira un aperçu général des annonces d'IA les plus percutantes faites lors des deux événements majeurs de Google en mai. Elle synthétisera les thèmes clés et les technologies présentées, des outils destinés aux développeurs aux nouvelles plateformes publicitaires. Ce titre est idéal pour un public large intéressé par un résumé complet mais concis de la dernière stratégie d'IA de Google et de ses applications pratiques.