talk-data.com talk-data.com

Topic

Cloud Computing

infrastructure saas iaas

4055

tagged

Activity Trend

471 peak/qtr
2020-Q1 2026-Q1

Activities

4055 activities · Newest first

Building Reactive Data Apps with Shinylive and WebAssembly

WebAssembly is reshaping how Python applications can be delivered - allowing fully interactive apps that run directly in the browser, without a traditional backend server. In this talk, I’ll demonstrate how to build reactive, data-driven web apps using Shinylive for Python, combining efficient local storage with Parquet and extending functionality with optional FastAPI cloud services. We’ll explore the benefits and limitations of this architecture, share practical design patterns, and discuss when browser-based Python is the right choice. Attendees will leave with hands-on techniques for creating modern, lightweight, and highly responsive Python data applications.

Data Modeling with Snowflake - Second Edition

Data Modeling with Snowflake provides a clear and practical guide to mastering data modeling tailored to the Snowflake Data Cloud. By integrating foundational principles of database modeling with Snowflake's unique features and functionality, this book empowers you to create scalable, cost-effective, and high-performing data solutions. What this Book will help me do Apply universal data modeling concepts within the Snowflake platform effectively. Leverage Snowflake's features such as Time Travel and Zero-Copy Cloning for optimized data solutions. Understand and utilize advanced techniques like Data Vault and Data Mesh for scalable data architecture. Master handling semi-structured data in Snowflake using practical recipes and examples. Achieve cost efficiency and resource optimization by aligning modeling principles with Snowflake's architecture. Author(s) Serge Gershkovich is an accomplished data engineer and seasoned professional in data architecture and modeling. With a passion for simplifying complex concepts, Serge's work leverages his years of hands-on experience to guide readers in mastering both foundational and advanced data management practices. His clear and practical approach ensures accessibility for all levels. Who is it for? This book is ideal for data developers and engineers seeking practical modeling guidance within Snowflake. It's suitable for data analysts looking to broaden their database design expertise, and for database beginners aiming to get a head start in structuring data. Professionals new to Snowflake will also find its clear explanations of key features aligned with modeling techniques invaluable.

Summary In this episode of the Data Engineering Podcast Serge Gershkovich, head of product at SQL DBM, talks about the socio-technical aspects of data modeling. Serge shares his background in data modeling and highlights its importance as a collaborative process between business stakeholders and data teams. He debunks common misconceptions that data modeling is optional or secondary, emphasizing its crucial role in ensuring alignment between business requirements and data structures. The conversation covers challenges in complex environments, the impact of technical decisions on data strategy, and the evolving role of AI in data management. Serge stresses the need for business stakeholders' involvement in data initiatives and a systematic approach to data modeling, warning against relying solely on technical expertise without considering business alignment.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Enterprises today face an enormous challenge: they’re investing billions into Snowflake and Databricks, but without strong foundations, those investments risk becoming fragmented, expensive, and hard to govern. And that’s especially evident in large, complex enterprise data environments. That’s why companies like DirecTV and Pfizer rely on SqlDBM. Data modeling may be one of the most traditional practices in IT, but it remains the backbone of enterprise data strategy. In today’s cloud era, that backbone needs a modern approach built natively for the cloud, with direct connections to the very platforms driving your business forward. Without strong modeling, data management becomes chaotic, analytics lose trust, and AI initiatives fail to scale. SqlDBM ensures enterprises don’t just move to the cloud—they maximize their ROI by creating governed, scalable, and business-aligned data environments. If global enterprises are using SqlDBM to tackle the biggest challenges in data management, analytics, and AI, isn’t it worth exploring what it can do for yours? Visit dataengineeringpodcast.com/sqldbm to learn more.Your host is Tobias Macey and today I'm interviewing Serge Gershkovich about how and why data modeling is a sociotechnical endeavorInterview IntroductionHow did you get involved in the area of data management?Can you start by describing the activities that you think of when someone says the term "data modeling"?What are the main groupings of incomplete or inaccurate definitions that you typically encounter in conversation on the topic?How do those conceptions of the problem lead to challenges and bottlenecks in execution?Data modeling is often associated with data warehouse design, but it also extends to source systems and unstructured/semi-structured assets. How does the inclusion of other data localities help in the overall success of a data/domain modeling effort?Another aspect of data modeling that often consumes a substantial amount of debate is which pattern to adhere to (star/snowflake, data vault, one big table, anchor modeling, etc.). What are some of the ways that you have found effective to remove that as a stumbling block when first developing an organizational domain representation?While the overall purpose of data modeling is to provide a digital representation of the business processes, there are inevitable technical decisions to be made. What are the most significant ways that the underlying technical systems can help or hinder the goals of building a digital twin of the business?What impact (positive and negative) are you seeing from the introduction of LLMs into the workflow of data modeling?How does tool use (e.g. MCP connection to warehouse/lakehouse) help when developing the transformation logic for achieving a given domain representation? What are the most interesting, innovative, or unexpected ways that you have seen organizations address the data modeling lifecycle?What are the most interesting, unexpected, or challenging lessons that you have learned while working with organizations implementing a data modeling effort?What are the overall trends in the ecosystem that you are monitoring related to data modeling practices?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Links sqlDBMSAPJoe ReisERD == Entity Relation DiagramMaster Data ManagementdbtData ContractsData Modeling With Snowflake book by Serge (affiliate link)Type 2 DimensionData VaultStar SchemaAnchor ModelingRalph KimballBill InmonSixth Normal FormMCP == Model Context ProtocolThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Benchmarking 2000+ Cloud Servers for GBM Model Training and LLM Inference Speed

Spare Cores is a Python-based, open-source, and vendor-independent ecosystem collecting, generating, and standardizing comprehensive data on cloud server pricing and performance. In our latest project, we started 2000+ server types across five cloud vendors to evaluate their suitability for serving Large Language Models from 135M to 70B parameters. We tested how efficiently models can be loaded into memory of VRAM, and measured inference speed across varying token lengths for prompt processing and text generation. The published data can help you find the optimal instance type for your LLM serving needs, and we will also share our experiences and challenges with the data collection and insights into general patterns.

🛰️➡️🧑‍💻: Streamlining Satellite Data for Analysis-Ready Outputs

I will share how our team built an end-to-end system to transform raw satellite imagery into analysis-ready datasets for use cases like vegetation monitoring, deforestation detection, and identifying third-party activity. We streamlined the entire pipeline from automated acquisition and cloud storage to preprocessing that ensures spatial, spectral, and temporal consistency. By leveraging Prefect for orchestration, Anyscale Ray for scalable processing, and the open source STAC standard for metadata indexing, we reduced processing times from days to near real-time. We addressed challenges like inconsistent metadata and diverse sensor types, building a flexible system capable of supporting large-scale geospatial analytics and AI workloads.

Data Engineering for Cybersecurity

Security teams rely on telemetry—the continuous stream of logs, events, metrics, and signals that reveal what’s happening across systems, endpoints, and cloud services. But that data doesn’t organize itself. It has to be collected, normalized, enriched, and secured before it becomes useful. That’s where data engineering comes in. In this hands-on guide, cybersecurity engineer James Bonifield teaches you how to design and build scalable, secure data pipelines using free, open source tools such as Filebeat, Logstash, Redis, Kafka, and Elasticsearch and more. You’ll learn how to collect telemetry from Windows including Sysmon and PowerShell events, Linux files and syslog, and streaming data from network and security appliances. You’ll then transform it into structured formats, secure it in transit, and automate your deployments using Ansible. You’ll also learn how to: Encrypt and secure data in transit using TLS and SSH Centrally manage code and configuration files using Git Transform messy logs into structured events Enrich data with threat intelligence using Redis and Memcached Stream and centralize data at scale with Kafka Automate with Ansible for repeatable deployments Whether you’re building a pipeline on a tight budget or deploying an enterprise-scale system, this book shows you how to centralize your security data, support real-time detection, and lay the groundwork for incident response and long-term forensics.

This session introduces Dana, a local-first agent programming language designed for building AI agents. Get a working expert agent in minutes. Features include long running, multi-step agent workflows on a single line; built-in concurrency for parallel LLM calls with zero async keywords; and deterministic execution with learning loops to improve reliability over time. Ideal for sensitive data, air-gapped environments, or cloud API limitations.

Send us a text Episode Description (Show Notes): Step into the world of IBM Power Systems with insider insights from Tom McPherson, former GM of IBM Power. In this conversation, Tom shares leadership lessons, debunks common misconceptions, and dives deep into the innovations shaping the future of Power infrastructure. From AI integration to hybrid cloud strategies, competitive positioning to compelling client use cases — it’s a powerhouse discussion you won’t want to miss. Timestamps:  00:49 Meet Tom McPherson 03:00 Leadership Advice 04:58 Hobbies 07:53 IBM Power 10:24 Power 11 13:53 Common Misconception 14:39 Favorite Power Features 21:51 Promise to Profits of AI 25:28 Hybrid Cloud 27:34 Power Competitors 28:36 Compelling Use Cases 29:51 The Future of Power 31:20 Rapid Fire 33:51 Business Partners in Power 35:16 LeadershipGuest Links: 🔗 Tom McPherson on LinkedIn 🌐 IBM Power Systems Social: #IBMPower #Leadership #HybridCloud #AI #EnterpriseTech #TechInnovation #MakingDataSimple #PowerSystems #BusinessStrategy #DigitalTransformation #CloudComputing #AIinBusiness Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Jumpstart Snowflake: A Step-by-Step Guide to Modern Cloud Analytics

This book is your guide to the modern market of data analytics platforms and the benefits of using Snowflake, the data warehouse built for the cloud. As organizations increasingly rely on modern cloud data platforms, the core of any analytics framework—the data warehouse—is more important than ever. This updated 2nd edition ensures you are ready to make the most of the industry’s leading data warehouse. This book will onboard you to Snowflake and present best practices for deploying and using the Snowflake data warehouse. The book also covers modern analytics architecture, integration with leading analytics software such as Matillion ETL, Tableau, and Databricks, and migration scenarios for on-premises legacy data warehouses. This new edition includes expanded coverage of SnowPark for developing complex data applications, an introduction to managing large datasets with Apache Iceberg tables, and instructions for creating interactive data applications using Streamlit, ensuring readers are equipped with the latest advancements in Snowflake's capabilities. What You Will Learn Master key functionalities of Snowflake Set up security and access with cluster Bulk load data into Snowflake using the COPY command Migrate from a legacy data warehouse to Snowflake Integrate the Snowflake data platform with modern business intelligence (BI) and data integration tools Manage large datasets with Apache Iceberg Tables Implement continuous data loading with Snowpipe and Dynamic Tables Who This Book Is For Data professionals, business analysts, IT administrators, and existing or potential Snowflake users

Exam Ref DP-300 Administering Microsoft Azure SQL Solutions

Prepare for Microsoft Exam DP-300 and demonstrate your real-world foundational knowledge of Azure database administration using a variety of methods and tools to perform and automate day-to-day operations, including use of Transact-SQL (T-SQL) and other tools for administrative management purposes. Designed for database administrators, solution architects, data scientists, and other data professionals, this Exam Ref focuses on the critical-thinking and decision-making acumen needed for success at the Microsoft Certified: Azure Database Administrator Associate level. Focus on the expertise measured by these objectives: Plan and implement data platform resources Implement a secure environment Monitor, configure, and optimize database resources Configure and manage automation of tasks Plan and configure a high availability and disaster recovery (HA/DR) environment This Microsoft Exam Ref: Organizes its coverage by the Skills Measured list published for the exam Features strategic, what-if scenarios to challenge you Assumes you have subject matter expertise in building database solutions that are designed to support multiple workloads built with SQL Server on-premises and Azure SQL About the Exam Exam PD-300 focuses on core knowledge for implementing and managing the operational aspects of cloud-native and hybrid data platform solutions built on SQL Server and Azure SQL services, using a variety of methods and tools to perform and automate day-to-day operations, including applying knowledge of using Transact-SQL (T-SQL) and other tools for administrative management purposes. About Microsoft Certification Passing this exam fulfills your requirements for the Microsoft Certified: Azure Database Administrator Associate certification, demonstrating your ability to administer a SQL Server database infrastructure for cloud, on-premises, and hybrid relational databases using the Microsoft PaaS relational database offerings. See full details at: microsoft.com/learn .

The relationship between humans and AI in the workplace is rapidly evolving beyond simple automation. As companies deploy thousands of AI agents to handle everything from expense approvals to customer success management, a new paradigm is emerging—one where humans become orchestrators rather than operators. But how do you determine which processes should be handled by AI and which require human judgment? What governance structures need to be in place before deploying AI at scale? With the potential to automate up to 80% of business processes, organizations must carefully consider not just the technology, but the human element of AI-driven transformation. Steve Lucas is the Chairman and CEO of Boomi, marking his third tenure as CEO. With nearly 30 years of enterprise software leadership, he has held senior roles at leading cloud organizations including Marketo, iCIMS, Adobe, SAP, Salesforce, and BusinessObjects. He led Marketo through its multi-billion-dollar acquisition by Adobe and drove strategic growth at iCIMS, delivering significant investments and transformation. A proven leader in scaling software companies, Steve is also the author of the national bestseller Digital Impact and holds a business degree from the University of Colorado. In the episode, Richie and Steve explore the importance of choosing the right tech stack for your business, the challenges of managing complex systems, the role of AI in transforming business processes, and the need for effective AI governance. They also discuss the future of AI-driven enterprises and much more. Links Mentioned in the Show: BoomiSteve’s Book - Digital Impact: The Human Element of AI-Driven TransformationWhat is the OSI Model?Connect with SteveSkill Track: AI Business FundamentalsRelated Episode: New Models for Digital Transformation with Alison McCauley Chief Advocacy Officer at Think with AI & Founder of Unblocked FutureRewatch RADAR AI  New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

In this season of the Analytics Engineering podcast, Tristan is deep into the world of developer tools and databases. If you're following us here, you've almost definitely used Amazon S3 it and its Blob Storage siblings. They form the foundation for nearly all data work in the cloud. In many ways, it was the innovations that happened inside of S3 that have unlocked all of the progress in cloud data over the last decade. In this episode, Tristan talks with Andy Warfield, VP and senior principal engineer at AWS, where he focuses primarily on storage. They go deep on S3, how it works, and what it unlocks. They close out italking about Iceberg, S3 table buckets, and what this all suggests about the outlines of the S3 product roadmap moving forward. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

The SAP S/4HANA Handbook for EPC Projects: An End-to-End Solution for the Engineering, Construction and Operations (EC&O) Industry

The SAP S/4HANA Handbook for EPC Projects equips you with the knowledge and insights needed to successfully manage and execute complex Engineering, Procurement, and Construction (EPC) projects using the power of SAP S/4HANA. Building upon your existing knowledge of SAP solutions, this handbook provides advanced insights into EPC project management and addresses the operational challenges unique to the Engineering, Construction and Operations (EC&O) industry by connecting business processes with relevant SAP solutions. It is an essential guide enabling you to gain a deeper understanding of optimizing your project management capabilities using SAP S/4HANA. Whether you are an SAP Solution Architect in Finance, Human Resources, or Supply Chain Management, or a project manager in the EC&O industry, this book will help you understand how projects can be managed with SAP. We begin by examining the world of EPC, EPC/M (Engineering, Procurement, Construction, and Management), and ETO (Engineer-To-Order) projects.Looking at detailed planning, controlling, and execution solutions of EPC projects with S/4HANA Project System, CPM (Commercial Project Management), PPM (Project & Portfolio Management), S/4HANA Add-ons, SAP Cloud Solutions, and to integrate these with other engineering and project management software such as Tekla and Primavera through SAP BTP (Business Technology Platform). You will follow a construction company secure an EPC contract of a refinery upgrade project and demonstrates how SAP is used at every step of the way, from bidding to project closure. Through real-world use-cases, supported by tables and visual aids, you will find the practical solutions offered by SAP S/4HANA. The SAP S/4HANA Handbook for EPC Projects is the ultimate resource bridging theory with practical applications, offering a framework to navigate the complexities of modern project management in the EC&O industry. You Will Learn To: [if !supportLists] Understand project management processes with business use cases and their application in SAP Apply detailed planning, scheduling, resource and management strategies, as well as for risk and claim managmement in large-scale projects. [if !supportLists] Master project procurement, ETO manufacturing for projects, product and service quality management and the handling of project materials, tools and equipment. [if !supportLists] Manage the design and creation of documentation and oversee change management in EPC projects. This Book is For: Project and Portfolio Managers, SAP Solution Architects and other SAP partners looking for hands-on solution s for the EC&O industry. Engineering and Construction Contractors, Engineering Consultants, and Project Management Services companies seeking business transformation with SAP tools and practices

PhD students, postdocs and independent researchers often struggle when trying to execute code developed locally in the cloud or HPC clusters for better performance. This is even more difficult if they can't count on IT staff to set up the necessary infrastructure for them on the remote machine, which is common in third-world countries. Spyder 6.1 will come with a whole set of improvements to address that limitation, from setting up a server automatically to easily run code remotely on behalf of users, to manage remote Conda environments and the remote file system from the comfort of a local Spyder installation.

Extreme weather events threaten industries and economic stability. NOAA’s National Centers for Environmental Information (NCEI) addresses this through the Industry Proving Grounds (IPG), which modernizes data delivery by collaborating with sectors like re/insurance and retail to develop practical, data-driven solutions. This presentation explores IPG’s technical innovations, including implementing Polars for efficient data processing, AWS for scalability, and CI/CD pipelines for streamlined deployment. These tools enhance data accessibility, reduce latency, and support real-time decision-making. By integrating scientific computing, cloud technology, and DevOps, NCEI improves climate resilience and provides a model for leveraging open-source tools to address global challenges.

The elasticity of the Cloud is very appealing for processing large scientific data. However, enormous volumes of unstructured research data, totaling petabytes, remain untapped in data repositories due to the lack of efficient parallel data access. Even-sized partitioning of these data to enable its parallel processing requires a complete re-write to storage, becoming prohibitively expensive for high volumes. In this article we present Dataplug, an extensible framework that enables fine-grained parallel data access to unstructured scientific data in object storage. Dataplug employs read-only, format-aware indexing, allowing to define dynamically-sized partitions using various partitioning strategies. This approach avoids writing the partitioned dataset back to storage, enabling distributed workers to fetch data partitions on-the-fly directly from large data blobs, efficiently leveraging the high bandwidth capability of object storage. Validations on genomic (FASTQGZip) and geospatial (LiDAR) data formats demonstrate that Dataplug considerably lowers pre-processing compute costs (between 65.5% — 71.31% less) without imposing significant overheads.

The best way to distribute large scientific datasets is via the Cloud, in Cloud-Optimized formats. But often this data is stuck in archival pre-Cloud file formats such as netCDF.

VirtualiZarr makes it easy to create "Virtual" Zarr datacubes, allowing performant access to huge archival datasets as if it were in the Cloud-Optimized Zarr format, without duplicating any of the original data.

We will demonstrate using VirtualiZarr to generate references to archival files, combine them into one array datacube using xarray-like syntax, commit them to Icechunk, and read the data back with zarr-python v3.

MongoDB 8.0 in Action, Third Edition

Deliver flexible, scalable, and high-performance data storage that's perfect for AI and other modern applications with MongoDB 8.0 and MongoDB Atlas multi-cloud data platform. In MongoDB 8.0 in Action, Third Edition you'll find comprehensive coverage of the latest version of MongoDB 8.0 and the MongoDB Atlas multi-cloud data platform. Learn to utilize MongoDB’s flexible schema design for data modeling, scale applications effectively using advanced sharding features, integrate full-text and vector-based semantic search, and more. This totally revised new edition delivers engaging hands-on tutorials and examples that put MongoDB into action! In MongoDB 8.0 in Action, Third Edition you'll: Master new features in MongoDB 8.0 Create your first, free Atlas cluster using the Atlas CLI Design scalable NoSQL databases with effective data modeling techniques Master Vector Search for building GenAI-driven applications Utilize advanced search capabilities in MongoDB Atlas, including full-text search Build Event-Driven Applications with Atlas Stream Processing Deploy and manage MongoDB Atlas clusters both locally and in the cloud using the Atlas CLI Leverage the Atlas SQL interface for familiar SQL querying Use MongoDB Atlas Online Archive for efficient data management Establish robust security practices including encryption Master backup and restore strategies Optimize database performance and identify slow queries MongoDB 8.0 in Action, Third Edition offers a clear, easy-to-understand introduction to everything in MongoDB 8.0 and MongoDB Atlas—including new advanced features such as embedded config servers in sharded clusters, or moving an unsharded collection to a different shard. The book also covers Atlas stream processing, full text search, and vector search capabilities for generative AI applications. Each chapter is packed with tips, tricks, and practical examples you can quickly apply to your projects, whether you're brand new to MongoDB or looking to get up to speed with the latest version. About the Technology MongoDB is the database of choice for storing structured, semi-structured, and unstructured data like business documents and other text and image files. MongoDB 8.0 introduces a range of exciting new features—from sharding improvements that simplify the management of distributed data, to performance enhancements that stay resilient under heavy workloads. Plus, MongoDB Atlas brings vector search and full-text search features that support AI-powered applications. About the Book MongoDB 8.0 in Action, Third Edition you’ll learn how to take advantage of all the new features of MongoDB 8.0, including the powerful MongoDB Atlas multi-cloud data platform. You’ll start with the basics of setting up and managing a document database. Then, you’ll learn how to use MongoDB for AI-driven applications, implement advanced stream processing, and optimize performance with improved indexing and query handling. Hands-on projects like creating a RAG-based chatbot and building an aggregation pipeline mean you’ll really put MongoDB into action! What's Inside The new features in MongoDB 8.0 Get familiar with MongoDB’s Atlas cloud platform Utilizing sharding enhancements Using vector-based search technologies Full-text search capabilities for efficient text indexing and querying About the Reader For developers and DBAs of all levels. No prior experience with MongoDB required. About the Author Arek Borucki is a MongoDB Champion, certified MongoDB and MongoDB Atlas administrator with expertise in distributed systems, NoSQL databases, and Kubernetes. Quotes An excellent resource with real-world examples and best practices to design, optimize, and scale modern applications. - Advait Patel, Broadcom Essential MongoDB resource. Covers new features such as full-text search, vector search, AI, and RAG applications. - Juan Roy, Credit Suisse Reflects author’s practical experience and clear teaching style. It’s packed with real-world examples and up-to-date insights. - Rajesh Nair, MongoDB Champion & community leader This book will definitely make you a MongoDB star! - Vinicios Wentz, JP Morgan & Chase Co.

Cubed is a framework for distributed processing of large arrays without a cluster. Designed to respect memory constraints at all times, Cubed can express any NumPy-like array operation as a series of embarrassingly-parallel, bounded-memory steps. By using Zarr as persistent storage between steps, Cubed can run in a serverless fashion on both a local machine and on a range of Cloud platforms. After explaining Cubed’s model, we will show how Cubed has been integrated with Xarray and demonstrate its performance on various large array geoscience workloads.

The cloud has revolutionized the way we store and process data. However, with the increasing adoption of cloud services, the risk of vendor lock-in has become a major concern for organizations. In this talk, we will explore the similarities between cloud vendor lock-in and how that compares to the creation of database abstraction layers of the past. As usual, the answer is not simple and we will dive deeper on the topic of vendor lock-ins and how to deal with it. What are the pros and cons of a vendor lock-in and how you may already be affected by other lock-in effects which you are not aware of.