talk-data.com talk-data.com

Topic

Data Lake

big_data data_storage analytics

311

tagged

Activity Trend

28 peak/qtr
2020-Q1 2026-Q1

Activities

311 activities · Newest first

Advanced SQL

SQL is no longer just a querying language for relational databases—it's a foundational tool for building scalable, modern data solutions across real-time analytics, machine learning workflows, and even generative AI applications. Advanced SQL shows data professionals how to move beyond conventional SELECT statements and tap into the full power of SQL as a programming interface for today's most advanced data platforms. Written by seasoned data experts Rui Pedro Machado, Hélder Russa, and Pedro Esmeriz, this practical guide explores the role of SQL in streaming architectures (like Apache Kafka and Flink), data lake ecosystems, cloud data warehouses, and ML pipelines. Geared toward data engineers, analysts, scientists, and analytics engineers, the book combines hands-on guidance with architectural best practices to help you extend your SQL skills into emerging workloads and real-world production systems. Use SQL to design and deploy modern, end-to-end data architectures Integrate SQL with data lakes, stream processing, and cloud platforms Apply SQL in feature engineering and ML model deployment Master pipe syntax and other advanced features for scalable, efficient queries Leverage SQL to build GenAI-ready data applications and pipelines

From Data Lake Entanglement to Data Mesh Decoupling: Scaling a Self-Service Data Platform

Our data platform journey started with a classic data lake — easy to ingest, hard to evolve. As domains scaled, tight coupling across source systems, pipelines, and data products slowed everything down. In this talk, we share how we re-architected toward a domain-oriented data mesh using PySpark, Delta Lake and DQX to achieve true decoupling. Expect practical lessons on designing independent data products, managing lineage and governance, and scaling self-service without chaos.

AWS re:Invent 2025 - iTTi's Cross-Company Data Mesh Blueprint with Amazon SageMaker (ANT342)

This session shares the journey of implementing a hybrid Data Mesh architecture within a multi-company holding, balancing centralization and decentralization needs. We will cover how our hybrid approach leverages Amazon EMR on EKS for data lake ingestion and Amazon SageMaker to enable self-service data discovery, data product subscription, and consumption, allowing companies within the group to autonomously explore, access, and utilize data products while maintaining centralized governance. Through ITTI - Grupo Vázquez's real-world experience, attendees will learn how this hybrid data mesh architecture successfully addresses diverse data domains, varying governance requirements, and rapid value delivery needs.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - A practitioner’s guide to data for agentic AI (DAT315)

In this session, gain the skills needed to deploy end-to-end agentic AI applications using your most valuable data. This session focuses on data management using processes like Model Context Protocol (MCP) and Retrieval Augmented Generation (RAG), and provides concepts that apply to other methods of customizing agentic AI applications. Discover best practice architectures using AWS database services like Amazon Aurora and OpenSearch Service, along with analytical, data processing and streaming experiences found in SageMaker Unified Studio. Learn data lake, governance, and data quality concepts and how Amazon Bedrock AgentCore and Bedrock Knowledge Bases, and other features tie solution components together.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Architecting the future: Amazon SageMaker as a data and AI platform (ANT351)

Learn how organizations are modernizing their data architectures using the next generation of Amazon SageMaker to create AI-ready data platforms. This session covers how to design and build modern data architectures that enable both innovation and control, examining real-world patterns for data lake consolidation, cross-account governance, and AI integration. This session is ideal for enterprise architects and technical leaders planning large-scale architectural transformations to support their AI/ML initiatives.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

In this episode, I sit down with Mark Freeman and Chad Sanderson (Gable.ai) to discuss the release of their new O’Reilly book, Data Contracts: Developing Production-Grade Pipelines at Scale. They dive deep into the chaotic journey of writing a 350-page book while simultaneously building a venture-backed startup. The conversation takes a sharp turn into the evolution of Data Contracts. While the concept started with data engineers, Mark and Chad explain why they pivoted their focus to software engineers. They argue that software engineers are facing a "Data Lake Moment, "prioritizing speed over craftsmanship, resulting in massive technical debt and integration failures.

Gable: https://www.gable.ai/

AWS re:Invent 2025 - Using graphs over your data lake to power generative AI applications (DAT447)

In this session, learn about new Amazon Neptune capabilities for high-performance graph analytics and queries over data lakes to unlock the implicit and explicit relationships in your data, driving more accurate, trustworthy generative AI responses. We'll demonstrate building knowledge graphs from structured and unstructured data, combining graph algorithms (PageRank, Louvain clustering, path optimization) with semantic search, and executing Cypher queries on Parquet and Iceberg formats in Amazon S3. Through code samples and benchmarks, learn advanced architectures to use Neptune for multi-hop reasoning, entity linking, and context enrichment at scale. This session assumes familiarity with graph concepts and data lake architectures.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

In this episode, Tristan Handy sits down with Chang She — a co-creator of Pandas and now CEO of LanceDB — to explore the convergence of analytics and AI engineering. The team at LanceDB is rebuilding the data lake from the ground up with AI as a first principle, starting with a new AI-native file format called Lance. Tristan traces Chang's journey as one of the original contributors to the pandas library to building a new infrastructure layer for AI-native data. Learn why vector databases alone aren't enough, why agents require new architecture, and how LanceDB is building a AI lakehouse for the future. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

Update description to Unlock the power of Microsoft Sentinel with this hands-on lab. Hunt threats using KQL across the data lake and leverage graph-based insights to detect anomalies, investigate alerts, and strengthen SOC workflows.

Please RSVP and arrive at least 5 minutes before the start time, at which point remaining spaces are open to standby attendees.

As security data grows, it is critical for organizations to balance business outcomes and budgets. In this session, discover how Microsoft Sentinel data lake simplifies data management, accelerates AI adoption and optimizes costs. Serving as the foundation of the Microsoft Sentinel platform, it unifies all security data for greater visibility, deeper analysis, and contextual awareness. We’ll share real-world examples, tools, and best practices to help you maximize Microsoft Sentinel value.

Update description to Unlock the power of Microsoft Sentinel with this hands-on lab. Hunt threats using KQL across the data lake and leverage graph-based insights to detect anomalies, investigate alerts, and strengthen SOC workflows.

Please RSVP and arrive at least 5 minutes before the start time, at which point remaining spaces are open to standby attendees.

Partners: Grow Your Modern SecOps Practice with the Unified Platform

Step into the future of security operations! Unlock insights and actionable guidance for partners ready to amplify their security practice and grow their revenue streams through the Modern SecOps with Unified Platform solution play. Learn how to capitalize on cutting-edge tools like the new Microsoft Sentinel data lake & Microsoft Sentinel graph to transform threat detection, response, and investigation. Explore GTM resources & incentives to drive impactful customer engagements at every stage.

Breaches are inevitable. Disasters are optional. Watch Illumio Insights, an AI-powered cloud detection and response solution, transform complex threat detection into visual containment in minutes. As a Sentinel data lake launch partner, Insights is accessible from within Sentinel. Insights Agent within Security Copilot identifies lateral movement risk, contains threats instantly, and suggests microsegmentation policies. Stop intruders and turn threat containment into a 5-minute “wow” moment.

Update description to Unlock the power of Microsoft Sentinel with this hands-on lab. Hunt threats using KQL across the data lake and leverage graph-based insights to detect anomalies, investigate alerts, and strengthen SOC workflows.

Please RSVP and arrive at least 5 minutes before the start time, at which point remaining spaces are open to standby attendees.

Bringing in custom application data, on-premises logs, or migrating from another SIEM into Microsoft Sentinel can be complex and full of hidden pitfalls. In this session, we’ll share strategies to overcome common ingestion challenges, improve SOC detections and hunting, optimize data retention, and get the most value from the Sentinel data lake. Join the discussion to exchange real-world lessons learned and approaches that work.

Connection Pods accommodate up to 15 people. Please RSVP and arrive at least 5 minutes before the start time, at which point remaining spaces are open to standby attendees.

Discover how to build a cost-efficient log management solution using Microsoft Sentinel and Azure Data Lake. In this session, we’ll walk through a real-world example that demonstrates how to optimize data ingestion, storage, and retention without compromising visibility or compliance. Learn practical strategies to reduce costs while maintaining a scalable and secure logging architecture.

Connection Pods accommodate up to 15 people. Please RSVP and arrive at least 5 minutes before the start time, at which point remaining spaces are open to standby attendees.

Update description to Unlock the power of Microsoft Sentinel with this hands-on lab. Hunt threats using KQL across the data lake and leverage graph-based insights to detect anomalies, investigate alerts, and strengthen SOC workflows.

Please RSVP and arrive at least 5 minutes before the start time, at which point remaining spaces are open to standby attendees.