talk-data.com talk-data.com

Topic

ETL/ELT

ETL/ELT

data_integration data_transformation data_loading

480

tagged

Activity Trend

40 peak/qtr
2020-Q1 2026-Q1

Activities

480 activities · Newest first

Amazon Redshift Cookbook - Second Edition

Amazon Redshift Cookbook provides practical techniques for utilizing AWS's managed data warehousing service effectively. With this book, you'll learn to create scalable and secure data analytics solutions, tackle data integration challenges, and leverage Redshift's advanced features like data sharing and generative AI capabilities. What this Book will help me do Create end-to-end data analytics solutions from ingestion to reporting using Amazon Redshift. Optimize the performance and security of Redshift implementations to meet enterprise standards. Leverage Amazon Redshift for zero-ETL ingestion and advanced concurrency scaling. Integrate Redshift with data lakes for enhanced data processing versatility. Implement generative AI and machine learning solutions directly within Redshift environments. Author(s) Shruti Worlikar, Harshida Patel, and Anusha Challa are seasoned data experts who bring together years of experience with Amazon Web Services and data analytics. Their combined expertise enables them to offer actionable insights, hands-on recipes, and proven strategies for implementing and optimizing Amazon Redshift-based solutions. Who is it for? This book is best suited for data analysts, data engineers, and architects who are keen on mastering modern data warehouse solutions using Redshift. Readers should have some knowledge of data warehousing and familiarity with cloud concepts. Ideal for professionals looking to migrate on-premises systems or build cloud-native analytics pipelines leveraging Redshift.

Bigtable has been a core piece of application infrastructure for Google and companies such as Snap, Spotify, and many other massive platforms for over 20 years. In this session, we’ll discuss the fundamental changes to Bigtable processing capabilities made available via SQL that will let you bring more data transformations directly into Bigtable – enabling extract, load, and transform (ELT) capabilities taking advantage of Bigtable’s flexible schema to achieve increased data freshness – and that will reduce the time and costs of running other data processing services to prepare data for your real-time application.

Applications of the future require a database that transcends historical paradigms. They require advanced in-database capabilities like Graph RAG, vector and full-text search without compromising on critical database properties of compliance, scale, and availability. In this talk, you'll learn how Spanner's native search and interoperable multi-model capabilities enable your developers to build intelligent, global applications on a single, zero-ETL (extract, transform, and load) data platform.

NVIDIA GPUs accelerate batch ETL workloads at significant cost savings and performance. In this session, we will delve into optimizing Apache Spark on GCP Dataproc using the G2 accelerator-optimized series with L4 GPUs via RAPIDS Accelerator For Apache Spark, showcasing up to 14x speedups and 80% cost reductions for Spark applications. We will demonstrate this acceleration through a reference AI architecture on financial transaction fraud detection, and go through performance measurements.

Unstructured data makes up the majority of all new data; a trend that's been growing exponentially since 2018. At these volumes, vector embeddings require indexes to be trained so that nearest neighbors can be efficiently approximated, avoiding the need for exhaustive lookups. However, training these indexes puts intense demand on vector databases to maintain a high ingest throughput. In this session, we will explain how the NVIDIA cuVS library is turbo charging vector database ingest with GPUs, providing speedups from 5-20x and improving data readiness.

This Session is hosted by a Google Cloud Next Sponsor.
Visit your registration profile at g.co/cloudnext to opt out of sharing your contact information with the sponsor hosting this session.

This talk will demonstrate how the SAP user community can use Looker/Explore Assistant Chatbot to explore data insights into SAP ERP data stored on Google Cloud's BigQuery using natural language prompts. We will discuss the challenge of accessing and analyzing SAP data - ETL, Complex Data Model, introduction to Generative AI and Large Language Models (LLMs), and Looker Explore Assistant and Chatbot This presentation will illustrate how SAP users can leverage Looker and Explore Assistant Chatbot to gain insights into their SAP ERP data residing on Google Cloud's BigQuery, using natural language prompts. We will address common challenges in accessing and analyzing SAP data, such as ETL processes and complex data models. Additionally, we will provide an introduction to Generative AI and Large Language Models (LLMs), as well as an overview of Looker Explore Assistant and Chatbot's capabilities.

Join us to learn how you can build on Google’s intelligent, open, and unified Data Cloud to accelerate your AI transformation. This session covers deep integrations between BigQuery and Google’s operational databases, such as Spanner, AlloyDB, Bigtable, Cloud SQL. Mercado Libre will share how Spanner and Bigtable Data Boost enable near-zero impact analytics on their operational data. Plus, discover how Datastream and change streams simplify data movement to BigQuery, and how reverse ETL (extract, transform, and load) from BigQuery powers operational analytics.

Build robust ETL pipelines on Google Cloud! This hands-on lab teaches you to use Dataflow (Python) and BigQuery to ingest and transform public datasets. Learn design considerations and implementation details to create effective data pipelines for your needs.

If you register for a Learning Center lab, please ensure that you sign up for a Google Cloud Skills Boost account for both your work domain and personal email address. You will need to authenticate your account as well (be sure to check your spam folder!). This will ensure you can arrive and access your labs quickly onsite. You can follow this link to sign up!

session
by Tom Varco (Mattel) , Vinay Balasubramaniam (Google Cloud) , Geeta Banda (Google Cloud) , TJ Allard (Mattel) , Abhishek Kashyap (Google Cloud)

BigQuery is unifying data management, analytics, governance, and AI. Join this session to learn about the latest innovations in BigQuery to help you get actionable insights from your multimodal data and accelerate AI innovation with a secure data foundation and new-gen AI-powered experiences. Hear how Mattel utilized BigQuery to create a no-code, shareable template for data processing, analytics, and AI modeling, leveraging their existing data and streamlining the entire workflow from ETL to AI implementation within a single platform.

Redpanda, a leading Kafka API-compatible streaming platform, now supports storing topics in Apache Iceberg, seamlessly fusing low-latency streaming with data lakehouses using BigQuery and BigLake in GCP. Iceberg Topics eliminate complex & inefficient ETL between streams and tables, making real-time data instantly accessible for analysis in BigQuery This push-button integration eliminates the need for costly connectors or custom pipelines, enabling both simple and sophisticated SQL queries across streams and other datasets. By combining Redpanda and Iceberg, GCP customers gain a secure, scalable, and cost-effective solution that transforms their agility while reducing infrastructure and human capital costs.

This Session is hosted by a Google Cloud Next Sponsor.
Visit your registration profile at g.co/cloudnext to opt out of sharing your contact information with the sponsor hosting this session.

Databricks Certified Data Engineer Associate Study Guide

Data engineers proficient in Databricks are currently in high demand. As organizations gather more data than ever before, skilled data engineers on platforms like Databricks become critical to business success. The Databricks Data Engineer Associate certification is proof that you have a complete understanding of the Databricks platform and its capabilities, as well as the essential skills to effectively execute various data engineering tasks on the platform. In this comprehensive study guide, you will build a strong foundation in all topics covered on the certification exam, including the Databricks Lakehouse and its tools and benefits. You'll also learn to develop ETL pipelines in both batch and streaming modes. Moreover, you'll discover how to orchestrate data workflows and design dashboards while maintaining data governance. Finally, you'll dive into the finer points of exactly what's on the exam and learn to prepare for it with mock tests. Author Derar Alhussein teaches you not only the fundamental concepts but also provides hands-on exercises to reinforce your understanding. From setting up your Databricks workspace to deploying production pipelines, each chapter is carefully crafted to equip you with the skills needed to master the Databricks Platform. By the end of this book, you'll know everything you need to ace the Databricks Data Engineer Associate certification exam with flying colors, and start your career as a certified data engineer from Databricks! You'll learn how to: Use the Databricks Platform and Delta Lake effectively Perform advanced ETL tasks using Apache Spark SQL Design multi-hop architecture to process data incrementally Build production pipelines using Delta Live Tables and Databricks Jobs Implement data governance using Databricks SQL and Unity Catalog Derar Alhussein is a senior data engineer with a master's degree in data mining. He has over a decade of hands-on experience in software and data projects, including large-scale projects on Databricks. He currently holds eight certifications from Databricks, showcasing his proficiency in the field. Derar is also an experienced instructor, with a proven track record of success in training thousands of data engineers, helping them to develop their skills and obtain professional certifications.

Deepti Srivastava, Founder of Snow Leopard AI and former Spanner Product Lead at Google Cloud, joined Yuliia to chat what's wrong with current approaches to AI integration. Deepti introduces a paradigm shift away from ETL pipelines towards federated, real-time data access for AI applications. She explains how Snow Leopard's intelligent data retrieval platform enables enterprises to connect AI systems directly to operational data sources without compromising security or freshness. Through practical examples Deepti explains why conventional RAG approaches with vector stores are not good enough for business-critical AI applications, and how a systems thinking approach to AI infrastructure can unlock greater value while reducing unnecessary data movement.Deepti's linkedin - https://www.linkedin.com/in/thedeepti/Snowleopard.ai - http://snowleopard.ai/

With the proliferation of SaaS ELT tools many organizations don't realize that Google BigQuery offers many ways to ingest data from different platforms for free. This presentation will walk through the most important native export and data transfer mechanisms and will show how data from these platforms can be integrated to enable a comprehensive view on digital marketing efforts for an organization. Various use cases will be presented as well to generate tangible insights from this integrated data that help increase the bottom line.

Essential Data Analytics, Data Science, and AI: A Practical Guide for a Data-Driven World

In today’s world, understanding data analytics, data science, and artificial intelligence is not just an advantage but a necessity. This book is your thorough guide to learning these innovative fields, designed to make the learning practical and engaging. The book starts by introducing data analytics, data science, and artificial intelligence. It illustrates real-world applications, and, it addresses the ethical considerations tied to AI. It also explores ways to gain data for practice and real-world scenarios, including the concept of synthetic data. Next, it uncovers Extract, Transform, Load (ETL) processes and explains how to implement them using Python. Further, it covers artificial intelligence and the pivotal role played by machine learning models. It explains feature engineering, the distinction between algorithms and models, and how to harness their power to make predictions. Moving forward, it discusses how to assess machine learning models after their creation, with insights into various evaluation techniques. It emphasizes the crucial aspects of model deployment, including the pros and cons of on-device versus cloud-based solutions. It concludes with real-world examples and encourages embracing AI while dispelling fears, and fostering an appreciation for the transformative potential of these technologies. Whether you’re a beginner or an experienced professional, this book offers valuable insights that will expand your horizons in the world of data and AI. What you will learn: What are Synthetic data and Telemetry data How to analyze data using programming languages like Python and Tableau. What is feature engineering What are the practical Implications of Artificial Intelligence Who this book is for: Data analysts, scientists, and engineers seeking to enhance their skills, explore advanced concepts, and stay up-to-date with ethics. Business leaders and decision-makers across industries are interested in understanding the transformative potential and ethical implications of data analytics and AI in their organizations.

Dave and Johnny run Estuary, a data integration company focused on real-time ETL and ELT. We're also friends, so we decided to have a chat.

In this episode, we chat about the current state of the data integration space, running a startup while raising kids, and much more.

Estuary

AWS re:Invent 2024 - Deep dive into Amazon DynamoDB zero-ETL integrations (DAT348)

Amazon DynamoDB is a serverless, NoSQL, fully managed database with single-digit millisecond performance at any scale. DynamoDB lends itself to easy integration with several other AWS services. In this session, dive deep into zero-ETL integrations between Amazon DynamoDB and Amazon SageMaker Lakehouse, Amazon OpenSearch Service, and Amazon Redshift. Learn from AWS experts about how these integrations can reduce operational burden and cost, allowing you to focus on creating value from data instead of preparing data for analysis.

Learn more: AWS re:Invent: https://go.aws/reinvent. More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

About AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2024

AWS re:Invent 2024 - AI-powered data integration and governance with Amazon Q Developer (ANT352-NEW)

Discover how the AI-driven capabilities of Amazon Q Developer streamline data integration across AWS services, such as AWS Glue, Amazon SageMaker Catalog, Amazon Redshift, Amazon SageMaker AI, and more. Learn how data engineers and ETL developers can build complex jobs, troubleshoot, and explore data using natural language through an intuitive chat interface in Amazon SageMaker Unified Studio. Join this session to see how Amazon Q Developer enhances productivity and accelerates workflows, transforming the way you handle data integration.

Learn more: AWS re:Invent: https://go.aws/reinvent. More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

About AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2024

AWS re:Invent 2024-Zero-ETL replication to Amazon SageMaker Lakehouse & Amazon Redshift (ANT353-NEW)

In today’s data-driven landscape, organizations rely on enterprise applications to manage critical business processes. However, extracting and integrating this data into data warehouses and data lakes can be complex. This session explores a new zero-ETL capability that simplifies ingesting data to Amazon SageMaker Lakehouse and Amazon Redshift via AWS Glue from enterprise applications such as Salesforce, ServiceNow, and Zendesk. See how zero-ETL automates the extract and load process, expanding your analytics and machine solutions with valuable SaaS data.

Learn more: AWS re:Invent: https://go.aws/reinvent. More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

About AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2024

AWS re:Invent 2024 - Analyze Amazon Aurora & RDS data in Amazon Redshift with zero-ETL (DAT331)

Discover the power of Amazon Aurora and Amazon RDS zero-ETL integrations with Amazon Redshift. Zero-ETL integrations help unify your data across applications and data sources for holistic insights. This session explores how Amazon Aurora and Amazon RDS zero-ETL integrations with Amazon Redshift remove the need to build and manage complex data pipelines, enabling analytics and machine learning using Amazon Redshift on petabytes of transactional data from your relational databases. In this session, learn about key zero-ETL integration functionalities like data filtering, AWS CloudFormation support, and more.

Learn more: AWS re:Invent: https://go.aws/reinvent. More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

About AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2024

Frank Munz: A Journey in Space with Apache Kafka data streams from NASA

🌟 Session Overview 🌟

Session Name: Supernovas, Black Holes, and Streaming Data: A Journey in Space with Apache Kafka data streams from NASA Speaker: Frank Munz Session Description: In this fun, hands-on, and in-depth How-To, we explore NASA's GCN project, which publishes various events in space as Kafka topics.

The focus of my talk is on end-to-end data engineering, from consuming the data and ELT-ing the stream, to using generative AI tools for analytics.

We will analyze GCN data in real time, specifically targeting the data stream from exploding supernovas. This data triggers dozens of terrestrial telescopes to potentially reposition and point toward the event.

The speaker will kick off the session by contrasting various ways of ingesting and transforming the data, discussing their trade-offs: Should you use a declarative data pipeline, or can a data analyst manage with SQL only? Alternatively, when would it be better to follow the classic approach of orchestrating Spark notebooks to get the data ingested?

He will answer the question: Does a data engineer working with streaming data benefit from generative AI-based tools and assistants today? Is it worth it, or is it just hype?

The demo is easy to replicate at home, and Frank will share the notebooks in a GitHub repository so you can analyze real NASA data yourself!

This session is ideal for data engineers, data architects who enjoy some coding, generative AI enthusiasts, or anyone fascinated by technology and the sparkling stars in the night sky.

While the focus is clearly on tech, the demo will run on the open-source and open-standards-based Databricks Intelligence Platform (so inevitably, you'll get a high-level overview here too).

🚀 About Big Data and RPA 2024 🚀

Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨

📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP

💡 Stay Connected & Updated 💡

Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop!

🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT

Sonal Goyal: Open Source Entity Resolution - Needs and Challenges

🌟 Session Overview 🌟

Session Name: Open Source Entity Resolution - Needs and Challenges Speaker: Sonal Goyal Session Description: Real world data contains multiple records belonging to the same customer. These records can be in single or multiple systems and they have variations across fields, which makes it hard to combine them together, especially with growing data volumes. This hurts customer analytics - establishing lifetime value, loyalty programs, or marketing channels is impossible when the base data is not linked. No AI algorithm for segmentation can produce the right results when there are multiple copies of the same customer lurking in the data. No warehouse can live up to its promise if the dimension tables have duplicates.

With a modern data stack and DataOps, we have established patterns for E and L in ELT for building data warehouses, datalakes and deltalakes. However, the T - getting data ready for analytics still needs a lot of effort. Modern tools like dbt are actively and successfully addressing this. What is also needed is a quick and scalable way to resolve entities to build the single source of truth of core business entities post Extraction and pre or post Loading.

This session would cover the problem of Entity Resolution, its practical applications and challenges in building an entity resolution system. It will also cover Zingg - an Open Source Framework for building Entity Resolution systems. (https://github.com/zinggAI/zingg/) 🚀 About Big Data and RPA 2024 🚀

Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨

📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP

💡 Stay Connected & Updated 💡

Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop!

🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT