talk-data.com talk-data.com

Topic

Data Streaming

realtime event_processing data_flow

739

tagged

Activity Trend

70 peak/qtr
2020-Q1 2026-Q1

Activities

739 activities · Newest first

In this podcast episode, we talked with Adrian Brudaru about ​the past, present and future of data engineering.

About the speaker: Adrian Brudaru studied economics in Romania but soon got bored with how creative the industry was, and chose to go instead for the more factual side. He ended up in Berlin at the age of 25 and started a role as a business analyst. At the age of 30, he had enough of startups and decided to join a corporation, but quickly found out that it did not provide the challenge he wanted. As going back to startups was not a desirable option either, he decided to postpone his decision by taking freelance work and has never looked back since. Five years later, he co-founded a company in the data space to try new things. This company is also looking to release open source tools to help democratize data engineering.

0:00 Introduction to DataTalks.Club 1:05 Discussing trends in data engineering with Adrian 2:03 Adrian's background and journey into data engineering 5:04 Growth and updates on Adrian's company, DLT Hub 9:05 Challenges and specialization in data engineering today 13:00 Opportunities for data engineers entering the field 15:00 The "Modern Data Stack" and its evolution 17:25 Emerging trends: AI integration and Iceberg technology 27:40 DuckDB and the emergence of portable, cost-effective data stacks 32:14 The rise and impact of dbt in data engineering 34:08 Alternatives to dbt: SQLMesh and others 35:25 Workflow orchestration tools: Airflow, Dagster, Prefect, and GitHub Actions 37:20 Audience questions: Career focus in data roles and AI engineering overlaps 39:00 The role of semantics in data and AI workflows 41:11 Focusing on learning concepts over tools when entering the field 45:15 Transitioning from backend to data engineering: challenges and opportunities 47:48 Current state of the data engineering job market in Europe and beyond 49:05 Introduction to Apache Iceberg, Delta, and Hudi file formats 50:40 Suitability of these formats for batch and streaming workloads 52:29 Tools for streaming: Kafka, SQS, and related trends 58:07 Building AI agents and enabling intelligent data applications 59:09Closing discussion on the place of tools like DBT in the ecosystem

🔗 CONNECT WITH ADRIAN BRUDARU Linkedin -  / data-team   Website - https://adrian.brudaru.com/ 🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/... Check other upcoming events - https://lu.ma/dtc-events LinkedIn -  /datatalks-club   Twitter -  /datatalksclub   Website - https://datatalks.club/

Databricks Certified Data Engineer Associate Study Guide

Data engineers proficient in Databricks are currently in high demand. As organizations gather more data than ever before, skilled data engineers on platforms like Databricks become critical to business success. The Databricks Data Engineer Associate certification is proof that you have a complete understanding of the Databricks platform and its capabilities, as well as the essential skills to effectively execute various data engineering tasks on the platform. In this comprehensive study guide, you will build a strong foundation in all topics covered on the certification exam, including the Databricks Lakehouse and its tools and benefits. You'll also learn to develop ETL pipelines in both batch and streaming modes. Moreover, you'll discover how to orchestrate data workflows and design dashboards while maintaining data governance. Finally, you'll dive into the finer points of exactly what's on the exam and learn to prepare for it with mock tests. Author Derar Alhussein teaches you not only the fundamental concepts but also provides hands-on exercises to reinforce your understanding. From setting up your Databricks workspace to deploying production pipelines, each chapter is carefully crafted to equip you with the skills needed to master the Databricks Platform. By the end of this book, you'll know everything you need to ace the Databricks Data Engineer Associate certification exam with flying colors, and start your career as a certified data engineer from Databricks! You'll learn how to: Use the Databricks Platform and Delta Lake effectively Perform advanced ETL tasks using Apache Spark SQL Design multi-hop architecture to process data incrementally Build production pipelines using Delta Live Tables and Databricks Jobs Implement data governance using Databricks SQL and Unity Catalog Derar Alhussein is a senior data engineer with a master's degree in data mining. He has over a decade of hands-on experience in software and data projects, including large-scale projects on Databricks. He currently holds eight certifications from Databricks, showcasing his proficiency in the field. Derar is also an experienced instructor, with a proven track record of success in training thousands of data engineers, helping them to develop their skills and obtain professional certifications.

Supported by Our Partners • WorkOS — The modern identity platform for B2B SaaS • CodeRabbit — Cut code review time and bugs in half • Augment Code — AI coding assistant that pro engineering teams love — How do you architect a live streaming system to deal with more load than it’s ever been done before? Today, we hear from an architect of such a system: Ashutosh Agrawal, formerly Chief Architect of JioCinema (and currently Staff Software Engineer at Google DeepMind.) We take a deep dive into video streaming architecture, tackling the complexities of live streaming at scale (at tens of millions of parallel streams) and the challenges engineers face in delivering seamless experiences. We talk about the following topics:  • How large-scale live streaming architectures are designed • Tradeoffs in optimizing performance • Early warning signs of streaming failures and how to detect them • Why capacity planning for streaming is SO difficult • The technical hurdles of streaming in APAC regions • Why Ashutosh hates APMs (Application Performance Management systems) • Ashutosh’s advice for those looking to improve their systems design expertise • And much more! — Timestamps (00:00) Intro (01:28) The world record-breaking live stream and how support works with live events (05:57) An overview of streaming architecture (21:48) The differences between internet streaming and traditional television.l (22:26) How adaptive bitrate streaming works (25:30) How throttling works on the mobile tower side  (27:46) Leading indicators of streaming problems and the data visualization needed (31:03) How metrics are set  (33:38) Best practices for capacity planning  (35:50) Which resources are planned for in capacity planning  (37:10) How streaming services plan for future live events with vendors (41:01) APAC specific challenges (44:48) Horizontal scaling vs. vertical scaling  (46:10) Why auto-scaling doesn’t work (47:30) Concurrency: the golden metric to scale against (48:17) User journeys that cause problems  (49:59) Recommendations for learning more about video streaming  (51:11) How Ashutosh learned on the job (55:21) Advice for engineers who would like to get better at systems (1:00:10) Rapid fire round — The Pragmatic Engineer deepdives relevant for this episode: • Software architect archetypes https://newsletter.pragmaticengineer.com/p/software-architect-archetypes  • Engineering leadership skill set overlaps https://newsletter.pragmaticengineer.com/p/engineering-leadership-skillset-overlaps  • Software architecture with Grady Booch https://newsletter.pragmaticengineer.com/p/software-architecture-with-grady-booch — See the transcript and other references from the episode at ⁠⁠https://newsletter.pragmaticengineer.com/podcast⁠⁠ — Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email [email protected].

Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

Supported by Our Partners • WorkOS — The modern identity platform for B2B SaaS • CodeRabbit — Cut code review time and bugs in half • Augment Code — AI coding assistant that pro engineering teams love — How do you architect a live streaming system to deal with more load than it’s ever been done before? Today, we hear from an architect of such a system: Ashutosh Agrawal, formerly Chief Architect of JioCinema (and currently Staff Software Engineer at Google DeepMind.) We take a deep dive into video streaming architecture, tackling the complexities of live streaming at scale (at tens of millions of parallel streams) and the challenges engineers face in delivering seamless experiences. We talk about the following topics:  • How large-scale live streaming architectures are designed • Tradeoffs in optimizing performance • Early warning signs of streaming failures and how to detect them • Why capacity planning for streaming is SO difficult • The technical hurdles of streaming in APAC regions • Why Ashutosh hates APMs (Application Performance Management systems) • Ashutosh’s advice for those looking to improve their systems design expertise • And much more! — Timestamps (00:00) Intro (01:28) The world record-breaking live stream and how support works with live events (05:57) An overview of streaming architecture (21:48) The differences between internet streaming and traditional television.l (22:26) How adaptive bitrate streaming works (25:30) How throttling works on the mobile tower side  (27:46) Leading indicators of streaming problems and the data visualization needed (31:03) How metrics are set  (33:38) Best practices for capacity planning  (35:50) Which resources are planned for in capacity planning  (37:10) How streaming services plan for future live events with vendors (41:01) APAC specific challenges (44:48) Horizontal scaling vs. vertical scaling  (46:10) Why auto-scaling doesn’t work (47:30) Concurrency: the golden metric to scale against (48:17) User journeys that cause problems  (49:59) Recommendations for learning more about video streaming  (51:11) How Ashutosh learned on the job (55:21) Advice for engineers who would like to get better at systems (1:00:10) Rapid fire round — The Pragmatic Engineer deepdives relevant for this episode: • Software architect archetypes https://newsletter.pragmaticengineer.com/p/software-architect-archetypes  • Engineering leadership skill set overlaps https://newsletter.pragmaticengineer.com/p/engineering-leadership-skillset-overlaps  • Software architecture with Grady Booch https://newsletter.pragmaticengineer.com/p/software-architecture-with-grady-booch — See the transcript and other references from the episode at ⁠⁠https://newsletter.pragmaticengineer.com/podcast⁠⁠ — Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email [email protected].

Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

Episode Description Ever feel like your phone knows you a little too well? One Google search, and suddenly, ads follow you across the internet like a digital stalker. AI-powered personalization has long relied on collecting massive amounts of personal data—but what if it didn’t have to? In this episode of Data & AI with Mukundan, we explore a game-changing shift in AI—personalized experiences without intrusive tracking. Two groundbreaking techniques, Sequential Layer Expansion and FedSelect, are reshaping how AI learns from users while keeping their data private. We’ll break down: ✅ Why AI personalization has been broken until now ✅ How these new models improve AI recommendations without privacy risks ✅ Real-world applications in streaming, e-commerce, and healthcare ✅ How AI can respect human identity while scaling globally The future of AI is personal, but it doesn’t have to be invasive. Tune in to discover how AI can work for you—without spying on you. Key Takeaways 🔹 The Problem: Why AI Personalization Has Been Broken Streaming services, e-commerce, and healthcare AI often make irrelevant or generic recommendations.Most AI models collect massive amounts of user data, stored on centralized servers—risking leaks, breaches, and misuse.AI personalization has been a “one-size-fits-all” approach that doesn’t truly adapt to individual needs.🔹 The Solution: AI That Learns Without Spying on You ✨ Sequential Layer Expansion – AI that grows with you Instead of static AI models, this method builds in layers, adapting over time.It learns only what’s relevant to you, reducing unnecessary data collection.Think of it like training for a marathon—starting small and progressively improving.✨ FedSelect – AI that fine-tunes only what matters Instead of changing an entire AI model, it selectively updates the most relevant parameters.Think of it like tuning a car—you upgrade what’s needed instead of replacing the whole engine.Everything happens locally on your device, meaning your raw data never leaves.🔹 Real-World Impact: How This Changes AI for You 🎬 Streaming Services – Netflix finally gets your taste right—without tracking you across the web. 🛍️ E-commerce – Shopping apps suggest what you actually need, not random trending items. 🏥 Healthcare – AI-powered health plans tailored to your genes and habits—without sharing your medical data. 🔹 The Bigger Picture: Why This Matters for the Future of AI Personalized AI at scale: AI adapts to billions of users while remaining privacy-first.AI that respects human identity: You control your AI, not the other way around.The end of surveillance-style tracking: No more creepy ads following you around.🌟 AI can be personal—without being invasive. That’s the future we should all demand. Fedselect: https://arxiv.org/abs/2404.02478 | Sequential Layer Expansion:https://arxiv.org/abs/2404.17799 🔔 Subscribe, rate, and review for more AI insights!

A look inside at the data work happening at a company making some of the most advanced technologies in the industry. Rahul Jain, data engineering manager at Snowflake, joins Tristan to discuss Iceberg, streaming, and all things Snowflake.  For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

Bobur Umurzokov: Shaping the Future of Real-time Data Pipeline

🌟 Session Overview 🌟

Session Name: Shaping the Future of Real-time Data Pipeline Speaker: Bobur Umurzokov Session Description: The rise of real-time data processing has transformed business operations, yet navigating its technical challenges remains complex. Organizations often wrestle with managing distinct batch and streaming data workflows, each presenting unique difficulties. Batch processing, while effective for large datasets, can be costly, slow, and not well-suited for streaming API integration. On the other hand, streaming, despite its speed and low latency, often has restricted functionality.

This talk is prepared for developers, data engineers, and tech visionaries eager to explore how to build an efficient, dynamic, and unified data pipeline for both scenarios using streaming platforms in Python. You will see, with examples, how simple it is to make your batch code run in streaming with serverless infrastructure from day one.

🚀 About Big Data and RPA 2024 🚀

Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨

📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP

💡 Stay Connected & Updated 💡

Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop!

🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT

Alexey Novakov: Streamhouse Architecture with Flink and Paimon

🌟 Session Overview 🌟

Session Name: Speaker: Alexey Novakov Session Description: Today, many data teams choose lakehouse architecture for their data platforms. But what if they process all data in streaming mode? Then they end up building a streaming lakehouse, or 'streamhouse' for short! This means they use stream processing engines to ingest, transform, and analyze business data in near real-time. However, they still want to use inexpensive storage infrastructure. How can they achieve that?

This talk introduces data teams to tools like Apache Paimon in combination with Flink. Paimon has been built with a strong focus on streaming workflows, serving as a table format in a lakehouse. It takes the stream processing approach in lakehouse architecture to the next level compared to other table formats that are more oriented towards batch data. After this talk, data teams will know how to use Paimon and Flink to build a cost-efficient and fast data layer for different data processing scenarios.

🚀 About Big Data and RPA 2024 🚀

Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨

📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP

💡 Stay Connected & Updated 💡

Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop!

🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT

Takahiko Saito: Empowering Real-Time ML Inference and Training with GRIS

🌟 Session Overview 🌟

Session Name: Empowering Real-Time ML Inference and Training with GRIS: A Deep Dive into High Availability and Low Latency Data Solutions Speaker: Takahiko Saito Session Description: In the rapidly evolving landscape of machine learning (ML) and data processing, the need for real-time data delivery systems that offer high availability, low latency, and robust service level agreements (SLAs) has never been more critical. This session introduces GRIS (Generic Real-time Inference Service), a cutting-edge platform designed to meet these demands head-on, facilitating real-time ML inference and historical data processing for ML model training.

Attendees will gain insights into GRIS's capabilities, including its support for real-time data delivery for ML inference, products requiring high availability, low latency, and strong SLA adherence, and real-time product performance monitoring. We will explore how GRIS prioritizes use cases off the Netflix critical path, such as choosing, playback, and sign-up processes, while ensuring data delivery for critical real-time monitoring tasks like anomaly detection during product launches and live events.

The session will delve into the key design decisions and challenges faced during the MVP release of GRIS, highlighting its low latency, high availability gRPC API for inference, and the use of Granular Historical Dataset via Iceberg for training. We will discuss the MVP metrics, including feature groups, categories, and aggregation windows, and how these elements contribute to the platform's effectiveness in real-time data processing.

Furthermore, we will cover the production readiness of GRIS, including streaming jobs, on-call alerts, and data quality measures. The session will provide a comprehensive overview of the MVP data quality framework for GRIS, including online and offline checks, and how these measures ensure the integrity and consistency of data processed by the platform.

Looking ahead, the roadmap for GRIS will be presented, outlining the journey from POC to GA, including the introduction of processor metrics, event-level transaction history, and the next batch of metrics for advanced aggregation types. We will also discuss the potential for a user-facing metrics definition API/DSL and how GRIS is poised to enable new use cases for teams across various domains.

This session is a must-attend for data scientists, ML engineers, and technology leaders looking to stay at the forefront of real-time data processing and ML model training. Whether you're interested in the technical underpinnings of GRIS or its application in real-world scenarios, this session will provide valuable insights into how high availability, low latency data solutions are shaping the future of ML and data analytics.

🚀 About Big Data and RPA 2024 🚀

Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨

📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP

💡 Stay Connected & Updated 💡

Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop!

🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT

Frank Munz: A Journey in Space with Apache Kafka data streams from NASA

🌟 Session Overview 🌟

Session Name: Supernovas, Black Holes, and Streaming Data: A Journey in Space with Apache Kafka data streams from NASA Speaker: Frank Munz Session Description: In this fun, hands-on, and in-depth How-To, we explore NASA's GCN project, which publishes various events in space as Kafka topics.

The focus of my talk is on end-to-end data engineering, from consuming the data and ELT-ing the stream, to using generative AI tools for analytics.

We will analyze GCN data in real time, specifically targeting the data stream from exploding supernovas. This data triggers dozens of terrestrial telescopes to potentially reposition and point toward the event.

The speaker will kick off the session by contrasting various ways of ingesting and transforming the data, discussing their trade-offs: Should you use a declarative data pipeline, or can a data analyst manage with SQL only? Alternatively, when would it be better to follow the classic approach of orchestrating Spark notebooks to get the data ingested?

He will answer the question: Does a data engineer working with streaming data benefit from generative AI-based tools and assistants today? Is it worth it, or is it just hype?

The demo is easy to replicate at home, and Frank will share the notebooks in a GitHub repository so you can analyze real NASA data yourself!

This session is ideal for data engineers, data architects who enjoy some coding, generative AI enthusiasts, or anyone fascinated by technology and the sparkling stars in the night sky.

While the focus is clearly on tech, the demo will run on the open-source and open-standards-based Databricks Intelligence Platform (so inevitably, you'll get a high-level overview here too).

🚀 About Big Data and RPA 2024 🚀

Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨

📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP

💡 Stay Connected & Updated 💡

Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop!

🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT

Natalija Tumanova: Building a Unified Viewing Experience with Multi-Provider TV Recommendation

🌟 Session Overview 🌟

Session Name: Building a Unified Viewing Experience with Multi-Provider TV Recommendation Engine Speaker: Natalija Tumanova Session Description: In today’s entertainment landscape, users expect seamless access to their favorite shows, whether from streaming platforms or live TV, across languages and regions. Telia Lithuania provides a TV service offering content from three different platforms, nearly a hundred multilingual linear channels, and a library of films and series for rent.

This presentation will take you on a journey through the design and development of a TV recommendation engine that integrates content from these multiple sources, overcoming the associated challenges across streams, formats, and languages. Starting from the initial prototype, we’ll explore how we tackled complexities such as metadata harmonization and the scarcity of linear TV attributes. As the project evolved into a more advanced AI-driven system, we incorporated machine learning techniques and leveraged large language models (LLMs) for specific subtasks.

🚀 About Big Data and RPA 2024 🚀

Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨

📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP

💡 Stay Connected & Updated 💡

Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop!

🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT

Jan Mensch: Insights into Your Cloud Database: How Storage Engines Actually Work

🌟 Session Overview 🌟

Session Name: Insights into Your Cloud Database: How Storage Engines Actually Work Speaker: Jan Mensch Session Description: In this session, we will dive into the inner workings of cloud storage engines by exploring Hummock, the storage engine behind RisingWave, a streaming database. We will cover how data writes occur in Hummock, focusing on the crucial role of MemTables in managing data before persistence. You will gain an understanding of Log-Structured Merge (LSM) trees and their importance in optimizing both read and write performance. Additionally, we will explore the function of L0 sublevels in accelerating the compaction process. We’ll discuss Sorted String Tables (SSTs), including how they organize data, their versioning, and how this versioning connects to distributed snapshots in streaming systems. Furthermore, we will examine the necessity of compaction and how it represents a trade-off between read and write amplification. By the end of the session, you will gain valuable insights into the mechanics of LSM storage engines and their role in powering streaming databases. 🚀 About Big Data and RPA 2024 🚀

Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨

📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP

💡 Stay Connected & Updated 💡

Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop!

🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT

Raghav Matta: Leveraging Azure PaaS for Real-time Social Media Analysis

🌟 Session Overview 🌟

Session Name: Leveraging Azure PaaS for Real-time Social Media Analysis by Building Streaming Dashboard Speaker: Raghav Matta Session Description: In this session, Raghav and Sundar will delve into a practical business scenario focusing on real-time social media analysis using Azure PaaS offerings.

  1. They will begin by addressing a prevalent business challenge concerning social media sentiment analysis.

  2. Next, speakers explore a range of Azure services including Azure Functions, Logic Apps, Cognitive Services, Stream Analytics, PowerBI, and Azure Databricks.

  3. Moving forward, they will demonstrate how to gather live data in real-time utilizing Azure Cognitive Services Bing Web Search API. Subsequently, they will analyze the data using Azure Stream Analytics and visualize insights using PowerBI.

This course combines hands-on labs with theoretical curriculum aligned with the 'Exam AI-102: Designing and Implementing a Microsoft Azure AI Solution'.

For further information and resources, please refer to: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-twitter-sentiment-analysis-trends https://microsoftlearning.github.io/AI-102-AIEngineer/Instructions/05-analyze-text.html 🚀 About Big Data and RPA 2024 🚀

Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨

📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP

💡 Stay Connected & Updated 💡

Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop!

🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT

AWS re:Invent 2024 - Build large-scale transactional data lakes with open table formats (ANT336)

Transform your data landscape by building large-scale transactional data lakes using open table formats (OTFs) with AWS analytics services. The rise of generative AI and ML demands robust and scalable data infrastructure, and OTFs offer a cutting-edge solution for modern data architectures. Learn best practices for operating tables at scale, focusing on high performance, cost optimization, and operational excellence. This session also covers streaming data challenges, showcasing how OTFs enable seamless schema evolution and strong reliability for streaming workloads.

Learn more: AWS re:Invent: https://go.aws/reinvent. More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

About AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2024

AWS re:Invent 2024 - What’s new: Data streaming on AWS (ANT327)

Learn how AWS is reimagining data streaming with end-to-end managed and serverless capabilities across core infrastructure, systems operations, data integration, data processing, and data management for customers to modernize their data platforms. Learn about new and recent innovations for collecting, processing, and analyzing streaming data, including improved scalability, high resiliency, lower latency, and native integrations with many AWS and third-party services. Join this session to see how you can use AWS streaming solutions to build scalable, resilient data streaming applications for faster insights and improved decision-making.

Learn more: AWS re:Invent: https://go.aws/reinvent. More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

About AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2024

AWS re:Invent 2024 - Accelerate value from data: Migrating from batch to stream processing (ANT324)

Growing business needs for incorporating real-time insights into conventional use cases is pushing the data transformation envelope from batch processing to streaming. From gaming to clickstream to generative AI use cases, batch analytical workloads today want high throughput, low latency, and simplified ingestion mechanisms for real-time insights and visualizations. Join this session to hear from experts on how to successfully migrate from batch to stream processing using AWS streaming services that provide scalable integrations and real-time capabilities across services such as Amazon Redshift for real-time data warehousing analytics and ELT pipelines.

Learn more: AWS re:Invent: https://go.aws/reinvent. More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

About AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2024

AWS re:Invent 2024 - A practitioner’s guide to data for generative AI (DAT319)

In this session, gain the skills needed to deploy end-to-end generative AI applications using your most valuable data. While this session focuses on the Retrieval Augmented Generation (RAG) process, the concepts also apply to other methods of customizing generative AI applications. Discover best practice architectures using AWS database services like Amazon Aurora, Amazon OpenSearch Service, or Amazon MemoryDB along with data processing services like AWS Glue and streaming data services like Amazon Kinesis. Learn data lake, governance, and data quality concepts and how Amazon Bedrock Knowledge Bases, Amazon Bedrock Agents, and other features tie solution components together.

Learn more: AWS re:Invent: https://go.aws/reinvent. More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

About AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2024

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society. In this episode, we’re joined by a special guest: Alex Gallego, founder and CEO of Red Panda. Together, we dive deep into building data-intensive applications, the evolution of streaming technologies, and balancing high throughput and low latency demands.  Key topics covered: What is Red Panda and why it matters: Red Panda’s mission to redefine data streaming while being the fastest Kafka-compatible option on the market.Batch vs. streaming data: An accessible guide to understanding the classic debate and how the tech landscape is shifting towards unified data frameworks.Scaling at speed: The challenges and innovations driving Red Panda’s performance optimizations, from zero-copy architecture to storage engines.AI, ML, and streaming data integration: How Red Panda empowers real-time machine learning and AI-powered workloads with ease.Open source vs. enterprise models: Navigating licensing challenges and balancing business goals in the hybrid cloud era.Leadership and career shifts: Alex’s reflections on moving from technical lead to CEO, blending engineering know-how with company vision.