talk-data.com talk-data.com

Event

Data Universe 2024

2024-04-10 – 2024-04-11 Big Data LDN/Paris

Activities tracked

5

Filtering by: Data Engineering ×

Sessions & talks

Showing 1–5 of 5 · Newest first

Search within this event →

Building Telemetry Curations and Effective Data Pipelines

2024-04-11
Face To Face

Have you ever wondered how a data company does data? In this session, Isaac Obezo, Staff Data Engineer at Starburst, will take you for a peek behind the curtain into Starburst’s own data architecture built to support batch processing of telemetry data within Galaxy data pipelines. Isaac will walk you through our architecture utilizing tools like git, dbt, and Starburst Galaxy to create a CI/CD process allowing our data engineering team to iterate quickly to deploy new models, develop and land data, and create and improve existing models in the data lake. Isaac will also discuss Starburst’s mentality toward data quality, the use of data products, and the process toward delivering quality analytics.

Data Engineering: The Secret Sauce for Supercharging ML Models

2024-04-11
Face To Face

Discover the hidden power of feature engineering in revolutionizing machine learning performance. This talk explores how crafting informative features transforms model outcomes, offering practical techniques and real-world examples. From understanding data intricacies to optimizing model efficacy, learn why feature engineering is the ultimate key to enhancing machine learning success. 

Bottom 10 Neglected Data Engineering Tasks

2024-04-10
Face To Face

Most IT organizations face a constant balance between delivering approved projects (the top-of-mind, important tasks that management wants to launch) and fixing urgent problems (the ones that break systems in unexpected ways.) But there's a third bucket of issues—the long-languishing, forgotten, often boring tasks that turn into technical debt.

Take a step back from the Top Ten lists and join Saks' Veronika Durgin as she digs through the Bottom Ten: Neglected data engineering tasks that will come back to haunt you. This ""forgotten bucket"" can always be deferred, but the more you wait, the more time you'll spend on unplanned activities. And there are a variety of lenses with which you can look at it to better understand its impact on the organization, including the hidden costs of built-versus-buy, the need for a single definition of ""done"", identifying unexpected business dependencies; finding real data to conduct meaningful tests; and the environmental impact of your data.

Data Mesh: How to Supercharge Cross-Company Collaboration & Operational Efficiency

2024-04-10
Face To Face

The data mesh framework, first introduced in 2021, provides a more dexterous and valuable approach to data management by increasing accessibility for teams, partners, and other stakeholders. In this session, Annalect’s Chief Technology Officer, Anna Nicanorova, and Director of Data Engineering, Santhosh Swaminathan, will share how their organization — the data and analytics division of Omnicom Group — was able to simplify the implementation of data mesh and unlock numerous benefits — namely, the ability to facilitate seamless collaboration and drive greater operational efficiency. 

Under the Hood: Data Engineering behind Industrial Grade GenAI

2024-04-10
Face To Face

GenAI can look deceptively easy when it comes to showing a cool demo, but can prove incredibly hard to productionalize. This session will cover the challenges behind industrializing GenAI applications in the enterprise, and the approaches engineers are taking to meet these challenges. Attendees will get to take a look under the hood to see how Data Engineering and Integration techniques can help us go from simple demos to production grade applications with consistently high quality results.  

We will explore how Retrieval Augmented Generation (RAG) workflows go from naive to advanced. Techniques discussed will cover a typical GenAI application flow with topics including multiple and hybrid models, refined data processing, data security, getting transparency in results, combining structured and unstructured data, and putting it all together to get high performance and cost effective outcomes. Attendees will leave the session with a framework to understand proposed solutions from their teams and ask the right questions to test if a solution can become industrial-grade.