talk-data.com talk-data.com

Event

Databricks DATA + AI Summit 2023

2026-01-11 YouTube Visit website ↗

Activities tracked

287

Filtering by: AI/ML ×

Sessions & talks

Showing 251–275 of 287 · Newest first

Search within this event →
How Robinhood Built a Streaming Lakehouse to Bring Data Freshness from 24h to Less Than 15 Mins

How Robinhood Built a Streaming Lakehouse to Bring Data Freshness from 24h to Less Than 15 Mins

2022-07-19 Watch
video

Robinhood’s data lake is the bedrock foundation that powers business analytics, product experimentation, and other machine learning applications throughout our organization. Come join this session where we will share our journey of building a scalable streaming data lakehouse with Spark, Postgres and other leading open source technologies.

We will lay out our architecture in depth and describe how we perform CDC streaming ingestion and incremental processing of 1000’s of Postgres tables into our data lake.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Lessons Learned Running RL Recommendation at Scale in Physical Retail Setting at Starbucks

Lessons Learned Running RL Recommendation at Scale in Physical Retail Setting at Starbucks

2022-07-19 Watch
video

Change in QSR state from static boards to dynamic and contextualized recommendation. The brain behind the system connects the Starbucks brand and culture with state-of-the-art AI techniques. Review some of the tactics and lessons learnt by running an RL algorithm and deep item collaborative filtering in production over a year.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

MLflow Pipelines: Accelerating MLOps from Development to Production

MLflow Pipelines: Accelerating MLOps from Development to Production

2022-07-19 Watch
video

Despite being an emerging topic, MLOps is hard and there are no widely established approaches for MLOps. What makes it even harder is that in many companies the ownership of MLOps usually falls through the cracks between data science teams and production engineering teams. Data scientists are mostly focused on modeling the business problems and reasoning about data, features, and metrics, while the production engineers/ops are mostly focused on traditional DevOps for software development, ignoring ML-specific Ops like ML development cycles, experiment tracking, data/model validation, etc. In this talk, we will introduce MLflow Pipelines, an opinionated approach for MLOps. It provides predefined ML pipeline templates for common ML problems and opinionated development workflows to help data scientists bootstrap ML projects, accelerate model development, and ship production-grade code with little help from production engineers.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Setting up On Shelf Availability Alerts at Scale with Databricks and Azure

Setting up On Shelf Availability Alerts at Scale with Databricks and Azure

2022-07-19 Watch
video

Tredence' s OSA accelerator is a robust quick-start guide that is the foundation for a full Out of Stock or Supply Chain solution. The OSA solution focuses on driving sales through improved stock availability on the shelves. The following components make up the OSA accelerator.

• Identifying OOS Situation: ML models to identify the Out-Of-Stock scenario in a store at a SKU level taking in account the level of phantom inventory • Identifying Off-Sales Behavior: ML models to identify the off-sale behavior of a SKU in particular which is attributable to phantom inventory, stock less than presentation stock or improper operations within the store • Smart Alerts: Alert mechanism for the store manager and merchandizing reps in order to maintain healthy stock in the store and increase the revenue

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

AI and creativity, and building data products where there's no quantitative metric for success

AI and creativity, and building data products where there's no quantitative metric for success

2022-07-19 Watch
video

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Analytics Engineering and the Great Convergence   Tristan Handy   Keynote Data + AI Summit 2022

Analytics Engineering and the Great Convergence Tristan Handy Keynote Data + AI Summit 2022

2022-07-19 Watch
video

We've come a long way from the way data analysis used to be done. The emergence of the analytics engineering workflow, with dbt at its center, has helped usher in a new era of productivity. Not quite data engineering or data analysis, analytics engineering has enabled new levels of collaboration between two key sets of practitioners.

But that's not the only coming together happening right now. Enabled by the open lakehouse, the worlds of data analysis and AI/ML are also converging under a single roof, hinting at a new future of intertwined workloads and silo-free collaboration. It's a future that's tantalizing, and entirely within reach. Let's talk about making it happen.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

AWS Databricks Excitement 2022

AWS Databricks Excitement 2022

2022-07-19 Watch
video

Data + AI Summit 2022 was a great opportunity to check-in on the partnership between AWS and Databricks!

Data centric AI development  From Big Data to Good Data   Andrew Ng

Data centric AI development From Big Data to Good Data Andrew Ng

2022-07-19 Watch
video

Data-centric AI is a growing movement which shifts the engineering focus in AI systems from the model to the data. However, Data-centric AI faces many open challenges, including measuring data quality, data iteration and engineering data as part of the ML project workflow, data management tools, crowdsourcing, data augmentation & data synthesis as well as responsible AI. This talk names the key pillars of Data-centric AI, identifies the trends in Data-centric AI movement, and sets a vision for taking ideas applied intuitively by a handful of experts and synthesizing them into tools that make the application systematic for all.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Day 1 Afternoon Keynote |  Data + AI Summit 2022

Day 1 Afternoon Keynote | Data + AI Summit 2022

2022-07-19 Watch
video
Eric Sun (Coinbase) , Zaheera Valani (Databricks) , Arsalan Tavakoli (Databricks) , Zhamak Dehghani (Nextdata) , Francois Ajenstat , George Fraser (Fivetran)

Day 1 Afternoon Keynote | Data + AI Summit 2022 Supercharging our data architecture at Coinbase using Databricks Lakehouse | Eric Sun | Keynote Partner Connect & Ecosystem Strategy | Zaheera Valani What are ELT and CDC, and why are all the cool kids doing it? |George Fraser Analytics without Compromise | Francois Ajenstat Fireside Chat with Zhamak Dehghani and Arsalan Tavakoli

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Day 1 Morning Keynote | Data + AI Summit 2022

Day 1 Morning Keynote | Data + AI Summit 2022

2022-07-19 Watch
video
Kerby Johnson , Shant Hovespian , Dave Weinstein (Adobe) , Karthik Ramasamy (Databricks) , Ali Ghodsi (Databricks) , Reynold Xin (Databricks) , Matei Zaharia (Databricks) , Michael Armbrust (Databricks) , Tristan Handy

Day 1 Morning Keynote | Data + AI Summit 2022 Welcome & "Destination Lakehouse" | Ali Ghodsi Apache Spark Community Update | Reynold Xin Streaming Lakehouse | Karthik Ramasamy Delta Lake | Michael Armbrust How Adobe migrated to a unified and open data Lakehouse to deliver personalization at unprecedented scale | Dave Weinstein Data Governance and Sharing on Lakehouse |Matei Zaharia Analytics Engineering and the Great Convergence | Tristan Handy Data Warehousing | Shant Hovespian Unlocking the power of data, AI & analytics: Amgen’s journey to the Lakehouse | Kerby Johnson

Get insights on how to launch a successful lakehouse architecture in Rise of the Data Lakehouse by Bill Inmon, the father of the data warehouse. Download the ebook: https://dbricks.co/3ER9Y0K

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Financial Services Experience at Data + AI Summit 2022

Financial Services Experience at Data + AI Summit 2022

2022-07-19 Watch
video

The future of Financial Services is open with data and AI at its core. Welcome data teams and executives in Financial Services! This year’s Data + AI Summit is jam-packed with talks, demos and discussions on how Financial Services leaders are harnessing the power of data and analytics to digitally transform, minimize risk, accelerate time to market and drive sustainable value creation To help you take full advantage of the Financial Services industry experience at Summit, we’ve curated all the programs in one place.

Highlights at this year’s Summit:

Financial Services Industry Forum: Our flagship event for Financial Services attendees at Summit featuring keynotes and panel discussions with ADP, Northwestern Mutual, Point72 Asset Management, S&P Global and EY, followed by networking. More details in the agenda below. Financial Services Lounge: Stop by our lounge located outside the Expo floor to meet with Databricks’ industry experts and see solutions from our partners including Accenture, Avanade, Deloitte and others. Session Talks: Over 15 technical talks and demos on topics including hyper-personalization, AI-fueled forecasting, enterprise analytics in cloud, scaling privacy and cybersecurity, MLOps in cryptocurrency, ethical credit scoring and more.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Fireside Chat with Zhamak Dehghani and Arsalan Tavakoli | Keynote Data + AI Summit 2022

Fireside Chat with Zhamak Dehghani and Arsalan Tavakoli | Keynote Data + AI Summit 2022

2022-07-19 Watch
video
Arsalan Tavakoli (Databricks) , Zhamak Dehghani (Nextdata)

Join Zhamak Dehghani - creator of Data Mesh and Arsalan Tavakoli Co-founder and SVP Field Engineering of Databricks

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Health Care and Life Sciences Experience at Data + AI Summit 2022

Health Care and Life Sciences Experience at Data + AI Summit 2022

2022-07-19 Watch
video

Welcome data teams and executives in the Healthcare and Life Sciences industry! This year’s Data + AI Summit is jam-packed with talks, demos and discussions on the biggest innovations in patient care and drug R&D. To help you take full advantage of the Healthcare and Life Sciences experience at Summit, we’ve curated all the programs in one place.

Highlights at this year’s Summit:

Healthcare and Life Sciences Industry Forum: Our capstone event for Healthcare and Life Sciences attendees at Summit featuring keynotes and panel discussions with Walgreens, Takeda, Optum, and Humana followed by networking. More details in the agenda below. Healthcare and Life Sciences Lounge: Stop by our industry lounge located outside the Expo floor to meet with Databricks’ industry experts and see solutions from our partners including ZS Associates, John Snow Labs and others. Session Talks: Over 10 technical talks on topics including healthcare NLP, knowledge graphs for R&D, commercial analytics, and predicting hospital readmissions.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Manufacturing Experience at Data + AI Summit 2022

Manufacturing Experience at Data + AI Summit 2022

2022-07-19 Watch
video

Welcome data teams and executives in the Manufacturing industry! This year’s Data + AI Summit is jam-packed with talks, demos and discussions on the biggest innovations around improving manufacturing operations, building agile supply chains and enabling an AI-driven business. To help you take full advantage of the Manufacturing experience at Summit, we’ve curated all the programs in one place.

Highlights at this year’s Summit:

Manufacturing Industry Forum: Our capstone event for Manufacturing attendees at Summit featuring keynotes and panel discussions with John Deere, Honeywell and Collins Aerospace followed by networking. More details in the agenda below. Manufacturing Lounge:Stop by our lounge located outside the Expo floor to meet with Databricks’ industry experts and see solutions from The Global Solution Integrator and Tredence. Session Talks: Insightful talks on predicting and preventing machine downtime, real-time process optimization and leveraging informational and operational technology data to make enterprise decisions.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Media and Entertainment Experience at Data + AI Summit 2022

Media and Entertainment Experience at Data + AI Summit 2022

2022-07-19 Watch
video

Welcome data teams and executives in Media and Entertainment! This year’s Data + AI Summit is jam-packed with talks, demos and discussions focused on how organizations are using data to personalize, monetize and innovate the audience experience. To help you take full advantage of the Communications, Media & Entertainment experience at Summit, we’ve curated all the programs in one place.

Highlights at this year’s Summit:

Communications, Media & Entertainment Forum: Our capstone event for the industry at Summit featuring fireside chats and panel discussions with HBO, Warner Bros. Discovery, LaLiga, and Condé Nast followed by networking. More details in the agenda below. Industry Lounge: Stop by our lounge located outside the Expo floor to meet with Databricks’ industry experts and see solutions from our partners including Cognizant, Fivetran, Labelbox, and Lovelytics. Session Talks: Over 10 technical talks on topics including Telecommunication Data Lake Management at AT&T, Data-driven Futbol Analysis from LaLiga, Improving Recommendations with Graph Neural Networks from Condé Nast, Tools for Assisted Spark Version Migrations at Netflix, Real-Time Cost Reduction Monitoring and Alerting with HuuugeGames and much more.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

OpeningProduction Machine Learning | Patrick WendellMLflow 2.0 | Kasey Uhlenhuth

OpeningProduction Machine Learning | Patrick WendellMLflow 2.0 | Kasey Uhlenhuth

2022-07-19 Watch
video
Kasey Uhlenhuth (Databricks) , Patrick Wendell (Databricks)

Opening Production Machine Learning | Patrick Wendell MLflow 2.0 | Kasey Uhlenhuth | Keynotes Data + AI Summit 2022

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Partner Connect & Ecosystem Strategy

Partner Connect & Ecosystem Strategy

2022-07-19 Watch
video
Zaheera Valani (Databricks) , Francois Ajenstat , George Fraser (Fivetran)

Data + AI Summit Keynotes from: Partner Connect & Ecosystem Strategy (Zaheera Valani) What are ELT and CDC, and why are all the cool kids doing it? (George Fraser) Analytics without Compromise (Francois Ajenstat)

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Powering Geospatial Data Science with Graph Machine Learning

Powering Geospatial Data Science with Graph Machine Learning

2022-07-19 Watch
video

At Iggy we provide easy access to hundreds of geospatial features to help companies make sense of ‘place’. We believe that incorporating ‘place’ into data science and machine learning pipelines can have a huge impact on predictive capabilities in a wide range of fields such as travel, real estate, healthcare, logistics and many more.

Traditionally this data was accessible in a tabular form, but recently we have been experimenting with converting our tabular data into a graph representation, and applying graph machine learning to build derived products. This allows us to leverage the power of graphs and to more effectively model the relationships between different entities in our data.

In this talk we will present: 1. What is geospatial data? 2. Why are we interested in graph representations of geospatial data? 3. What do our graph representations look like? 4. How are we applying graph machine learning? 5. What are some use cases and derived products that we are building using graph machine learning?

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Privacy Preserving Machine Learning and Big Data Analytics Using Apache Spark

Privacy Preserving Machine Learning and Big Data Analytics Using Apache Spark

2022-07-19 Watch
video

In recent years, latest privacy laws & regulations bring a fundamental shift in the protection of data and privacy, placing new challenges to data applications. To resolve these privacy & security challenges in big data ecosystem without impacting existing applications, several hardware TEE (Trusted Execution Environment) solutions have been proposed for Apache Spark, e.g., PySpark with Scone and Opaque etc. However, to the best of our knowledge, none of them provide full protection to data pipelines in Spark applications. An adversary may still get sensitive information from unprotected components and stages. Furthermore, some of them greatly narrowed supported applications, e.g., only support SparkSQL. In this presentation, we will present a new PPMLA (privacy preserving machine learning and analytics) solution built on top of Apache Spark, BigDL, Occlum and Intel SGX. It ensures all spark components and pipelines are fully protected by Intel SGX, and existing Spark applications written in Scala, Java or Python can be migrated into our platform without any code change. We will demonstrate how to build distributed end-to-end SparkML/SparkSQL workloads with our solution on untrusted cloud environment and share real-world use cases for PPMLA.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Revolutionizing agriculture with AI: Delivering smart industrial solutions built upon a Lakehouse

Revolutionizing agriculture with AI: Delivering smart industrial solutions built upon a Lakehouse

2022-07-19 Watch
video

John Deere is leveraging big data and AI to deliver ‘smart’ industrial solutions that are revolutionizing agriculture and construction, driving sustainability and ultimately helping to feed the world. The John Deere Data Factory that is built upon the Databricks Lakehouse Platform is at the core of this innovation. It ingests petabytes of data and trillions of records to give data teams fast, reliable access to standardized data sets supporting 100s of ML and analytics use cases across the organization. From IoT sensor-enabled equipment driving proactive alerts that prevent failures, to precision agriculture that maximizes field output, to optimizing operations in the supply chain, finance and marketing, John Deere is providing advanced products, technology and services for customers who cultivate, harvest, transform, enrich, and build upon the land.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

ROAPI: Serve Not So Big Data Pipeline Outputs Online with Modern APIs

ROAPI: Serve Not So Big Data Pipeline Outputs Online with Modern APIs

2022-07-19 Watch
video

Data is the key component of Analytics, AI or ML platform. Organizations may not be successful without having a Platform that can Source, Transform, Quality check and present data in a reportable format that can drive actionable insights.

This session will focus on how Capital One HR Team built a Low Cost Data movement Ecosystem that can source data, transform at scale and build the data storage (Redshift) at a level that can be easily consumed by AI/ML programs - by using AWS Services with combination of Open source software(Spark) and Enterprise Edition Hydrograph (UI Based ETL tool with Spark as backend) This presentation is mainly to demonstrate the flexibility that Apache Spark provides for various types ETL Data Pipelines when we code in Spark.

We have been running 3 types of pipelines over 6+ years , over 400+ nightly batch jobs for $1000/mo. (1) Spark on EC2 (2) UI Based ETL tool with Spark backend (on the same EC2) (3) Spark on EMR. We have a CI/CD pipeline that supports easy integration and code deployment in all non-prod and prod regions ( even supports automated unit testing). We will also demonstrate how this ecosystem can failover to a different region in less than 15 minutes , making our application highly resilient.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Simplify Global DataOps and MLOps Using Okta’s FIG Automation Library

Simplify Global DataOps and MLOps Using Okta’s FIG Automation Library

2022-07-19 Watch
video

Think for a moment about an ML pipeline that you have created. Was it tedious to write? Did you have to familiarize yourself with technology outside your normal domain? Did you find many bugs? Did you give up with a “good enough” solution? Even simple ML pipelines are tedious. Complex ML pipelines make teams that include Data Engineers and ML Engineers still end up with delays and bugs. Okta’s FIG (Feature Infrastructure Generator) simplifies this with a configuration language for Data Scientists that produces scalable and correct ML pipelines, even highly complex ones. FIG is “just a library” in the sense that you can PIP install it. Once installed, FIG will configure your AWS account, creating ETL jobs, workflows, and ML training and scoring jobs. Data Scientists then use FIG’s configuration language to specify features and model integrations. With a single function call, FIG will run an ML pipeline to generate feature data, train models, and create scoring data. Feature generation is performed in a scalable, efficient, and temporally correct manner. Model training artifacts and scoring are automatically labeled and traced. This greatly simplifies the ML prototyping experience. Once it is time to productionize a model, FIG is able to use the same configuration to coordinate with Okta’s deployment infrastructure to configure production AWS accounts, register build and model artifacts, and setup monitoring. This talk will show a demo of using FIG in the development of Okta’s next generation security infrastructure. The demo includes a walkthrough of the configuration language and how that is translated into AWS during a prototyping session. The demo will also briefly cover how FIG interacts with Okta’s deployment system to make productionization seamless.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Supercharging our data architecture at Coinbase using Databricks Lakehouse   Eric Sun

Supercharging our data architecture at Coinbase using Databricks Lakehouse Eric Sun

2022-07-19 Watch
video
Eric Sun (Coinbase)

Coinbase is neither simply a finance company nor a tech company — it’s a crypto company. This distinction has big implications for how we work with the Blockchain, Product and Financial data that we need to drive our hypergrowth. We’ve recently enabled a Lakehouse architecture based upon Databricks to unify these complex and varied data sets, to deliver a high performance, continuous ingestion framework at an unprecedented scale. We can now support both ETL and ML workloads on one platform to deliver innovative batch and streaming use cases, and democratize data much faster by enabling teams to use the tools of their choice, while greatly reducing end-to-end latency and simplifying maintenance and operations. In this keynote, we will share our journey to the Lakehouse, and some of the lessons learned as we built an open data architecture at scale.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Time Series Forecasting with PyCaret

Time Series Forecasting with PyCaret

2022-07-19 Watch
video

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that speeds up the experiment cycle exponentially and makes you more productive.

This presentation will demo the time series forecasting use case using PyCaret's new low-code time series forecasting module.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Towards a Modular Future: Reimagining and Rebuilding Kedro-viz for Visualizing Modular Pipelines

Towards a Modular Future: Reimagining and Rebuilding Kedro-viz for Visualizing Modular Pipelines

2022-07-19 Watch
video

Kedro is an open-source framework for creating portable pipelines through modular data science code, and provides a powerful interactive visualisation tool called ‘Kedro-Viz’, a webapp that magically generates a highly powerful and informational visualisation of the pipeline.

In 2020, the Kedro project introduced an important set of features to support Modular Pipelines, which allows users to set up a series of pipelines that are logically isolated and re-usable to form higher level pipelines.

With this paradigm shift comes the need to reimagine the visualization of the pipeline on Kedro-viz, in that it needs to introduce a series of redesigns and new features to support this new representation of pipeline structure.

As a core contributor and team member to the Kedro-viz project throughout the past year, I have witnessed the journey of this transition through shipping the core features for modular pipelines on Kedro-viz.

This talk will focus on my experience as a front end developer as I walk through the unique architecture and data ingestion setup for this project. I will deep-dive into the unique set of problems and assumptions we have to make in accommodating this new modular pipeline setup, and our approach for solving them within a Front End(React + Redux) context.

Not to say I will definitely share the mistakes and learnings along the way, and how this paved the path towards the app architecture choices for our next set of features in ML experiment tracking.

This talk is for the curious data practitioner who is up for exposure to a fresh set of problems beyond the typical data science domain, and for those who are up for a ride through the mind-boggling details of the unique set up of front end development and data visualisation for data science.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/