talk-data.com talk-data.com

Event

Databricks DATA + AI Summit 2023

2026-01-11 YouTube Visit website ↗

Activities tracked

43

Filtering by: Data Science ×

Sessions & talks

Showing 26–43 of 43 · Newest first

Search within this event →
Data Boards: A Collaborative and Interactive Space for Data Science

Data Boards: A Collaborative and Interactive Space for Data Science

2022-07-19 Watch
video

Databricks enables many organizations to harness the power of data; but while Databricks enables collaboration across Data Scientists and Data Engineers, there is still opportunity to begin democratizing access to domain experts. Successfully achieving this requires a rethinking of the classic analytics user interfaces, towards interactive systems with highly collaborative visual interfaces. Current visualization and workflow tools are ill-suited to bringing the full team together. I will present Northstar, a novel system we developed for Interactive Data Exploration at MIT / Brown University, now commercialized by Einblick. I will explain why Northstar required us to completely rethink the analytics stack, from the interface to the “guts,” and highlight the techniques we developed to provide a truly novel user-interface which enables creating code optional analysis over Databricks, where all user personas can collaborate together very large datasets and use complex ML operations.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Survey of Production ML Tech Stacks

Survey of Production ML Tech Stacks

2022-07-19 Watch
video

Production machine learning demand stitching together many tools ranging from open source standards to cloud-specific and third party solutions. This session surveys the current ML deployment technology landscape to contextualize which tools solve for which features off production ML systems such as CI/CD, REST endpoint, and monitoring. It'll help answer the questions: what tools are out there? Where do I start with the MLops tech stack for my application? What are the pros and cons of open source versus managed solutions? This talk takes a features driven approach to tool selection for MLops tacks to provide best practices in the most rapidly evolving field of data science.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Turning Big Biology Data into Insights on Disease – The Power of Circulating Biomarkers

Turning Big Biology Data into Insights on Disease – The Power of Circulating Biomarkers

2022-07-19 Watch
video

Profiling small molecules in human blood across global populations gives rise to a greater understanding of the varied biological pathways and processes that contribute to human health and diseases. Herein, we describe the development of a comprehensive Human Biology Database, derived from nontargeted molecular profiling of over 300,000 human blood samples from individuals across diverse backgrounds, demographics, geographical locations, lifestyles, diseases, and medication regimens, and its applications to inform drug development.

Approximately 11,000 circulating molecules have been captured and measured per sample using Sapient’s high-throughput, high-specificity rapid liquid chromatography-mass spectrometry (rLC-MS) platform. The samples come from cohorts with adjudicated clinical outcomes from prospective studies lasting 10-25 years, as well as data on individuals’ diet, nutrition, physical exercise, and mental health. Genetic information for a subset of subjects is also included and we have added microbiome sequencing data from over 150,000 human samples in diverse diseases.

An efficient data science environment is established to enable effective health insight mining across this vast database. Built on a customized AWS and Databricks “infrastructure-as-code” Terraform configuration, we employ streamlined data ETL and machine learning-based approaches for rapid rLC-MS data extraction. In mining the database, we have been able to identify circulating molecules potentially causal to disease; illuminate the impact of human exposures like diet and environment on disease development, aging, and mortality over decades of time; and support drug development efforts through identification of biomarkers of target engagement, pharmacodynamics, safety, efficacy, and more.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Unifying Data Science and Business: AI Augmentation/Integration in Production Business Applications

Unifying Data Science and Business: AI Augmentation/Integration in Production Business Applications

2022-07-19 Watch
video

Why is it so hard to integrate Machine Learning into real business applications? In 2019 Gartner predicted that AI augmentation would solve this problem and would create will create $2.9 trillion of business value and 6.2 billion hours of worker productivity in 2021. A new realm of business science methods that encompass AI-powered analytics that allows people with domain expertise to make smarter decisions faster and with more confidence have also emerged as a solution to this problem. Dr. Harvey will demystify why integration challenges still account for $30.2 billion in annual global losses and discuss what it takes to integrate AI/ML code or algorithms into real business applications and the effort that goes into making each component, including data collection, preparation, training, and serving production-ready, enabling organizations to use the results of integrated models repeatedly with minimal user intervention. Finally, Dr. Harvey will discuss AISquared’s integration with Databricks and MLFlow to accelerate the integration of AI by unifying data science with business. By adding five lines of code to your model, users can now leverage AISquared’s model integration API framework which provides a quick and easy way to integrate models directly into live business applications.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Deep Dive into the New Features of Apache Spark 3.2 and 3.3

Deep Dive into the New Features of Apache Spark 3.2 and 3.3

2022-07-19 Watch
video

Apache Spark has become the most widely-used engine for executing data engineering, data science and machine learning on single-node machines or clusters. The number of monthly maven downloads of Spark has rapidly increased to 20 million.

We will talk about the higher-level features and improvements in Spark 3.2 and 3.3. The talk also dives deeper into the following features + Introducing pandas API on Apache Spark to unify small data API and big data API. + Completing the ANSI SQL compatibility mode to simplify migration of SQL workloads. + Productionizing adaptive query execution to speed up Spark SQL at runtime. + Introducing RocksDB state store to make state processing more scalable

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Delta Live Tables: Modern Software Engineering and Management for ETL

Delta Live Tables: Modern Software Engineering and Management for ETL

2022-07-19 Watch
video

Data engineers have the difficult task of cleansing complex, diverse data, and transforming it into a usable source to drive data analytics, data science, and machine learning. They need to know the data infrastructure platform in depth, build complex queries in various languages and stitch them together for production. Join this talk to learn how Delta Live Tables (DLT) simplifies the complexity of data transformation and ETL. DLT is the first ETL framework to use modern software engineering practices to deliver reliable and trusted data pipelines at any scale. Discover how analysts and data engineers can innovate rapidly with simple pipeline development and maintenance, how to remove operational complexity by automating administrative tasks and gaining visibility into pipeline operations, how built-in quality controls and monitoring ensure accurate BI, data science, and ML, and how simplified batch and streaming can be implemented with self-optimizing and auto-scaling data pipelines.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Evolution of Data Architectures and How to Build a Lakehouse

Evolution of Data Architectures and How to Build a Lakehouse

2022-07-19 Watch
video

Data architectures are the key and part of a larger picture to building robust analytical and AI applications. One must take a holistic view of the entire data analytics realm when it comes to planning for data science initiatives.

Through this talk, learn about the evolution of the data landscape and why Lakehouses are becoming a de facto for organizations building scalable data architectures. A lakehouse architecture combines data management capability including reliability, integrity, and quality from the data warehouse and supports all data workloads including BI and AI with the low cost and open approach of data lakes.

Data Practitioners will also learn some core concepts of building an efficient Lakehouse with Delta Lake.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How AARP Services, Inc. automated SAS transformation to Databricks using LeapLogic

How AARP Services, Inc. automated SAS transformation to Databricks using LeapLogic

2022-07-19 Watch
video

While SAS has been a standard in analytics and data science use cases, it is not cloud-native and does not scale well. Join us to learn how AARP automated the conversion of hundreds of complex data processing, model scoring, and campaign workloads to Databricks using LeapLogic, an intelligent code transformation accelerator that can transform any and all legacy ETL, analytics, data warehouse and Hadoop to modern data platforms.

In this session experts from AARP and Impetus will share about collaborating with Databricks and how they were able to: • Automate modernization of SAS marketing analytics based on coding best practices • Establish a rich library of Spark and Python equivalent functions on Databricks with the same capabilities as SAS procedures, DATA step operations, macros, and functions • Leverage Databricks-native services like Delta Live Tables to implement waterfall techniques for campaign execution and simplify pipeline monitoring

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Building a Data Science as a Service Platform in Azure with Databricks

Building a Data Science as a Service Platform in Azure with Databricks

2022-07-19 Watch
video

Machine learning in the enterprise is rarely delivered by a single team. In order to enable Machine Learning across an organisation you need to target a variety of different skills, processes, technologies, and maturity's. To do this is incredibly hard and requires a composite of different techniques to deliver a single platform which empowers all users to build and deploy machine learning models.

In this session we discuss how Databricks enabled a data science as a service platform for one of the UKs largest household insurers. We look at how this platform is empowering users of all abilities to build models, deploy models and realise and return on investment earlier.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Data Policy in the Past, Present, and Future

Data Policy in the Past, Present, and Future

2022-07-19 Watch
video
Jacob Pasner, PhD (Senator Ron Wyden's technology policy team (Office of Senator Ron Wyden))

Federal, State, and Local governments are doing their best to play catch-up as the modern world is revolutionized by the collection and utilization of vast data repositories. This leaves them in the awkward position of implementing their own data modernization projects while simultaneously cleaning up the regulatory mess created by the widely accepted "Go fast and break things" mentality of industry. This talk draws up on Jacob Pasner, PhD's data science background and his yearlong experience as a Science and Technology Policy fellow on Senator Ron Wyden's technology policy team. The discussion will center on his work navigating the complex inner workings of Federal executive and legislative branch bureaucracies while drafting first-of-its kind government data legislation, advocating for congressional modernization projects, and leading data privacy oversight of industry. Note: The opinions presented here are Jacob Pasner's alone and do not represent the views of Senator Ron Wyden or his office.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Hassle-Free Data Ingestion into the Lakehouse

Hassle-Free Data Ingestion into the Lakehouse

2022-07-19 Watch
video

Ingesting data from hundreds of different data sources is critical before organizations can execute advanced analytics, data science, and machine learning. Unfortunately, ingesting and unifying this data to create a reliable single source of truth is usually extremely time-consuming and costly. In this session, discover how Databricks simplifies data ingestion, at low latency, with SQL-only ingestion capabilities. We will discuss and demonstrate how you can easily and quickly ingest any data into the lakehouse. The session will also cover newly-released features and tools that make data ingestion even simpler on the Databricks Lakehouse Platform.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How AT&T Data Science Team Solved an Insurmountable Big Data Challenge on Databricks

How AT&T Data Science Team Solved an Insurmountable Big Data Challenge on Databricks

2022-07-19 Watch
video

Data driven personalization is an insurmountable challenge for AT&T’s data science team because of the size of datasets and complexity of data engineering. More often these data preparation tasks not only take several hours or days to complete but some of these tasks fail to complete affecting productivity.

In this session, the AT&T Data Science team will talk about how RAPIDS Accelerator for Apache Spark and Photon runtime on Databricks can be leveraged to process these extremely large datasets resulting in improved content recommendation, classification, etc while reducing infrastructure costs. The team will compare speedups and costs to the regular Databricks runtime Apache Spark environment. The size of tested datasets vary from 2TB - 50TB which consists of data collected from for 1 day to 31 days.

The talk will showcase the results from both RAPIDS accelerator for Apache Spark and Databricks Photon runtime.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

MLflow Pipelines: Accelerating MLOps from Development to Production

MLflow Pipelines: Accelerating MLOps from Development to Production

2022-07-19 Watch
video

Despite being an emerging topic, MLOps is hard and there are no widely established approaches for MLOps. What makes it even harder is that in many companies the ownership of MLOps usually falls through the cracks between data science teams and production engineering teams. Data scientists are mostly focused on modeling the business problems and reasoning about data, features, and metrics, while the production engineers/ops are mostly focused on traditional DevOps for software development, ignoring ML-specific Ops like ML development cycles, experiment tracking, data/model validation, etc. In this talk, we will introduce MLflow Pipelines, an opinionated approach for MLOps. It provides predefined ML pipeline templates for common ML problems and opinionated development workflows to help data scientists bootstrap ML projects, accelerate model development, and ship production-grade code with little help from production engineers.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Powering Geospatial Data Science with Graph Machine Learning

Powering Geospatial Data Science with Graph Machine Learning

2022-07-19 Watch
video

At Iggy we provide easy access to hundreds of geospatial features to help companies make sense of ‘place’. We believe that incorporating ‘place’ into data science and machine learning pipelines can have a huge impact on predictive capabilities in a wide range of fields such as travel, real estate, healthcare, logistics and many more.

Traditionally this data was accessible in a tabular form, but recently we have been experimenting with converting our tabular data into a graph representation, and applying graph machine learning to build derived products. This allows us to leverage the power of graphs and to more effectively model the relationships between different entities in our data.

In this talk we will present: 1. What is geospatial data? 2. Why are we interested in graph representations of geospatial data? 3. What do our graph representations look like? 4. How are we applying graph machine learning? 5. What are some use cases and derived products that we are building using graph machine learning?

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Towards a Modular Future: Reimagining and Rebuilding Kedro-viz for Visualizing Modular Pipelines

Towards a Modular Future: Reimagining and Rebuilding Kedro-viz for Visualizing Modular Pipelines

2022-07-19 Watch
video

Kedro is an open-source framework for creating portable pipelines through modular data science code, and provides a powerful interactive visualisation tool called ‘Kedro-Viz’, a webapp that magically generates a highly powerful and informational visualisation of the pipeline.

In 2020, the Kedro project introduced an important set of features to support Modular Pipelines, which allows users to set up a series of pipelines that are logically isolated and re-usable to form higher level pipelines.

With this paradigm shift comes the need to reimagine the visualization of the pipeline on Kedro-viz, in that it needs to introduce a series of redesigns and new features to support this new representation of pipeline structure.

As a core contributor and team member to the Kedro-viz project throughout the past year, I have witnessed the journey of this transition through shipping the core features for modular pipelines on Kedro-viz.

This talk will focus on my experience as a front end developer as I walk through the unique architecture and data ingestion setup for this project. I will deep-dive into the unique set of problems and assumptions we have to make in accommodating this new modular pipeline setup, and our approach for solving them within a Front End(React + Redux) context.

Not to say I will definitely share the mistakes and learnings along the way, and how this paved the path towards the app architecture choices for our next set of features in ML experiment tracking.

This talk is for the curious data practitioner who is up for exposure to a fresh set of problems beyond the typical data science domain, and for those who are up for a ride through the mind-boggling details of the unique set up of front end development and data visualisation for data science.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

What to Know about Data Science and Machine Learning in 2022   Peter Norvig

What to Know about Data Science and Machine Learning in 2022 Peter Norvig

2022-07-19 Watch
video
Peter Norvig (Google)

After writing an AI textbook in 2020 and a Data Science textbook in 2022, Peter Norvig reflects on how we teach and learn about these fields has changed in recent years. New data and new algorithms have become available, but more importantly, ethical and societal issues have become more prominent, and the question of what exactly we are trying to optimize is paramount.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Day 2 Morning Keynote |  Data + AI Summit 2022

Day 2 Morning Keynote | Data + AI Summit 2022

2022-07-19 Watch
video
Ganesh Jayaram , Manish Amde (Intuit) , Kasey Uhlenhuth (Databricks) , Peter Norvig (Google) , Andrew Ng , Hilary Mason (Hidden Door) , Michael Armbrust (Databricks) , Stacy Kerkela (Databricks) , Patrick Wendell (Databricks) , Alon Amit (Intuit)

Day 2 Morning Keynote | Data + AI Summit 2022 Production Machine Learning | Patrick Wendell MLflow 2.0 | Kasey Uhlenhuth Revolutionizing agriculture with AI: Delivering smart industrial solutions built upon a Lakehouse architecture | Ganesh Jayaram Intuit’s Data Journey to the Lakehouse: Developing Smart, Personalized Financial Products for 100M+ Consumers & Small Businesses | Alon Amit and Manish Amde Workflows | Stacy Kerkela Delta Live Tables | Michael Armbrust AI and creativity, and building data products where there's no quantitative metric for success, such as in games, or web-scale search, or content discovery | Hilary Mason What to Know about Data Science and Machine Learning in 2022 | Peter Norvig Data-centric AI development: From Big Data to Good Data | Andrew Ng

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Unlocking the power of data, AI & analytics: Amgen’s journey to the Lakehouse | Kerby Johnson

Unlocking the power of data, AI & analytics: Amgen’s journey to the Lakehouse | Kerby Johnson

2022-07-19 Watch
video

In this keynote, you will learn more about Amgen's data platform journey from data warehouse to data lakehouse. They’’ll discuss our decision process and the challenges they faced with legacy architectures, and how they designed and implemented a sustaining platform strategy with Databricks Lakehouse, accelerating their ability to democratize data to thousands of users.
Today, Amgen has implemented 400+ data science and analytics projects covering use cases like clinical trial optimization, supply chain management and commercial sales reporting, with more to come as they complete their digital transformation and unlock the power of data across the company.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/