talk-data.com

Speaker

Chang She

Activities

talks

CEO / Co-founder LanceDB

Chang She is the CEO and cofounder of LanceDB, the developer-friendly, open-source database for multi-modal AI. A serial entrepreneur, Chang has been building DS/ML tooling for nearly two decades and is one of the original contributors to the pandas library. Prior to founding LanceDB, Chang was VP of Engineering at TubiTV, where he focused on personalized recommendations and ML experimentation.

Bio from: PyData Berlin 2025

Filter by Event / Source

PyData Berlin 2025 1 PyData Seattle 2025 1 DuckCon #3 San Francisco 2023 1 Databricks DATA + AI Summit 2023 1 The Analytics Engineering Podcast 1 Data + AI Summit 2025 1

Talks & appearances

6 activities · Newest first

Search activities →

Building a multimodal lakehouse for AI (w/ Chang She)

2025-11-23 · The Analytics Engineering Podcast Listen

podcast_episode

with Tristan Handy (dbt Labs) , Chang She (LanceDB)

AI/ML Analytics Analytics Engineering Data Lake Data Lakehouse dbt

In this episode, Tristan Handy sits down with Chang She — a co-creator of Pandas and now CEO of LanceDB — to explore the convergence of analytics and AI engineering. The team at LanceDB is rebuilding the data lake from the ground up with AI as a first principle, starting with a new AI-native file format called Lance. Tristan traces Chang's journey as one of the original contributors to the pandas library to building a new infrastructure layer for AI-native data. Learn why vector databases alone aren't enough, why agents require new architecture, and how LanceDB is building a AI lakehouse for the future. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

Keynote: Chang She - Never Send a Human to do an Agent's Search

2025-11-08 · PyData Seattle 2025 Watch

talk

Keynote by Chang She

AI-Ready Data in Action: Powering Smarter Agents

2025-09-01 · PyData Berlin 2025 Watch

talk

with Chang She (LanceDB) , Violetta Mishechkina

AI/ML Lance

This hands-on workshop focuses on what AI engineers do most often: making data AI-ready and turning it into production-useful applications. Together with dltHub and LanceDB, you’ll walk through an end-to-end workflow: collecting and preparing real-world data with best practices, managing it in LanceDB, and powering AI applications with search, filters, hybrid retrieval, and lightweight agents. By the end, you’ll know how to move from raw data to functional, production-ready AI setups without the usual friction. We will touch upon multi-modal data and going to production with this end-to-end use case.

LanceDB: A Complete Search and Analytical Store for Serving Production-scale AI Applications

2025-06-10 · Data + AI Summit 2025 Watch

talk

with Chang She (LanceDB) , Zero Qu (Databricks)

AI/ML Lance Vector DB

If you're building AI applications, chances are you're solving a retrieval problem somewhere along the way. This is why vector databases are popular today. But if we zoom out from just vector search, serving AI applications also requires handling KV workloads like a traditional feature store, as well as analytical workloads to explore and visualize data. This means that building an AI application often requires multiple data stores, which means multiple data copies, manual syncing, and extra infrastructure expenses. LanceDB is the first and only system that supports all of these workloads in one system. Powered by Lance columnar format, LanceDB completely breaks open the impossible triangle of performance, scalability, and cost for AI serving. Serving AI applications is different from previous waves of technology, and a new paradigm demands new tools.

Vector Data Lakes

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

with Tony Wang , Chang She (LanceDB)

AI/ML Analytics Cloud Computing Cloud Storage Data Lakehouse Databricks

Vector databases such as ElasticSearch and Pinecone offer fast ingestion and querying on vector embeddings with ANNs. However, they typically do not decouple compute and storage, making them hard to integrate in production data stacks. Because data storage in these databases is expensive and not easily accessible, data teams typically maintain ETL pipelines to offload historical embedding data to blob stores. When that data needs to be queried, they get loaded back into the vector database in another ETL process. This is reminiscent of loading data from OLTP database to cloud storage, then loading said data into an OLAP warehouse for offline analytics.

Recently, “lakehouse” offerings allow direct OLAP querying on cloud storage, removing the need for the second ETL step. The same could be done for embedding data. While embedding storage in blob stores cannot satisfy the high TPS requirements in online settings, we argue it’s sufficient for offline analytics use cases like slicing and dicing data based on embedding clusters. Instead of loading the embedding data back into the vector database for offline analytics, we propose direct processing on embeddings stored in Parquet files in Delta Lake. You will see that offline embedding workloads typically touch a large portion of the stored embeddings without the need for random access.

As a result, the workload is entirely bound by network throughput instead of latency, making it quite suitable for blob storage backends. On a test one billion vector dataset, ETL into cloud storage takes around one hour on a dedicated GPU instance, while batched nearest neighbor search can be done in under one minute with four CPU instances. We believe future “lakehouses” will ship with native support for these embedding workloads.

Talk by: Tony Wang and Chang She

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Bringing AI to DuckDB with Lance columnar format for multi-modal AI – DuckCon #3 (San Francisco)

· DuckCon #3 San Francisco 2023 Watch

video

AI/ML DuckDB Lance

Speaker: Chang She (LanceDB) Slides: https://blobs.duckdb.org/events/duckcon3/chang-she-lancedb-bringing-ai-to-duckdb-with-lance-columnar-format.pdf