talk-data.com talk-data.com

Topic

Lance

file_format vector_db embeddings open_table_format data_lake

7

tagged

Activity Trend

3 peak/qtr
2020-Q1 2026-Q1

Activities

7 activities · Newest first

Supercharging Multimodal Feature Engineering with Lance and Ray

Efficient feature engineering is key to unlocking modern multimodal AI workloads. In this talk, we’ll dive deep into how Lance - an open-source format with built-in indexing, random access, and data evolution - works seamlessly with Ray’s distributed compute and UDF capabilities. We’ll walk through practical pipelines for preprocessing, embedding computation, and hybrid feature serving, highlighting concrete patterns attendees can take home to supercharge their own multimodal pipelines. See https://lancedb.github.io/lance/integrations/ray to learn more about this integration.

AI-Ready Data in Action: Powering Smarter Agents

This hands-on workshop focuses on what AI engineers do most often: making data AI-ready and turning it into production-useful applications. Together with dltHub and LanceDB, you’ll walk through an end-to-end workflow: collecting and preparing real-world data with best practices, managing it in LanceDB, and powering AI applications with search, filters, hybrid retrieval, and lightweight agents. By the end, you’ll know how to move from raw data to functional, production-ready AI setups without the usual friction. We will touch upon multi-modal data and going to production with this end-to-end use case.

Bridging Big Data and AI: Empowering PySpark With Lance Format for Multi-Modal AI Data Pipelines

PySpark has long been a cornerstone of big data processing, excelling in data preparation, analytics and machine learning tasks within traditional data lakes. However, the rise of multimodal AI and vector search introduces challenges beyond its capabilities. Spark’s new Python data source API enables integration with emerging AI data lakes built on the multi-modal Lance format. Lance delivers unparalleled value with its zero-copy schema evolution capability and robust support for large record-size data (e.g., images, tensors, embeddings, etc), simplifying multimodal data storage. Its advanced indexing for semantic and full-text search, combined with rapid random access, enables high-performance AI data analytics to the level of SQL. By unifying PySpark's robust processing capabilities with Lance's AI-optimized storage, data engineers and scientists can efficiently manage and analyze the diverse data types required for cutting-edge AI applications within a familiar big data framework.

LanceDB: A Complete Search and Analytical Store for Serving Production-scale AI Applications

If you're building AI applications, chances are you're solving a retrieval problem somewhere along the way. This is why vector databases are popular today. But if we zoom out from just vector search, serving AI applications also requires handling KV workloads like a traditional feature store, as well as analytical workloads to explore and visualize data. This means that building an AI application often requires multiple data stores, which means multiple data copies, manual syncing, and extra infrastructure expenses. LanceDB is the first and only system that supports all of these workloads in one system. Powered by Lance columnar format, LanceDB completely breaks open the impossible triangle of performance, scalability, and cost for AI serving. Serving AI applications is different from previous waves of technology, and a new paradigm demands new tools.

Bring enhanced manageability to SQL Server anywhere with Azure Arc | OD45

Join this discussion to discover how connecting your SQL Servers to Azure can enhance your management, security, and governance capabilities with live demos. SQL Server enabled by Azure Arc is a hybrid cloud solution that allows you to manage, secure and govern your SQL Server estate running anywhere from Azure. Our experts will also explore different options for deploying Azure Arc to your SQL Servers at scale.

To learn more, please check out these resources: * https://aka.ms/Ignite23CollectionsOD45 * https://aka.ms/ArcSQL

𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀: * Dhananjay Mahajan * Lance Wright * Nikita Takru * Raj Pochiraju

𝗦𝗲𝘀𝘀𝗶𝗼𝗻 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻: This video is one of many sessions delivered for the Microsoft Ignite 2023 event. View sessions on-demand and learn more about Microsoft Ignite at https://ignite.microsoft.com

OD45 | English (US) | Data

MSIgnite

Data warehouse as a product: Design to delivery - Coalesce 2023

Every day, Trade Me gets 1.5 million new listings and 20 million listing views. With all that data comes the difficulty of managing a complex data ecosystem. This got the Trade Me team thinking: "Which problems are we trying to solve? How can we increase speed to customer value?" Using this framework, the team developed a new mission statement: "To build a data warehouse that analysts love to use." In this session, Trade Me shares exactly how they achieved that vision, with a focus on planning, data operating models, and database architecture.

Speaker: Lance Witheridge, Data Modernisation Lead, Trade Me

Register for Coalesce at https://coalesce.getdbt.com