talk-data.com talk-data.com

Event

Databricks DATA + AI Summit 2023

2026-01-11 YouTube Visit website ↗

Activities tracked

287

Filtering by: AI/ML ×

Sessions & talks

Showing 51–75 of 287 · Newest first

Search within this event →
Sponsored by: Avanade | Accelerating Adoption of Modern Analytics and Governance at Scale

Sponsored by: Avanade | Accelerating Adoption of Modern Analytics and Governance at Scale

2023-07-26 Watch
video

To unlock all the competitive advantage Databricks offers your organization, you might need to update your strategy and methodology for the platform. With over 1,000+ Databricks projects completed globally in the last 18 months, we are going to share our insights on the best building blocks to target as you search for efficiency and competitive advantage.

These building blocks supporting this include enterprise metadata and data management services, data management foundation, and data services and products that enable business units to fully use their data and analytics at scale.

In this session, Avanade data leaders will highlight how Databricks’ modern data stack fits Azure PaaS and SaaS (such as Microsoft Fabric) ecosystem, how Unity catalog metadata supports automated data operations scenarios, and how we are helping clients measure modern analytics and governance business impact and value.

Talk by: Alan Grogan and Timur Mehmedbasic

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: ThoughtSpot | Drive Self-Service Adoption Through the Roof with Embedded Analytics

Sponsored by: ThoughtSpot | Drive Self-Service Adoption Through the Roof with Embedded Analytics

2023-07-26 Watch
video

When it comes to building stickier apps and products to grow your business, there's no greater opportunity than embedded analytics. Data apps that deliver superior user engagement and business value do analytics differently. They take a user-first approach and know how to deliver real-time, AI-powered insights - not just to internal employees - but to an organization’s customers and partners, as well.

Learn how ThoughtSpot Everywhere is helping companies like Emerald natively integrate analytics with other tools in their modern data stack to deliver a blazing-fast and instantly available analytics experience across all the data their users love. Join this session to learn how you can leverage embedded analytics to: Drive higher app engagement Get your app to market faster And create new revenue streams

Talk by: Krishti Bikal and Vika Smilansky

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Why a Major Japanese Financial Institution Chose Databricks To Accelerate its Data AI-Driven Journey

Why a Major Japanese Financial Institution Chose Databricks To Accelerate its Data AI-Driven Journey

2023-07-26 Watch
video
Yuki Saito (NTT DATA)

In this session, NTT DATA presents a case study involving of one of the largest and most prominent financial institutions in Japan. The project involved migration from the largest data analysis platform to Databricks, a project that required careful navigation of very strict security requirements while accommodating the needs of evolving technical solutions so they could support a wide variety of company structures. This session is for those who want to accelerate their business by effectively utilizing AI as well as BI.

NTT DATA is one of the largest system integrators in Japan, providing data analytics infrastructure to leading companies to help them effectively drive the democratization of data and AI as many in the Japanese market are now adding AI into their BI offering.

Talk by: Yuki Saito

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Automating Sensitive Data (PII/PHI) Detection

Automating Sensitive Data (PII/PHI) Detection

2023-07-26 Watch
video

Healthcare datasets contain both personally identifiable information (PII) and personal health information (PHI) that needs to be de-identified in order to protect patient confidentiality and ensure HIPAA compliance. This privacy data is easily detected when it’s provided in columns labeled with names such as “SSN,” First Name,” “Full Name,” and “DOB;” however, it is much harder to detect when it is hidden within columns labeled “Doctor Notes,” “Diagnoses,” or “Comments.” HealthVerity, a leader in the HIPAA-compliant exchange of real-world data (RWD) to uncover patient, payer and genomic insights and power innovation for the healthcare industry, ensures healthcare datasets are de-identified from PII and PHI using elaborate privacy procedures.

During this session, we will demonstrate how to use a low-code/no-code platform to simplify and automate data pipelines that leverage prebuilt ML models to scan data for PHI/PII leakage and quarantine those rows in Unity Catalog when leakage is identified and move them to a Databricks clean room for analysis.

Talk by: Pouya Barrach-Yousefi and Simon King

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

D-Lite: Integrating a Lightweight ChatGPT-Like Model Based on Dolly into Organizational Workflows

D-Lite: Integrating a Lightweight ChatGPT-Like Model Based on Dolly into Organizational Workflows

2023-07-26 Watch
video

DLite is a new instruction-following model developed by AI Squared by fine-tuning the smallest GPT-2 model on the Alpaca dataset. Despite having only 124 million parameters, DLite exhibits impressive ChatGPT-like interactivity and can be fine-tuned on a single T4 GPU for less than $15.00. Due to its small relative size, DLite can be run locally on a wide variety of compute environments, including laptop CPUs, and can be used without sending data to any third-party API. This lightweight property of DLite makes it highly accessible for personal use, empowering users to integrate machine learning models and advanced analytics into their workflows quickly, securely, and cost-effectively.

Leveraging DLite within AI Squared's platform can empower organizations to orchestrate the integration of Dolly/DLite into business workflows, creating personalized versions of Dolly/DLite, chaining models or analytics to contextualize Dolly/Dlite responses/prompts, and curating new datasets leveraging real-time feedback.

Talk by: Jacob Renn and Ian Sotnek

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

IFC's MALENA Provides Analytics for ESG Reviews in Emerging Markets Using NLP and LLMs

IFC's MALENA Provides Analytics for ESG Reviews in Emerging Markets Using NLP and LLMs

2023-07-26 Watch
video

International Finance Corporation (IFC) is using data and AI to build machine learning solutions that create analytical capacity to support the review of ESG issues at scale. This includes natural language processing and requires entity recognition and other applications to support the work of IFC’s experts and other investors working in emerging markets. These algorithms are available via IFC’s Machine Learning ESG Analyst (MALENA) platform to enable rapid analysis, increase productivity, and build investor confidence. In this manner, IFC, a development finance institution with the mandate to address poverty in emerging markets, is making use of its historical datasets and open source AI solutions to build custom-AI applications that democratize access to ESG capacity to read and classify text.

In this session, you will learn the unique flexibility of the Apache Spark™ ecosystem from Databricks and how that has allowed IFC’s MALENA project to connect to scalable data lake storage, use different natural language processing models and seamlessly adopt MLOps.

Talk by: Atiyah Curmally and Blaise Sandwidi

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Increasing Data Trust: Enabling Data Governance on Databricks Using Unity Catalog & ML-Driven MDM

Increasing Data Trust: Enabling Data Governance on Databricks Using Unity Catalog & ML-Driven MDM

2023-07-26 Watch
video

As part of Comcast Effectv’s transformation into a completely digital advertising agency, it was key to develop an approach to manage and remediate data quality issues related to customer data so that the sales organization is using reliable data to enable data-driven decision making. Like many organizations, Effectv's customer lifecycle processes are spread across many systems utilizing various integrations between them. This results in key challenges like duplicate and redundant customer data that requires rationalization and remediation. Data is at the core of Effectv’s modernization journey with the intended result of winning more business, accelerating order fulfillment, reducing make-goods and identifying revenue.

In partnership with Slalom Consulting, Comcast Effectv built a traditional lakehouse on Databricks to ingest data from all of these systems but with a twist; they anchored every engineering decision in how it will enable their data governance program.

In this session, we will touch upon the data transformation journey at Effectv and dive deeper into the implementation of data governance leveraging Databricks solutions such as Delta Lake, Unity Catalog and DB SQL. Key focus areas include how we baked master data management into our pipelines by automating the matching and survivorship process, and bringing it all together for the data consumer via DBSQL to use our certified assets in bronze, silver and gold layers.

By making thoughtful decisions about structuring data in Unity Catalog and baking MDM into ETL pipelines, you can greatly increase the quality, reliability, and adoption of single-source-of-truth data so your business users can stop spending cycles on wrangling data and spend more time developing actionable insights for your business.

Talk by: Maggie Davis and Risha Ravindranath

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Databricks SQL: Why the Best Serverless Data Warehouse is a Lakehouse

Databricks SQL: Why the Best Serverless Data Warehouse is a Lakehouse

2023-07-26 Watch
video

Many organizations rely on complex cloud data architectures that create silos between applications, users and data. This fragmentation makes it difficult to access accurate, up-to-date information for analytics, often resulting in the use of outdated data. Enter the lakehouse, a modern data architecture that unifies data, AI, and analytics in a single location.

This session explores why the lakehouse is the best data warehouse, featuring success stories, use cases and best practices from industry experts. You'll discover how to unify and govern business-critical data at scale to build a curated data lake for data warehousing, SQL and BI. Additionally, you'll learn how Databricks SQL can help lower costs and get started in seconds with on-demand, elastic SQL serverless warehouses, and how to empower analytics engineers and analysts to quickly find and share new insights using their preferred BI and SQL tools such as Fivetran, dbt, Tableau, or Power BI.

Talk by: Miranda Luna and Cyrielle Simeone

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Essential Data Security Strategies for the Modern Enterprise Data Architecture

Essential Data Security Strategies for the Modern Enterprise Data Architecture

2023-07-26 Watch
video

Balancing critical data requirements is a 24-7 task for enterprise-level organizations that must straddle the need to open specific gates to enable self-service data access while closing other access points to maintain internal and external compliance. Data breaches can cost U.S. businesses an average of $9.4 million per occurrence; ignoring this leaves organizations vulnerable to severe losses and crippling costs.

The 2022 Gartner Hype Cycle for Data Security reports that more and more enterprises are modernizing their data architecture with cloud and technology partners to help them collect, store and manage business data; a trend that does not appear to be letting up. According to Gartner®, “by 2025, 30% of enterprises will have adopted the Broad Data Security Platform (bDSP), up from less than 10% in 2021, due to the pent-up demand for higher levels of data security and the rapid increase in product capabilities."

Moving to both a modern data architecture and data-driven culture sets enterprises on the right trajectory for growth, but it’s important to keep in mind individual public cloud platforms are not guaranteed to protect and secure data. To solve this, Privacera pioneered the industry’s first open-standards-based data security platform that integrates privacy and compliance across multiple cloud services.

During this presentation, we will discuss: - Why today’s modern data architecture needs a DSP that works across the entire data ecosystem; Essential DSP prescriptive measures and adoption strategies. - Why faster and more responsible access to data insights helps reduce cost, increases productivity, expedites decision making, and leads to exponential growth.

Talk by: Piet Loubser

Here’s more to explore: Data, Analytics, and AI Governance: https://dbricks.co/44gu3YU

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Generative AI at Scale Using GAN and Stable Diffusion

Generative AI at Scale Using GAN and Stable Diffusion

2023-07-26 Watch
video

Generative AI is under the spotlight and it has diverse applications but there are also many considerations when deploying a generative model at scale. This presentation will make a deep dive into multiple architectures and talk about optimization hacks for the sophisticated data pipelines that generative AI requires. The session will cover: - How to create and prepare a dataset for training at scale in single GPU and multi GPU environments. - How to optimize your data pipeline for training and inference in production considering the complex deep learning models that need to be run. - Tradeoff between higher quality outputs versus training time and resources and processing times.

Agenda: - Basic concepts in Generative AI: GAN networks and Stable Diffusion - Training and inference data pipelines - Industry applications and use cases

Talk by: Paula Martinez and Rodrigo Beceiro

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Rapidly Scaling Applied AI/ML with Foundational Models and Applying Them to Modern AI/ML Use Cases

Rapidly Scaling Applied AI/ML with Foundational Models and Applying Them to Modern AI/ML Use Cases

2023-07-26 Watch
video
Nick King (Snowplow)

Today many of us are familiar with foundational models such as LLM/ChatGPT. However, there are many more enterprise foundational models that can be rapidly deployed, trained and applied to enterprise use cases. This approach dramatically increases the performance of AI/ML models in production, but also gives AI teams rapid roadmaps for efficiency and delivering value to the business. Databricks provides the ideal toolset to enable this approach.

In this session, we will provide a logically overview of foundational models available today, demonstrate a real-world use case, and provide a business framework for data scientists and business leaders to collaborate to rapidly deploy these use cases.

Talk by: Nick King

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Sponsored: Accenture | Factory of the Future: Building Digital Twins Using Knowledge Graphs & Gen AI

Sponsored: Accenture | Factory of the Future: Building Digital Twins Using Knowledge Graphs & Gen AI

2023-07-26 Watch
video

Digital twins are the foundation for the Factory of the Future providing the data foundation to answer questions like what is happening and what can be done about it. It requires combining data across the business — from R&D, manufacturing, supply chain, and operations — and with partners, that then is used with AI to make decisions.

This session presents a case study of a digital twin implemented for warehouse controllers designed to alleviate internal decisions and recommendations for next trips, that replaces tribal knowledge and gut-decision making. We share how we use a domain knowledge graph to drive a data-driven approach that combines warehouse data, with simulations, AI models, and domain knowledge. Warehouse controllers use a dispatch control board that provides a list of orders by dispatch date and time, destination, carrier, assignments to the trailers and to the order and dock number. We show how this new semantic layer works with large language models to make it easier to answer questions on what trip to activate and trailer to choose; based on assets available, products in inventory, and what's coming out of manufacturing.

Talk by: Teresa Tung

Here’s more to explore: A New Approach to Data Sharing: https://dbricks.co/44eUnT1

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Anomalo | Data Archaeology: Quickly Understand Unfamiliar Datasets Using Machine Learning

Sponsored: Anomalo | Data Archaeology: Quickly Understand Unfamiliar Datasets Using Machine Learning

2023-07-26 Watch
video

One of the most daunting and time-consuming activities for data scientists and data analysts is understanding new and unfamiliar data sets. When given such a new data set, how do you understand its shape and structure? How can you quickly understand its important trends and characteristics? The typical answer is hours of manual querying and exploration, a process many call data archaeology.

This session will show a better way to explore new data sets by letting machine learning do the work for you. In particular, we will showcase how Anomalo simplifies the process of understanding and obtaining insights from Databricks tables — without manual querying. With a few clicks, you can generate comprehensive profiles and powerful visualizations that give immediate insight into your data's key characteristics and trends, as well as its shape and structure. With this approach, very little manual data archaeology is required, and you can quickly get to work on getting value out of the data (rather than just exploring it).

Talk by: Elliot Shmukler and Vicky Andonova

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksi

Testing Generative AI Models: What You Need to Know

Testing Generative AI Models: What You Need to Know

2023-07-26 Watch
video

Generative AI shows incredible promise for enterprise applications. The explosion of generative AI can be attributed to the convergence of several factors. Most significant is that the barrier to entry has dropped for AI application developers through customizable prompts (few-shot learning), enabling laypeople to generate high-quality content. The flexibility of models like ChatGPT and DALLE-2 have sparked curiosity and creativity about new applications that they can support. The number of tools will continue to grow in a manner similar to how AWS fueled app development. But excitement must be tampered by concerns about new risks imposed to business and society. Increased capability and adoption also increase risk exposure. As organizations explore creative boundaries of generative models, measures to reduce risk must be put in place. However, the enormous size of the input space and inherent complexity make this task more challenging than traditional ML models.

In this session, we summarize the new risks introduced by the new class of generative foundation models through several examples, and compare how these risks relate to the risks of mainstream discriminative models. Steps can be taken to reduce the operational risk, bias and fairness issues, and privacy and security of systems that leverage LLM for automation. We’ll explore model hallucinations, output evaluation, output bias, prompt injection, data leakage, stochasticity, and more. We’ll discuss some of the larger issues common to LLMs and show how to test for them. A comprehensive, test-based approach to generative AI development will help instill model integrity by proactively mitigating failure and the associated business risk.

Talk by: Yaron Singer

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Unleashing the Magic of Large Language Modeling with Dolly 2.0

Unleashing the Magic of Large Language Modeling with Dolly 2.0

2023-07-26 Watch
video

As the field of artificial intelligence continues to advance at an unprecedented pace, LLMs are becoming increasingly powerful and transformative. LLMs use deep learning techniques to analyze vast amounts of text data, and can generate language that is like human language. These models have been used for a wide range of applications, including language translation, chatbots, text summarization, and more.

Dolly 2.0 is the first open-source, instruction-following LLM that has been fine-tuned on a human-generated instruction dataset – with zero chance of copyright implications. This makes it an ideal tool for research and commercial use, and opens up new possibilities for businesses looking to streamline their operations and enhance their customer service offerings.

In this session, we will provide an overview of Dolly 2.0, discuss its features and capabilities, and showcase its potential through a demo of Dolly in action. Attendees will gain insights into the LLMs, and learn how to maximize the impact of this cutting-edge technology in their organizations. By the end of the session, attendees will have a deep understanding of the capabilities of Dolly 2.0, and will be equipped with the knowledge they need to integrate LLMs into their own operations in order to achieve greater efficiency, productivity, and customer satisfaction.

Talk by: Gavita Regunath

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Weaving the Data Mesh in the Department of Defense

Weaving the Data Mesh in the Department of Defense

2023-07-26 Watch
video

The Chief Digital and AI Office (CDAO) was created to lead the strategy and policy on data, analytics, and AI adoption across the Department of Defense. To enable that vision, the Department must achieve new ways to scale and standardize delivery under a global strategy while enabling decentralized workflows that capture the wealth of data and domain expertise.

CDAO’s strategy and goals are aligned with data mesh principles. This alignment starts with providing enterprise-level infrastructure and services to advance the adoption of data, analytics, and AI, creating the self-service data infrastructure as a platform. And it continues through implementing policy for federated computational governance centered around decentralizing data ownership to become domain-oriented but enforcing the quality and trustworthiness of data. CDAO seeks to expand and make enterprise data more accessible through providing data as a product and leveraging a federated data catalog to designate authoritative data and common data models. This results in domain-oriented, decentralized data ownership to empower the business domains across the Department to increase mission and business impact that result in significant cost savings, saving lives, and data serving as a “public good.”

Please join us in our session as we discuss how the CDAO leverages modern, innovative implementations that accelerate the delivery of data and AI throughout one of the largest distributed organizations in the world; the Department of Defense. We will walk through how this enables delivery in various Department of Defense use cases.

Talk by: Brad Corwin and Cody Ferguson

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How Mars Achieved a People Analytics Transformation with a Modern Data Stack

How Mars Achieved a People Analytics Transformation with a Modern Data Stack

2023-07-26 Watch
video

People Analytics at Mars was formed two years ago as part of an ambitious journey to transform our HR analytics capabilities. To transform, we needed to build foundational services to provide our associates with helpful insights through fast results and resolving complex problems. Critical in that foundation are data governance and data enablement which is the responsibility of the Mars People Data Office team whose focus is to deliver high quality and reliable data that is reusable for current and future People Analytics use cases. Come learn how this team used Databricks in helping Mars achieve its People Analytics Transformation.

Talk by: Rachel Belino and Sreeharsha Alagani

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Feeding the World One Plant at a Time

Feeding the World One Plant at a Time

2023-07-26 Watch
video
Naveed Farooqui , Fahad Khan (Volt Active Data)

Join this session to learn how the CVML and Data Platform team at BlueRiver Technology utilized Databricks to maximize savings on herbicide usage and revolutionize Precision Agriculture.

Blue River Technology is an agricultural technology company that uses computer vision and machine learning (CVML) to revolutionize the way crops are grown and harvested. BRT’s See & Spray technology, which uses CVML to identify and precisely determine whether the plant is a weed or a crop so it can deliver a small, targeted dose of herbicide directly to the plant, while leaving the crop unharmed. By using this approach, Blue River significantly reduces the amount of herbicides used in agriculture by over 70% and has a positive impact on the environment and human health.

The technical challenges we seek to overcome are:  - Processing massive petabytes of proprietary data at scale and in real time. Equipment in the field can generate up to 40TBs of data per hour per machine. - Aggregating, curating and visualizing at scale data can often be convoluted, error-prone and complex.  - Streamlining pipelines runs from weeks to hours to ensure continuous delivery of data.  - Abstracting and automating  the infra, deployment and data management from each program. - Building downstream data products based on descriptive analysis, predictive analysis or prescriptive analysis to drive the machine behavior.

The business questions we seek to answer for any machine are:  - Are we getting the spray savings we anticipated? - Are we reducing the use of herbicide at the scale we expected? - Are spraying nozzles performing at the expected rate? - Finding the relevant data to troubleshoot new edge conditions.  - Providing a simple interface for data exploration to both technical and non-technical personas to help improve our model. - Identifying repetitive and new faults in our machines. - Filtering out data based on certain incidents. - Identifying anomalies for e.g. sudden drop in spray saving, like frequency of broad spray suddenly is too high.

How we are addressing and plan to address these challenges: - Designating Databricks as our purposeful DB for all data - using the bronze, silver and gold layer standards. - Processing new machine logs using a Delta Live table as a source both in batch and incremental manner. - Democratize access for data scientists, product managers, data engineers who are not proficient with the robotic software stack via notebooks for quick development as well as real time dashboards.

Talk by: Fahad Khan and Naveed Farooqui

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Building AI-Powered Products with Foundation Models

Building AI-Powered Products with Foundation Models

2023-07-26 Watch
video

Foundation models make for fantastic demos, but in practice, they can be challenging to put into production. These models work well over datasets that match common training distributions (e.g., generating WEBTEXT or internet images), but may fail on domain-specific tasks or long-tail edge case; the settings that matter most to organizations building differentiated products. We propose a data-centric development approach that organizations can use to adapt foundation models to their own private/proprietary datasets.

We'll describe several techniques, including supervision "warmstarts" and interactive prompting (spoiler alert: no code needed). To make these techniques come to life, we'll walk through real case studies describing how we've seen data-centric development drive AI-powered products, from "AI assist" use cases (e.g., copywriting assistants) to "fully automated" solutions (e.g., loan processing engines).

Talk by: Vincent Chen

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Comparing Databricks and Snowflake for Machine Learning

Comparing Databricks and Snowflake for Machine Learning

2023-07-26 Watch
video
Michael Green , Don Scott (Microsoft)

Snowflake and Databricks both aim to provide data science toolkits for machine learning workflows, albeit with different approaches and resources. While developing ML models is technically possible using either platform, the Hitachi Solutions Empower team tested which solution will be easier, faster, and cheaper to work with in terms of both user experience and business outcomes for our customers. To do this, we designed and conducted a series of experiments with use cases from the TPCx-AI benchmark standard. We developed both single-node and multi-node versions of these experiments, which sometimes required us to set up separate compute infrastructure outside of the platform, in the case of Snowflake. We also built datasets of various sizes (1GB, 10GB, and 100GB), to assess how each platform/node setup handles scale.

Based on our findings, on the average, Databricks is faster, cheaper, and easier to use for developing machine learning models, and we use it exclusively for data science on the Empower platform. Snowflake’s reliance on third party resources for distributed training is a major drawback, and the need to use multiple compute environments to scale up training is complex and, in our view, an unnecessary complication to achieve best results.

Talk by: Michael Green and Don Scott

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Databricks SQL Serverless Under the Hood: How We Use ML to Get the Best Price/Performance

Databricks SQL Serverless Under the Hood: How We Use ML to Get the Best Price/Performance

2023-07-26 Watch
video
Gaurav Saraf (Databricks) , Mostafa Mokhtar (Databricks) , Jeremy Lewallen (Databricks)

Join this session to learn how Databricks SQL Serverless warehouses use ML to make large improvements in price-performance for both ETL and BI workloads. We will demonstrate how they can cater to an organization’s peak concurrency needs for BI and showcase the latest advancements in resource-based scheduling, autoscaling, and caching enhancements that allow for seamless performance and workload management. We will deep dive into new features such as Predictive I/O and Intelligent Workload Management, and show new price/performance benchmarks.

Talk by: Gaurav Saraf, Mostafa Mokhtar, and Jeremy Lewallen

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

De-Risking Language Models for Faster Adoption

De-Risking Language Models for Faster Adoption

2023-07-26 Watch
video

Language models are incredible engineering breakthroughs but require auditing and risk management before productization. These systems raise concerns about toxicity, transparency and reproducibility, intellectual property licensing and ownership, disinformation and misinformation, supply chains, and more. How can your organization leverage these new tools without taking on undue or unknown risks? While language models and associated risk management are in their infancy, a small number of best practices in governance and risk are starting to emerge. If you have a language model use case in mind, want to understand your risks, and do something about them, this presentation is for you! We'll be covering the following: 

  • Studying past incidents in the AI Incident Database and using this information to guide debugging.
  • Adhering to authoritative standards, like the NIST AI Risk Management Framework. 
  • Finding and fixing common data quality issues.
  • Applying general public tools and benchmarks as appropriate (e.g., BBQ, Winogender, TruthfulQA).
  • Binarizing specific tasks and debugging them using traditional model assessment and bias testing.
  • Engineering adversarial prompts with strategies like counterfactual reasoning, role-playing, and content exhaustion. 
  • Conducting random attacks: random sequences of attacks, prompts, or other tests that may evoke unexpected responses. 
  • Countering prompt injection attacks, auditing for backdoors and data poisoning, ensuring endpoints are protected with authentication and throttling, and analyzing third-party dependencies. 
  • Engaging stakeholders to help find problems system designers and developers cannot see. 
  • Everyone knows that generative AI is going to be huge. Don't let inadequate risk management ruin the party at your organization!

Talk by: Patrick Hall

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Explainable Data Drift for NLP

Explainable Data Drift for NLP

2023-07-26 Watch
video

Detecting data drift, although far from solved-for tabular data, has become a common approach to monitor ML models in production. For Natural Language Processing (NLP) on the other hand the question remains mostly open. In this session, we will present and compare two approaches. In the first approach, we will demonstrate how by extracting a wide range of explainable properties per document such as topics, language, sentiment, named entities, keywords and more we are able to explore potential sources of drift. We will show how these properties can be consistently tracked over time, how they can be used to detect meaningful data drift as soon as it occurs and how they can be used to explain and fix the root cause.

The second approach we will present is to detect drift by using the embeddings of common foundation models (such as GPT3 in the Open AI model family) and use them to identify areas in the embedding space in which significant drift has occurred. These areas in embedding space should then be characterized in a human-readable way to enable root cause analysis of the detected drift. We will compare the performance and explainability of these two methods and explore the pros and cons of each approach.

Talk by: Noam Bressler

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Making the Shift to Application-Driven Intelligence

Making the Shift to Application-Driven Intelligence

2023-07-26 Watch
video

In the digital economy, application-driven intelligence delivered against live, real-time data will become a core capability of successful enterprises. It has the potential to improve the experience that you provide to your customers and deepen their engagement. But to make application-driven intelligence a reality, you can no longer rely only on copying live application data out of operational systems into analytics stores. Rather, it takes the unique real-time application-serving layer of a MongoDB database combined with the scale and real-time capabilities of a Databricks Lakehouse to automate and operationalize complex and AI-enhanced applications at scale.

In this session, we will show how it can be seamless for developers and data scientists to automate decisioning and actions on fresh application data and we'll deliver a practical demonstration on how operational data can be integrated in real time to run complex machine learning pipelines.

Talk by: Mat Keep and Ashwin Gangadhar

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Qlik | Extracting the Full Potential of SAP Data for Global Automotive Manufacturing

Sponsored by: Qlik | Extracting the Full Potential of SAP Data for Global Automotive Manufacturing

2023-07-26 Watch
video
Matthew Hayes , Bala Amavasai (Celebal Technologies)

Every year, organizations lose millions of dollars due to equipment failure, unscheduled downtime, or unoptimized supply chains because business and operational data is not integrated. During this session you will hear from experts at Qlik and Databricks on how global luxury automotive manufacturers are accelerating the discovery and availability of complex data sets like SAP. Learn how Qlik, Microsoft, and Databricks together are delivering an integrated solution for global luxury automotive manufacturers that combines the automated data delivery capabilities of Qlik Data Integration with the agility and openness of the Databricks Lakehouse platform and AI on Azure Synpase.

We'll explore how to leverage the IT and OT data convergence to extract the full potential of business-critical SAP data, lower IT costs and deliver real-time prescriptive insights, at scale, for more resilient, predictable, and sustainable supply-chains. Learn how organizations can track and manage inventory levels, predict demand, optimize production and help their organizations identify opportunities for improvements.

Talk by: Matthew Hayes and Bala Amavasai

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksi