talk-data.com talk-data.com

Topic

Analytics

data_analysis insights metrics

4552

tagged

Activity Trend

398 peak/qtr
2020-Q1 2026-Q1

Activities

4552 activities · Newest first

Databricks Marketplace: Going Beyond Data and Applications

The demand for third-party data has never been greater, but existing marketplaces simply aren't cutting it. You deserve more than being locked into a walled garden of just data sets and simple applications. You deserve an open marketplace to exchange ML models, notebooks, datasets and more. The Databricks Marketplace is the ultimate solution for your data, AI and analytics needs, powered by open source Delta Sharing. Databricks is revolutionizing the data marketplace space.

Join us for a demo-filled session and learn how Databricks Marketplace is exactly what you need in today’s AI-driven innovation ecosystem. Hear from customers on how Databricks is empowering organizations to leverage shared knowledge and take their analytics and AI to new heights. Take advantage of this rare opportunity to ask questions of the Databricks product team that is building the Databricks Marketplace..

Talk by: Mengxi Chen and Darshana Sivakumar

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Lakehouses: The Best Start to Your Graph Data and Analytics Journey

Data architects and IT executives are continually looking for the best ways to integrate graph data and analytics into their organizations to improve business outcomes. This session outlines how the Data Lakehouse provides the perfect starting point for a successful journey. We will explore how the Data Lakehouse offer the unique combination of scalability, flexibility, and speed to quickly and effectively ingest, pre-process, curate, and analyze graph data to create powerful analytics. Additionally, we will discuss the benefits of using the Data Lakehouse over traditional graph databases and how it can help improve time to insight, time to production and overall satisfaction. At the end of this presentation, attendees will: - Understand the benefits of using a Data Lakehouse for graph data and analytics - Learn how to get started with a successful Lakehouse implementation (demo) - Discover the advantages of using a Data Lakehouse over graph databases - Learn specifically where graph databases integrate and perform better together

Key Takeaways: - Data lakehouses provide the perfect starting point for a successful graph data and analytics journey - Data lakehouses offer scalability, flexibility, and speed to quickly and effectively analyze graph data - The Data lakehouse is a cost-effective alternative to traditional graph database shortening your time to insight and de-risk your project

Talk by: Douglas Moore

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Planning and Executing a Snowflake Data Warehouse Migration to Databricks

Organizations are going through a critical phase of data infrastructure modernization, laying the foundation for the future, and adapting to support growing data and AI needs. Organizations that embraced cloud data warehouses (CDW) such as Snowflake have ended up trying to use a data warehousing tool for ETL pipelines and data science. This created unnecessary complexity and resulted in poor performance since data warehouses are optimized for SQL-based analytics only.

Realizing the limitation and pain with cloud data warehouses, organizations are turning to a lakehouse-first architecture. Though a cloud platform to cloud platform migration should be relatively easy, the breadth of the Databricks platform provides flexibility and hence requires careful planning and execution. In this session, we present the migration methodology, technical approaches, automation tools, product/feature mapping, a technical demo and best practices using real-world case studies for migrating data, ELT pipelines and warehouses from Snowflake to Databricks.

Talk by: Satish Garla and Ramachandran Venkat

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Simplifying Lakehouse Observability: Databricks Key Design Goals and Strategies

In this session, we'll explore Databricks vision for simplifying lakehouse observability, a critical component of any successful data, analytics, and machine learning initiatives. By directly integrating observability solutions within the lakehouse, Databricks aims to provide users with the tools and insights needed to run a successful business on top of lakehouse.

Our approach is designed to leverage existing expertise and simplify the process of monitoring and optimizing data and ML workflows, enabling teams to deliver sustainable and scalable data and AI applications. Join us to learn more about our key design goals and how Databricks is streamlining lakehouse observability to support the next generation of data-driven applications

Talk by: Michael Milirud

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Dataiku | Have Your Cake and Eat it Too with Dataiku + Databricks

In this session, we will highlight all parts of the analytics lifecycle using Dataiku + Databricks. Explore, blend, and prepare source data, train a machine learning model and score new data, and visualize and publish results — all using only Dataiku's visual interface. Plus, we will use LLMs for everything from simple data prep to sophisticated development pipelines. Attend and learn how you can truly have it all with Dataiku + Databricks.

Talk by: Amanda Milberg

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Writing Data-Sharing Apps Using Node.js and Delta Sharing

JavaScript remains the top programming language today with most code repositories written using JavaScript on GitHub. However, JavaScript is evolving beyond just a language for web application development into a language built for tomorrow. Everyday tasks like data wrangling, data analysis, and predictive analytics are possible today directly from a web browser. For example, many popular data analytics libraries, like Tensorflow.js, now support JavaScript SDKs.

Another popular library, Danfo.js, makes it possible to wrangle data using familiar pandas-like operations, shortening the learning curve and arming the typical data engineer or data scientist with another data tool in their toolbox. In this presentation, we’ll explore using the Node.js connector for Delta Sharing to build a data analytics app that summarizes a Twitter dataset.

Talk by: Will Girten

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Clean Room Primer: Using Databricks Clean Rooms to Use More & Better Data in your BI, ML, & Beyond

In this session, we will discuss the foundational changes in the ecosystem, the implications of data insights on marketing, analytics, and measurement, and how companies are coming together to collaborate through data clean rooms in new and exciting ways to power mutually beneficial value for their businesses while preserving privacy and governance.

We will delve into the concept and key features of clean room technology and how they can be used to access more and better data for business intelligence (BI), machine learning (ML), and other data-driven initiatives. By examining real-world use cases of clean rooms in action, attendees will gain a clear understanding of the benefits they can bring to industries like CPG, retail, and media and entertainment. In addition, we will unpack the advantages of using Databricks as a clean room platform, specifically showcasing how interoperable clean rooms can be leveraged to enhance BI, ML and other compute scenarios. By the end of this session, you will be equipped with the knowledge and inspiration to explore how clean rooms can unlock new collaboration opportunities that drive better outcomes for your business and improved experiences for consumers.

Talk by: Matthew Karasick, and Anil Puliyeril

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Lakehouse Architecture to Advance Security Analytics at the Department of State

In 2023, the Department of State surged forward on implementing a lakehouse architecture to get faster, smarter, and more effective on cybersecurity log monitoring and incident response. In addition to getting us ahead of federal mandates, this approach promises to enable advanced analytics and machine learning across our highly federated global IT environment while minimizing costs associated with data retention and aggregation.

This talk will include a high-level overview of the technical and policy challenge and a technical deeper dive on the tactical implementation choices made. We’ll share lessons learned related to governance and securing organizational support, connecting between multiple cloud environments, and standardizing data to make it useful for analytics. And finally, we’ll discuss how the lakehouse leverages Databricks in multicloud environments to promote decentralized ownership of data while enabling strong, centralized data governance practices.

Talk by: Timothy Ahrens and Edward Moe

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Leveraging Data Science for Game Growth and Matchmaking Optimization

For video games, Data Science solutions can be applied throughout players' lifecycle, from Adtech, LTV forecasting, In-game economic system monitoring to experimentation. Databricks is used as a data and computation foundation to power these data science solutions, enabling data scientists to easily develop and deploy these solutions for different use cases.

In this session, we will share insights on how Databricks-powered data science solutions drive game growth and improve player experiences using different advanced analytics, modeling, experimentation, and causal inference methods. We will introduce the business use cases, data science techniques, as well as Databricks demos.

Talk by: Zhenyu Zhao and Shuo Chen

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: AccuWeather | Weather Matters: How to Harness its Power to Improve Your Bottom Line

AccuWeather provides the world's most sophisticated weather intelligence to make lives and businesses simpler, safer, better, and more-informed. To achieve AccuWeather's core mission and obtain its technical and business ambitions, AccuWeather uses Databricks to support its next-generation forecasting engine and historical database, enabling their teams of scientists and engineers to develop accurate, scalable weather data solutions. Businesses can take advantage and truly harness the power of AccuWeather's data suite by easily integrating weather information into their own decision support systems and analytics to effectively improve their bottom line.

Talk by: Timothy Loftus and Eric Michielli

Here’s more to explore: A New Approach to Data Sharing: https://dbricks.co/44eUnT1

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Infosys | Topaz AI First Innovations

Insights into Infosys' Topaz AI First Innovations including AI-enabled Analytics and AI-enabled Automation to help clients in significant cost savings, improved efficiency and customer experience across industry segments.

Talk by: Neeraj Dixit

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: KPMG | Multicloud Enterprise Delta Sharing & Governance using Unity Catalog @ S&P Global

Cloud technologies have revolutionized global data access across a number of industries. However, many enterprise organizations face challenges in adopting these technologies effectively, as comprehensive cloud data governance strategies and solutions are complex and evolving – particularly in hybrid or multicloud scenarios involving multiple third parties. KPMG and S&P Global have harnessed the power of Databricks Lakehouse to create a novel approach.

By integrating Unity Catalogue, Delta Sharing, and the KPMG Modern Data Platform, S&P Global has enabled scalable, transformative cross-enterprise data sharing and governance. This demonstration highlights a collaboration between S&P Global Sustainable1 (S1) ESG program and the KPMG ESG Analytics Accelerators to enable large-scale SFDR ESG portfolio analytics. Join us to discover our solution that drives transformative change, fosters data-driven decision-making, and bolsters sustainability efforts in a wide range of industries.

Talk by: Niels Hanson,Dennis Tally

Here’s more to explore: A New Approach to Data Sharing: https://dbricks.co/44eUnT1

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Matillion | Using Matillion to Boost Productivity w/ Lakehouse and your Full Data Stack

In this presentation, Matillion’s Sarah Pollitt, Group Product Manager for ETL, will discuss how you can use Matillion to load data from popular data sources such as Salesforce, SAP, and over a hundred out-of-the-box connectors into your data lakehouse. You can quickly transform this data using powerful tools like Matillion or dbt, or your own custom notebooks, to derive valuable insights. She will also explore how you can run streaming pipelines to ensure real-time data processing, and how you can extract and manage this data using popular governance tools such as Alation or Collibra, ensuring compliance and data quality. Finally, Sarah will showcase how you can seamlessly integrate this data into your analytics tools of choice, such as Thoughtspot, PowerBI, or any other analytics tool that fits your organization's needs.

Talk by: Rick Wear

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsor: Sigma Computing | Using Sigma Input Tables Helps Databricks Data Science & ML Apps

In this talk, we will dive into the powerful analytics combination of Databricks and Sigma for data science and machine learning use cases. Databricks offers a scalable and flexible platform for building and deploying machine learning models at scale, while Sigma enhances this framework with real-time data insights, analysis, and visualization capabilities. Going a step further, we will demonstrate how input tables can be utilized from Sigma to create seamless workflows in Databricks for data science and machine learning. From this workflow, business users can leverage data science and ML models to do ad-hoc analysis and make data-driven decisions.

This talk is perfect for data scientists, data analysts, business users, and anyone interested in harnessing the power of Databricks and Sigma to drive business value. Join us and discover how these two platforms can revolutionize the way you analyze and leverage your data.

Talk by: Mitch Ertle and Greg Owen

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Using Open Source Tools to Build Privacy-Conscious Data Systems

With the rapid proliferation of consumer data privacy laws across the world, it is becoming a strict requirement for data organizations to be mindful of data privacy risks. Privacy violation fines are reaching record highs and will only get higher as governments continue to crack down on the runaway abuse of user data. To continue producing value without becoming a liability, data systems must include privacy protections at a foundational level.

The most practical way to do this is to enable privacy as code, shifting privacy left and including it as a foundational part of the organization's software development life cycle. The promise of privacy as code is that data organizations can be liberated from inefficient, manual workflows for producing the compliance deliverables their legal teams need, and instead ship at speed with pre-defined privacy guardrails built into the structure of their preferred workflows.

Despite being an emerging and complex problem, there are already powerful open source tools available designed to help organizations of all sizes achieve this outcome. Fides is an open source privacy as code tool, written in Python and Typescript, that is engineered to tackle a variety of privacy problems throughout the application lifecycle. The most relevant feature for data organizations is the ability to annotate systems and their datasets with data privacy metadata, thus enabling automatic rejection of dangerous or illegal uses. Fides empowers data organizations to be proactive, not reactive, in terms of protecting user privacy and reducing organizational risk. Moving forward data privacy will need to be top of mind for data teams.

Talk by: Thomas La Piana

Here’s more to explore: Data, Analytics, and AI Governance: https://dbricks.co/44gu3YU

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Vector Data Lakes

Vector databases such as ElasticSearch and Pinecone offer fast ingestion and querying on vector embeddings with ANNs. However, they typically do not decouple compute and storage, making them hard to integrate in production data stacks. Because data storage in these databases is expensive and not easily accessible, data teams typically maintain ETL pipelines to offload historical embedding data to blob stores. When that data needs to be queried, they get loaded back into the vector database in another ETL process. This is reminiscent of loading data from OLTP database to cloud storage, then loading said data into an OLAP warehouse for offline analytics.

Recently, “lakehouse” offerings allow direct OLAP querying on cloud storage, removing the need for the second ETL step. The same could be done for embedding data. While embedding storage in blob stores cannot satisfy the high TPS requirements in online settings, we argue it’s sufficient for offline analytics use cases like slicing and dicing data based on embedding clusters. Instead of loading the embedding data back into the vector database for offline analytics, we propose direct processing on embeddings stored in Parquet files in Delta Lake. You will see that offline embedding workloads typically touch a large portion of the stored embeddings without the need for random access.

As a result, the workload is entirely bound by network throughput instead of latency, making it quite suitable for blob storage backends. On a test one billion vector dataset, ETL into cloud storage takes around one hour on a dedicated GPU instance, while batched nearest neighbor search can be done in under one minute with four CPU instances. We believe future “lakehouses” will ship with native support for these embedding workloads.

Talk by: Tony Wang and Chang She

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How to Create and Manage a High-Performance Analytics Team

Data science and analytics teams are unique. Large and small corporations want to build and manage analytics teams to convert their data and analytic assets into revenue and competitive advantage, but many are failing before they make their first hire. In this session, the audience will learn how to structure, hire, manage and grow an analytics team. Organizational structure, project and program portfolios, neurodiversity, developing talent, and more will be discussed.

Questions and discussion will be encouraged and engaged in. The audience will leave with a deeper understanding of how to succeed in turning data and analytics into tangible results.

Talk by: John Thompson

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How We Made a Unified Talent Solution Using Databricks Machine Learning, Fine-Tuned LLM & Dolly 2.0

Using Databricks, we built a “Unified Talent Solution” backed by a robust data and AI engine for analyzing skills of a combined pool of permanent employees, contractors, part-time employees and vendors, inferring skill gaps, future trends and recommended priority areas to bridge talent gaps, which ultimately greatly improved operational efficiency, transparency, commercial model, and talent experience of our client. We leveraged a variety of ML algorithms such as boosting, neural networks and NLP transformers to provide better AI-driven insights.

One inevitable part of developing these models within a typical DS workflow is iteration. Databricks' end-to-end ML/DS workflow service, MLflow, helped streamline this process by organizing them into experiments that tracked the data used for training/testing, model artifacts, lineage and the corresponding results/metrics. For checking the health of our models using drift detection, bias and explainability techniques, MLflow's deploying, and monitoring services were leveraged extensively.

Our solution built on Databricks platform, simplified ML by defining a data-centric workflow that unified best practices from DevOps, DataOps, and ModelOps. Databricks Feature Store allowed us to productionize our models and features jointly. Insights were done with visually appealing charts and graphs using PowerBI, plotly, matplotlib, that answer business questions most relevant to clients. We built our own advanced custom analytics platform on top of delta lake as Delta’s ACID guarantees allows us to build a real-time reporting app that displays consistent and reliable data - React (for front-end), Structured Streaming for ingesting data from Delta table with live query analytics on real time data ML predictions based on analytics data.

Talk by: Nitu Nivedita

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Accenture | Databricks Enables Employee Data Domain to Align People w/ Business Outcomes

A global franchise retailer was struggling to understand the value of its employees and had not fostered a data-driven enterprise. During the journey to use facts as the basis for decision making, Databricks became the facilitator of DataMesh and created the pipelines, analytics and source engine for a three-layer — bronze, silver, gold — lakehouse that supports the HR domain and drives the integration of multiple additional domains: sales, customer satisfaction, product quality and more. In this talk, we will walk through:

  • The business rationale and drivers
  • The core data sources
  • The data products, analytics and pipelines
  • The adoption of Unity Catalog for data privacy compliance /adherence and data management
  • Data quality metrics

Join us to see the analytic product and the design behind this innovative view of employees and their business outcomes.

Talk by: Rebecca Bucnis

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Avanade | Enabling Real-Time Analytics with Structured Streaming and Delta Live Tables

Join the panel to hear how Avanade is helping clients enable real-time analytics and tackle the people and process problems that accompany technology, powered by Azure Databricks.

Talk by: Thomas Kim, Dael Williamson, Zoé Durand

Here’s more to explore: Data, Analytics, and AI Governance: https://dbricks.co/44gu3YU

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc