talk-data.com talk-data.com

Topic

Databricks

big_data analytics spark

561

tagged

Activity Trend

515 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: Databricks DATA + AI Summit 2023 ×
Building Recommendation Systems Using Graph Neural Networks

RECKON (RECommendation systems using KnOwledge Networks) is a machine learning project centred around improving the entities intelligence.

We represent the dataset of our site interactions as a heterogeneous graph. The nodes represent various entities in the underlying data (Users, Articles, Authors, etc.). Edges between nodes represent interactions between these entities (User u has read article v, Article u was written by author v, etc.)

RECKON uses a GNN based encoder-decoder architecture to learn representations for important entities in our data by leveraging both their individual features and the interactions between them through repeated graph convolutions.

Personalized Recommendations play an important role in improving our user's experience and retaining them. We would like to take this opportunity to walk through some of the techniques that we have incorporated in RECKON and an end-end building of this product on databricks along with the demo.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Cloud Fetch: High-bandwidth Connectivity With BI Tools

Business Intelligence (BI) tools such as Tableau and Microsoft Power BI are notoriously slow at extracting large query results from traditional data warehouses because they typically fetch the data in a single thread through a SQL endpoint that becomes a data transfer bottleneck. Data analysts can connect their BI tools to Databricks SQL endpoints to query data in tables through an ODBC/JDBC protocol integrated in our Simba drivers. With Cloud Fetch, which we released in Databricks Runtime 8.3 and Simba ODBC 2.6.17 driver, we introduce a new mechanism for fetching data in parallel via cloud storage such as AWS S3 and Azure Data Lake Storage to bring the data faster to BI tools. In our experiments using Cloud Fetch, we observed a 10x speed-up in extract performance due to parallelism.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Computational Data Governance at Scale

This talk is about the implementation of a Data Mesh in a Fozzy Group. In our experience, the biggest bottleneck in transition to Data Mesh is unclear data ownership. This and other issues can be solved with (federated) computational data governance. We will go through the process of building a global data lineage with 200k tables, 40k table replications, and 70k SQL stored procedures. Also, we will cover our lessons from building data product culture with explicit and automated tracking of ownership and data quality. Fozzy Group is a holding company that comprises about 40 different businesses with 60k employees in various domains: retail, banking, insurance, logistics, agriculture, HoReCa, E-Commerce, etc.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Databricks and Enterprise Observability with Overwatch

Join us for a quick discussion and demonstration on how Overwatch can help you understand cost, utilization, workloads, and much more across the Databricks platform TODAY. You will understand what it is, how to get started, and we will explore several common observability questions Overwatch helps customers answer every day.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Databricks Meets Power BI

Databricks and Spark are becoming increasingly popular and are now used as a modern data platform to analyze real-time or batch data. In addition, Databricks offers a great integration for machine learning developers.

Power BI, on the other hand, is a great platform for easy graphical analysis of data, and it's a great way to bring hundreds of different data sources together, analyze them together and make them accessible on any device.

So let's just bring both worlds together and see how well Databricks works with Power BI.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Data On-Board: The Aerospace Revolution

From airplanes to satellites, through mission systems, the aerospace industry generates a huge amount of data waiting to be exploited. All this information is shaping new concepts and capabilities that will forever change the industry thanks to artificial intelligence: autonomous flight, fault prediction, automatic problem detection or energy efficiency among many others. To achieve this, we face countless challenges, such as the rigorous AI certification and trustworthiness, safety, data integrity and security, which will have to be faced in this exciting Airbus journey: welcome to the aerospace revolution!

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Data Policy in the Past, Present, and Future
video
by Jacob Pasner, PhD (Senator Ron Wyden's technology policy team (Office of Senator Ron Wyden))

Federal, State, and Local governments are doing their best to play catch-up as the modern world is revolutionized by the collection and utilization of vast data repositories. This leaves them in the awkward position of implementing their own data modernization projects while simultaneously cleaning up the regulatory mess created by the widely accepted "Go fast and break things" mentality of industry. This talk draws up on Jacob Pasner, PhD's data science background and his yearlong experience as a Science and Technology Policy fellow on Senator Ron Wyden's technology policy team. The discussion will center on his work navigating the complex inner workings of Federal executive and legislative branch bureaucracies while drafting first-of-its kind government data legislation, advocating for congressional modernization projects, and leading data privacy oversight of industry. Note: The opinions presented here are Jacob Pasner's alone and do not represent the views of Senator Ron Wyden or his office.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Deep-Dive into Delta Lake

Delta Lake is becoming a defacto-standard for storing big amounts data for analytical purposes in a data lake. But what is behind it? How does it work under the hood? In this session you we will dive deep into the internals of Delta Lake by unpacking the transaction log and also highlight some common pitfalls when working with Delta Lake (and show how to avoid them).

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Detecting Financial Crime Using an Azure Advanced Analytics Platform and MLOps Approach

As gatekeepers of the financial system, banks play a crucial role in reporting possible instances of financial crime. At the same time, criminals continuously reinvent their approaches to hide their activities among dense transaction data. In this talk, we describe the challenges of detecting money laundering and outline why employing machine learning via MLOps is critically important to identify complex and ever-changing patterns.

In anti-money-laundering, machine learning answers to a dire need for vigilance and efficiency where previous-generation systems fall short. We will demonstrate how our open platform facilitates a gradual migration towards a model-driven landscape, using the example of transaction-monitoring to showcase applications of supervised and unsupervised learning, human explainability, and model monitoring. This environment enables us to drive change from the ground up in how the bank understands its clients to detect financial crime.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Disrupting the Prescription Drug Market with AI and Data

The US prescription drug market has known issues; overpriced, unaffordable drugs; razor-thin margins for pharmacies; widespread inefficiencies, and endemic lack of transparency.

AI and Data are key technologies to fix these issues and benefit patients, pharmacies, and providers. AI-driven price optimization brings price transparency and removes inefficiencies, making prescription drugs more affordable. Drug recommendations and personalization empowers consumers and providers with better choices, knowledge, and control.

To support all these solutions, we have built our intelligent Pharma AI and Data Platform. Based on the DataBricks’ platform, we delivered our AI and Data platform into production in 2.5 months, deploying our innovative AI Optimized Pricing models, supporting tens of thousands of pharmacies, and connecting millions of consumers. Continuing in our journey, we are building AI prescription recommender, medication adherence improver, and healthcare personalization.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Elixir: The Wickedly Awesome Batch and Stream Processing Language You Should Have in Your Toolbox

Elixir is an Erlang-VM bytecode-compatible programming language that is growing in popularity.

In this session I will show how you can apply Elixir towards solving data engineering problems in novel ways.

Examples include: • How to leverage Erlang's lightweight distributed process coordination to run clusters of workers across docker containers and perform data ingestion. • A framework that hooks Elixir functions as steps into Airflow graphs. • How to consume and process Kafka events directly within Elixir microservices.

For each of the above I'll show real system examples and walk through the key elements step by step. No prior familiarity with Erlang or Elixir will be required.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

From PostGIS to Spark SQL: The History and Future of Spatial SQL

In this talk, we'll review the major milestones that have defined Spatial SQL as the powerful tool for geospatial analytics that it is today.

From the early foundations of the JTS Topology Suite and GEOS and its application on the PostGIS extension for PostgreSQL, to the latest implementation in Spark SQL using libraries such as the CARTO Analytics Toolbox for Databricks, Spatial SQL has been a key component of many geospatial analytics products and solutions, leveraging the computing power of different databases with SQL as lingua franca, allowing easy adoption by data scientists, analysts and engineers.

The latest innovation in this area is the CARTO Spatial Extension for Databricks, which makes the most of the near-unlimited scalability provided by Spark and the cutting-edge geospatial capabilities that CARTO offers.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

GIS Pipeline Acceleration with Apache Sedona

In CKDelta, we ingest and process a massive amount of geospatial data. Using Apache Sedona together with Databricks have accelerated our data pipelines many times.

In this talk, we'll talk about migrating the existing pipelines to Sedona + PySpark and the pitfalls we encountered along the way.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Git for Data Lakes—How lakeFS Scales Data Versioning to Billions of Objects

Modern data lake architectures rely on object storage as the single source of truth. We use them to store an increasing amount of data, which is increasingly complex and interconnected. While scalable, these object stores provide little safety guarantees: lacking semantics that allow atomicity, rollbacks, and reproducibility capabilities needed for data quality and resiliency.

lakeFS - an open source data version control system designed for Data Lakes solves these problems by introducing concepts borrowed from Git: branching, committing, merging and rolling back changes to data.

In this talk you'll learn about the challenges with using object storage for data lakes and how lakeFS enables you to solve them.

By the end of the session you’ll understand how lakeFS scales its Git-like data model to petabytes of data, across billions of objects - without affecting throughput or performance. We will also demo branching, writing data using Spark and merging it on a billion-object repository.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Graph-based stream processing

The understanding of complex relationships and interdependencies between different data points is crucial to many decision-making processes.

Graph analytics have found their way into every major industry, from marketing and financial services to transportation. Fraud detection, recommendation engines and process optimization are some of the use cases where real-time decisions are mission-critical, and the underlying domain can be easily modeled as a graph.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Hassle-Free Data Ingestion into the Lakehouse

Ingesting data from hundreds of different data sources is critical before organizations can execute advanced analytics, data science, and machine learning. Unfortunately, ingesting and unifying this data to create a reliable single source of truth is usually extremely time-consuming and costly. In this session, discover how Databricks simplifies data ingestion, at low latency, with SQL-only ingestion capabilities. We will discuss and demonstrate how you can easily and quickly ingest any data into the lakehouse. The session will also cover newly-released features and tools that make data ingestion even simpler on the Databricks Lakehouse Platform.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How AT&T Data Science Team Solved an Insurmountable Big Data Challenge on Databricks

Data driven personalization is an insurmountable challenge for AT&T’s data science team because of the size of datasets and complexity of data engineering. More often these data preparation tasks not only take several hours or days to complete but some of these tasks fail to complete affecting productivity.

In this session, the AT&T Data Science team will talk about how RAPIDS Accelerator for Apache Spark and Photon runtime on Databricks can be leveraged to process these extremely large datasets resulting in improved content recommendation, classification, etc while reducing infrastructure costs. The team will compare speedups and costs to the regular Databricks runtime Apache Spark environment. The size of tested datasets vary from 2TB - 50TB which consists of data collected from for 1 day to 31 days.

The talk will showcase the results from both RAPIDS accelerator for Apache Spark and Databricks Photon runtime.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How Databricks is driving disruptive digital transformation in the airline industry

In today’s business climate, leaders need to embrace continual change to stay ahead of potential impacts to their business. For Wizz Air, the fastest-growing airline in Europe, embracing digital transformation has been key to overcoming critical challenges: COVID-related travel restrictions, unpredictable fuel costs, and even war. By migrating and modernizing data with Databricks, Wizz Air is accelerating its transformation at massive scale, driving substantial operational efficiencies and better customer experiences. Join a conversation between leaders from Avanade and Wizz Air to learn how Databricks is helping take the airline to new heights.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How EPRI Uses Computer Vision to Mitigate Wildfire Risks for Electric Utilities

For this talk, Labelbox has invited the Electric Power and Research Institute (EPRI) to share information about how it is using computer vision, drone technology, and Labelbox’s training data platform to reduce wildfire risks innate to electricity delivery. This talk is a great starting point for any data teams tackling difficult computer vision projects. The Labelbox team will demonstrate how teams can produce their own annotated datasets like EPRI did, and import them into the Lakehouse for AI with the Labelbox Connector for Databricks.

Mechanical failures from overhead electrical infrastructure, in certain environments, are described in utility wildfire mitigation plans as potential ignition concerns. The utility industry is evaluating drones and new inspection technologies that may support more efficient and timely identification of such at risk assets. EPRI will present several of its AI initiatives and their impact on wildfire prevention and proper maintenance of power lines.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How McAfee Leverages Databricks on AWS at Scale

McAfee, a global leader in online protection security enables home users and businesses to stay ahead of fileless attacks, viruses, malware, and other online threats. Learn how McAfee leverages Databricks on AWS to create a centralized data platform as a single source of truth to power customer insights. We will also describe how McAfee uses additional AWS services specifically Kinesis and CloudWatch to provide real time data streaming and monitor and optimize their Databricks on AWS deployment. Finally, we’ll discuss business benefits and lessons learned during McAfee’s petabyte scale migration to Databricks on AWS using Databricks Delta clone technology coupled with network, compute, storage optimizations on AWS.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/