talk-data.com talk-data.com

Topic

Cloud Computing

infrastructure saas iaas

99

tagged

Activity Trend

471 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: Databricks DATA + AI Summit 2023 ×
Optimizing Speed and Scale of User-Facing Analytics Using Apache Kafka and Pinot

Apache Kafka is the de facto standard for real-time event streaming, but what do you do if you want to perform user-facing, ad-hoc, real-time analytics too? That's where Apache Pinot comes in.

Apache Pinot is a realtime distributed OLAP datastore, which is used to deliver scalable real time analytics with low latency. It can ingest data from batch data sources (S3, HDFS, Azure Data Lake, Google Cloud Storage) as well as streaming sources such as Kafka. Pinot is used extensively at LinkedIn and Uber to power many analytical applications such as Who Viewed My Profile, Ad Analytics, Talent Analytics, Uber Eats and many more serving 100k+ queries per second while ingesting 1Million+ events per second.

Apache Kafka's highly performant, distributed, fault-tolerant, real-time publish-subscribe messaging platform powers big data solutions at Airbnb, LinkedIn, MailChimp, Netflix, the New York Times, Oracle, PayPal, Pinterest, Spotify, Twitter, Uber, Wikimedia Foundation, and countless other businesses.

Come hear from Neha Power, Founding Engineer at a StarTree and PMC and committer of Apache Pinot, and Karin Wolok, Head of Developer Community at StarTree, on an introduction to both systems and a view of how they work together.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Orchestration Made Easy with Databricks Workflows

Orchestrating and managing end-to-end production pipelines have remained a bottleneck for many organizations. Data teams spend too much time stitching pipeline tasks and manually managing and monitoring the orchestration process – with heavy reliance on external or cloud-specific orchestration solutions, all of which slow down the delivery of new data. In this session, we introduce you to Databricks Workflows: a fully managed orchestration service for all your data, analytics, and AI, built in the Databricks Lakehouse Platform. Join us as we dive deep into the new workflow capabilities, and understand the integration with the underlying platform. You will learn how to create and run reliable production workflows, centrally manage and monitor workflows, and learn how to implement recovery actions such as repair and run, as well as other new features.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Apache Spark on Kubernetes—Lessons Learned from Launching Millions of Spark Executors

At Apple, data scientists and engineers are running enormous Spark workloads to deliver amazing cloud services. Apple Cloud Service supports the ever-increasing scale of Spark workloads and resource requirements with great user experience: from code to deployment management, one interface for all compute backends.

In this talk, Aaruna and Zhou would walk through the lessons we learnt and pitfalls encountered for supporting the service at Apple scale - we would share how Apple Cloud Services effectively orchestrate Spark applications, as well as the seamless switchover among different resource managers - be it in Mesos or Kubernetes, private or on-premise infrastructure. We will also cover the monitoring system and how it helps tuning Spark resource requirements with actual execution analysis.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Beyond Monitoring: The Rise of Data Observability

"Why did our dashboard break?" "What happened to my data?" "Why is this column missing?" If you've been on the receiving end of these messages (and many others!) from downstream stakeholders, you're not alone. Data engineering teams spend 40 percent or more of their time tackling data downtime, or periods of time when data is missing, erroneous, or otherwise inaccurate, and as data systems become increasingly complex and distributed, this number will only increase. To address this problem, data observability is becoming an increasingly important part of the cloud data stack, helping engineers and analysts reduce time to detection and resolution for data incidents caused by faulty data, code, and operational environments. But what does data observability actually look like in practice? During this presentation, Barr Moses, CEO and co-founder of Monte Carlo, will present on how some of today's best data leaders implement observability across their data lake ecosystem and share best practices for data teams seeking to achieve end-to-end visibility into their data at scale. Topics addressed will include: building automated lineage for Apache Spark, applying data reliability workflows, and extending beyond testing and monitoring to solve for unknown unknowns in your data pipelines.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Building Enterprise Scale Data and Analytics Platforms at Amgen

Amgen has developed a suite of enterprise data & analytics platforms powered by modern, cloud native and open source technologies, that have played a vital role in building game changing analytics capabilities within the organization. Our platforms include a mature Data Lake with extensive self service capabilities, a Data Fabric with semantically connected data, a Data Marketplace for advanced cataloging, an intelligent Enterprise search among others to solve for a range of high value business problems. In this talk, we - Amgen and our partner ZS Associates - will share learning from our journey so far, best practices for building enterprise scale data & analytics platforms, and describe several business use cases and how we leverage modern technologies such as Databricks to enable our business teams. We will cover use cases related to Delta Lake, microservices, platform monitoring, fine grained security, and more.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Pushing the limits of scale/performance for enterprise-wide analytics: A fire-side chat with Akamai

With the world’s most distributed compute platform — from cloud to edge — Akamai makes it easy for businesses to develop and run applications, while keeping experiences closer to users and threats farther away. ​So when it was time to scale it’s legacy Hadoop-like infrastructure reaching its capacity limits, while keeping their global operations running uninterrupted, Akamai partnered with Microsoft and Databricks to migrate to Azure Databricks.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Securing Databricks on AWS Using Private Link

Minimizing data transfers over the public internet is among the top priorities for organizations of any size, both for security and cost reasons. Modern cloud-native data analytics platforms need to support deployment architectures that meet this objective. For Databricks on AWS such an architecture is realized thanks to AWS PrivateLink, which allows computing resources deployed on different virtual private networks and different AWS accounts to communicate securely without ever crossing the public internet.

In this session, we want to provide a brief introduction to AWS Private Link and its main use cases in the context of a Databricks deployment: securing communications between control and data plane and securely connecting to the Databricks Web UI. We will then provide step-by-step walkthrough of the steps required in setting up PrivateLink connections with a Databricks deployment and demonstrate how to automate that process using AWS Cloud Formation or Terraform templates.

In this presentation we will cover the following topics: - Brief Introduction to AWS Private Link - How you can use PrivateLink to secure your AWS Databricks deployment - Step-by-step walkthrough of how to set up Private Link - How to automate and scale the setup using AWS CloudFormation or Terraform

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Serverless Kafka and Apache Spark in a Multi-Cloud Data Lakehouse Architecture

Apache Kafka in conjunction with Apache Spark became the de facto standard for processing and analyzing data. Both frameworks are open, flexible, and scalable. Unfortunately, the latter makes operations a challenge for many teams. Ideally, teams can use serverless SaaS offerings to focus on business logic. However, hybrid and multi-cloud scenarios require a cloud-native platform that provides automated and elastic tooling to reduce the operations burden.

This post explores different architecture to build serverless Kafka and Spark multi-cloud architectures across regions and continents. We start from the analytics perspective of a data lake and explore its relation to a fully integrated data streaming layer with Kafka to build a modern data lakehouse. Real-world use cases show the joint value and explore the benefit of the "delta lake" integration.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Customer-centric Innovation to Scale Data & AI Everywhere

Imagine a world where you have the flexibility to infuse intelligence into every application, from edge to cloud. In this session, you will learn how Intel is enabling customer-centric innovation and delivering the simplicity, productivity, and performance the developers need to scale their data and AI solutions everywhere. An overview of Intel end-to-end data analytics and AI technologies, developer tools as well as examples of customers use cases will be presented.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Tackling Challenges of Distributed Deep Learning with Open Source Solutions

Deep learning has had an enormous impact in a variety of domains, however, with model and data size growing at a rapid pace, scaling out deep learning training has become essential for practical use.

In this talk, you will learn about the challenges and various solutions for distributed deep learning.

We will first cover some of the common patterns used to scale out deep learning training.

We will then describe some of the challenges with distributed deep learning in practice: Infrastructure and hardware management Spending too much time managing clusters, resources, and the scheduling/placement of jobs or processes. Developer iteration speed. Too much overhead to go from small-scale local ML development to large-scale training Hard to run distributed training jobs in a notebook/interactive environment. Difficulty integrating with open source software. Scale out training while still being able to leverage open source tools such as MLflow, Pytorch Lightning, and Huggingface Managing large-scale training data. Efficiently ingest large amounts of training data to my distributed machine learning model. Cloud compute costs. Leverage cheaper spot instances, without having to restart training in case of node pre-emption. Easily switch between cloud providers to reduce costs without rewriting all my code

Then, we will share the merits of the ML open source ecosystem for distributed deep learning. In particular, we will introduce Ray Train, an open source library built on the Ray distributed execution framework, and show how it’s integrations with other open source libraries (PyTorch, Huggingface, MLflow, etc.) alleviate the pain points above.

We will conclude with a live demo showing large-scale distributed training using these open source tools.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Cloud Native Geospatial Analytics at JLL

Luis Sanz, CEO of CARTO and Yanqing Zeng, Lead Data Scientist at JLL, take us through how cloud native geospatial analytics can be unlocked on the Databricks Lakehouse platform with CARTO. Yanqing will showcase her work on large scale spatial analytics projects to address some of the most critical analysis use cases in Real Estate. Taking a geospatial perspective, Yanqing will share practical examples of how large-scale spatial data and analytics can be used for property portfolio mapping, AI-driven risk assessment, real estate valuation and more.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Data Lake for State Health Exchange Analytics using Databricks

One of the largest State based health exchanges in the country was looking to modernize their data warehouse (DWH) environment to support the vision that every decision to design, implement and evaluate their state-based health exchange portal is informed by timely and rigorous evidence about its consumers’ experiences. The scope of the project was to replace existing Oracle-based DWH with an analytics platform that could support a much broader range of requirements with an ability to provide unified analytics capabilities including machine learning. The modernized analytics platform comprises a cloud native data lake and DWH solution using Databricks. The solution provides significantly higher performance and elastic scalability to better handle larger and varying data volumes with a much lower cost of ownership compared to the existing solution. In this session, we will walk through the rationale behind tool selection, solution architecture, project timeline and benefits expected.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Road to a Robust Data Lake: Utilizing Delta Lake & Databricks to Map 150 Million Miles of Roads

In the past, stream processing over data lakes required a lot of development efforts from data engineering teams, as Itai has shown in his talk at Spark+AI Summit 2019 (https://tinyurl.com/2s3az5td). Today, with Delta Lake and Databricks Auto Loader, this becomes a few minutes' work! Not only that, it unlocks a new set of ways to efficiently leverage your data.

Nexar, a leading provider of dynamic mapping solutions, utilizes Delta Lake and advanced features such as Auto Loader to map 150 million miles of roads a month and provide meaningful insights to cities, mobility companies, driving apps, and insurers. Nexar’s growing dataset contains trillions of images that are used to build and maintain a digital twin of the world. Nexar uses state-of-the-art technologies to detect road furniture (like road signs and traffic lights), surface markings, and road works.

In this talk, we will describe how you can efficiently ingest, process, and maintain a robust Data Lake, whether you’re a mapping solutions provider, a media measurement company, or a social media network. Topics include: * Incremental & efficient streaming over cloud storage such as S3 * Storage optimizations using Delta Lake * Supporting mutable data use-cases with Delta Lake

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Simplifying Migrations to Lakehouse—the Databricks Way

Customers around the world are experiencing tremendous success migrating from legacy on-premises Hadoop architectures to a modern Databricks Lakehouse in the cloud. At Databricks, we have formulated a migration methodology that helps customers sail through this migration journey with ease. In this talk, we will touch upon some of the key elements that minimize risks and simplify the process of migrating to Databricks, and will walk through some of the customer journeys and use cases.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Supercharge your SaaS applications with a modern, cloud-native database

Today’s world demands modern applications that process data at faster speeds and deliver real-time insights. Yet the challenge for most businesses is their data infrastructure isn't designed for data intensity — the idea that high volumes of data should be quickly ingested and processed, no matter how complex or diverse the data sets. How do you meet the demands of a data-intensive application? It starts with the right database. This session gives you a roadmap with key criteria for powering modern, data-intensive applications with a cloud-native database — and how three customers drove up to 100x better performance for their applications.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Survey of Production ML Tech Stacks

Production machine learning demand stitching together many tools ranging from open source standards to cloud-specific and third party solutions. This session surveys the current ML deployment technology landscape to contextualize which tools solve for which features off production ML systems such as CI/CD, REST endpoint, and monitoring. It'll help answer the questions: what tools are out there? Where do I start with the MLops tech stack for my application? What are the pros and cons of open source versus managed solutions? This talk takes a features driven approach to tool selection for MLops tacks to provide best practices in the most rapidly evolving field of data science.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Take Databricks Lakehouse to the Max with Informatica​

The hard part of ML and analytics is not building data models. It’s getting the data right and into production. Join us to learn how Informatica’s Intelligent Data Management Cloud (IDMC) helps you maximize the benefits of the Databricks’ Unified Analytics platform. Learn how our cloud-native capabilities can shorten your time to results. See how to enable more data users to easily load data and develop data engineering workflows on Databricks in ELT mode at scale. Find out how Informatica delivers all the necessary governance and compliance guardrails you need to operate analytics, AI and ML. Accelerate adoption and maximize agility while maintaining control of your data and lowering risk.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

The Future is Open - a Look at Google Cloud’s Open Data Ecosystem

Join Anagha Khanolkar and Mansi Maharana, both Cloud Customer Engineers specialized in Advanced Analytics, to learn about Open Data Analytics on Google Cloud. This session will cover Google Data Cloud's Open Data Analytics portfolio, value proposition, customer stories, trends, and more, and including Databricks on GCP.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

The Future of Data - What’s Next with Google Cloud

Join Bruno Aziza, Head of Data and Analytics, Google Cloud, for an in-depth look at what he is seeing in the future of data and emerging trends. He will also cover Google Cloud’s data analytics practice, including insights into the Data Cloud Alliance, Big Lake, and our strategic partnership with Databricks.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Databricks SQL Under the Hood: What's New with Live Demos

With serverless SQL compute and built-in governance, Databricks SQL lets every analyst and analytics engineer easily ingest, transform, and query the freshest data directly on your data lake, using their tools of choice like Fivetran, dbt, PowerBI or Tableau, and standard SQL. There is no need to move data to another system. All this takes place at virtually any scale, at a fraction of the cost of traditional cloud data warehouses. Join this session for a deep dive into how Databricks SQL works under the hood, and see a live end-to-end demo of the data and analytics on Databricks from data ingestion, transformation, and consumption, using the modern data stack along with Databricks SQL.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/