talk-data.com talk-data.com

Topic

ELK

Elasticsearch/ELK Stack

search_engine log_analysis elk_stack

13

tagged

Activity Trend

10 peak/qtr
2020-Q1 2026-Q1

Activities

13 activities · Newest first

AWS re:Invent 2025 - SageMaker HyperPod: Checkpointless & elastic training for AI models (AIM3338)

Transform your generative AI model development with checkpointless and elastic training on Amazon SageMaker HyperPod. Learn how checkpointless training eliminates costly downtime by automatically recovering from infrastructure faults in minutes instead of hours, using peer-to-peer state transfer without relying on restarting from checkpoints. Discover how elastic training can dynamically expand to claim idle accelerators or gracefully contract when higher-priority tasks need capacity, all without manual intervention. See how these innovations help you maintain forward training momentum despite infrastructure faults or fluctuations in resource availability, helping you scale and accelerate generative AI model development across hundreds to thousands of AI accelerators.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Balance cost, performance & reliability for AI at enterprise scale (AIM3304)

Deploying generative AI at enterprise scale requires balancing performance, cost, and reliability across diverse business purposes and use cases. Amazon Bedrock offers a complete portfolio of inference options, with on-demand cross-region inference for elastic scaling, on-demand service tiers for balancing performance and cost, including optimization options like prompt caching for improving latency while significantly reducing cost, and batch inference for cost-effective bulk processing. This interactive session covers the tools and approaches needed to architect hybrid inference strategies that enable enterprises to maximize price-performance ratios as AI workloads scale.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2024 - Customer Keynote Autodesk

Design software pioneer Autodesk is transforming computer-aided design (CAD) by harnessing generative AI and Amazon Web Services (AWS). The company is developing advanced AI foundation models, like "Project Bernini," which can generate precise 2D and 3D geometric designs based on physical principles.

By utilizing AWS technologies such as Amazon DynamoDB, Elastic MapReduce (EMR), Amazon SageMaker, and Elastic Fabric Adapter, Autodesk has significantly enhanced its AI development process. These innovations have halved foundation model development time and increased AI productivity by 30%.

Learn more about AWS events: https://go.aws/events

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

reInvent2024 #AWSreInvent2024 #AWSEvents

Keyword search is dead! And so are Solr and Elasticsearch? by Daniel Wrigley

Big Data Europe Onsite and online on 22-25 November in 2022 Learn more about the conference: https://bit.ly/3BlUk9q

Join our next Big Data Europe conference on 22-25 November in 2022 where you will be able to learn from global experts giving technical talks and hand-on workshops in the fields of Big Data, High Load, Data Science, Machine Learning and AI. This time, the conference will be held in a hybrid setting allowing you to attend workshops and listen to expert talks on-site or online.

David Pilato: Enriching Postal Addresses With Elastic Stack

Discover the power of enriching postal addresses with the Elastic Stack in this live coding session led by David Pilato. 🌍🛠️ Learn how to transform poorly formatted addresses into valuable location data and vice versa using Elasticsearch, Logstash, and Kibana, with a special emphasis on Elasticsearch's ingest pipelines. Don't miss out on unlocking the potential to map customer locations and enhance your data systems!📍📈 #ElasticStack #AddressEnrichment

✨ H I G H L I G H T S ✨

🙌 A huge shoutout to all the incredible participants who made Big Data Conference Europe 2023 in Vilnius, Lithuania, from November 21-24, an absolute triumph! 🎉 Your attendance and active participation were instrumental in making this event so special. 🌍

Don't forget to check out the session recordings from the conference to relive the valuable insights and knowledge shared! 📽️

Once again, THANK YOU for playing a pivotal role in the success of Big Data Conference Europe 2023. 🚀 See you next year for another unforgettable conference! 📅 #BigDataConference #SeeYouNextYear

David Pilato: Search: A New Era

Embark on a journey into the future of search with David Pilato as he unveils 'Search: A New Era.' 🚀🔍 Explore the evolution from traditional TF/IDF to cutting-edge machine learning and models in search technology. Dive deep into topics like vector search, OpenAI's ChatGPT integration, and the latest advancements in search methodologies, including demonstrations on generating music embeddings and more! 🎶💡 #SearchTechnology #MachineLearning #elasticsearch

✨ H I G H L I G H T S ✨

🙌 A huge shoutout to all the incredible participants who made Big Data Conference Europe 2023 in Vilnius, Lithuania, from November 21-24, an absolute triumph! 🎉 Your attendance and active participation were instrumental in making this event so special. 🌍

Don't forget to check out the session recordings from the conference to relive the valuable insights and knowledge shared! 📽️

Once again, THANK YOU for playing a pivotal role in the success of Big Data Conference Europe 2023. 🚀 See you next year for another unforgettable conference! 📅 #BigDataConference #SeeYouNextYear

Databricks SQL: Why the Best Serverless Data Warehouse is a Lakehouse

Many organizations rely on complex cloud data architectures that create silos between applications, users and data. This fragmentation makes it difficult to access accurate, up-to-date information for analytics, often resulting in the use of outdated data. Enter the lakehouse, a modern data architecture that unifies data, AI, and analytics in a single location.

This session explores why the lakehouse is the best data warehouse, featuring success stories, use cases and best practices from industry experts. You'll discover how to unify and govern business-critical data at scale to build a curated data lake for data warehousing, SQL and BI. Additionally, you'll learn how Databricks SQL can help lower costs and get started in seconds with on-demand, elastic SQL serverless warehouses, and how to empower analytics engineers and analysts to quickly find and share new insights using their preferred BI and SQL tools such as Fivetran, dbt, Tableau, or Power BI.

Talk by: Miranda Luna and Cyrielle Simeone

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Vector Data Lakes

Vector databases such as ElasticSearch and Pinecone offer fast ingestion and querying on vector embeddings with ANNs. However, they typically do not decouple compute and storage, making them hard to integrate in production data stacks. Because data storage in these databases is expensive and not easily accessible, data teams typically maintain ETL pipelines to offload historical embedding data to blob stores. When that data needs to be queried, they get loaded back into the vector database in another ETL process. This is reminiscent of loading data from OLTP database to cloud storage, then loading said data into an OLAP warehouse for offline analytics.

Recently, “lakehouse” offerings allow direct OLAP querying on cloud storage, removing the need for the second ETL step. The same could be done for embedding data. While embedding storage in blob stores cannot satisfy the high TPS requirements in online settings, we argue it’s sufficient for offline analytics use cases like slicing and dicing data based on embedding clusters. Instead of loading the embedding data back into the vector database for offline analytics, we propose direct processing on embeddings stored in Parquet files in Delta Lake. You will see that offline embedding workloads typically touch a large portion of the stored embeddings without the need for random access.

As a result, the workload is entirely bound by network throughput instead of latency, making it quite suitable for blob storage backends. On a test one billion vector dataset, ETL into cloud storage takes around one hour on a dedicated GPU instance, while batched nearest neighbor search can be done in under one minute with four CPU instances. We believe future “lakehouses” will ship with native support for these embedding workloads.

Talk by: Tony Wang and Chang She

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Scaling Uber's Metric System from Elasticsearch to Pinot | Uber

ABOUT THE TALK: Uber has been using realtime system to support time-sensitive critical use cases for years, including Gairos, which was initiated in the Marketplace Org and then widely used across the company since 2014, and uMetric, which has emerged rapidly since 2020.

Continuous effort has been spent toward the reliability and performance of these realtime platforms, to cope with traffic growth, increasing number of users, different varieties of use cases, and following work such as operation cost, resource planning, and optimization feature development. This presentation shares the things done right to solve these challenges, including fully replace Elasticsearch with Apache Pinot as the realtime storage of our ecosystem.

ABOUT THE SPEAKERS: Yupeng Fu is a Principal Engineer at Uber and he leads the Real-time Data platform and Search platform at Uber. Yupeng Fu is also an Apache Pinot committer.

Nan Ding is a staff engineer at Uber, and leads data platform reliability and performance of Marketplace uMetric team.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Serverless Kafka and Apache Spark in a Multi-Cloud Data Lakehouse Architecture

Apache Kafka in conjunction with Apache Spark became the de facto standard for processing and analyzing data. Both frameworks are open, flexible, and scalable. Unfortunately, the latter makes operations a challenge for many teams. Ideally, teams can use serverless SaaS offerings to focus on business logic. However, hybrid and multi-cloud scenarios require a cloud-native platform that provides automated and elastic tooling to reduce the operations burden.

This post explores different architecture to build serverless Kafka and Spark multi-cloud architectures across regions and continents. We start from the analytics perspective of a data lake and explore its relation to a fully integrated data streaming layer with Kafka to build a modern data lakehouse. Real-world use cases show the joint value and explore the benefit of the "delta lake" integration.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Comprehensive Patient Data Self-Serve Environment and Executive Dashboards Leveraging Databricks

In this talk, we will outline our data pipelines and demo dashboards developed on top of the resulting elasticsearch index. This tool enables queries for terms or phrases in the raw documents to be executed together with any associated EMR patient data filters within 1-2 second for a data set containing millions of records/documents. Finally, the dashboards are simple to use and enable Real World Evidence data stakeholders to gain real-time statistical insight into the comprehensive patient information available.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Data Lake for State Health Exchange Analytics using Databricks

One of the largest State based health exchanges in the country was looking to modernize their data warehouse (DWH) environment to support the vision that every decision to design, implement and evaluate their state-based health exchange portal is informed by timely and rigorous evidence about its consumers’ experiences. The scope of the project was to replace existing Oracle-based DWH with an analytics platform that could support a much broader range of requirements with an ability to provide unified analytics capabilities including machine learning. The modernized analytics platform comprises a cloud native data lake and DWH solution using Databricks. The solution provides significantly higher performance and elastic scalability to better handle larger and varying data volumes with a much lower cost of ownership compared to the existing solution. In this session, we will walk through the rationale behind tool selection, solution architecture, project timeline and benefits expected.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Data Warehousing on the Lakehouse

Most organizations routinely operate their business with complex cloud data architectures that silo applications, users and data. As a result, there is no single source of truth of data for analytics, and most analysis is performed with stale data. To solve these challenges, the lakehouse has emerged as the new standard for data architecture, with the promise to unify data, AI and analytic workloads in one place. In this session, we will cover why the data lakehouse is the next best data warehouse. You will hear from the experts success stories, use cases, and best practices learned from the field and discover how the data lakehouse ingests, stores and governs business-critical data at scale to build a curated data lake for data warehousing, SQL and BI workloads. You will also learn how Databricks SQL can help you lower costs and get started in seconds with instant, elastic SQL serverless compute, and how to empower every analytics engineers and analysts to quickly find and share new insights using their favorite BI and SQL tools, like Fivetran, dbt, Tableau or PowerBI.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/