Jerry Liu is the CEO and co-founder of LlamaIndex. LlamaIndex is an open-source framework that helps people prep their data for use with large language models in a process called retrieval augmented generation. LLMs are great decision engines, but in order for them to be useful for organizations, they need additional knowledge and context, and Jerry discusses how companies are bringing their data to tailor LLMs for their needs. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.
talk-data.com
Topic
RAG
Retrieval Augmented Generation (RAG)
369
tagged
Activity Trend
Top Events
Create a custom chat-based solution to query and summarize your data within your VPC using Dolly 2.0 and Amazon SageMaker. In this talk, you will learn about Dolly 2.0, Databricks, state-of-the-art, open source, LLM, available for commercial and Amazon SageMaker, AWS’s premiere toolkit for ML builders. You will learn how to deploy and customize models to reference your data using retrieval augmented generation (RAG) and additional fine tuning techniques…all using open-source components available today.
Talk by: Venkat Viswanathan and Karl Albertsen
Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz
Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc
In this demo, we will show you the fastest and functional answer engine and search copilot that exists right now: Perplexity.ai. It can solve a wide array of problems starting from giving you fast answers to any topic to planning trips and doing market research on things unfamiliar to you, all in a trustworthy way without hallucinations, providing you references in the form of citations. This is made possible by harnessing the power of LLMs along with retrieval augmented generation from traditional search engines and indexes.
We will also show you how information discovery can now be fully personalized to you: personalization through prompt engineering. Finally, we will see use cases of how this search copilot can help you in your day to day tasks in a data team: be it a data engineer, data scientist, or a data analyst.
Talk by: Aravind Srinivas
Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz
Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc
Highlights Streaming might favor frontline singles, but some tracks buck the trend. Looking at Spotify, Apple, Amazon, and Deezer’s Top 100 charts, we examine what tracks and artists are able to ride the wave of longevity.Mission Good morning, it’s Rutger here at Chartmetric with your 3-minute Data Dump where we upload charts, artists, and playlists into your brain so you can stay up on the latest in the music data world.We’re on the socials at “chartmetric” — that’s Chartmetric, no “S.” Follow us on LinkedIn, Instagram, Twitter, or Facebook, and talk to us! We’d love to hear from you.DateThis is your Data Dump for Wednesday, Sept. 18th, 2019.Post Malone Leads Track Longevity on Streaming ChartsWhen it comes to streaming, we’re trained to think immediacy and expendability, because, let’s face it, those are the kinds of qualities that characterize today’s digital singles-driven industry.On the streaming charts, however, things aren’t that simple, and some tracks can ride out their Top 100 position for more than a year. Pulling up Spotify’s Daily Global Chart on our charts tab, for example, we can scroll down a little to see chart summaries according to many different variables, including “By Time on Chart.” Within Spotify’s Top 100, Post Malone’s “Rockstar” might only be sporting a No. 81 spot, but it’s been on the chart for 508 days — that’s almost a year and a half.If we extend the Daily Global Chart to include the next 100 tracks, “Closer,” by the Chainsmokers and Halsey, might be in a precarious position at No. 199, but the track has enjoyed some 1,103 days on Spotify’s Top 200.To be clear, that’s three years.Toggling Apple’s Top 100, at No. 58, Travis Scott’s “Sicko Mode” claims the top spot, in terms of time on chart, with 361 days, or just short of a year.Meanwhile, Amazon’s Top 100 features a four-way tie at 210 days. At No. 20, it’s “High Hopes,” by Panic! At The Disco.No. 41 is Bebe Rexha’s “Meant to Be (featuring Florida Georgia Line).”No. 56 is “Youngblood” by 5 Seconds of Summer.And No. 60 is “Better Now,” by, guess who? Post Malone.Interestingly, Deezer’s Top 100 has a six-way tie at 195 days.At No. 10, it’s “Con Calma” by Daddy Yankee and Snow, while No. 19 is “Calma” by Pedro Capó and Farruko — ¾ of whom are Puerto Rican who all like to keep it cool.No. 27 is once again Post Malone, but this time, with “Sunflower,” from the Spider-Man: Into the Spider-Verse soundtrack.No. 66 is “Te Vi” by Piso 21 and Micro Tdh, No. 68 is “Adan Y Eva” by Paulo Londra, and No. 70 is “Giant” by Calvin Harris and Rag'n'Bone Man.So, while Amazon and Deezer’s track longevities might be a bit more evenly spread, they’re also significantly lower than the longest lasting tracks on Apple’s and Spotify’s charts.Another takeaway here is that Posty has managed to keep tracks from two separate releases, Beerbongs & Bentleys and the Spider-Man soundtrack, relevant — and that’s irrespective of his new album, Hollywood’s Bleeding, dominating the top of those same charts.OutroThat’s it for your Daily Data Dump for Wednesday, Sept. 18th, 2019. This is Rutger from Chartmetric.Free accounts are available at chartmetric.com And article links and show notes are at: podcast.chartmetric.comHappy Wednesday, and we’ll see you on Friday!
HighlightsIt’s time to hit the road again, so we’re heading down south to trigger city São Paulo, Brazil. What makes it such an important global music marketplace?Mission Good morning, it’s Rutger here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Thursday, June 13th, 2019.Excursion Thursday: Trigger City São Paulo, BrazilWe’re hitting the road again, heading down south to trigger city São Paulo, Brazil, to see what makes it such an important global music marketplace. First, it’s important to note that São Paulo is also a state in Brazil — naturally, the state in which São Paulo, the city, is located. Obviously, this presents some major metadata problems, which are compounded by the fact that São Paulo (with a tilde) and “Sao Paulo” (without a tilde) are reported as different cities. Adjusting for metadata errors, the city, which is Brazil’s wealthiest and most populous, is ranked third in the world for non-unique monthly Spotify listeners, based on our calculations from a week in May.For that same week, São Paulo came in ninth for global YouTube views.They’re really living up to their city motto, “I am not led; I lead.”It’s not just local artists and the longstanding sertanejo style updated for younger people skyrocketing São Paulo with regional streams.Scanning our top artists charts, the city comes up on three of the Top 10 artists — namely, J Balvin, Justin Bieber, and Shawn Mendes — as somewhere people listen most.Of the Top 100 artists globally according to our Cross-Platform Performance metric, São Paulo is in the Top 5 listener cities for 26, or just a bit more than a quarter, of them.Zooming in a bit and looking at Top Artists by Spotify Monthly Listeners on São Paulo’s city page, Brazilian artists do tend to dominate, with the 10 most listened-to artists, except for Lady Gaga, calling Brazil home.On Top Artists by YouTube Views, the Top 10 are all Brazilian as well, but when it comes to Top Artists by Shazam Chart Occurrences, only two Brazilians make the Top 10, suggesting São Paulo locals are loyal to their countrymen and countrywomen on major streaming platforms, but Shazam is where they learn what’s happening in the Anglo music world.And they certainly have an ear for British and American hits like “Giant” by Calvin Harris and Rag ‘n’ Bone Man or “Happier” by Marshmello and Bastille.With a population comparable to New York City and Los Angeles combined, São Paulo tops each of those cities on the global stage, thanks to a musical ecosystem — not to mention tradition — as robust as the Amazon rainforest and an appetite for pop hits from their neighbors on the northern side of the Tropic of Cancer.OutroThat’s it for your Daily Data Dump for Thursday, June 13th, 2019. This is Rutger from Chartmetric.If you’re interested in learning more about trigger cities, check out Jason’s in-depth analysis on our blog at blog.chartmetric.io.Free accounts are at chartmetric.comAnd article links and show notes are at: podcast.chartmetric.com.Happy Thursday, and see you tomorrow!
HighlightsBeatport’s Top 100 chart keeps highlighting the latest music in the club world, and today, we’re doing aSpecial artist deep-dive into a Norwegian producer who found streaming success in a far-off land called...SeattleMissionGood morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Friday March 29th 2019.ChartsBeatport.com is the go-to electronic music marketplace for professional DJs around the world so they can make people dance. With over 60K suppliers and labels, 450K customers and 35M unique annual visitors, Beatport is a B2B business at its core: providing high quality downloads so DJs can fill their sonic arsenal.It also provides a weekly Top 100 chart that essentially becomes an up-to-date soundtrack to what clubbers are getting down to globally.For the week ending March 22nd, “Inside My Head” by UK-based duo Audiojack took the #1 most purchased download for the second week in a row spending 10 days on the chart. Spinnin’ Records had the most tracks with 4, including a David Guetta and Tom Staar track, while legendary London-based Ministry of Sound Recordings had 2.While you may see some familiar Top 40 names such as Childish Gambino in the #3 spot or Calvin Harris & Rag’n’Bone Man at #5, the Beatport Top 100 is really an anti-pop serum: nearly half the chart’s tracks have 4-minute-plus run times, zero tracks directly releasing into the chart and nearly ⅔ of the list spending less than a week on the chart themselves...meaning lots of track turnover, and lots of opportunities for emerging artists with great dance music.Artist Highlight in the NewsHere’s an interesting case of streaming’s global nature at its finest: Norway-based house producer Simon Field found unexpected attention in Seattle upon the release of his track “Shake the Tree” on January 25th.Field, who sports a 53 Spotify Popularity Index score and 385K monthly listeners despite having only 10K followers, is an example of an emerging artist that organically over-indexes their stream count in a particular city for an unknown reason.Upon “Shake the Tree”’s release, it found a snug spot in the #62 position of the 90-track New Music Friday playlist for that week, which then seemed to feed playlist adds within 24 hours on no less than 20 mid-tier playlists ranging from 10 to 80K followers.While Sony-owned playlist curator Filtr UK’s “Dance All Night” was among these lists, Field had no major label support in the release, and yet from Feb 21st to March 20th saw a 455% increase in monthly listeners in Seattle, peaking at 7.4K.Virtually mirroring Field’s rise in Seattle however, a certain mid-tier playlist called “CloudKid” by the curator of the same name added “Shake the Tree” with 98K followers…..adding the track on Feb 16th, and removing it on March 21st, after which day Field saw an immediate decrease in Spotify listener growth in Seattle after a month long increase.Coincidence? Possibly, except for the fact that CloudKid is an influential electronic music label & curator who came up on YouTube, with a channel influencing 2.9M subscribers with over 906M total views.While the connection between Seattle and CloudKid’s audience is still unclear, the data suggests at the minimum an appreciation of mid-tier, five-digit follower count playlists helping propagate new music.And at its best, Field may have a unique case of cross-platform success where a veteran YouTube curator’s side hustle (or here, a Spotify playlist) gave an unknowing artist streaming success thousands of miles away.Playlist Round-Up (none)OutroThat’s it for your Daily Data Dump for Friday March 29th 2019. This is Jason from Chartmetric, if you’re enjoying the podcast, hit that subscribe button so you get the latest episodes at the earliest time.And if you feel like you missed something, you can get full show notes at: chartmetric.transistor.fm/episodes.Have a great weekend, see you Monday!
A beginner-friendly workshop covering how LLMs work, NLP basics, transformers & attention, prompt engineering, and building AI agents with Retrieval-Augmented Generation (RAG). Includes a live demo: Your First AI Agent.
Hands-on, beginner-friendly workshop covering LLM basics, Python, LangChain, LangGraph, retrieval-augmented generation (RAG), prompt engineering, LangChain introduction, and workflow automation with LangGraph, including a live demo of building your first AI agent.
Directed Acyclic Graphs (DAGs) are the foundation of most orchestration frameworks. But what happens when you allow an LLM to act as the router? Acyclic graphs now become cyclic, which means you have to design for the challenges resulting from all this extra power. We'll cover the ins and outs of agentic applications and how to best use them in your work as a data practitioner or developer building today.
➡️ Follow Us LinkedIn: https://www.linkedin.com/company/small-data-sf/ X/Twitter : https://twitter.com/smalldatasf Website: https://www.smalldatasf.com/
Discover LangChain, the open-source framework for building powerful agentic systems. Learn how to augment LLMs with your private data, moving beyond their training cutoffs. We'll break down how LangChain uses "chains," which are essentially Directed Acyclic Graphs (DAGs) similar to data pipelines you might recognize from dbt. This structure is perfect for common patterns like Retrieval Augmented Generation (RAG), where you orchestrate steps to fetch context from a vector database and feed it to an LLM to generate an informed response, much like preparing data for analysis.
Dive into the world of AI agents, where the LLM itself determines the application's control flow. Unlike a predefined DAG, this allows for dynamic, cyclic graphs where an agent can iterate and improve its response based on previous attempts. We'll explore the core challenges in building reliable agents: effective planning and reflection, managing shared memory across multiple agents in a cognitive architecture, and ensuring reliability against task ambiguity. Understand the critical trade-offs between the dependability of static chains and the flexibility of dynamic LLM agents.
Introducing LangGraph, a framework designed to solve the agent reliability problem by balancing agent control with agency. Through a live demo in LangGraph Studio, see how to build complex AI applications using a cyclic graph. We'll demonstrate how a router agent can delegate tasks, execute a research plan with multiple steps, and use cycles to iterate on a problem. You'll also see how human-in-the-loop intervention can steer the agent for improved performance, a critical feature for building robust and observable agentic systems.
Explore some of the most exciting AI agents in production today. See how Roblox uses an AI assistant to generate virtual worlds from a prompt, how TripAdvisor’s agent acts as a personal travel concierge to create custom itineraries, and how Replit’s coding agent automates code generation and pull requests. These real-world examples showcase the practical power of moving from simple DAGs to dynamic, cyclic graphs for solving complex, agentic problems.
This hands-on lab empowers you to build a cutting-edge multimodal question answering system using Google's Vertex AI and the powerful Gemini family of models. By constructing this system from the ground up, you'll gain a deep understanding of its inner workings and the advantages of incorporating visual information into Retrieval Augmented Generation (RAG). This hands-on experience equips you with the knowledge to customize and optimize your own multimodal question answering systems, unlocking new possibilities for knowledge discovery and reasoning.
If you register for a Learning Center lab, please ensure that you sign up for a Google Cloud Skills Boost account for both your work domain and personal email address. You will need to authenticate your account as well (be sure to check your spam folder!). This will ensure you can arrive and access your labs quickly onsite. You can follow this link to sign up!
It's finally possible to bring the awesome power of Large Language Models (LLMs) to your laptop. This talk will explore how to run and leverage small, openly available LLMs to power common tasks involving data, including selecting the right models, practical use cases for running small models, and best practices for deploying small models effectively alongside databases.
Bio: Jeffrey Morgan is the founder of Ollama, an open-source tool to get up and run large language models. Prior to founding Ollama, Jeffrey founded Kitematic, which was acquired by Docker and evolved into Docker Desktop. He has previously worked at companies including Docker, Twitter, and Google.
➡️ Follow Us LinkedIn: https://www.linkedin.com/company/small-data-sf/ X/Twitter : https://twitter.com/smalldatasf Website: https://www.smalldatasf.com/
Discover how to run large language models (LLMs) locally using Ollama, the easiest way to get started with small AI models on your Mac, Windows, or Linux machine. Unlike massive cloud-based systems, small open source models are only a few gigabytes, allowing them to run incredibly fast on consumer hardware without network latency. This video explains why these local LLMs are not just scaled-down versions of larger models but powerful tools for developers, offering significant advantages in speed, data privacy, and cost-effectiveness by eliminating hidden cloud provider fees and risks.
Learn the most common use case for small models: combining them with your existing factual data to prevent hallucinations. We dive into retrieval augmented generation (RAG), a powerful technique where you augment a model's prompt with information from a local data source. See a practical demo of how to build a vector store from simple text files and connect it to a model like Gemma 2B, enabling you to query your own data using natural language for fast, accurate, and context-aware responses.
Explore the next frontier of local AI with small agents and tool calling, a new feature that empowers models to interact with external tools. This guide demonstrates how an LLM can autonomously decide to query a DuckDB database, write the correct SQL, and use the retrieved data to answer your questions. This advanced tutorial shows you how to connect small models directly to your data engineering workflows, moving beyond simple chat to create intelligent, data-driven applications.
Get started with practical applications for small models today, from building internal help desks to streamlining engineering tasks like code review. This video highlights how small and large models can work together effectively and shows that open source models are rapidly catching up to their cloud-scale counterparts. It's never been a better time for developers and data analysts to harness the power of local AI.
Building an assistant capable of answering complex, company-specific questions and executing workflows requires first building a powerful Retrieval Augmented Generation (RAG) system. Founding engineer Eddie Zhou explains how Glean built its RAG system on Google Cloud— combining a domain-adapted search engine with dynamic prompts to harness the full capabilities of Gemini's reasoning engine. By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.
Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.
Build smart context systems with retrieval-augmented generation (RAG) pipelines, vector embeddings, or built-in search capabilities.
AI is reshaping NetOps from scripted automation to intelligent, data driven workflows. We will show uses: incident triage, knowledge retrieval, traffic analysis, prediction, and contrast legacy monitoring with ML, NLP, and LLMs. See how RAG, text to SQL, and agent workflows enable real time insights across hybrid data. We will outline data pipelines and MLOps, address accuracy, reliability, cost, compliance, and weigh build vs buy. We will cover API integration and human in the loop guardrails.
This talk will guide you through the initial steps of integrating LM Studio with Elastic. We'll discuss how to establish a connection to LM Studio using a connector, conduct tests, and begin developing an RAG application in Python.
Build a multimodal search engine with Gemini and Vertex AI. This hands-on lab demonstrates Retrieval Augmented Generation (RAG) to query documents containing text and images. Learn to extract metadata, generate embeddings, and search using text or image queries.
If you register for a Learning Center lab, please ensure that you sign up for a Google Cloud Skills Boost account for both your work domain and personal email address. You will need to authenticate your account as well (be sure to check your spam folder!). This will ensure you can arrive and access your labs quickly onsite. You can follow this link to sign up!
Learn how to build an advanced AI agent using Azure Database for PostgreSQL and Semantic Kernel. This hands-on lab walks you through integrating Retrieval-Augmented Generation (RAG), semantic re-ranking, Semantic Operators, and GraphRAG (using Apache AGE) to enable intelligent legal question answering using real case data. Gain practical AI implementation skills with your own PostgreSQL-backed applications.
In this hands-on workshop, you’ll learn to build domain-specific AI agents with Foundry Agent Service. Starting from a simple agent, you’ll add system prompts, custom instructions, and knowledge with RAG. You’ll extend it with tool calling (like a pizza calculator) and connect external services via MCP for live menu and order handling. By the end, you’ll have a working Contoso PizzaBot that can answer questions, recommend pizzas, and manage orders.