Welcome back to the podcast! The host, Mukundan Sankar, is an experienced data professional and AI researcher. This episode will discuss Retrieval Augmented Generation (RAG) and how it's transforming our relationship with information.23 The Problem of Information Overload: We are constantly bombarded with information, making it challenging to find what truly matters. Traditional AI models and search engines can provide inaccurate, outdated information, or even fabricate information (AI hallucination). What is RAG? RAG is an AI model combining retrieval and generation, offering the best of both worlds. Retrieval: Like a super-powered search engine, it searches vast data sources (documents, articles, reports) for the most relevant information based on the user's query. Generation: Takes the retrieved data and summarizes it clearly, concisely, and engagingly. How RAG Differs from Traditional Methods: RAG goes beyond simple keyword matching; it seeks deeper connections, patterns, and contextual data. It's grounded in real-time data from reliable sources, ensuring accuracy and trustworthiness. Real-World Applications of RAG: Personalized News Podcasts: RAG can scan news articles, extract key points, and convert them into an easily digestible audio format. Here is a look at my blog which looks at the application of RAG to convert Text News to Audio. Research Summarization: It can condense complex research papers and scientific reports into key takeaways, saving users time and effort. Efficient Workflows: RAG can summarize lengthy reports, highlighting the most crucial points for faster decision-making. The Benefits of RAG: Personalized Learning and Information Processing: RAG filters out irrelevant data and presents only what's useful to the individual. Increased Efficiency: It automates information gathering and summarization, freeing up time for other tasks. The Importance of Responsible AI Use: While RAG is a powerful tool, its impact depends on our choices. It's crucial to use RAG ethically and thoughtfully to shape a positive future. What’s Next? Don't miss out on future episodes exploring exciting tech trends, data projects, and innovations! If you found this useful, please subscribe to stay updated! Embrace curiosity, keep learning, and stay tuned – the AI revolution is just beginning! You can also find me on Medium and Substack. A blog that talks about application of RAG in News Articles here This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit mukundansankar.substack.com
talk-data.com
Topic
RAG
Retrieval Augmented Generation (RAG)
369
tagged
Activity Trend
Top Events
Send us a text More on GenAI, Hallucinations, RAG, Use Cases, LLMs, SLMs and costs with Armand Ruiz, Director watsonx Client Engineering and John Webb, Principal Client Engineering. With this and the previous episode you'll be wiser on AI than 98% of the world.
00:12 Hallucinations02:33 RAG Differentiation06:41 Why IBM in AI09:23 Use Cases11:02 The GenAI Resume13:37 watson.x 15:40 LLMs17:51 Experience Counts20:03 AI that Surprises23:46 AI Skills26:47 Switching LLMs27:13 The Cost and SLMs28:21 Prompt Engineering29:16 For FunLinkedIn: linkedin.com/in/armand-ruiz, linkedin.com/in/john-webb-686136127 Website: https://www.ibm.com/client-engineering
Love what you're hearing? Don't forget to rate us on your favorite platform! Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.
This project aims to leverage retrieval augmented generation (RAG) and fine-tuning of LLMs to create an AI-based assistant for mental health, which could be used to support a psychotherapist.
Retrieval is the process of searching for a given item (image, text, …) in a large database that are similar to one or more query items. A classical approach is to transform the database items and the query item into vectors (also called embeddings) with a trained model so that they can be compared via a distance metric. It has many applications in various fields, e.g. to build a visual recommendation system like Google Lens or a RAG (Retrieval Augmented Generation), a technique used to inject specific knowledge into LLMs depending on the query. Vector databases ease the management, serving and retrieval of the vectors in production and implement efficient indexes, to rapidly search through millions of vectors. They gained a lot of attention over the past year, due to the rise of LLMs and RAGs.
Although people working with LLMs are increasingly familiar with the basic principles of vector databases, the finer details and nuances often remain obscure. This lack of clarity hinders the ability to make optimal use of these systems.
In this talk, we will detail two examples of real-life projects (Deduplication of real estate adverts using the image embedding model DinoV2 and RAG for a medical company using the text embedding model Ada-2) and deep dive into retrieval and vector databases to demystify the key aspects and highlight the limitations: HSNW index, comparison of the providers, metadata filtering (the related plunge of performance when filtering too many nodes and how indexing partially helps it), partitioning, reciprocal rank fusion, the performance and limitations of the representations created by SOTA image and text embedding models, …
The first episode of The Pragmatic Engineer Podcast is out. Expect similar episodes every other Wednesday. You can add the podcast in your favorite podcast player, and have future episodes downloaded automatically. Listen now on Apple, Spotify, and YouTube. Brought to you by: • Codeium: Join the 700K+ developers using the IT-approved AI-powered code assistant. • TLDR: Keep up with tech in 5 minutes — On the first episode of the Pragmatic Engineer Podcast, I am joined by Simon Willison. Simon is one of the best-known software engineers experimenting with LLMs to boost his own productivity: he’s been doing this for more than three years, blogging about it in the open. Simon is the creator of Datasette, an open-source tool for exploring and publishing data. He works full-time developing open-source tools for data journalism, centered on Datasette and SQLite. Previously, he was an engineering director at Eventbrite, joining through the acquisition of Lanyrd, a Y Combinator startup he co-founded in 2010. Simon is also a co-creator of the Django Web Framework. He has been blogging about web development since the early 2000s. In today’s conversation, we dive deep into the realm of Gen AI and talk about the following: • Simon’s initial experiments with LLMs and coding tools • Why fine-tuning is generally a waste of time—and when it’s not • RAG: an overview • Interacting with GPTs voice mode • Simon’s day-to-day LLM stack • Common misconceptions about LLMs and ethical gray areas • How Simon’s productivity has increased and his generally optimistic view on these tools • Tips, tricks, and hacks for interacting with GenAI tools • And more! I hope you enjoy this episode. — In this episode, we cover: (02:15) Welcome (05:28) Simon’s ‘scary’ experience with ChatGPT (10:58) Simon’s initial experiments with LLMs and coding tools (12:21) The languages that LLMs excel at (14:50) To start LLMs by understanding the theory, or by playing around? (16:35) Fine-tuning: what it is, and why it’s mostly a waste of time (18:03) Where fine-tuning works (18:31) RAG: an explanation (21:34) The expense of running testing on AI (23:15) Simon’s current AI stack (29:55) Common misconceptions about using LLM tools (30:09) Simon’s stack – continued (32:51) Learnings from running local models (33:56) The impact of Firebug and the introduction of open-source (39:42) How Simon’s productivity has increased using LLM tools (41:55) Why most people should limit themselves to 3-4 programming languages (45:18) Addressing ethical issues and resistance to using generative AI (49:11) Are LLMs are plateauing? Is AGI overhyped? (55:45) Coding vs. professional coding, looking ahead (57:27) The importance of systems thinking for software engineers (1:01:00) Simon’s advice for experienced engineers (1:06:29) Rapid-fire questions — Where to find Simon Willison: • X: https://x.com/simonw • LinkedIn: https://www.linkedin.com/in/simonwillison/ • Website: https://simonwillison.net/ • Mastodon: https://fedi.simonwillison.net/@simon — Referenced: • Simon’s LLM project: https://github.com/simonw/llm • Jeremy Howard’s Fast Ai: https://www.fast.ai/ • jq programming language: https://en.wikipedia.org/wiki/Jq_(programming_language) • Datasette: https://datasette.io/ • GPT Code Interpreter: https://platform.openai.com/docs/assistants/tools/code-interpreter • Open Ai Playground: https://platform.openai.com/playground/chat • Advent of Code: https://adventofcode.com/ • Rust programming language: https://www.rust-lang.org/ • Applied AI Software Engineering: RAG: https://newsletter.pragmaticengineer.com/p/rag • Claude: https://claude.ai/ • Claude 3.5 sonnet: https://www.anthropic.com/news/claude-3-5-sonnet • ChatGPT can now see, hear, and speak: https://openai.com/index/chatgpt-can-now-see-hear-and-speak/ • GitHub Copilot: https://github.com/features/copilot • What are Artifacts and how do I use them?: https://support.anthropic.com/en/articles/9487310-what-are-artifacts-and-how-do-i-use-them • Large Language Models on the command line: https://simonwillison.net/2024/Jun/17/cli-language-models/ • Llama: https://www.llama.com/ • MLC chat on the app store: https://apps.apple.com/us/app/mlc-chat/id6448482937 • Firebug: https://en.wikipedia.org/wiki/Firebug_(software)# • NPM: https://www.npmjs.com/ • Django: https://www.djangoproject.com/ • Sourceforge: https://sourceforge.net/ • CPAN: https://www.cpan.org/ • OOP: https://en.wikipedia.org/wiki/Object-oriented_programming • Prolog: https://en.wikipedia.org/wiki/Prolog • SML: https://en.wikipedia.org/wiki/Standard_ML • Stabile Diffusion: https://stability.ai/ • Chain of thought prompting: https://www.promptingguide.ai/techniques/cot • Cognition AI: https://www.cognition.ai/ • In the Race to Artificial General Intelligence, Where’s the Finish Line?: https://www.scientificamerican.com/article/what-does-artificial-general-intelligence-actually-mean/ • Black swan theory: https://en.wikipedia.org/wiki/Black_swan_theory • Copilot workspace: https://githubnext.com/projects/copilot-workspace • Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems: https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321 • Bluesky Global: https://www.blueskyglobal.org/ • The Atrocity Archives (Laundry Files #1): https://www.amazon.com/Atrocity-Archives-Laundry-Files/dp/0441013651 • Rivers of London: https://www.amazon.com/Rivers-London-Ben-Aaronovitch/dp/1625676158/ • Vanilla JavaScript: http://vanilla-js.com/ • jQuery: https://jquery.com/ • Fly.io: https://fly.io/ — Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email [email protected].
Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe
Graph Retrieval Augmented Generation (Graph RAG) is emerging as a powerful addition to traditional vector search retrieval methods. Graphs are great at representing and storing heterogeneous and interconnected information in a structured manner, effortlessly capturing complex relationships and attributes across different data types. Using open weights LLMs removes the dependency on an external LLM provider while retaining complete control over the data flows and how the data is being shared and stored. In this talk, we construct and leverage the structured nature of graph databases, which organize data as nodes and relationships, to enhance the depth and contextuality of retrieved information to enhance RAG-based applications with open weights LLMs. We will show these capabilities with a demo.
Retrieval-augmented generation (RAG) has become a key application for large language models (LLMs), enhancing their responses with information from external databases. However, RAG systems are prone to errors, and their complexity has made evaluation a critical and challenging area. Various libraries (like RAGAS and TruLens) have introduced evaluation tools and metrics for RAGs, but these evaluations involve using one LLM to assess another, raising questions about their reliability. Our study examines the stability and usefulness of these evaluation methods across different datasets and domains, focusing on the effects of the choice of the evaluation LLM, query reformulation, and dataset characteristics on RAG performance. It also assesses the stability of the metrics on multiple runs of the evaluation and how metrics correlate with each other. The talk aims to guide users in selecting and interpreting LLM-based evaluations effectively.
In this session, we will focus on fine-tuning, continuous pretraining, and retrieval-augmented generation (RAG) to customize foundation models using Amazon Bedrock. Attendees will explore and compare strategies such as prompt engineering, which reformulates tasks into natural language prompts, and fine-tuning, which involves updating the model's parameters based on new tasks and use cases. The session will also highlight the trade-offs between usability and resource requirements for each approach. Participants will gain insights into leveraging the full potential of large models and learn about future advancements aimed at enhancing their adaptability.
Join us as we explore how the use of gGraph Retrieval Augmented Generation technology enables our customers to interact with our rich data.
Crafting Tech Stacks to Embrace Traditional and Generative AI in Enterprise Environments In this talk, Bas will present a reference architecture for machine learning systems that incorporates MLOps standards and best practices. This blueprint promises scalability and effectiveness for ML platforms, integrating modern technological concepts such as feature stores, vector stores, and model registries seamlessly into the architecture. With a spotlight on emerging generative AI techniques like retrieval-augmented generation, attendees will gain valuable insights into harnessing the power of modern AI practices. Additionally, Bas will delve into the aspects of MLOps, including feedback loops and model monitoring, ensuring a holistic understanding of how to operationalize and optimize ML systems for sustained success.
In the rapidly evolving world of enterprise AI, traditional monolithic approaches are giving way to more agile and efficient architectures. This session will delve into how Multi-Agent Retrieval-Augmented Generation Systems (MARS) are transforming enterprise software development for AI applications. Learn about the core components of AI agents, the challenges of integrating LLMs with enterprise data, and how to build scalable, accurate, and high-performing AI applications
In the era of AI-driven applications, personalization is paramount. This talk explores the concept of Full RAG (Retrieval-Augmented Generation) and its potential to revolutionize user experiences across industries. We examine four levels of context personalization, from basic recommendations to highly tailored, real-time interactions.
The presentation demonstrates how increasing levels of context - from batch data to streaming and real-time inputs - can dramatically improve AI model outputs. We discuss the challenges of implementing sophisticated context personalization, including data engineering complexities and the need for efficient, scalable solutions.
Introducing the concept of a Context Platform, we showcase how tools like Tecton can simplify the process of building, deploying, and managing personalized context at scale. Through practical examples in travel recommendations, we illustrate how developers can easily create and integrate batch, streaming, and real-time context using simple Python code, enabling more engaging and valuable AI-powered experiences.
From a data perspective, an ideal scenario is one where practitioners can have a meaningful conversation with their data. In an era where data is both abundant and critical, the need for innovative methods to interact with and understand complex datasets has never been greater. Enter GraphRAG (Graph-based Retrieval-Augmented Generation), a cutting-edge approach that revolutionizes data interaction by seamlessly integrating graph theory with generative AI.
GraphRAG leverages the power of a knowledge graph to represent relationships within data, enabling more intuitive navigation and retrieval of relevant information. By augmenting these capabilities with state-of-the-art generation models, GraphRAG provides users with enriched, context-aware outputs that significantly surpass traditional query-response systems.
Attendees will gain insights into the underlying principles of GraphRAG, its architectural components, and practical applications across various domains, from healthcare to finance. We will demonstrate real-world use cases, showcasing how GraphRAG not only improves efficiency and accuracy in data handling but also democratizes access to complex insights, empowering users to reach their ideal state of conversing with their data. Join us to discover how GraphRAG is paving the way for the future of intelligent data interaction.
Generative AI (GenAI) has garnered significant attention for its potential to revolutionize various industries, from creative arts to data analysis. However, organizations are realizing that implementing GenAI is not as easy as just asking ChatGPT a few questions. Providing the most relevant and accurate contextual data to the LLM is critical if organizations are going to realize the full benefits of GenAI. Retrieval Augmented Generation, or RAG, is a well understood and effective technique for augmenting the original user prompt with additional, contextual data. However, many examples of RAG grossly oversimplify the reality of enterprise data ecosystems. In this session, we will examine how a Logical Data Fabric can make RAG a practical reality in large, complex organizations and deliver AI-ready data that make RAG effective and accurate.
A 30 minute demo of how to use Redpanda Connect (powered by Benthos) to generate vector embeddings on streaming text.
A 30 minute demo of how to use Redpanda Connect (powered by Benthos) to generate vector embeddings on streaming text. This session will walk through the architecture and configuration used to seamlessly integrate Redpanda Connect with LangChain, OpenAI, and MongoDB Atlas to build a complete Retrieval Augmented Generation data pipeline.
From a data perspective, an ideal scenario is one where practitioners can have a meaningful conversation with their data. In an era where data is both abundant and critical, the need for innovative methods to interact with and understand complex datasets has never been greater. Enter GraphRAG (Graph-based Retrieval-Augmented Generation), a cutting-edge approach that revolutionizes data interaction by seamlessly integrating graph theory with generative AI.
GraphRAG leverages the power of a knowledge graph to represent relationships within data, enabling more intuitive navigation and retrieval of relevant information. By augmenting these capabilities with state-of-the-art generation models, GraphRAG provides users with enriched, context-aware outputs that significantly surpass traditional query-response systems.
Attendees will gain insights into the underlying principles of GraphRAG, its architectural components, and practical applications across various domains, from healthcare to finance. We will demonstrate real-world use cases, showcasing how GraphRAG not only improves efficiency and accuracy in data handling but also democratizes access to complex insights, empowering users to reach their ideal state of conversing with their data. Join us to discover how GraphRAG is paving the way for the future of intelligent data interaction.
An intro to RAGHack, a global hackathon to develop apps using LLMs and RAG. A large language model (LLM) like GPT-4 can be used for summarization, translation, entity extraction, and question-answering. Retrieval Augmented Generation (RAG) is an approach that sends context to the LLM so that it can provide grounded answers. RAG apps can be developed on Azure using a wide range of programming languages and retrievers (such as AI Search, Cosmos DB, PostgreSQL, and Azure SQL). Get an overview of RAG in this session before diving deep in our follow-up streams.
“Last week was a great year in GenAI,” jokes Mark Ramsey—and it’s a great philosophy to have as LLM tools especially continue to evolve at such a rapid rate. This week, you’ll get to hear my fun and insightful chat with Mark from Ramsey International about the world of large language models (LLMs) and how we make useful UXs out of them in the enterprise.
Mark shared some fascinating insights about using a company’s website information (data) as a place to pilot a LLM project, avoiding privacy landmines, and how re-ranking of models leads to better LLM response accuracy. We also talked about the importance of real human testing to ensure LLM chatbots and AI tools truly delight users. From amusing anecdotes about the spinning beach ball on macOS to envisioning a future where AI-driven chat interfaces outshine traditional BI tools, this episode is packed with forward-looking ideas and a touch of humor.
Highlights/ Skip to:
(0:50) Why is the world of GenAI evolving so fast? (4:20) How Mark thinks about UX in an LLM application (8:11) How Mark defines “Specialized GenAI?” (12:42) Mark’s consulting work with GenAI / LLMs these days (17:29) How GenAI can help the healthcare industry (30:23) Uncovering users’ true feelings about LLM applications (35:02) Are UIs moving backwards as models progress forward? (40:53) How will GenAI impact data and analytics teams? (44:51) Will LLMs be able to consistently leverage RAG and produce proper SQL? (51:04) Where can find more from Mark and Ramsey International
Quotes from Today’s Episode “With [GenAI], we have a solution that we’ve built to try to help organizations, and build workflows. We have a workflow that we can run and ask the same question [to a variety of GenAI models] and see how similar the answers are. Depending on the complexity of the question, you can see a lot of variability between the models… [and] we can also run the same question against the different versions of the model and see how it’s improved. Folks want a human-like experience interacting with these models.. [and] if the model can start responding in just a few seconds, that gives you much more of a conversational type of experience.” - Mark Ramsey (2:38) “[People] don’t understand when you interact [with GenAI tools] and it brings tokens back in that streaming fashion, you’re actually seeing inside the brain of the model. Every token it produces is then displayed on the screen, and it gives you that typewriter experience back in the day. If someone has to wait, and all you’re seeing is a logo spinning, from a UX experience standpoint… people feel like the model is much faster if it just starts to produce those results in that streaming fashion. I think in a design, it’s extremely important to take advantage of that [...] as opposed to waiting to the end and delivering the results some models support that, and other models don’t.”- Mark Ramsey (4:35) "All of the data that’s on the website is public information. We’ve done work with several organizations on quickly taking the data that’s on their website, packaging it up into a vector database, and making that be the source for questions that their customers can ask. [Organizations] publish a lot of information on their websites, but people really struggle to get to it. We’ve seen a lot of interest in vectorizing website data, making it available, and having a chat interface for the customer. The customer can ask questions, and it will take them directly to the answer, and then they can use the website as the source information.” - Mark Ramsey (14:04) “I’m not skeptical at all. I’ve changed much of my [AI chatbot searches] to Perplexity, and I think it’s doing a pretty fantastic job overall in terms of quality. It’s returning an answer with citations, so you have a sense of where it’s sourcing the information from. I think it’s important from a user experience perspective. This is a replacement for broken search, as I really don’t want to read all the web pages and PDFs you have that might be about my chiropractic care query to answer my actual [healthcare] question.” - Brian O’Neill (19:22)
“We’ve all had great experience with customer service, and we’ve all had situations where the customer service was quite poor, and we’re going to have that same thing as we begin to [release more] chatbots. We need to make sure we try to alleviate having those bad experiences, and have an exit. If someone is running into a situation where they’d rather talk to a live person, have that ability to route them to someone else. That’s why the robustness of the model is extremely important in the implementation… and right now, organizations like OpenAI and Anthropic are significantly better at that [human-like] experience.” - Mark Ramsey (23:46) "There’s two aspects of these models: the training aspect and then using the model to answer questions. I recommend to organizations to always augment their content and don’t just use the training data. You’ll still get that human-like experience that’s built into the model, but you’ll eliminate the hallucinations. If you have a model that has been set up correctly, you shouldn’t have to ask questions in a funky way to get answers.” - Mark Ramsey (39:11) “People need to understand GenAI is not a predictive algorithm. It is not able to run predictions, it struggles with some math, so that is not the focus for these models. What’s interesting is that you can use the model as a step to get you [the answers]. A lot of the models now support functions… when you ask a question about something that is in a database, it actually uses its knowledge about the schema of the database. It can build the query, run the query to get the data back, and then once it has the data, it can reformat the data into something that is a good response back." - Mark Ramsey (42:02)
Links Mark on LinkedIn Ramsey International Email: mark [at] ramsey.international Ramsey International's YouTube Channel