talk-data.com talk-data.com

Topic

LLM

Large Language Models (LLM)

nlp ai machine_learning

1405

tagged

Activity Trend

158 peak/qtr
2020-Q1 2026-Q1

Activities

1405 activities · Newest first

Many data engineers already use large language models to assist data ingestion, transformation, DataOps, and orchestration. This blog commences a series that explores the emergence of ChatGPT, Bard, and LLM tools from data pipeline vendors, and their implications for the discipline of data engineering. Published at: https://www.eckerson.com/articles/should-ai-bots-build-your-data-pipelines-examining-the-role-of-chatgpt-and-large-language-models-in-data-engineering

We talked about:

Katharine's background Katharine's ML privacy startup GDPR, CCPA, and the “opt-in as the default” approach What is data privacy? Finding Katharine's book – Practical Data Privacy The various definitions of data privacy and “user profiles” Privacy engineering and privacy-enhancing technologies Why data privacy is important What is differential privacy? The importance of keeping privacy in mind when designing systems Data privacy on the example of ChatGPT Katharine's resource suggestions for learning about data privacy

Links:

LinkedIn: https://www.linkedin.com/in/katharinejarmul/

Twitter: https://twitter.com/kjam

Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Send us a text ChatGPT might be a scrape of the internet but personal.ai is AI that's personally yours.  Meet Suman Kanuganti, CEO of Personal AI, focused on empowering every individual with an AI extension of their memory. 01:35 Meet Suman Kanuganti05:57 Starting Aira, addressing needs of the blind 11:41 Bigger dreams - what would Larry do?16:27 ChatGPT… ok we've said it17:57 Introducing Personal.ai26:34 Using Personal.ai31:23 Innovative use cases33:57 Now it gets crazy38:43 It's FREE… to start42:02 Keeping your data safe44:41 Predicting the future of AI48:15 The scary part51:12 For funLinkedIn: linkedin.com/in/kanugantisuman Website: https://www.personal.ai/ Want to be featured as a guest on Making Data Simple?  Reach out to us at [email protected] and tell us why you should be next.  The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.  Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Conversation Simulator: A Real Life Case Leveraging OpenAI's API | Crisis Text Line

ABOUT THE TALK: While we will never replace human to human interaction for crisis intervention, there are plenty of opportunities to build intelligence with AI/ML models that crisis responders could greatly benefit from.

In this talk Maddie Schults and Mateo Garcia introduce their conversation simulator, a tool that we built leveraging openAI's API that allows them to train crisis responders on how to support people in crisis with close to real life situations and can help reduce anxiety for new crisis responders as they log on the platform for the first time.

ABOUT THE SPEAKERS: Maddie Schults is the General Manager at Crisis Text Line. She is a product leader and technologist with over 20 years of experience envisioning, building and launching enterprise software products. At Crisis Text Line, Maddie is responsible for building the Global Product for crisis care intervention and its adoption globally in different countries and languages.

Mateo Garcia is Lead Data Scientist at Crisis Text Line, where he oversees all the Analytics & Data Science efforts. He is a data leader with +7 industry experience scaling data teams from the ground up and building data products at different start-ups and consulting firms.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

LLM's & Semantic Layer: Self Serve has Entered the Chat | Zenlytic

ABOUT THE TALK: Self-serve analytics has always been the promised land for data teams. A fantastic ideal but not something actually achievable. The combination of LLM’s and the semantic layer completely changes that. LLM’s and Semantic Layer combine high level intelligence with context of the business that allows deep and accurate question answering. Together, they unlock truly self-serve analytics.

ABOUT THE SPEAKER: Paul Blankley is the Co-founder and CTO of Zenlytic. He previously co-founded Ex Quanta AI Studio. He was also a Data Engineer at Capital One.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Although many have been cognizant of AI’s value in recent months, the further back we look, the more exclusive this group of people becomes. In our latest AI-series episodes of DataFramed, we gain insight from an expert who has been part of the industry for 40 years. Joaquin Marques, Founder and Principal Data Scientist at Kanayma LLC has been working in AI since 1983. With experience at major tech companies like IBM, Verizon, and Oracle, Joaquin's knowledge of AI is vast. Today, he leads an AI consultancy, Kanayma, where he creates innovative AI products. Throughout the episode, Joaquin shares his insights on AI's development over the years, its current state, and future possibilities. Joaquin also shares the exciting projects they've worked on at Kanayma as well as what to consider when building AI products, and how ChatGPT is making chatbots better. Joaquin goes beyond providing insight into the space, encouraging listeners to think about the practical consequences of implementing AI, with Joaquin sharing the finer technical details of many of the solutions he’s helped build. Joaquin also shares many of the thought processes that have helped him move forward when building AI products, providing context on many practical applications of AI, both from his past and the bleeding edge of today.   The discussion examines the complexities of artificial intelligence, from the perspective of someone that has been focused on this technology for more than most. Tune in for guidance on how to build AI into your own company's products.

With the advances in AI products and the explosion of ChatGPT in recent months, it is becoming easier to imagine a world where AI and humans work seamlessly together—revolutionizing how we solve complex problems and transform our daily lives. This is especially the case for data professionals. In this episode of our AI series, we speak to Sarah Schlobohm, Head of AI at Kubrick Group. Dr. Schlobohm leads the training of the next generation of machine learning engineers. With a background in finance and consulting, Sarah has a deep understanding of the intersection between business strategy, data science, and AI. Prior to her work in finance, Sarah became a chartered accountant, where she honed her skills in financial analysis and strategy. Sarah worked for one of the world's largest banks, where she used data science to fight financial crime, making significant contributions to the industry's efforts to combat money laundering and other illicit activities. Sarah shares her extensive knowledge on incorporating AI within data teams for maximum impact, covering a wide array of AI-related topics, including upskilling, productivity, and communication, to help data professionals understand how to integrate generative AI effectively in their daily work. Throughout the episode, Sarah explores the challenges and risks of AI integration, touching on the balance between privacy and utility. She highlights the risks data teams can avoid when using AI products and how to approach using AI products the right way. She also covers how different roles within a data team might make use of generative AI, as well as how it might effect coding ability going forward. Sarah also shares use cases for those in non-data teams, such as marketing, while also highlighting what to consider when using outputs from GPT models. Sarah shares the impact chatbots might have on education calling attention to the power of AI tutors in schools. Sarah encourages people to start using AI now, considering the barrier to entry is so low, and how that might not be the case going forward. From automating mundane tasks to enabling human-AI collaboration that makes work more enjoyable, Sarah underscores the transformative power of AI in shaping the future of humanity. Whether you're an AI enthusiast, data professional, or someoone with an interest in either this episode will provide you with a deeper understanding of the practical aspects of AI implementation.

With the advent of any new technology that promises to make humans lives easier, replacing concious actions with automation, there is always backlash. People are often aware of the displacement of jobs, and often, it is viewed in a negative light. But how do we try to change the collective understanding to one of hope and excitement? What use cases can be shared that will change the opinion of those that are weary of AI?  Noelle Silver Russell is the Global AI Solutions & Generative AI & LLM Industry Lead at Accenture, responsible for enterprise-scale industry playbooks for generative AI and LLMs. In this episode of our AI series, Noelle discusses how to prioritize ChatGPT use cases by focusing on the different aspects of value creation that GPT models can bring to individuals and organizations. She addresses common misconceptions surrounding ChatGPT and AI in general, emphasizing the importance of understanding their potential benefits and selecting use cases that maximize positive impact, foster innovation, and contribute to job creation. Noelle draws parallels between the fast-moving AI projects today and the launch of Amazon Alexa, which she worked on, and points out that many of the discussions being raised today were also talked about 10 years ago. She discusses how companies can now use AI to focus both on business efficiencies and customer experience, no longer having to settle for a trade-off between the two. Noelle explains the best way for companies to approach adding GPT tools into their processes, which focusses on taking a holistic view to implementation. She also recommends use-cases for companies that are just beginning to use AI, as well as the challenges they might face when deploying models into production, and how they can mitigate them.  On the topic of the displacement of jobs, Noelle draws parallels from when Alexa was launched, and how it faced similar criticisms, digging into the fear that people have around new technology, which could be transformed into enthusiasm. Noelle suggests that there is a burden on leadership within organizations to create a culture where people are excited to use AI tools, rather than feeling threatened by them.

ChatGPT has leaped into the forefront of our lives—everyone from students to multinational organizations are seeing value in adding a chat interface to an LLM. But OpenAI has been concentrating on this for years, steadily developing one of the most viral digital products this century. In this episode of our AI series, we sit down with Logan Kilpatrick. Logan currently leads developer relations at OpenAI, supporting developers building with DALL-E, the OpenAI API, and ChatGPT. Logan takes us through OpenAI’s products, API, and models, and provides insights into the many use cases of ChatGPT.  Logan provides fascinating information on ChatGPT’s plugins and how they can be used to build agents that help us in a variety of contexts. He also discusses the future integration of LLMs into our daily lives and how it will add structure to the unstructured nature and difficult-to-leverage data we generate and interact with on a daily basis. Logan also touches on the powerful image input features in GPT4, how it can help those with partial sight to improve their quality of life, and how it can be used for various other use cases. Throughout the episode, we unpack the need for collaboration and innovation, due to ChatGPT becoming more powerful when integrated with other pieces of software. Covering key discussion points with regard to AI tools currently, in particular, what could be built in-house by OpenAI and what could be built in the public domain. Logan also discusses the ecosystem forming around ChatGPT and how it will all become connected going forward. Finally, Logan shares tips for getting better responses from ChatGPT and the things to consider when integrating it into your organization’s product.  This episode provides a deep dive into the world of GPT models from within the eye of the storm, providing valuable insights to those interested in AI and its practical applications in our daily lives.

podcast_episode
by Dante DeAntonio (Moody's Analytics) , Cris deRitis , Mark Zandi (Moody's Analytics) , Marisa DiNatale (Moody's Analytics)

Another jobs Friday, another strong jobs report. Dante DeAntonio joins the crew to break down the employment numbers and what they mean for the near-term outlook. The team also discusses the recent banking crisis, the looming debt limit x-date, and the most likely outcomes for both. And is someone using ChatGPT to cheat at the statistics game? For the full transcript, click here Follow Mark Zandi @MarkZandi, Cris deRitis @MiddleWayEcon, and Marisa DiNatale on LinkedIn for additional insight.

Questions or Comments, please email us at [email protected]. We would love to hear from you.    To stay informed and follow the insights of Moody's Analytics economists, visit Economic View.

In this episode, Conor and Bryce continue to chat with Tristan Brindle about his new library Flux, ChatGPT and more. Link to Episode 127 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachAbout the Guest Tristan Brindle a freelance programmer and trainer based in London, mostly focussing on C++. He is a member of the UK national body (BSI) and ISO WG21. Occasionally I can be found at C++ conferences. He is also a director of C++ London Uni, a not-for-profit organisation offering free beginner programming classes in London and online. He has a few fun projects on GitHub that you can find out about here.

Show Notes

Date Recorded: 2023-04-05 Date Released: 2023-04-28 ADSP Episode 125: NanoRange with Tristan BrindleADSP Episode 126: Flux (and Flow) with Tristan BrindleChatGPTcwhyChatDBGcommentator“Performance Matters” by Emery Berger“Python Performance Matters” by Emery Berger (Strange Loop 2022)Tristan’s Tweet about scanTristan’s Tweet with flux solutions to MCOArrayCast Episode 48 Henry Rich Reveals J with Threads J9.4J Fold’stop10 GitHub RepoHaskell’s mapAdjacentblackbird C++ Combinator LibraryC++ On Sea ConferenceCppNorthIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

We talked about:

Johannes’s background Johannes’s Open Source Spotlight demos – Refinery and Bricks The difficulties of working with natural language processing (NLP) Incorporating ChatGPT into a process as a heuristic What is Bricks? The process of starting a startup – Kern Making the decision to go with open source Pros and cons of launching as open source Kern’s business model Working with enterprises Johannes as a salesperson The team at Kern Johannes’s role at Kern How Johannes and Henrik separate responsibilities at Kern Working with very niche use cases The short story of how Kern got its funding Johannes’s resource recommendation

Links:

Refinery's GitHub repo: https://github.com/code-kern-ai/refinery Bricks' Github repo: https://github.com/code-kern-ai/bricks Bricks Open Source Spotlight demo: https://www.youtube.com/watch?v=r3rXzoLQy2U Refinery Open Source Spotlight demo: https://www.youtube.com/watch?v=LlMhN2f7YDg Discord: https://discord.com/invite/qf4rGCEphW Ker's Website: https://www.kern.ai

Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

A modern AI start-up is a front-end developer plus a prompt engineer" is a popular joke on Twitter. This talk is about LangChain, a Python open-source tool for prompt engineering. You can use it with completely open-source language models or ChatGPT. I will show you how to create a prompt and get an answer from LLM. As an example application, I will show a demo of an intelligent agent using web search and generating Python code to answer questions about this conference.

Local Planning Authorities (LPAs) in the UK rely on written representations from the community to inform their Local Plans which outline development needs for their area. With an average of 2000 representations per consultation and 4 rounds of consultation per Local Plan, the volume of information can be overwhelming for both LPAs and the Planning Inspectorate tasked with examining the legality and soundness of plans. In this study, we investigate the potential for Large Language Models (LLMs) to streamline representation analysis.

We find that LLMs have the potential to significantly reduce the time and effort required to analyse representations, with simulations on historical Local Plans projecting a reduction in processing time by over 30%, and experiments showing classification accuracy of up to 90%.

In this presentation, we discuss our experimental process which used a distributed experimentation environment with Jupyter Lab and cloud resources to evaluate the performance of the BERT, RoBERTa, DistilBERT, and XLNet models. We also discuss the design and prototyping of web applications to support the aided processing of representations using Voilà, FastAPI, and React. Finally, we highlight successes and challenges encountered and suggest areas for future improvement.

Large generative models rely upon massive data sets that are collected automatically. For example, GPT-3 was trained with data from “Common Crawl” and “Web Text”, among other sources. As the saying goes — bigger isn’t always better. While powerful, these data sets (and the models that they create) often come at a cost, bringing their “internet-scale biases” along with their “internet-trained models.” While powerful, these models beg the question — is unsupervised learning the best future for machine learning?

ML researchers have developed new model-tuning techniques to address the known biases within existing models and improve their performance (as measured by response preference, truthfulness, toxicity, and result generalization). All of this at a fraction of the initial training cost. In this talk, we will explore these techniques, known as Reinforcement Learning from Human Feedback (RLHF), and how open-source machine learning tools like PyTorch and Label Studio can be used to tune off-the-shelf models using direct human feedback.

Large Language Models (LLM), like ChatGPT, have shown miraculous performances on various tasks. But there are still unsolved issues with these models: they can be confidently wrong and their knowledge becomes outdated. GPT also does not have any of the information that you have stored in your own data. In this talk, you'll learn how to use Haystack, an open source framework, to chain LLMs with other models and components to overcome these issues. We will build a practical application using these techniques. And you will walk away with a deeper understanding of how to use LLMs to build NLP products that work.

In this talk, I'll show how large language models such as GPT-3 complement rather than replace existing machine learning workflows. Initial annotations are gathered from the OpenAI API via zero- or few-shot learning, and then corrected by a human decision maker using an annotation tool. The resulting annotations can then be used to train and evaluate models as normal. This process results in higher accuracy than can be achieved from the OpenAI API alone, with the added benefit that you'll own and control the model for runtime.

Summary

Business intellingence has been chasing the promise of self-serve data for decades. As the capabilities of these systems has improved and become more accessible, the target of what self-serve means changes. With the availability of AI powered by large language models combined with the evolution of semantic layers, the team at Zenlytic have taken aim at this problem again. In this episode Paul Blankley and Ryan Janssen explore the power of natural language driven data exploration combined with semantic modeling that enables an intuitive way for everyone in the business to access the data that they need to succeed in their work.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudderstack Your host is Tobias Macey and today I'm interviewing Paul Blankley and Ryan Janssen about Zenlytic, a no-code business intelligence tool focused on emerging commerce brands

Interview

Introduction How did you get involved in the area of data management? Can you describe what Zenlytic is and the story behind it? Business intelligence is a crowded market. What was your process for defining the problem you are focused on solving and the method to achieve that outcome? Self-serve data exploration has been attempted in myriad ways over successive generations of BI and data platforms. What are the barriers that have been the most challenging to overcome in that effort?

What are the elements that are coming together now that give you confidence in being able to deliver on that?

Can you describe how Zenlytic is implemented?

What are the evolutions in the understanding and implementation of semantic layers that provide a sufficient substrate for operating on? How have the recent breakthroughs in large language models (LLMs) improved your ability to build features in Zenlytic? What is your process for adding domain semantics to the operational aspect of your LLM?

For someone using Zenlytic, what is the process for getting it set up and integrated with their data? Once it is operational, can you describe some typical workflows for using Zenlytic in a business context?

Who are the target users? What are the collaboration options available?

What are the most complex engineering/data challenges that you have had to address in building Zenlytic? What are the most interesting, innovative, or unexpected ways that you have seen Zenlytic used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Zenlytic? When is Zenlytic the wrong choice? What do you have planned for the future of Zenlytic?

Contact Info

Paul Blankley (LinkedIn)

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Zenlytic OLAP Cube Large Language Model Starburst Pr

ChatGPT was the iPhone moment for AI, and things are moving insanely quickly. What do generative AI models mean for us, especially children, who are arguably the last of the Pre-AI generation? I dive into some thoughts this week about how we need to work alongside the machines, the impact of generative AI on kids, and so on. Buckle up. We are in for a very interesting next few years as we sort out where AI fits into our day-to-day lives.

data #datascience #dataengineering #chatgpt #ai


If you like this show, give it a 5-star rating on your favorite podcast platform.

Purchase Fundamentals of Data Engineering at your favorite bookseller.

Check out my substack: https://joereis.substack.com/