talk-data.com talk-data.com

Topic

GenAI

Generative AI

ai machine_learning llm

1517

tagged

Activity Trend

192 peak/qtr
2020-Q1 2026-Q1

Activities

1517 activities · Newest first

Coalesce 2024: How Riot Games is building player-first gaming experiences with Databricks and dbt

Riot Games, creator of hit titles like League of Legends and Valorant, is building an ultimate gaming experience by using data and AI to deliver the most optimal player journeys. In this session, you'll learn how Riot's data platform team paired with analytics engineering, machine learning, and insights teams to integrate Databricks Data Intelligence Platform and dbt Cloud to significantly mature its data capabilities. The outcome: a scalable, collaborative analytics environment that serves millions of players worldwide.

You’ll hear how Riot Games: - Centralized petabytes of game telemetry on Databricks for fast processing and analytics - Modernized their data platform by integrating dbt Cloud, unlocking governance for modular, version-controlled data transformations and testing for a diverse set of user personas - Uses Generative AI to automate the enforcement of good documentation and quality code and plans to use Databricks AI to further speed up its ability to unlock the value of data - Deployed machine learning models for personalized recommendations and player behavior analysis

You'll come away with practical insights on architecting a modern data stack that can handle massive scale while empowering teams across the organization. Whether you're in gaming or any data-intensive industry, you'll learn valuable lessons from Riot's journey to build a world-class data platform.

Read the blog to learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements

Coalesce 2024: Generative AI driven near-real-time operational analytics with zero-ETL and dbt Cloud

AWS offers the most scalable, highest performing data services to keep up with the growing volume and velocity of data to help organizations to be data-driven in real-time. AWS helps customers unify diverse data sources by investing in a zero ETL future and enable end-to-end data governance so your teams are free to move faster with data. Data teams running dbt Cloud are able to deploy analytics code, following software engineering best practices such as modularity, continuous integration and continuous deployment (CI/CD), and embedded documentation. In this session, we will dive deeper into how to get near real-time insight on petabytes of transaction data using Amazon Aurora zero-ETL integration with Amazon Redshift and dbt Cloud for your Generative AI workloads.

Speakers: Neela Kulkarni Solutions Architect AWS

Neeraja Rentachintala Director, Product Management Amazon

Read the blog to learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements

Coalesce 2024: Supercharge your data pipelines with AI & ML using dbt Labs and Snowflake

Ready to level up your data pipelines with AI and ML? In this session, we'll dive into key Snowflake AI and ML features and teach you how to easily integrate them into dbt pipelines. You'll explore real-world machine learning and generative AI use cases, and see how dbt and Snowflake together deliver powerful, secure results within Snowflake’s governance and security framework. Plus, discover how data scientists, engineers, and analysts can collaborate seamlessly using these tools. Whether you're scaling ML models or embedding AI into your existing workflows, this session will give you practical strategies for building secure, AI-powered data pipelines with dbt and Snowflake.

Speaker: Randy Pettus Senior Partner Sales Engineer Snowflake

Read the blog to learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements

Coalesce 2024: How the NBA deploys AI analysts with TextQL

Join us for an insightful session where we delve into the innovative ways the NBA is leveraging Generative AI (GenAI) to revolutionize data insights and transform the world of sports and entertainment analytics.

Speakers: Keelan Smithers Data Product Manager, Analytics Engineering NBA

Mark Hay CTO & Co-Founder TextQL

Read the blog to learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements

Coalesce 2024: How Roche is redefining data, analytics & genAI at scale with dbt

Centralize, harmonize, and streamline. That’s how Roche delivers self-service analytics to the thousands of people in its pharma commercial sector. Dbt is powering the backend solution that combines over 60 transactional systems into a harmonized simplified data model. By adopting a version-controlled approach and enabling end-to-end lineage tracking, we achieved significant reduction in duplication and accelerated time-to-insight for data-driven decision-making. The transition from a heterogeneous technology stack to standardized ways of working has fostered greater flexibility in allocating resources across the organization to address diverse use cases. Additionally, the scalable nature of this platform allows us to easily replicate successful data solutions globally. We further augmented our capabilities by integrating generative AI into our Redshift data warehouse, empowering the creation of innovative data products using dbt. This presentation will share practical lessons learned, architectural insights, and the tangible business impact realized from this data platform modernization.

Speaker: João Antunes Lead Engineer Hoffmann-La Roche

Read the blog to learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements

Coalesce 2024: Generative AI, the ADLC and the coming era of analytics engineering

Over the past 8 years, dbt, Cloud data warehouses and the dbt viewpoint have dramatically changed the workflow for data practitioners, raising the bar on what great data work looks like and altering the nature of the types of problems we focus on day to day. Jason Ganz lived this transition firsthand and now, he believes, we’re on the cusp of another transformation in how data work gets done. Come hear about how new technologies like Generative AI and new workflows like the Analytics Development Lifecycle will transform data work and how to think about that in your own role and career trajectory.

Speaker: Jason Ganz Senior Manager, Developer Experience dbt Labs

Read the blog to learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements

podcast_episode
by Vijay Yadav (Center for Mathematical Sciences at Merck) , Joe Reis (DeepLearning.AI)

Vijay Yadav (Director of Data Science at Merck) joins me to chat about a very interesting project he launched at Merck involving LLMs in production. A big part of this discussion is how to make data ready for generative AI.

This is a great example of an LLM-native use case in production, which are rare right now. Lots to learn from here. Enjoy!

LinkedIn: https://www.linkedin.com/in/vijay-yadav-ds/

Summary In this episode of the Data Engineering Podcast, Adrian Broderieux and Marcin Rudolph, co-founders of DLT Hub, delve into the principles guiding DLT's development, emphasizing its role as a library rather than a platform, and its integration with lakehouse architectures and AI application frameworks. The episode explores the impact of the Python ecosystem's growth on DLT, highlighting integrations with high-performance libraries and the benefits of Arrow and DuckDB. The episode concludes with a discussion on the future of DLT, including plans for a portable data lake and the importance of interoperability in data management tools. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementImagine catching data issues before they snowball into bigger problems. That’s what Datafold’s new Monitors do. With automatic monitoring for cross-database data diffs, schema changes, key metrics, and custom data tests, you can catch discrepancies and anomalies in real time, right at the source. Whether it’s maintaining data integrity or preventing costly mistakes, Datafold Monitors give you the visibility and control you need to keep your entire data stack running smoothly. Want to stop issues before they hit production? Learn more at dataengineeringpodcast.com/datafold today!Your host is Tobias Macey and today I'm interviewing Adrian Brudaru and Marcin Rudolf, cofounders at dltHub, about the growth of dlt and the numerous ways that you can use it to address the complexities of data integrationInterview IntroductionHow did you get involved in the area of data management?Can you describe what dlt is and how it has evolved since we last spoke (September 2023)?What are the core principles that guide your work on dlt and dlthub?You have taken a very opinionated stance against managed extract/load services. What are the shortcomings of those platforms, and when would you argue in their favor?The landscape of data movement has undergone some interesting changes over the past year. Most notably, the growth of PyAirbyte and the rapid shifts around the needs of generative AI stacks (vector stores, unstructured data processing, etc.). How has that informed your product development and positioning?The Python ecosystem, and in particular data-oriented Python, has also undergone substantial evolution. What are the developments in the libraries and frameworks that you have been able to benefit from?What are some of the notable investments that you have made in the developer experience for building dlt pipelines?How have the interfaces for source/destination development improved?You recently published a post about the idea of a portable data lake. What are the missing pieces that would make that possible, and what are the developments/technologies that put that idea within reach?What is your strategy for building a sustainable product on top of dlt?How does that strategy help to form a "virtuous cycle" of improving the open source foundation?What are the most interesting, innovative, or unexpected ways that you have seen dlt used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on dlt?When is dlt the wrong choice?What do you have planned for the future of dlt/dlthub?Contact Info AdrianLinkedInMarcinLinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links dltPodcast EpisodePyArrowPolarsIbisDuckDBPodcast Episodedlt Data ContractsRAG == Retrieval Augmented GenerationAI Engineering Podcast EpisodePyAirbyteOpenAI o1 ModelLanceDBQDrant EmbeddedAirflowGitHub ActionsArrow DataFusionApache ArrowPyIcebergDelta-RSSCD2 == Slowly Changing DimensionsSQLAlchemySQLGlotFSSpecPydanticSpacyEntity RecognitionParquet File FormatPython DecoratorREST API ToolkitOpenAPI Connector GeneratorConnectorXPython no-GILDelta LakePodcast EpisodeSQLMeshPodcast EpisodeHamiltonTabularPostHogPodcast.init EpisodeAsyncIOCursor.AIData MeshPodcast EpisodeFastAPILangChainGraphRAGAI Engineering Podcast EpisodeProperty GraphPython uvThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Databricks Data Intelligence Platform: Unlocking the GenAI Revolution

This book is your comprehensive guide to building robust Generative AI solutions using the Databricks Data Intelligence Platform. Databricks is the fastest-growing data platform offering unified analytics and AI capabilities within a single governance framework, enabling organizations to streamline their data processing workflows, from ingestion to visualization. Additionally, Databricks provides features to train a high-quality large language model (LLM), whether you are looking for Retrieval-Augmented Generation (RAG) or fine-tuning. Databricks offers a scalable and efficient solution for processing large volumes of both structured and unstructured data, facilitating advanced analytics, machine learning, and real-time processing. In today's GenAI world, Databricks plays a crucial role in empowering organizations to extract value from their data effectively, driving innovation and gaining a competitive edge in the digital age. This book will not only help you master the Data Intelligence Platform but also help power your enterprise to the next level with a bespoke LLM unique to your organization. Beginning with foundational principles, the book starts with a platform overview and explores features and best practices for ingestion, transformation, and storage with Delta Lake. Advanced topics include leveraging Databricks SQL for querying and visualizing large datasets, ensuring data governance and security with Unity Catalog, and deploying machine learning and LLMs using Databricks MLflow for GenAI. Through practical examples, insights, and best practices, this book equips solution architects and data engineers with the knowledge to design and implement scalable data solutions, making it an indispensable resource for modern enterprises. Whether you are new to Databricks and trying to learn a new platform, a seasoned practitioner building data pipelines, data science models, or GenAI applications, or even an executive who wants to communicate the value of Databricks to customers, this book is for you. With its extensive feature and best practice deep dives, it also serves as an excellent reference guide if you are preparing for Databricks certification exams. What You Will Learn Foundational principles of Lakehouse architecture Key features including Unity Catalog, Databricks SQL (DBSQL), and Delta Live Tables Databricks Intelligence Platform and key functionalities Building and deploying GenAI Applications from data ingestion to model serving Databricks pricing, platform security, DBRX, and many more topics Who This Book Is For Solution architects, data engineers, data scientists, Databricks practitioners, and anyone who wants to deploy their Gen AI solutions with the Data Intelligence Platform. This is also a handbook for senior execs who need to communicate the value of Databricks to customers. People who are new to the Databricks Platform and want comprehensive insights will find the book accessible.

Azure SQL Revealed: The Next-Generation Cloud Database with AI and Microsoft Fabric

Access detailed content and examples on Azure SQL, a set of cloud services that allows for SQL Server to be deployed in the cloud. This book teaches the fundamentals of deployment, configuration, security, performance, and availability of Azure SQL from the perspective of these same tasks and capabilities in SQL Server. This distinct approach makes this book an ideal learning platform for readers familiar with SQL Server on-premises who want to migrate their skills toward providing cloud solutions to an enterprise market that is increasingly cloud-focused. If you know SQL Server, you will love this book. You will be able to take your existing knowledge of SQL Server and translate that knowledge into the world of cloud services from the Microsoft Azure platform, and in particular into Azure SQL. This book provides information never seen before about the history and architecture of Azure SQL. Author Bob Ward is a leading expert with access to and support from the Microsoft engineering team that built Azure SQL and related database cloud services. He presents powerful, behind-the-scenes insights into the workings of one of the most popular database cloud services in the industry. This book also brings you the latest innovations for Azure SQL including Azure Arc, Hyperscale, generative AI applications, Microsoft Copilots, and integration with the Microsoft Fabric. What You Will Learn Know the history of Azure SQL Deploy, configure, and connect to Azure SQL Choose the correct way to deploy SQL Server in Azure Migrate existing SQL Server instances to Azure SQL Monitor and tune Azure SQL’s performance to meet your needs Ensure your data and application are highly available Secure your data from attack and theft Learn the latest innovations for Azure SQL including Hyperscale Learn how to harness the power of AI for generative data-driven applications and Microsoft Copilots for assistance Learn how to integrate Azure SQL with the unified data platform, the Microsoft Fabric Who This Book Is For This book is designed to teach SQL Server in the Azure cloud to the SQL Server professional. Anyone who operates, manages, or develops applications for SQL Server will benefit from this book. Readers will be able to translate their current knowledge of SQL Server—especially of SQL Server 2019 and 2022—directly to Azure. This book is ideal for database professionals looking to remain relevant as their customer base moves into the cloud.

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. DataTopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society. In this special one-year anniversary episode, we reminisce about our journey and dive into some intriguing tech stories: WordPress Governance Drama: We discuss recent issues with WordPress. Find out what’s behind the Automattic and WP Engine tension.Astral’s Business Model: Charlie Marsh shares insights into how Astral plans to balance open-source ideals with profitability.Deno 2.0 Release: Deno 2.0 claims to be a “Cargo for JavaScript.” Check out its new features and see how it compares to Node.js.OpenAI’s Soaring Valuation: OpenAI has hit a staggering $150 billion valuation after raising $6.5 billion in new funding.Adobe’s GenAI Policy: Adobe clarified their stance on GenAI, ensuring Firefly is only trained on stock images to support creators.Instructor Library for LLMs: Discover the Instructor library for turning unstructured data into structured outputs with ease.Repo2txt Tool: Convert your GitHub repo into a single text file using Repo2txt for easy analysis.Retro PC Fonts Galore: Explore a treasure trove of vintage fonts with the Ultimate Old-School PC Font Pack.Bop Spotter – Cultural Surveillance: Bop Spotter uses Shazam to capture the music trends and cultural vibes of San Francisco’s Mission District.

John Gleeson, COO of Storj, joins us on this episode of the Data Unchained podcast live from NAB! John talks with us about how bringing together organizations availale bandwidth and storage at lower costs with lower carbon footprints while also unifying data sets and getting the most value out of your data.

data #datascience #dataanalytics #AI #artificialintelligence #storage #genai #LLM #podcast #datastorage #technology #innovation #bandwidth #carbonfootprint #carbonfootprintreduction

Cyberpunk by jiglr | https://soundcloud.com/jiglrmusic Music promoted by https://www.free-stock-music.com Creative Commons Attribution 3.0 Unported License https://creativecommons.org/licenses/by/3.0/deed.en_US Hosted on Acast. See acast.com/privacy for more information.

Send us a text More on GenAI, Hallucinations, RAG, Use Cases, LLMs, SLMs and costs with Armand Ruiz, Director watsonx Client Engineering and John Webb, Principal Client Engineering.  With this and the previous episode you'll be wiser on AI than 98% of the world.

00:12 Hallucinations02:33 RAG Differentiation06:41 Why IBM in AI09:23 Use Cases11:02 The GenAI Resume13:37 watson.x 15:40 LLMs17:51 Experience Counts20:03 AI that Surprises23:46 AI Skills26:47 Switching LLMs27:13 The Cost and SLMs28:21 Prompt Engineering29:16 For FunLinkedIn: linkedin.com/in/armand-ruiz, linkedin.com/in/john-webb-686136127 Website: https://www.ibm.com/client-engineering

Love what you're hearing? Don't forget to rate us on your favorite platform! Want to be featured as a guest on Making Data Simple?  Reach out to us at [email protected] and tell us why you should be next.  The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.  Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Building and managing AI products comes with its own set of unique challenges. Especially when they are under intense scrutiny like mobile and home assistants have dealt with in recent years. From dealing with the unpredictable nature of machine learning models to ensuring that your product is both ethical and user-friendly, the path to success isn’t always clear. But how do you navigate these complexities and still deliver a product that meets business goals? What key steps can you take to align AI innovation with measurable outcomes and long-term success? Marily Nika is one of the world's leading thinkers on product management for artificial intelligence. At Google, she manages the generative AI product features for Google Assistant. Marily also founded AI Product Academy, where she runs a BootCamp on AI product management, and she teaches the subject on Maven. Previously, Marily was an AI Product Lead in Meta's Reality Labs, and the AI Product Lead for Google Glass. She is also an Executive Fellow at Harvard Business School. In the episode, Richie and Marily explore the unique challenges of AI product management, experimentation, ethical considerations in AI product management, collaboration, skills needed to succeed in AI product development, the career path to work in AI as a Product Manager, key metrics for AI products and much more.  Links Mentioned in the Show: Komo AIConnect with MarilyMarily’s Course: AI Product Management Bootcamp with CertificationSkill Track: AI Business FundamentalsRelated Episode: Building Human-Centered AI Experiences with Haris Butt, Head of Product Design at ClickUpRewatch sessions from RADAR: AI Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

Every organization today is exploring generative AI to drive value and push their business forward. But a common pitfall is that AI strategies often don’t align with business objectives, leading companies to chase flashy tools rather than focusing on what truly matters. How can you avoid these traps and ensure your AI efforts are not only innovative but also aligned with real business value?  Leon Gordon, is a leader in data analytics and AI. A current Microsoft Data Platform MVP based in the UK, founder of Onyx Data. During the last decade, he has helped organizations improve their business performance, use data more intelligently, and understand the implications of new technologies such as artificial intelligence and big data. Leon is an Executive Contributor to Brainz Magazine, a Thought Leader in Data Science for the Global AI Hub, chair for the Microsoft Power BI – UK community group and the DataDNA data visualization community as well as an international speaker and advisor. In the episode, Adel and Leon explore aligning AI with business strategy, building AI use-cases, enterprise AI-agents, AI and data governance, data-driven decision making, key skills for cross-functional teams, AI for automation and augmentation, privacy and AI and much more.  Links Mentioned in the Show: Onyx DataConnect with LeonLeon’s Linkedin Course - How to Build and Execute a Successful Data StrategySkill Track: AI Business FundamentalsRelated Episode: Generative AI in the Enterprise with Steve Holden, Senior Vice President and Head of Single-Family Analytics at Fannie MaeRewatch sessions from RADAR: AI Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

sktime is a widely used scikit-learn compatible library for learning with time series. sktime is easily extensible by anyone, and interoperable with the pydata/numfocus stack.

This talk presents progress, challenges, and newest features off the press, in extending the sktime framework to deep learning and foundation models.

Recent progress in generative AI and deep learning is leading to an ever-exploding number of popular “next generation AI” models for time series tasks like forecasting, classification, segmentation.

Particular challenges of the new AI ecosystem are inconsistent formal interfaces, different deep learning backends, vendor specific APIs and architectures which do not match sklearn-like patterns well – every practitioner who has tried to use at least two such models at the same time (outside sktime) will have their individual painful memories.

We show how sktime brings its unified interface architecture for time series modelling to the brave new AI frontier, using novel design patterns building on ideas from hugging face and scikit-learn, to provide modular, extensible building blocks with a simple specification language.

--- The GovExperts is the new mini-series from GovEx Data Points spotlighting some of the top minds in public sector data. In our inaugural episode we discuss what generative AI is good at, how cities are interacting with it, and what it means for the workforce.

--- We’re chatting with Andrew Nicklin, Senior Research Data Manager at GovEx. Andrew takes us from his early days at the NYC parks department to his pivotal role in launching the NYC Open Data platform, and how this experience led him to GovEx at the invitation of founder Beth Blauer.

Discover how cities are already using AI to power chatbots and manage documents, and why Andrew believes AI could help residents feel more comfortable accessing sensitive services like housing or food assistance. Wondering if AI will replace public sector workers? Andrew says rather than take jobs, it will most likely transform them, freeing up public servants to tackle big challenges. Plus, get an exclusive preview of GovEx’s new City Data Explorer, a tool that uses 1.7 million data points to track 40 key metrics across the 100 largest U.S. cities.  --- Learn more about GovEx --- Fill out our listener survey!

The first episode of The Pragmatic Engineer Podcast is out. Expect similar episodes every other Wednesday. You can add the podcast in your favorite podcast player, and have future episodes downloaded automatically. Listen now on Apple, Spotify, and YouTube. Brought to you by: • Codeium: ​​Join the 700K+ developers using the IT-approved AI-powered code assistant. • TLDR: Keep up with tech in 5 minutes — On the first episode of the Pragmatic Engineer Podcast, I am joined by Simon Willison. Simon is one of the best-known software engineers experimenting with LLMs to boost his own productivity: he’s been doing this for more than three years, blogging about it in the open. Simon is the creator of Datasette, an open-source tool for exploring and publishing data. He works full-time developing open-source tools for data journalism, centered on Datasette and SQLite. Previously, he was an engineering director at Eventbrite, joining through the acquisition of Lanyrd, a Y Combinator startup he co-founded in 2010. Simon is also a co-creator of the Django Web Framework. He has been blogging about web development since the early 2000s. In today’s conversation, we dive deep into the realm of Gen AI and talk about the following:  • Simon’s initial experiments with LLMs and coding tools • Why fine-tuning is generally a waste of time—and when it’s not • RAG: an overview • Interacting with GPTs voice mode • Simon’s day-to-day LLM stack • Common misconceptions about LLMs and ethical gray areas  • How Simon’s productivity has increased and his generally optimistic view on these tools • Tips, tricks, and hacks for interacting with GenAI tools • And more! I hope you enjoy this episode. — In this episode, we cover: (02:15) Welcome (05:28) Simon’s ‘scary’ experience with ChatGPT (10:58) Simon’s initial experiments with LLMs and coding tools (12:21) The languages that LLMs excel at (14:50) To start LLMs by understanding the theory, or by playing around? (16:35) Fine-tuning: what it is, and why it’s mostly a waste of time (18:03) Where fine-tuning works (18:31) RAG: an explanation (21:34) The expense of running testing on AI (23:15) Simon’s current AI stack  (29:55) Common misconceptions about using LLM tools (30:09) Simon’s stack – continued  (32:51) Learnings from running local models (33:56) The impact of Firebug and the introduction of open-source  (39:42) How Simon’s productivity has increased using LLM tools (41:55) Why most people should limit themselves to 3-4 programming languages (45:18) Addressing ethical issues and resistance to using generative AI (49:11) Are LLMs are plateauing? Is AGI overhyped? (55:45) Coding vs. professional coding, looking ahead (57:27) The importance of systems thinking for software engineers  (1:01:00) Simon’s advice for experienced engineers (1:06:29) Rapid-fire questions — Where to find Simon Willison: • X: https://x.com/simonw • LinkedIn: https://www.linkedin.com/in/simonwillison/ • Website: https://simonwillison.net/ • Mastodon: https://fedi.simonwillison.net/@simon — Referenced: • Simon’s LLM project: https://github.com/simonw/llm • Jeremy Howard’s Fast Ai: https://www.fast.ai/ • jq programming language: https://en.wikipedia.org/wiki/Jq_(programming_language) • Datasette: https://datasette.io/ • GPT Code Interpreter: https://platform.openai.com/docs/assistants/tools/code-interpreter • Open Ai Playground: https://platform.openai.com/playground/chat • Advent of Code: https://adventofcode.com/ • Rust programming language: https://www.rust-lang.org/ • Applied AI Software Engineering: RAG: https://newsletter.pragmaticengineer.com/p/rag • Claude: https://claude.ai/ • Claude 3.5 sonnet: https://www.anthropic.com/news/claude-3-5-sonnet • ChatGPT can now see, hear, and speak: https://openai.com/index/chatgpt-can-now-see-hear-and-speak/ • GitHub Copilot: https://github.com/features/copilot • What are Artifacts and how do I use them?: https://support.anthropic.com/en/articles/9487310-what-are-artifacts-and-how-do-i-use-them • Large Language Models on the command line: https://simonwillison.net/2024/Jun/17/cli-language-models/ • Llama: https://www.llama.com/ • MLC chat on the app store: https://apps.apple.com/us/app/mlc-chat/id6448482937 • Firebug: https://en.wikipedia.org/wiki/Firebug_(software)# • NPM: https://www.npmjs.com/ • Django: https://www.djangoproject.com/ • Sourceforge: https://sourceforge.net/ • CPAN: https://www.cpan.org/ • OOP: https://en.wikipedia.org/wiki/Object-oriented_programming • Prolog: https://en.wikipedia.org/wiki/Prolog • SML: https://en.wikipedia.org/wiki/Standard_ML • Stabile Diffusion: https://stability.ai/ • Chain of thought prompting: https://www.promptingguide.ai/techniques/cot • Cognition AI: https://www.cognition.ai/ • In the Race to Artificial General Intelligence, Where’s the Finish Line?: https://www.scientificamerican.com/article/what-does-artificial-general-intelligence-actually-mean/ • Black swan theory: https://en.wikipedia.org/wiki/Black_swan_theory • Copilot workspace: https://githubnext.com/projects/copilot-workspace • Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems: https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321 • Bluesky Global: https://www.blueskyglobal.org/ • The Atrocity Archives (Laundry Files #1): https://www.amazon.com/Atrocity-Archives-Laundry-Files/dp/0441013651 • Rivers of London: https://www.amazon.com/Rivers-London-Ben-Aaronovitch/dp/1625676158/ • Vanilla JavaScript: http://vanilla-js.com/ • jQuery: https://jquery.com/ • Fly.io: https://fly.io/ — Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email [email protected].

Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

In the last year there hasn’t been a day that passed without us hearing about a new generative AI innovation that will enhance some aspect of our lives. On a number of tasks large probabilistic systems are now outperforming humans, or at least they do so “on average”. “On average” means most of the time, but in many real life scenarios “average” performance is not enough: we need correctness ALL of the time, for example when you ask the system to dial 911.

In this talk we will explore the synergy between deterministic and probabilistic models to enhance the robustness and controllability of machine learning systems. Tailored for ML engineers, data scientists, and researchers, the presentation delves into the necessity of using both deterministic algorithms and probabilistic model types across various ML systems, from straightforward classification to advanced Generative AI models.

You will learn about the unique advantages each paradigm offers and gain insights into how to most effectively combine them for optimal performance in real-world applications. I will walk you through my past and current experiences in working with simple and complex NLP models, and show you what kind of pitfalls, shortcuts, and tricks are possible to deliver models that are both competent and reliable.

The session will be structured into a brief introduction to both model types, followed by case studies in classification and generative AI, concluding with a Q&A segment.