talk-data.com talk-data.com

Topic

LLM

Large Language Models (LLM)

nlp ai machine_learning

1405

tagged

Activity Trend

158 peak/qtr
2020-Q1 2026-Q1

Activities

1405 activities · Newest first

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society. Dive into conversations that should flow as smoothly as your morning coffee (but don’t), where industry insights meet laid-back banter. Whether you’re a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let’s get into the heart of data, unplugged style! This week, we dive into: The creative future with AI: is generative AI helping or hurting creators? Environmental concerns of AI: the hidden costs of AI’s growing capabilities—how much energy do these models actually consume, and is it worth it?AI copyright controversies: Mark Zuckerberg’s LLaMA model faces criticism for using copyrighted materials like content from the notorious LibGen database.Trump vs. AI regulation: The former president repeals Biden’s AI executive order, creating a Wild West approach to AI development in the U.S. How will this impact innovation and global competition?Search reimagined with Perplexity AI: A new era of search blending conversational AI and personalized data unification. Could this be the future of information retrieval?Apple Intelligence on pause: Apple's AI-generated news alerts face a bumpy road. For more laughs, check out the dedicated subreddit AppleIntelligenceFail.Rhai scripting for Rust: Empowering Rust developers with an intuitive embedded scripting language to make extensibility a breeze.Poisoned text for scrapers: Exploring creative ways to protect web content from unauthorized scraping by AI systems.The rise of the AI Data Engineer: Is this a new role in data science, or are we just rebranding existing skills?

Está no ar, o Data Hackers News !! Os assuntos mais quentes da semana, com as principais notícias da área de Dados, IA e Tecnologia, que você também encontra na nossa Newsletter semanal, agora no Podcast do Data Hackers !!

Aperte o play e ouça agora, o Data Hackers News dessa semana !

Para saber tudo sobre o que está acontecendo na área de dados, se inscreva na Newsletter semanal:

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.datahackers.news/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Conheça nossos comentaristas do Data Hackers News:

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Monique Femme⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Paulo Vasconcellos

⁠Matérias/assuntos comentados:

Perplexity AI quer comprar o TikTok;

Google anuncia Gemini para Gmail e Google Docs gratuitamente;

OpenAI anuncia fim de testes do o3 e anuncia o3-mini para as próximas semanas.

Citado no Episódio: As Tendências para Dados e AI em 2025 - Data Hackers Podcast #100

Demais canais do Data Hackers:

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Site⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Linkedin⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Instagram⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Tik Tok⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠You Tube⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Where Data Science Meets Shrek: How BuzzFeed uses AI

By introducing a range of AI-enhanced products that amplify creativity and interactivity across our platforms, Buzzfeed has been able to connect with the largest global audience of young people online to cement its role as the defining digital media company of the AI era. Notably, some of Buzzfeed's most successful tools and content experiences thrive on the power of small, focused datasets. Still wondering how Shrek fits into the picture? You'll have to watch!

Video from: https://smalldatasf.com/

📓 Resources Big Data is Dead: https://motherduck.com/blog/big-data-... Small Data Manifesto: https://motherduck.com/blog/small-dat... Why Small Data?: https://benn.substack.com/p/is-excel-... Small Data SF: https://www.smalldatasf.com/

➡️ Follow Us LinkedIn: / motherduck
X/Twitter : / motherduck
Bluesky: motherduck.com Blog: https://motherduck.com/blog/


Discover how BuzzFeed's Data team, led by Gilad Cohen, harnesses AI for creative purposes, leveraging large language models (LLMs) and generative image capabilities to enhance content creation. This video explores how machine learning teams build tools to create new interactive media experiences, focusing on augmenting creative workflows rather than replacing jobs, allowing readers to participate more deeply in the content they consume.

We dive into the core data science problem of understanding what a piece of content is about, a crucial step for improving content recommendation systems. Learn why traditional methods fall short and how the team is constantly seeking smaller, faster, and more performant models. This exploration covers the evolution from earlier architectures like DistilBERT to modern, more efficient approaches for better content representation, clustering, and user personalization.

A key technique explored is the use of text embeddings, which are dense, low-dimensional vector representations of data. This video provides an accessible explanation of embeddings as a form of compressed knowledge, showing how BuzzFeed creates a unique vector for each article. This allows for simple vector math to find semantically similar content, forming a foundational infrastructure for powerful ranking and recommender systems.

Explore how BuzzFeed leverages generative image capabilities to create new interactive formats. The journey began with Midjourney experiments and evolved to building custom tools by fine-tuning a Stable Diffusion XL model using LORA (Low-Rank Approximation). This advanced technique provides greater control over image output, enabling the rapid creation of viral AI generators that respond to trending topics and allow for massive user engagement.

Finally, see a practical application of machine learning for content optimization. BuzzFeed uses its vast historical dataset from Bayesian A/B testing to train a model that predicts headline performance. By generating multiple headline candidates with an LLM like Claude and running them through this predictive model, they can identify the winning headline. This showcases how to use unique, in-house data to build powerful tools that improve click-through rates and drive engagement, pointing to a significant transformation in how media is created and consumed.

How do you make data analytics fun and engaging? In this episode, I chat with YouTube sensation Thu Vu. We discuss Python's growing significance, trends in the data job market, plus get a sneak peek into her new initiative, Python for AI Projects. 💌 Join 10k+ aspiring data analysts & get my tips in your inbox weekly 👉 https://www.datacareerjumpstart.com/newsletter 🆘 Feeling stuck in your data journey? Come to my next free "How to Land Your First Data Job" training 👉 https://www.datacareerjumpstart.com/training 👩‍💻 Want to land a data job in less than 90 days? 👉 https://www.datacareerjumpstart.com/daa 👔 Ace The Interview with Confidence 👉 https://www.datacareerjumpstart.com/interviewsimulator ⌚ TIMESTAMPS 05:54 - Creating cool projects with Local LLMs 13:48 - Learning and Teaching Python for AI 24:09 - Trends in Data and Tech Job Market 🔗 CONNECT WITH THU VU 🎥 YouTube Channel: https://www.youtube.com/@Thuvu5 🤝 LinkedIn: https://www.linkedin.com/in/thu-hien-vu-3766b174/ 📸 Instagram: https://www.instagram.com/thuvu.analytics/ 🎵 TikTok: https://www.tiktok.com/@thuvu.datanalytics 💻 Website: https://thuhienvu.com/ Free Data Science & AI tips thu-vu.ck.page/49c5ee08f6 Master Python for AI projects python-course-earlybird.framer.website 🔗 CONNECT WITH AVERY 🎥 YouTube Channel: https://www.youtube.com/@averysmith 🤝 LinkedIn: https://www.linkedin.com/in/averyjsmith/ 📸 Instagram: https://instagram.com/datacareerjumpstart 🎵 TikTok: https://www.tiktok.com/@verydata 💻 Website: https://www.datacareerjumpstart.com/ Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

With GenAI and LLMs comes great potential to delight and damage customer relationships—both during the sale, and in the UI/UX. However, are B2B AI product teams actually producing real outcomes, on the business side and the UX side, such that customers find these products easy to buy, trustworthy and indispensable? 

What is changing with customer problems as a result of LLM and GenAI technologies becoming more readily available to implement into B2B software? Anything?

Is your current product or feature development being driven by the fact you might be able to now solve it with AI? The “AI-first” team sounds like it’s cutting edge, but is that really determining what a customer will actually buy from you? 

Today I want to talk to you about the interplay of GenAI, customer trust (both user and buyer trust), and the role of UX in products using probabilistic technology.  

These thoughts are based on my own perceptions as a “user” of AI “solutions,” (quotes intentional!), conversations with prospects and clients at my company (Designing for Analytics), as well as the bright minds I mentor over at the MIT Sandbox innovation fund. I also wrote an article about this subject if you’d rather read an abridged version of my thoughts.

Highlights/ Skip to:

AI and LLM-Powered Products Do Not Turn Customer Problems into “Now” and “Expensive” Problems (4:03) Trust and Transparency in the Sale and the Product UX: Handling LLM Hallucinations (Confabulations) and Designing for Model Interpretability (9:44) Selling AI Products to Customers Who Aren’t Users (13:28) How LLM Hallucinations and Model Interpretability Impact User Trust of Your Product (16:10) Probabilistic UIs and LLMs Don’t Negate the Need to Design for Outcomes (22:48) How AI Changes (or Doesn’t) Our Benchmark Use Cases and UX Outcomes (28:41) Closing Thoughts (32:36)

Quotes from Today’s Episode

“Putting AI or GenAI into a product does not change the urgency or the depth of a particular customer problem; it just changes the solution space. Technology shifts in the last ten years have enabled founders to come up with all sorts of novel ways to leverage traditional machine learning, symbolic AI, and LLMs to create new products and disrupt established products; however, it would be foolish to ignore these developments as a product leader. All this technology does is change the possible solutions you can create. It does not change your customer situation, problem, or pain, either in the depth, or severity, or frequency. In fact, it might actually cause some new problems. I feel like most teams spend a lot more time living in the solution space than they do in the problem space. Fall in love with the problem and love that problem regardless of how the solution space may continue to change.” (4:51) “Narrowly targeted, specialized AI products are going to beat solutions trying to solve problems for multiple buyers and customers. If you’re building a narrow, specific product for a narrow, specific audience, one of the things you have on your side is a solution focused on a specific domain used by people who have specific domain experience. You may not need a trillion-parameter LLM to provide significant value to your customer. AI products that have a more specific focus and address a very narrow ICP I believe are more likely to succeed than those trying to serve too many use cases—especially when GenAI is being leveraged to deliver the value. I think this can be true even for platform products as well. Narrowing the audience you want to serve also narrows the scope of the product, which in turn should increase the value that you bring to that audience—in part because you probably will have fewer trust, usability, and utility problems resulting from trying to leverage a model for a wide range of use cases.” (17:18) “Probabilistic UIs and LLMs are going to create big problems for product teams, particularly if they lack a set of guiding benchmark use cases. I talk a lot about benchmark use cases as a core design principle and data-rich enterprise products. Why? Because a lot of B2B and enterprise products fall into the game of ‘adding more stuff over time.’ ‘Add it so you can sell it.’ As products and software companies begin to mature, you start having product owners and PMs attached to specific technologies or parts of a product. Figuring out how to improve the customer’s experience over time against the most critical problems and needs they have is a harder game to play than simply adding more stuff— especially if you have no benchmark use cases to hold you accountable. It’s hard to make the product indispensable if it’s trying to do 100 things for 100 people.“ (22:48) “Product is a hard game, and design and UX is by far not the only aspect of product that we need to get right. A lot of designers don’t understand this, and they think if they just nail design and UX, then everything else solves itself. The reason the design and experience part is hard is that it’s tied to behavior change– especially if you are ‘disrupting’ an industry, incumbent tool, application, or product. You are in the behavior-change game, and it’s really hard to get it right. But when you get it right, it can be really amazing and transformative.” (28:01) “If your AI product is trying to do a wide variety of things for a wide variety of personas, it’s going to be harder to determine appropriate benchmarks and UX outcomes to measure and design against. Given LLM hallucinations, the increased problem of trust, model drift problems, etc., your AI product has to actually innovate in a way that is both meaningful and observable to the customer. It doesn’t matter what your AI is trying to “fix.” If they can’t see what the benefit is to them personally, it doesn’t really matter if technically you’ve done something in a new and novel way. They’re just not going to care because that question of what’s in it for me is always sitting behind, in their brain, whether it’s stated out loud or not.” (29:32)

Links

Designing for Analytics mailing list

AI-Powered Search

Apply cutting-edge machine learning techniques—from crowdsourced relevance and knowledge graph learning, to Large Language Models (LLMs)—to enhance the accuracy and relevance of your search results. Delivering effective search is one of the biggest challenges you can face as an engineer. AI-Powered Search is an in-depth guide to building intelligent search systems you can be proud of. It covers the critical tools you need to automate ongoing relevance improvements within your search applications. Inside you’ll learn modern, data-science-driven search techniques like: Semantic search using dense vector embeddings from foundation models Retrieval augmented generation (RAG) Question answering and summarization combining search and LLMs Fine-tuning transformer-based LLMs Personalized search based on user signals and vector embeddings Collecting user behavioral signals and building signals boosting models Semantic knowledge graphs for domain-specific learning Semantic query parsing, query-sense disambiguation, and query intent classification Implementing machine-learned ranking models (Learning to Rank) Building click models to automate machine-learned ranking Generative search, hybrid search, multimodal search, and the search frontier AI-Powered Search will help you build the kind of highly intelligent search applications demanded by modern users. Whether you’re enhancing your existing search engine or building from scratch, you’ll learn how to deliver an AI-powered service that can continuously learn from every content update, user interaction, and the hidden semantic relationships in your content. You’ll learn both how to enhance your AI systems with search and how to integrate large language models (LLMs) and other foundation models to massively accelerate the capabilities of your search technology. About the Technology Modern search is more than keyword matching. Much, much more. Search that learns from user interactions, interprets intent, and takes advantage of AI tools like large language models (LLMs) can deliver highly targeted and relevant results. This book shows you how to up your search game using state-of-the-art AI algorithms, techniques, and tools. About the Book AI-Powered Search teaches you to create a search that understands natural language and improves automatically the more it is used. As you work through dozens of interesting and relevant examples, you’ll learn powerful AI-based techniques like semantic search on embeddings, question answering powered by LLMs, real-time personalization, and Retrieval Augmented Generation (RAG). What's Inside Sparse lexical and embedding-based semantic search Question answering, RAG, and summarization using LLMs Personalized search and signals boosting models Learning to Rank, multimodal, and hybrid search About the Reader For software developers and data scientists familiar with the basics of search engine technology. About the Author Trey Grainger is the Founder of Searchkernel and former Chief Algorithms Officer and SVP of Engineering at Lucidworks. Doug Turnbull is a Principal Engineer at Reddit and former Staff Relevance Engineer at Spotify. Max Irwin is the Founder of Max.io and former Managing Consultant at OpenSource Connections. Quotes Belongs on the shelf of every search practitioner! - Khalifeh AlJadda, Google A treasure map! Now you have decades of semantic search knowledge at your fingertips. - Mark Moyou, NVIDIA Modern and comprehensive! Everything you need to build world-class search experiences. - Kelvin Tan, SearchStax Kick starts your ability to implement AI search with easy to understand examples. - David Meza, NASA

As AI continues to advance, natural language processing (NLP) is at the forefront, transforming how businesses interact with data. From chatbots to document analysis, NLP offers numerous applications. But with the advent of generative AI, professionals face new challenges: When is it appropriate to use traditional NLP techniques versus more advanced models? How do you balance the costs and benefits of these technologies? Explore the strategic decisions and practical applications of NLP in the modern business world. Meri Nova is the founder of Break Into Data, a data careers company. Her work focuses on helping people switch to a career in data, and using machine learning to improve community engagement. Previously, she was a data scientist and machine learning engineer at Hyloc. Meri is the instructor of DataCamp's 'Retrieval Augmented Generation with LangChain' course. In the episode, Richie and Meri explore the evolution of natural language processing, the impact of generative AI on business applications, the balance between traditional NLP techniques and modern LLMs, the role of vector stores and knowledge graphs, and the exciting potential of AI in automating tasks and decision-making, and much more. Links Mentioned in the Show: Meri’s Breaking Into Data Handbook on GitHubBreak Into Data Discord GroupConnect with MeriSkill Track: Artificial Intelligence (AI) LeadershipRelated Episode: Industry Roundup #2: AI Agents for Data Work, The Return of the Full-Stack Data Scientist and Old languages Make a ComebackRewatch sessions from RADAR: Forward Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

2025 promises to be another transformative year for data and AI. From groundbreaking advancements in reasoning models to the rise of new challengers in generative AI, the field shows no signs of slowing down. Last week Jonathan and Martijn scored their 2024 predictions, and scored highly, but what's in store for 2025?  Building on the insights from their 2024 predictions, we'll assess the future of generative AI, the evolving role of AI in education, the growing importance of synthetic data, and much more. In the episode, Richie, Jo, and Martijn discuss whether OpenAI and Google will maintain their dominance or face disruption from new players like Meta’s Llama and XAI’s Grok, the implications of recent breakthroughs in AI reasoning, the rise of short-form video generation AI in social media and advertising, the challenges Europe faces in keeping pace with the US and China in AI innovation and much more. Links Mentioned in the Show: Data & AI Trends & Predictions 2025Skill Track: AI Business FundamentalsRelated Episode: Reviewing Our Data Trends & Predictions of 2024 with DataCamp's CEO & COO, Jonathan Cornelissen & Martijn TheuwissenRewatch sessions from RADAR: Forward Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society. Dive into conversations that flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style! In this episode, we explore: OpenAI’s O3: Features, O1 Comparison, Release Date & more.Advent of Code: How LLMs performed on the 2024 coding challenges.DeepSeek V3: A breakthrough AI model developed for a fraction of GPT-4’s cost, yet rivaling top benchmarks.Shadow Workspace: How Cursor compares to Copilot with features like integrated models, documentation, and search.Bolt.new: Why it’s poised to revolutionize web app development with prompt-driven innovation.O1 Preview’s Chess Hack: When smarter means “cheater” in a fascinating experiment against Stockfish.Pydantic AI: A new tool bringing structure and intelligence to Python’s AI workflows.RightTyper: A tool to infer and apply type hints for cleaner, more efficient Python code.Doom: The Gallery Experience: A whimsical take on art appreciation in a retro gaming environment.Suno V4: The next-gen music generator, featuring "Bart, the Data Dynamo."Ghostty Terminal: The terminal emulator developers are raving about.

Está no ar, o Data Hackers News !! Os assuntos mais quentes da semana, com as principais notícias da área de Dados, IA e Tecnologia, que você também encontra na nossa Newsletter semanal, agora no Podcast do Data Hackers !!

Aperte o play e ouça agora, o Data Hackers News dessa semana !

Para saber tudo sobre o que está acontecendo na área de dados, se inscreva na Newsletter semanal:

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.datahackers.news/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Conheça nossos comentaristas do Data Hackers News:

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Monique Femme⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Paulo Vasconcellos

⁠Matérias/assuntos comentados:

Meta vai encher Instagram e Facebook com bots de IA:

Novo modelo da OpenAI (o3) revolta pesquisadores por falta de transparência.

Demais canais do Data Hackers:

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Site⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Linkedin⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Instagram⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Tik Tok⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠You Tube⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

AI is not just about writing code; it's about improving the entire software development process. From generating documentation to automating code reviews, AI tools are becoming indispensable. But how do you ensure the quality of AI-generated code? What strategies can you employ to maintain high standards while leveraging AI's capabilities? These are the questions developers must consider as they incorporate AI into their workflows. Eran Yahav is an associate professor at the Computer Science Department at the Technion – Israel Institute of Technology and co-founder and CTO of Tabnine (formerly Codota). Prior to that, he was a research staff member at the IBM T.J. Watson Research Center in New York (2004-2010). He received his Ph.D. from Tel Aviv University (2005) and his B.Sc. from the Technion in 1996. His research interests include program analysis, program synthesis, and program verification. Eran is a recipient of the prestigious Alon Fellowship for Outstanding Young Researchers, the Andre Deloro Career Advancement Chair in Engineering, the 2020 Robin Milner Young Researcher Award (POPL talk here), the ERC Consolidator Grant as well as multiple best paper awards at various conferences. In the episode, Richie and Eran explore AI's role in software development, the balance between AI assistance and manual coding, the impact of generative AI on code review and documentation, the evolution of developer tools, and the future of AI-driven workflows, and much more. Links Mentioned in the Show: TabnineConnect with EranCourse: Working with the OpenAI APIRelated Episode: Getting Generative AI Into Production with Lin Qiao, CEO and Co-Founder of Fireworks AIRewatch sessions from RADAR: Forward Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

Summary In this episode of the Data Engineering Podcast Dan Bruckner, co-founder and CTO of Tamr, talks about the application of machine learning (ML) and artificial intelligence (AI) in master data management (MDM). Dan shares his journey from working at CERN to becoming a data expert and discusses the challenges of reconciling large-scale organizational data. He explains how data silos arise from independent teams and highlights the importance of combining traditional techniques with modern AI to address the nuances of data reconciliation. Dan emphasizes the transformative potential of large language models (LLMs) in creating more natural user experiences, improving trust in AI-driven data solutions, and simplifying complex data management processes. He also discusses the balance between using AI for complex data problems and the necessity of human oversight to ensure accuracy and trust.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. As a listener of the Data Engineering Podcast you clearly care about data and how it affects your organization and the world. For even more perspective on the ways that data impacts everything around us don't miss Data Citizens® Dialogues, the forward-thinking podcast brought to you by Collibra. You'll get further insights from industry leaders, innovators, and executives in the world's largest companies on the topics that are top of mind for everyone. In every episode of Data Citizens® Dialogues, industry leaders unpack data’s impact on the world; like in their episode “The Secret Sauce Behind McDonald’s Data Strategy”, which digs into how AI-driven tools can be used to support crew efficiency and customer interactions. In particular I appreciate the ability to hear about the challenges that enterprise scale businesses are tackling in this fast-moving field. The Data Citizens Dialogues podcast is bringing the data conversation to you, so start listening now! Follow Data Citizens Dialogues on Apple, Spotify, YouTube, or wherever you get your podcasts.Your host is Tobias Macey and today I'm interviewing Dan Bruckner about the application of ML and AI techniques to the challenge of reconciling data at the scale of businessInterview IntroductionHow did you get involved in the area of data management?Can you start by giving an overview of the different ways that organizational data becomes unwieldy and needs to be consolidated and reconciled?How does that reconciliation relate to the practice of "master data management"What are the scaling challenges with the current set of practices for reconciling data?ML has been applied to data cleaning for a long time in the form of entity resolution, etc. How has the landscape evolved or matured in recent years?What (if any) transformative capabilities do LLMs introduce?What are the missing pieces/improvements that are necessary to make current AI systems usable out-of-the-box for data cleaning?What are the strategic decisions that need to be addressed when implementing ML/AI techniques in the data cleaning/reconciliation process?What are the risks involved in bringing ML to bear on data cleaning for inexperienced teams?What are the most interesting, innovative, or unexpected ways that you have seen ML techniques used in data resolution?What are the most interesting, unexpected, or challenging lessons that you have learned while working on using ML/AI in master data management?When is ML/AI the wrong choice for data cleaning/reconciliation?What are your hopes/predictions for the future of ML/AI applications in MDM and data cleaning?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links TamrMaster Data ManagementCERNLHCMichael StonebrakerConway's LawExpert SystemsInformation RetrievalActive LearningThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

As we look back at 2024, we're highlighting some of our favourite episodes of the year, and with 100 of them to choose from, it wasn't easy! The four guests we'll be recapping with are: Lea Pica - A celebrity in the data storytelling and visualisation space. Richie and Lea cover the full picture of data presentation, how to understand your audience, how to leverage hollywood storytelling and more. Out December 19.Alex Banks - Founder of Sunday Signal. Adel and Alex cover Alex’s journey into AI and what led him to create Sunday Signal, the potential of AI, prompt engineering at its most basic level, chain of thought prompting, the future of LLMs and more. Out December 23.Don Chamberlin - The renowned co-inventor of SQL. Richie and Don explore the early development of SQL, how it became standardized, the future of SQL through NoSQL and SQL++ and more. Out December 26.Tom Tunguz - general Partner at Theory Ventures, a $235m VC firm. Richie and Tom explore trends in generative AI, cloud+local hybrid workflows, data security, the future of business intelligence and data analytics, AI in the corporate sector and more. Out December 30. Rapid change seems to be the new norm within the data and AI space, and due to the ecosystem constantly changing, it can be tricky to keep up. Fortunately, any self-respecting venture capitalist looking into data and AI will stay on top of what’s changing and where the next big breakthroughs are likely to come from. We all want to know which important trends are emerging and how we can take advantage of them, so why not learn from a leading VC.  Tomasz Tunguz is a General Partner at Theory Ventures, a $235m early-stage venture capital firm. He blogs sat tomtunguz.com & co-authored Winning with Data. He has worked or works with Looker, Kustomer, Monte Carlo, Dremio, Omni, Hex, Spot, Arbitrum, Sui & many others. He was previously the product manager for Google's social media monetization team, including the Google-MySpace partnership, and managed the launches of AdSense into six new markets in Europe and Asia. Before Google, Tunguz developed systems for the Department of Homeland Security at Appian Corporation.  In the episode, Richie and Tom explore trends in generative AI, the impact of AI on professional fields, cloud+local hybrid workflows, data security, and changes in data warehousing through the use of integrated AI tools, the future of business intelligence and data analytics, the challenges and opportunities surrounding AI in the corporate sector. You'll also get to discover Tom's picks for the hottest new data startups. Links Mentioned in the Show: Tom’s BlogTheory VenturesArticle: What Air Canada Lost In ‘Remarkable’ Lying AI Chatbot Case[Course] Implementing AI Solutions in BusinessRelated Episode: Making Better Decisions using Data & AI with Cassie Kozyrkov, Google's First Chief Decision ScientistSign up to RADAR: AI...

As we look back at 2024, we're highlighting some of our favourite episodes of the year, and with 100 of them to choose from, it wasn't easy! The four guests we'll be recapping with are: Lea Pica - A celebrity in the data storytelling and visualisation space. Richie and Lea cover the full picture of data presentation, how to understand your audience, how to leverage hollywood storytelling and more. Out December 19.Alex Banks - Founder of Sunday Signal. Adel and Alex cover Alex’s journey into AI and what led him to create Sunday Signal, the potential of AI, prompt engineering at its most basic level, chain of thought prompting, the future of LLMs and more. Out December 23.Don Chamberlin - The renowned co-inventor of SQL. Richie and Don explore the early development of SQL, how it became standardized, the future of SQL through NoSQL and SQL++ and more. Out December 26.Tom Tunguz - general Partner at Theory Ventures, a $235m VC firm. Richie and Tom explore trends in generative AI, cloud+local hybrid workflows, data security, the future of business intelligence and data analytics, AI in the corporate sector and more. Out December 30. For our 200th episode, we bring you a special guest and taking a walk down memory lane—to the creation and development of one of the most popular programming languages in the world. Don Chamberlin is renowned as the co-inventor of SQL (Structured Query Language), the predominant database language globally, which he developed with Raymond Boyce in the mid-1970s. Chamberlin's professional career began at IBM Research in Yorktown Heights, New York, following a summer internship there during his academic years. His work on IBM's System R project led to the first SQL implementation and significantly advanced IBM’s relational database technology. His contributions were recognized when he was made an IBM Fellow in 2003 and later a Fellow of the Computer History Museum in 2009 for his pioneering work on SQL and database architectures. Chamberlin also contributed to the development of XQuery, an XML query language, as part of the W3C, which became a W3C Recommendation in January 2007. Additionally, he holds fellowships with ACM and IEEE and is a member of the National Academy of Engineering. In the episode, Richie and Don explore his early career at IBM and the development of his interest in databases alongside Ray Boyce, the database task group (DBTG), the transition to relational databases and the early development of SQL, the commercialization and adoption of SQL, how it became standardized, how it evolved and spread via open source, the future of SQL through NoSQL and SQL++ and much more.  Links Mentioned in the Show: The first-ever journal paper on SQL. SEQUEL: A Structured English Query LanguageDon’s Book: SQL++ for SQL Users: A TutorialSystem R: Relational approach to database managementSQL CoursesSQL Articles, Tutorials and Code-AlongsRelated Episode: Scaling Enterprise Analytics with...

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society. Dive into conversations that flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style! In this episode, we wrap up the Rootsconf mini-series with a thrilling finale with Sophie De Coppel and Warre Dreesen's workshop from our internal knowledge-sharing event: AI Hunger Games: A showdown between AI language models like GPT-4, Claude, and Gemini. Who aced coding, games, and social interactions?Human vs. Machine: Fun experiments like “Find the Human” and “The Chameleon Game” highlight where humans and AI shine—and stumble.Model Personalities Explored: Discover why some models seem nerdy, others boastful, and how creativity plays a role in performance.Engineering Insights: Behind-the-scenes on implementing and testing AI models in competitive scenarios, from advent-of-code puzzles to group chat debates.Join the fun as hosts and guests break down the playful and thought-provoking ways we’re pushing AI to its limits. Let the games begin!

Send us a text "Ready to dive deep into the future of intelligent systems? Meet Peter Voss, Founder and CEO of Aigo.ai, who coined the term 'Artificial General Intelligence' and is pioneering hyper-personalized chatbots WITH a brain. Join us as we explore his revolutionary ideas and why Aigo.ai is leading the charge in AI innovation."

AIInnovation #PeterVoss #FutureOfAI #HyperPersonalization #BeyondChatGPT #TechPodcast #ArtificialIntelligence #MachineLearning #PersonalizedTech

01:56 Meet Peter Voss08:23 Passion for Intelligent Systems12:54 Why only Aigo16:31 ChatGPT? A Different View22:03 A Use Case by Example30:53 What is Included, What is Not34:08 Who are your Clients36:10 The Engagement38:57 The Business Case41:59 AI that Reasons44:26 The Definition of AGI46:51 For FunLinkedIn: linkedin.com/in/vosspeter Twitter: @peterevoss Website: aigo.ai/

Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

As we look back at 2024, we're highlighting some of our favourite episodes of the year, and with 100 of them to choose from, it wasn't easy! The four guests we'll be recapping with are: Lea Pica - A celebrity in the data storytelling and visualisation space. Richie and Lea cover the full picture of data presentation, how to understand your audience, how to leverage hollywood storytelling and more. Out December 19.Alex Banks - Founder of Sunday Signal. Adel and Alex cover Alex’s journey into AI and what led him to create Sunday Signal, the potential of AI, prompt engineering at its most basic level, chain of thought prompting, the future of LLMs and more. Out December 23.Don Chamberlin - The renowned co-inventor of SQL. Richie and Don explore the early development of SQL, how it became standardized, the future of SQL through NoSQL and SQL++ and more. Out December 26.Tom Tunguz - general Partner at Theory Ventures, a $235m VC firm. Richie and Tom explore trends in generative AI, cloud+local hybrid workflows, data security, the future of business intelligence and data analytics, AI in the corporate sector and more. Out December 30. Since the launch of ChatGPT, one of the trending terms outside of ChatGPT itself has been prompt engineering. This act of carefully crafting your instructions is treated as alchemy by some and science by others. So what makes an effective prompt? Alex Banks has been building and scaling AI products since 2021. He writes Sunday Signal, a newsletter offering a blend of AI advancements and broader thought-provoking insights. His expertise extends to social media platforms on X/Twitter and LinkedIn, where he educates a diverse audience on leveraging AI to enhance productivity and transform daily life. In the episode, Alex and Adel cover Alex’s journey into AI and what led him to create Sunday Signal, the potential of AI, prompt engineering at its most basic level, strategies for better prompting, chain of thought prompting, prompt engineering as a skill and career path, building your own AI tools rather than using consumer AI products, AI literacy, the future of LLMs and much more.  Links Mentioned in the Show: [Alex’s Free Course on DataCamp] Understanding Prompt EngineeringSunday SignalPrinciples by Ray Dalio: Life and WorkRelated Episode: [DataFramed AI Series #1] ChatGPT and the OpenAI Developer EcosystemRewatch sessions from RADAR: The Analytics Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

I had an interesting conversation yesterday with a young gentleman upgrading my Google Fiber. While he was originally pursuing a career as a software developer, he and his friends decided against it after seeing the progress of ChatGPT over the last couple of years.

As a father of two teenage boys, I often think about the nature of work, including whether writing code will be relevant for future generations. Here, I rant at least part (not all) of what's on my mind. This is a big topic, and you'll see me ranting more about it.

As we look back at 2024, we're highlighting some of our favourite episodes of the year, and with 100 of them to choose from, it wasn't easy! The four guests we'll be recapping with are: Lea Pica - A celebrity in the data storytelling and visualisation space. Richie and Lea cover the full picture of data presentation, how to understand your audience, how to leverage hollywood storytelling and more. Out December 19.Alex Banks - Founder of Sunday Signal. Adel and Alex cover Alex’s journey into AI and what led him to create Sunday Signal, the potential of AI, prompt engineering at its most basic level, chain of thought prompting, the future of LLMs and more. Out December 23.Don Chamberlin - The renowned co-inventor of SQL. Richie and Don explore the early development of SQL, how it became standardized, the future of SQL through NoSQL and SQL++ and more. Out December 26.Tom Tunguz - general Partner at Theory Ventures, a $235m VC firm. Richie and Tom explore trends in generative AI, cloud+local hybrid workflows, data security, the future of business intelligence and data analytics, AI in the corporate sector and more. Out December 30. Your data project doesn't end once you have results. In order to have impact, you need to communicate those results to others. Presentations filled with endless tables and technical jargon can easily become tedious, leading your audience to lose interest or misunderstand your point. Data storytelling provides a solution to this: by creating a narrative around your results you can increase engagement and understanding from your audience. This is an art, and there are so many factors that contribute to visualizing data and creating a compelling story, it can be overwhelming. However, with the right approach, creating data stories can become second nature. In this special episode of DataFramed, we join forces with the Present Beyond Measure podcast to glean the best data presentation practices from one of the leading voices in the space. Lea Pica host of the Founder and Host of the Present Beyond Measure podcast and is a seasoned digital analytics practitioner, social media marketer and blogger with over 11 years of experience building search marketing and digital analytics practices for companies like Scholastic, Victoria’s Secret and Prudential. Present Beyond Measure’s mission is to bring their teachings to the digital marketing and web analytics communities, and empower anyone responsible for presenting data to an audience. In the full episode, Richie and Lea cover the full picture of data presentation, how to understand your audience, leverage hollywood storytelling, data storyboarding and visualization, the use of imagery in presentations, cognitive load management, the use of throughlines in presentations, how to improve your speaking and engagement skills, data visualization techniques in business setting and much more.  Links Mentioned in the Show: Present Beyond MeasureLea’s BookConnect with Lea on LinkedinHollywood Storytelling[Course] Data Storytelling Concepts New to DataCamp? Learn on the go using thea href="https://www.datacamp.com/mobile" rel="noopener...

Está no ar o Data Hackers News !! Os assuntos mais quentes da semana, com as principais notícias da área de Dados, IA e Tecnologia, que você também encontra na nossa Newsletter semanal, agora no Podcast do Data Hackers !!

Aperte o play e ouça agora, o Data Hackers News dessa semana !

Para saber tudo sobre o que está acontecendo na área de dados, se inscreva na Newsletter semanal:

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.datahackers.news/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Conheça nossos comentaristas do Data Hackers News:

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Monique Femme⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Paulo Vasconcellos

Gabriel Lages

⁠Matérias/assuntos comentados:

Cofundador da OpenAI, afirma a que forma com a IA é criada está prestes a mudar;

Meta não quer que OpenAI vire empresa com fins lucrativos e apela ao governo dos EUA;

Palestra Ilya Sutskever

Demais canais do Data Hackers:

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Site⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Linkedin⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Instagram⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Tik Tok⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠You Tube⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠