talk-data.com talk-data.com

Topic

AI/ML

Artificial Intelligence/Machine Learning

data_science algorithms predictive_analytics

9014

tagged

Activity Trend

1532 peak/qtr
2020-Q1 2026-Q1

Activities

9014 activities · Newest first

Pandas and scikit-learn have become staples in the machine learning toolkit for processing and modeling tabular data in Python. However, when data size scales up, these tools become slow or run out of memory. Ibis provides a unified, Pythonic, dataframe-like interface to 20+ execution backends, including dataframe libraries, databases, and analytics engines. Ibis enables users to leverage these powerful tools without rewriting their data engineering code (or learning SQL). IbisML extends the benefits of using Ibis to the ML workflow by letting users preprocess their data at scale on any Ibis-supported backend.

In this tutorial, you'll build an end-to-end machine learning project to predict the live win probability after each move during chess games.

Scientific researchers need reproducible software environments for complex applications that can run across heterogeneous computing platforms. Modern open source tools, like pixi, provide automatic reproducibility solutions for all dependencies while providing a high level interface well suited for researchers.

This tutorial will provide a practical introduction to using pixi to easily create scientific and AI/ML environments that benefit from hardware acceleration, across multiple machines and platforms. The focus will be on applications using the PyTorch and JAX Python machine learning libraries with CUDA enabled, as well as deploying these environments to production settings in Linux container images.

The advancement of AI systems necessitates the need for interpretability to address transparency, biases, risks, and regulatory compliance. The workshop teaches core techniques in interpretability, including SHAP (game-theoretic feature attribution), GINI (decision tree impurity analysis), LIME (local surrogate models), and Permutation Importance (feature shuffling), which provide global and local explanations for model decisions. With hands-on building of interpretability tools and visualization techniques, we explore how these methods enable bias detection and clinical trust in healthcare diagnostics and develop the most effective strategies in finance. These techniques are essential in building interpretable AI to address the challenges of the black-box models.

This tutorial will explore GPU-accelerated clustering techniques using RAPIDS cuML, optimizing algorithms like K-Means, DBSCAN, and HDBSCAN for large datasets. Traditional clustering methods struggle with scalability, but GPU acceleration significantly enhances performance and efficiency.

Participants will learn to leverage dimensionality reduction techniques (PCA, T-SNE, UMAP) for better data visualization and apply hyperparameter tuning with Optuna and cuML. The session also includes real-world applications like topic modeling in NLP and customer segmentation. By the end, attendees will be equipped to implement, optimize, and scale clustering algorithms effectively, unlocking faster and more powerful insights in machine learning workflows.

Summary In this episode of the Data Engineering Podcast Effie Baram, a leader in foundational data engineering at Two Sigma, talks about the complexities and innovations in data engineering within the finance sector. She discusses the critical role of data at Two Sigma, balancing data quality with delivery speed, and the socio-technical challenges of building a foundational data platform that supports research and operational needs while maintaining regulatory compliance and data quality. Effie also shares insights into treating data as code, leveraging modern data warehouses, and the evolving role of data engineers in a rapidly changing technological landscape.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. This episode is brought to you by Coresignal, your go-to source for high-quality public web data to power best-in-class AI products. Instead of spending time collecting, cleaning, and enriching data in-house, use ready-made multi-source B2B data that can be smoothly integrated into your systems via APIs or as datasets. With over 3 billion data records from 15+ online sources, Coresignal delivers high-quality data on companies, employees, and jobs. It is powering decision-making for more than 700 companies across AI, investment, HR tech, sales tech, and market intelligence industries. A founding member of the Ethical Web Data Collection Initiative, Coresignal stands out not only for its data quality but also for its commitment to responsible data collection practices. Recognized as the top data provider by Datarade for two consecutive years, Coresignal is the go-to partner for those who need fresh, accurate, and ethically sourced B2B data at scale. Discover how Coresignal's data can enhance your AI platforms. Visit dataengineeringpodcast.com/coresignal to start your free 14-day trial. Your host is Tobias Macey and today I'm interviewing Effie Baram about data engineering in the finance sectorInterview IntroductionHow did you get involved in the area of data management?Can you start by outlining the role of data in the context of Two Sigma?What are some of the key characteristics of the types of data sources that you work with?Your role is leading "foundational data engineering" at Two Sigma. Can you unpack that title and how it shapes the ways that you think about what you build?How does the concept of "foundational data" influence the ways that the business thinks about the organizational patterns around data?Given the regulatory environment around finance, how does that impact the ways that you think about the "what" and "how" of the data that you deliver to data consumers?Being the foundational team for data use at Two Sigma, how have you approached the design and architecture of your technical systems?How do you think about the boundaries between your responsibilities and the rest of the organization?What are the design patterns that you have found most helpful in empowering data consumers to build on top of your work?What are some of the elements of sociotechnical friction that have been most challenging to address?What are the most interesting, innovative, or unexpected ways that you have seen the ideas around "foundational data" applied in your organization?What are the most interesting, unexpected, or challenging lessons that you have learned while working with financial data?When is a foundational data team the wrong approach?What do you have planned for the future of your platform design?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links 2SigmaReliability EngineeringSLA == Service-Level AgreementAirflowParquet File FormatBigQuerySnowflakedbtGemini AssistMCP == Model Context ProtocoldtraceThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Como é manter o motor de dados de uma dos maiores grupos de varejo do Brasil rodando sem parar — enquanto se experimenta tecnologias de IA Generativa que poucas empresas do mundo ousaram colocar em produção? Neste episódio especial, convidamos Lucas Eduardo Wichinevsky, Rodrigo Lucchesi e Marcelle Araujo Chiriboga Carvalho do Grupo Boticário, para abrir a caixa-preta da Engenharia de Machine Learning. Lembrando que você pode encontrar todos os podcasts da comunidade Data Hackers no Spotify, iTunes, Google Podcast, Castbox e muitas outras plataformas. Falamos no episódio: Marcelle Chiriboga - Gerente de Data Science de Lojas e Franquias no Grupo Boticário Lucas Eduardo Wichinevsky  - Gerente de Data Science de Tech Corporate  no Grupo Boticário Rodrigo Lucchesi -  Gerente de Data Science de Demanda e RGM no  no Grupo Boticário Nossa Bancada — Data Hackers: Monique Femme — Head of Community Management na Data Hackers Paulo Vasconcellos — Co-founder da Data Hackers e Principal Data Scientist na Hotmart

In this episode of Data Unchained, we are joined by RJ Kedziora, CEO and co-founder of Estenda Solutions, for a deep dive into the massive and urgent data challenges facing healthcare. We explore the decades-long lag in healthcare data integration, how AI is quietly transforming clinical workflows, and the critical difference between data overload and actionable insight. From diabetic retinopathy surveillance to AI-powered ambient listening and personalized health tracking, this conversation breaks down how innovation is finally closing the gap between medical technology and real-world care. Learn how standards like FHIR are reshaping access, what’s next in regulatory shifts, and why patient data ownership is the next frontier. If you're in digital health, biotech, or AI, this is the episode to listen to.

HealthcareAI #DigitalHealth #HealthTech #DataUnchained #AIinHealthcare #PatientData #FHIR #HealthDataPrivacy #MedicalInnovation #DataInteroperability #HealthcareTransformation #MachineLearning #HealthcareR&D #PersonalizedMedicine #MollyPresley #RJkedziora #AscendaSolutions #HealthDataIntegration #ClinicalAI #HealthIT

Cyberpunk by jiglr | https://soundcloud.com/jiglrmusic Music promoted by https://www.free-stock-music.com Creative Commons Attribution 3.0 Unported License https://creativecommons.org/licenses/by/3.0/deed.en_US Hosted on Acast. See acast.com/privacy for more information.

Send us a text Welcome to the cozy corner of the tech world! Datatopics is your go-to spot for relaxed discussions around tech, news, data, and society. This week, co-host Ben is joined by Jackie Janssen, former Chief Data Officer at CM, author of AI: De Hype Voorbij, and an evangelist for pragmatic, human-centered AI. Together, they trace the winding path from early tech roles to enterprise transformation, exploring how AI can actually serve humans (and not just the hype machine). In this episode: Leadership in AI transformation: From KBC to CM, lessons on creating cultural buy-in.Building effective data teams: Why the first hire isn’t always a data engineer.AI governance: What makes a strong AI Council and why CEOs should care.Product and process thinking: How MLOps, data factories, and product mindsets intersect.Agents and autonomy: The future of work with AI teammates, not just tools.The human edge in a machine world: A preview of Jackie’s next book on rediscovering humanity in the age of AI.Curious about Jackie’s take on AI agents, cultural inertia, or what really makes a great data strategy tick? Tune in, you might just find a new way to think about your tech stack and your team.

Welcome to DataFramed Industry Roundups! In this series of episodes, we sit down to discuss the latest and greatest in data & AI.  In this episode, with special guest, DataCamp COO Martijn, we touch upon the hype and reality of AI agents in business, the McKinsey vs. Ethan Mollick debate on simple vs. complex agents, Meta's $15B stake in Scale AI and what it means for data and talent, Apple’s rumored $20B bid for Perplexity amid AI struggles, EU’s push to treat AI skills like reading and math, the first fully AI-generated NBA ad and what it means for creative industries, a new benchmark for deep research tools, and much more. Links Mentioned in the Show: Meta bought Scale AIApple rumoured to buy trying to acquire Perplexity for $20BnMcKinsey's Seizing the Agentic AI Advantage reportThe first fully AI-generated NBA AdEU Generative AI Outlook reportMary Meeker's Trend in AI reportDeep research benchmarkRewatch RADAR AI  New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

Real-Time vs Historical Insights: What’s Best for Your Business? | Data & AI NXT 2025

Daniel Esteban Vesga and Oscar Narvaez debate the trade-offs between real-time data and deep historical analysis, and how to balance speed with long-term context in decision-making.

What’s Data NXT? It’s Globant’s global event on Data & AI, where tech leaders explore the power of intelligent agents shaping the future of business.

👉 Don’t forget to subscribe to Globant’s channel and hit the 🔔 to stay updated on all our events!

NXTConference #GlobantNXTConference #DataNXT #💚

The Future of Business Intelligence: AI in Action | Data & AI NXT 2025

Nacho Vuotto, Esteban Bertuccio, Carlos Alarcón, and Sergio Soliz explain how BI is shifting from static dashboards to autonomous, insight-generating platforms using AI agents.

What’s Data NXT? It’s Globant’s global event on Data & AI, where tech leaders explore the power of intelligent agents shaping the future of business.

👉 Don’t forget to subscribe to Globant’s channel and hit the 🔔 to stay updated on all our events!

NXTConference #GlobantNXTConference #DataNXT #💚

Unlock the Power of Synthetic Data & Digital Twins | Data & AI NXT 2025

Carla Molgora, Ana Lía Villarreal and Cristina Granda discuss how synthetic data and digital twins are accelerating product development, simulations and risk analysis in AI-powered environments.

What’s Data NXT? It’s Globant’s global event on Data & AI, where tech leaders explore the power of intelligent agents shaping the future of business.

👉 Don’t forget to subscribe to Globant’s channel and hit the 🔔 to stay updated on all our events!

NXTConference #GlobantNXTConference #DataNXT #💚

Master Governance for AI Agents | Data & AI NXT 2025

Roberto Contreras explores how to build governance frameworks for agentic systems, ensuring transparency, accountability and ethical behavior in AI-driven organizations.

What’s Data NXT? It’s Globant’s global event on Data & AI, where tech leaders explore the power of intelligent agents shaping the future of business.

👉 Don’t forget to subscribe to Globant’s channel and hit the 🔔 to stay updated on all our events!

NXTConference #GlobantNXTConference #DataNXT #💚

Explore the Ethics of Autonomous AI | Data & AI NXT 2025

Avijeet Dutta, Dr. Shivani Rai Gupta, Jyothish Jayaraman and Andres Tenorio dive into the ethical risks of autonomous AI, tackling accountability, bias and the human-in-the-loop challenge.

What’s Data NXT? It’s Globant’s global event on Data & AI, where tech leaders explore the power of intelligent agents shaping the future of business.

👉 Don’t forget to subscribe to Globant’s channel and hit the 🔔 to stay updated on all our events!

NXTConference #GlobantNXTConference #DataNXT #💚

Unlock the Secrets of Data Success with Sportian’s Playbook | Data & AI NXT 2025

In this session from Data & AI NXT 2025, Leandro Mora (Chief Data Officer, Sportian) unveils how Sportian is redefining performance, operations, and fan engagement through a cutting‑edge data mesh strategy powered by AI.

What’s Data NXT? It’s Globant’s global event on Data & AI, where tech leaders explore the power of intelligent agents shaping the future of business.

👉 Don’t forget to subscribe to Globant’s channel and hit the 🔔 to stay updated on all our events!

NXTConference #GlobantNXTConference #DataNXT #💚

Supported by Our Partners •⁠ WorkOS — The modern identity platform for B2B SaaS. •⁠ Statsig ⁠ — ⁠ The unified platform for flags, analytics, experiments, and more. • Sonar —  Code quality and code security for ALL code.  — What happens when a company goes all in on AI? At Shopify, engineers are expected to utilize AI tools, and they’ve been doing so for longer than most. Thanks to early access to models from GitHub Copilot, OpenAI, and Anthropic, the company has had a head start in figuring out what works. In this live episode from LDX3 in London, I spoke with Farhan Thawar, VP of Engineering, about how Shopify is building with AI across the entire stack. We cover the company’s internal LLM proxy, its policy of unlimited token usage, and how interns help push the boundaries of what’s possible. In this episode, we cover: • How Shopify works closely with AI labs • The story behind Shopify’s recent Code Red • How non-engineering teams are using Cursor for vibecoding • Tobi Lütke’s viral memo and Shopify’s expectations around AI • A look inside Shopify’s LLM proxy—used for privacy, token tracking, and more • Why Shopify places no limit on AI token spending  • Why AI-first isn’t about reducing headcount—and why Shopify is hiring 1,000 interns • How Shopify’s engineering department operates and what’s changed since adopting AI tooling • Farhan’s advice for integrating AI into your workflow • And much more! — Timestamps (00:00) Intro (02:07) Shopify’s philosophy: “hire smart people and pair with them on problems” (06:22) How Shopify works with top AI labs  (08:50) The recent Code Red at Shopify (10:47) How Shopify became early users of GitHub Copilot and their pivot to trying multiple tools (12:49) The surprising ways non-engineering teams at Shopify are using Cursor (14:53) Why you have to understand code to submit a PR at Shopify (16:42) AI tools' impact on SaaS  (19:50) Tobi Lütke’s AI memo (21:46) Shopify’s LLM proxy and how they protect their privacy (23:00) How Shopify utilizes MCPs (26:59) Why AI tools aren’t the place to pinch pennies (30:02) Farhan’s projects and favorite AI tools (32:50) Why AI-first isn’t about freezing headcount and the value of hiring interns (36:20) How Shopify’s engineering department operates, including internal tools (40:31) Why Shopify added coding interviews for director-level and above hires (43:40) What has changed since Spotify added AI tooling  (44:40) Farhan’s advice for implementing AI tools — The Pragmatic Engineer deepdives relevant for this episode: • How Shopify built its Live Globe for Black Friday • Inside Shopify's leveling split • Real-world engineering challenges: building Cursor • How Anthropic built Artifacts — See the transcript and other references from the episode at ⁠⁠https://newsletter.pragmaticengineer.com/podcast⁠⁠ — Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email [email protected].

Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe