talk-data.com talk-data.com

Event

Data Skeptic

2014-05-23 – 2025-11-23 Podcasts Visit website ↗

Activities tracked

394

The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.

Sessions & talks

Showing 1–25 of 394 · Newest first

Search within this event →

Designing Recommender Systems for Digital Humanities

2025-11-23 Listen
podcast_episode
Kyle Polich , Florian Atzenhofer-Baumgartner (Graz University of Technology)

In this episode of Data Skeptic, we explore the fascinating intersection of recommender systems and digital humanities with guest Florian Atzenhofer-Baumgartner, a PhD student at Graz University of Technology. Florian is working on Monasterium.net, Europe's largest online collection of historical charters, containing millions of medieval and early modern documents from across the continent. The conversation delves into why traditional recommender systems fall short in the digital humanities space, where users range from expert historians and genealogists to art historians and linguists, each with unique research needs and information-seeking behaviors. Florian explains the technical challenges of building a recommender system for cultural heritage materials, including dealing with sparse user-item interaction matrices, the cold start problem, and the need for multi-modal similarity approaches that can handle text, images, metadata, and historical context. The platform leverages various embedding techniques and gives users control over weighting different modalities—whether they're searching based on text similarity, visual imagery, or diplomatic features like issuers and receivers. A key insight from Florian's research is the importance of balancing serendipity with utility, collection representation to prevent bias, and system explainability while maintaining effectiveness. The discussion also touches on unique evaluation challenges in non-commercial recommendation contexts, including Florian's "research funnel" framework that considers discovery, interaction, integration, and impact stages. Looking ahead, Florian envisions recommendation systems becoming standard tools for exploration across digital archives and cultural heritage repositories throughout Europe, potentially transforming how researchers discover and engage with historical materials. The new version of Monasterium.net, set to launch with enhanced semantic search and recommendation features, represents an important step toward making cultural heritage more accessible and discoverable for everyone.  

Shilling Attacks on Recommender Systems

2025-11-05 Listen
podcast_episode

In this episode of Data Skeptic's Recommender Systems series, Kyle sits down with Aditya Chichani, a senior machine learning engineer at Walmart, to explore the darker side of recommendation algorithms. The conversation centers on shilling attacks—a form of manipulation where malicious actors create multiple fake profiles to game recommender systems, either to promote specific items or sabotage competitors. Aditya, who researched these attacks during his undergraduate studies at SPIT before completing his master's in computer science with a data science specialization at UC Berkeley, explains how these vulnerabilities emerge particularly in collaborative filtering systems. From promoting a friend's ska band on Spotify to inflating product ratings on e-commerce platforms, shilling attacks represent a significant threat in an industry where approximately 4% of reviews are fake, translating to $800 billion in annual sales in the US alone. The discussion delves deep into collaborative filtering, explaining both user-user and item-item approaches that create similarity matrices to predict user preferences. However, these systems face various shilling attacks of increasing sophistication: random attacks use minimal information with average ratings, while segmented attacks strategically target popular items (like Taylor Swift albums) to build credibility before promoting target items. Bandwagon attacks focus on highly popular items to connect with genuine users, and average attacks leverage item rating knowledge to appear authentic. User-user collaborative filtering proves particularly vulnerable, requiring as few as 500 fake profiles to impact recommendations, while item-item filtering demands significantly more resources. Aditya addresses detection through machine learning techniques that analyze behavioral patterns using methods like PCA to identify profiles with unusually high correlation and suspicious rating consistency. However, this remains an evolving challenge as attackers adapt strategies, now using large language models to generate more authentic-seeming fake reviews. His research with the MovieLens dataset tested detection algorithms against synthetic attacks, highlighting how these concerns extend to modern e-commerce systems. While companies rarely share attack and detection data publicly to avoid giving attackers advantages, academic research continues advancing both offensive and defensive strategies in recommender systems security.

The Small World Hypothesis

2025-04-21 Listen
podcast_episode

Kyle discusses the history and proof for the small world hypothesis.

Networks of the Mind

2025-02-18 Listen
podcast_episode
Kyle Polich , Yoed Kennet (Technion – Israel Institute of Technology)

A man goes into a bar… This is the beginning of a riddle that our guest, Yoed Kennet, an assistant professor at the Technion's Faculty of Data and Decision Sciences, uses to measure creativity in subjects. In our talk, Yoed speaks about how to combine cognitive science and network science to explore the complexities and decode the mysteries of the human mind. The listeners will learn how network science provides tools to map and analyze human memory, revealing how problem-solving and creativity emerge from changes in semantic memory structures. Key insights include the role of memory restructuring during moments of insight, the connection between semantic networks and creative thinking, and how understanding these processes can improve problem-solving and analogical reasoning. Real-life applications span enhancing creativity in the workplace, building tools to combat cognitive rigidity in aging, and improving learning strategies by fostering richer, more flexible mental networks.


Want to listen ad-free?  Try our Graphs Course?  Join Data Skeptic+ for $5 / month of $50 / year https://plus.dataskeptic.com

Change Point Detection Algorithms

2021-11-08 Listen
podcast_episode
Kyle Polich , Gerrit van den Burg (The Alan Turing Institute)

Gerrit van den Burg, Postdoctoral Researcher at The Alan Turing Institute, joins us today to discuss his work "An Evaluation of Change Point Detection Algorithms."

Time Series for Good

2021-11-01 Listen
podcast_episode
Kyle Polich , Bahman Rostami-Tabar (Cardiff University)

Bahman Rostami-Tabar, Senior Lecturer in Management Science at Cardiff University, joins us today to talk about his work "Forecasting and its Beneficiaries."

Long Term Time Series Forecasting

2021-10-25 Listen
podcast_episode
Kyle Polich , Henning Lange (University of Washington) , Alex Mallen (University of Washington)

Alex Mallen, Computer Science student at the University of Washington, and Henning Lange, a Postdoctoral Scholar in Applied Math at the University of Washington, join us today to share their work "Deep Probabilistic Koopman: Long-term Time-Series Forecasting Under Periodic Uncertainties."

Fast and Frugal Time Series Forecasting

2021-10-17 Listen
podcast_episode
Kyle Polich , Fotios Petropoulos (University of Bath)

Fotios Petropoulos, Professor of Management Science at the University of Bath in The U.K., joins us today to talk about his work "Fast and Frugal Time Series Forecasting."

Causal Inference in Educational Systems

2021-10-11 Listen
podcast_episode
Kyle Polich , Manie Tadayon (University of California, Los Angeles (UCLA))

Manie Tadayon, a PhD graduate from the ECE department at University of California, Los Angeles, joins us today to talk about his work "Comparative Analysis of the Hidden Markov Model and LSTM: A Simulative Approach."

Boosted Embeddings for Time Series

2021-10-04 Listen
podcast_episode
Kyle Polich , Sankeerth Rao Karingula (Palo Alto Networks)

Sankeerth Rao Karingula, ML Researcher at Palo Alto Networks, joins us today to talk about his work "Boosted Embeddings for Time Series Forecasting."

Works Mentioned Boosted Embeddings for Time Series Forecasting by Sankeerth Rao Karingula, Nandini Ramanan, Rasool Tahmasbi, Mehrnaz Amjadi, Deokwoo Jung, Ricky Si, Charanraj Thimmisetty, Luisa Polania Cabrera, Marjorie Sayer, Claudionor Nunes Coelho Jr https://www.linkedin.com/in/sankeerthrao/ https://twitter.com/sankeerthrao3  https://lod2021.icas.cc/ 

Change Point Detection in Continuous Integration Systems

2021-09-27 Listen
podcast_episode
David Daly (MongoDB) , Kyle Polich

David Daly, Performance Engineer at MongoDB, joins us today to discuss "The Use of Change Point Detection to Identify Software Performance Regressions in a Continuous Integration System". Works Mentioned The Use of Change Point Detection to Identify Software Performance Regressions in a Continuous Integration System by David Daly, William Brown, Henrik Ingo, Jim O'Leary, David BradfordSocial Media David's Website David's Twitter Mongodb

Applying k-Nearest Neighbors to Time Series

2021-09-20 Listen
podcast_episode
Kyle Polich , Samya Tajmouati (University of Science of Kenitra, Morocco)

Samya Tajmouati, a PhD student in Data Science at the University of Science of Kenitra, Morocco, joins us today to discuss her work Applying K-Nearest Neighbors to Time Series Forecasting: Two New Approaches.

Ultra Long Time Series

2021-09-13 Listen
podcast_episode
Kyle Polich , Dr. Feng Li (Central University of Finance and Economics)

Dr. Feng Li, (@f3ngli) is an Associate Professor of Statistics in the School of Statistics and Mathematics at Central University of Finance and Economics in Beijing, China. He joins us today to discuss his work Distributed ARIMA Models for Ultra-long Time Series.

MiniRocket

2021-09-06 Listen
podcast_episode
Kyle Polich , Angus Dempster (Monash University)

Angus Dempster, PhD Student at Monash University in Australia, comes on today to talk about MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification, a fast deterministic transform for time series classification. MINIROCKET reformulates ROCKET, gaining a 75x improvement on larger datasets with essentially the same performance. In this episode, we talk about the insights that realized this speedup as well as use cases.

ARiMA is not Sufficient

2021-08-30 Listen
podcast_episode
Chongshou Li (Southwest Jiaotong University) , Kyle Polich

Chongshou Li, Associate Professor at Southwest Jiaotong University in China, joins us today to talk about his work Why are the ARIMA and SARIMA not Sufficient.

Comp Engine

2021-08-23 Listen
podcast_episode
Kyle Polich , Ben Fulcher (University of Sydney, School of Physics)

Ben Fulcher, Senior Lecturer at the School of Physics at the University of Sydney in Australia, comes on today to talk about his project Comp Engine. Follow Ben on Twitter: @bendfulcher For posts about time series analysis : @comptimeseries comp-engine.org

Detecting Ransomware

2021-08-16 Listen
podcast_episode
Kyle Polich , Nitin Pundir (University of Florida; Florida Institute for Cybersecurity Research)

Nitin Pundir, PhD candidate at University Florida and works at the Florida Institute for Cybersecurity Research, comes on today to talk about his work "RanStop: A Hardware-assisted Runtime Crypto-Ransomware Detection Technique." FICS Research Lab - https://fics.institute.ufl.edu/  LinkedIn - https://www.linkedin.com/in/nitin-pundir470/

GANs in Finance

2021-08-09 Listen
podcast_episode
Kyle Polich , Florian Eckerli (Zurich University of Applied Sciences)

Florian Eckerli, a recent graduate of Zurich University of Applied Sciences, comes on the show today to discuss his work Generative Adversarial Networks in Finance: An Overview.

Predicting Urban Land Use

2021-08-02 Listen
podcast_episode
Kyle Polich , Daniel Omeiza (University of Oxford)

Today on the show we have Daniel Omeiza, a doctoral student in the computer science department of the University of Oxford, who joins us to talk about his work Efficient Machine Learning for Large-Scale Urban Land-Use Forecasting in Sub-Saharan Africa.

Opportunities for Skillful Weather Prediction

2021-07-26 Listen
podcast_episode
Kyle Polich , Elizabeth Barnes (Colorado State University)

Today on the show we have Elizabeth Barnes, Associate Professor in the department of Atmospheric Science at Colorado State University, who joins us to talk about her work Identifying Opportunities for Skillful Weather Prediction with Interpretable Neural Networks. Find more from the Barnes Research Group on their site. Weather is notoriously difficult to predict. Complex systems are demanding of computational power. Further, the chaotic nature of, well, nature, makes accurate forecasting especially difficult the longer into the future one wants to look. Yet all is not lost! In this interview, we explore the use of machine learning to help identify certain conditions under which the weather system has entered an unusually predictable position in it's normally chaotic state space.

Predicting Stock Prices

2021-07-19 Listen
podcast_episode
Kyle Polich , Andrea Fronzetti Colladon (University of Perugia)

Today on the show we have Andrea Fronzetti Colladon (@iandreafc), currently working at the University of Perugia and inventor of the Semantic Brand Score, joins us to talk about his work studying human communication and social interaction. We discuss the paper Look inside. Predicting Stock Prices by Analyzing an Enterprise Intranet Social Network and Using Word Co-Occurrence Networks.

N-Beats

2021-07-12 Listen
podcast_episode
Kyle Polich , Boris Oreshkin (Unity Technologies)

Today on the show we have Boris Oreshkin @boreshkin, a Senior Research Scientist at Unity Technologies, who joins us today to talk about his work N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting. Works Mentioned: N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting By Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, Yoshua Bengio https://arxiv.org/abs/1905.10437 Social Media Linkedin

Twitter 

Translation Automation

2021-07-06 Listen
podcast_episode

Today we are back with another episode discussing AI in the work field. AI has, is, and will continue to facilitate the automation of work done by humans. Sometimes this may be an entire role. Other times it may automate a particular part of their role, scaling their effectiveness. Carl Stimson, a Freelance Japanese to English translator, comes on the show to talk about his work in translation and his perspective about how AI will change translation in the future. 

Time Series at the Beach

2021-06-28 Listen
podcast_episode
Kyle Polich , Shane Ross (Virginia Tech University)

Shane Ross, Professor of Aerospace and Ocean Engineering at Virginia Tech University, comes on today to talk about his work "Beach-level 24-hour forecasts of Florida red tide-induced respiratory irritation."

Automatic Identification of Outlier Galaxy Images

2021-06-21 Listen
podcast_episode
Kyle Polich , Lior Shamir (Kansas University)

Lior Shamir, Associate Professor of Computer Science at Kansas University, joins us today to talk about the recent paper Automatic Identification of Outliers in Hubble Space Telescope Galaxy Images. Follow Lio on Twitter @shamir_lior