Data Skeptic

Shilling Attacks on Recommender Systems

2025-11-05 Listen

podcast_episode

Aditya Chichani (Walmart) , Kyle Polich

AI/ML Computer Science Data Science Cyber Security

In this episode of Data Skeptic's Recommender Systems series, Kyle sits down with Aditya Chichani, a senior machine learning engineer at Walmart, to explore the darker side of recommendation algorithms. The conversation centers on shilling attacks—a form of manipulation where malicious actors create multiple fake profiles to game recommender systems, either to promote specific items or sabotage competitors. Aditya, who researched these attacks during his undergraduate studies at SPIT before completing his master's in computer science with a data science specialization at UC Berkeley, explains how these vulnerabilities emerge particularly in collaborative filtering systems. From promoting a friend's ska band on Spotify to inflating product ratings on e-commerce platforms, shilling attacks represent a significant threat in an industry where approximately 4% of reviews are fake, translating to $800 billion in annual sales in the US alone. The discussion delves deep into collaborative filtering, explaining both user-user and item-item approaches that create similarity matrices to predict user preferences. However, these systems face various shilling attacks of increasing sophistication: random attacks use minimal information with average ratings, while segmented attacks strategically target popular items (like Taylor Swift albums) to build credibility before promoting target items. Bandwagon attacks focus on highly popular items to connect with genuine users, and average attacks leverage item rating knowledge to appear authentic. User-user collaborative filtering proves particularly vulnerable, requiring as few as 500 fake profiles to impact recommendations, while item-item filtering demands significantly more resources. Aditya addresses detection through machine learning techniques that analyze behavioral patterns using methods like PCA to identify profiles with unusually high correlation and suspicious rating consistency. However, this remains an evolving challenge as attackers adapt strategies, now using large language models to generate more authentic-seeming fake reviews. His research with the MovieLens dataset tested detection algorithms against synthetic attacks, highlighting how these concerns extend to modern e-commerce systems. While companies rarely share attack and detection data publicly to avoid giving attackers advantages, academic research continues advancing both offensive and defensive strategies in recommender systems security.

Boosted Embeddings for Time Series

2021-10-04 Listen

podcast_episode

Kyle Polich , Sankeerth Rao Karingula (Palo Alto Networks)

AI/ML

Sankeerth Rao Karingula, ML Researcher at Palo Alto Networks, joins us today to talk about his work "Boosted Embeddings for Time Series Forecasting."

Works Mentioned Boosted Embeddings for Time Series Forecasting by Sankeerth Rao Karingula, Nandini Ramanan, Rasool Tahmasbi, Mehrnaz Amjadi, Deokwoo Jung, Ricky Si, Charanraj Thimmisetty, Luisa Polania Cabrera, Marjorie Sayer, Claudionor Nunes Coelho Jr https://www.linkedin.com/in/sankeerthrao/ https://twitter.com/sankeerthrao3 https://lod2021.icas.cc/

Predicting Urban Land Use

2021-08-02 Listen

podcast_episode

Kyle Polich , Daniel Omeiza (University of Oxford)

AI/ML Computer Science

Today on the show we have Daniel Omeiza, a doctoral student in the computer science department of the University of Oxford, who joins us to talk about his work Efficient Machine Learning for Large-Scale Urban Land-Use Forecasting in Sub-Saharan Africa.

Opportunities for Skillful Weather Prediction

2021-07-26 Listen

podcast_episode

Kyle Polich , Elizabeth Barnes (Colorado State University)

AI/ML

Today on the show we have Elizabeth Barnes, Associate Professor in the department of Atmospheric Science at Colorado State University, who joins us to talk about her work Identifying Opportunities for Skillful Weather Prediction with Interpretable Neural Networks. Find more from the Barnes Research Group on their site. Weather is notoriously difficult to predict. Complex systems are demanding of computational power. Further, the chaotic nature of, well, nature, makes accurate forecasting especially difficult the longer into the future one wants to look. Yet all is not lost! In this interview, we explore the use of machine learning to help identify certain conditions under which the weather system has entered an unusually predictable position in it's normally chaotic state space.

Translation Automation

2021-07-06 Listen

podcast_episode

Kyle Polich , Carl Stimson

AI/ML

Today we are back with another episode discussing AI in the work field. AI has, is, and will continue to facilitate the automation of work done by humans. Sometimes this may be an entire role. Other times it may automate a particular part of their role, scaling their effectiveness. Carl Stimson, a Freelance Japanese to English translator, comes on the show to talk about his work in translation and his perspective about how AI will change translation in the future.

Detecting Drift

2021-06-11 Listen

podcast_episode

Kyle Polich , Sam Ackerman (IBM Research Labs)

AI/ML IBM

Sam Ackerman, Research Data Scientist at IBM Research Labs in Haifa, Israel, joins us today to talk about his work Detection of Data Drift and Outliers Affecting Machine Learning Model Performance Over Time. Check out Sam's IBM statistics/ML blog at: http://www.research.ibm.com/haifa/dept/vst/ML-QA.shtml

They're Coming for Our Jobs

2021-05-03 Listen

podcast_episode

Kyle Polich , Celestia Ward

AI/ML

AI has, is, and will continue to facilitate the automation of work done by humans. Sometimes this may be an entire role. Other times it may automate a particular part of their role, scaling their effectiveness. Unless progress in AI inexplicably halts, the tasks done by humans vs. machines will continue to evolve. Today's episode is a speculative conversation about what the future may hold. Co-Host of Squaring the Strange Podcast, Caricature Artist, and an Academic Editor, Celestia Ward joins us today! Kyle and Celestia discuss whether or not her jobs as a caricature artist or as an academic editor are under threat from AI automation. Mentions https://squaringthestrange.wordpress.com/ https://twitter.com/celestiaward The legendary Dr. Jorge Pérez and his work studying unicorns Supernormal stimulus International Society of Caricature Artists Two Heads Studios

Pandemic Machine Learning Pitfalls

2021-04-26 Listen

podcast_episode

Kyle Polich , Derek Driggs (University of Cambridge)

AI/ML

Today on the show Derek Driggs, a PhD Student at the University of Cambridge. He comes on to discuss the work Common Pitfalls and Recommendations for Using Machine Learning to Detect and Prognosticate for COVID-19 Using Chest Radiographs and CT Scans. Help us vote for the next theme of Data Skeptic! Vote here: https://dataskeptic.com/vote

Flesch Kincaid Readability Tests

2021-04-19 Listen

podcast_episode

Kyle Polich

AI/ML

Given a document in English, how can you estimate the ease with which someone will find they can read it? Does it require a college-level of reading comprehension or is it something a much younger student could read and understand? While these questions are useful to ask, they don't admit a simple answer. One option is to use one of the (essentially identical) two Flesch Kincaid Readability Tests. These are simple calculations which provide you with a rough estimate of the reading ease. In this episode, Kyle shares his thoughts on this tool and when it could be appropriate to use as part of your feature engineering pipeline towards a machine learning objective. For empirical validation of these metrics, the plot below compares English language Wikipedia pages with "Simple English" Wikipedia pages. The analysis Kyle describes in this episode yields the intuitively pleasing histogram below. It summarizes the distribution of Flesch reading ease scores for 1000 pages examined from both Wikipedias.

Earthquake Detection with Crowd-sourced Data

2020-12-25 Listen

podcast_episode

Kyle Polich , Suzan van der Lee (Northwestern University) , Omkar Ranadive (NorthWestern University)

AI/ML Computer Science GitHub

Have you ever wanted to hear what an earthquake sounds like? Today on the show we have Omkar Ranadive, Computer Science Masters student at NorthWestern University, who collaborates with Suzan van der Lee, an Earth and Planetary Sciences professor at Northwestern University, on the crowd-sourcing project Earthquake Detective. Email Links: Suzan: [email protected] Omkar: [email protected] Works Mentioned: Paper: Applying Machine Learning to Crowd-sourced Data from Earthquake Detective https://arxiv.org/abs/2011.04740 by Omkar Ranadive, Suzan van der Lee, Vivan Tang, and Kevin Chao Github: https://github.com/Omkar-Ranadive/Earthquake-Detective Earthquake Detective: https://www.zooniverse.org/projects/vivitang/earthquake-detective Thanks to our sponsors! Brilliant.org Is an awesome platform with interesting courses, like Quantum Computing! There is something for you and surely something for the whole family! Get 20% off Brilliant Premium at http://brilliant.com/dataskeptic

Face Mask Sentiment Analysis

2020-11-27 Listen

podcast_episode

Kyle Polich , Jonathan Lai (University of Rochester) , Jiebo Luo (University of Rochester) , Neil Yeung (University of Rochester)

AI/ML Analytics Computer Science Data Science

As the COVID-19 pandemic continues, the public (or at least those with Twitter accounts) are sharing their personal opinions about mask-wearing via Twitter. What does this data tell us about public opinion? How does it vary by demographic? What, if anything, can make people change their minds? Today we speak to, Neil Yeung and Jonathan Lai, Undergraduate students in the Department of Computer Science at the University of Rochester, and Professor of Computer Science, Jiebo-Luoto to discuss their recent paper. Face Off: Polarized Public Opinions on Personal Face Mask Usage during the COVID-19 Pandemic. Works Mentioned https://arxiv.org/abs/2011.00336 Emails: Neil Yeung [email protected] Jonathan Lia [email protected] Jiebo Luo [email protected] Thanks to our sponsors! Springboard School of Data offers a comprehensive career program encompassing data science, analytics, engineering, and Machine Learning. All courses are online and tailored to fit the lifestyle of working professionals. Up to 20 Data Skeptic listeners will receive $500 scholarships. Apply today at springboard.com/datasketpic Check out Brilliant's group theory course to learn about object-oriented design! Brilliant is great for learning something new or to get an easy-to-look-at review of something you already know. Check them out a Brilliant.org/dataskeptic to get 20% off of a year of Brilliant Premium!

Sybil Attacks on Federated Learning

2020-11-13 Listen

podcast_episode

Kyle Polich , Clement Fung (Carnegie Mellon University)

AI/ML GitHub Cyber Security

Clement Fung, a Societal Computing PhD student at Carnegie Mellon University, discusses his research in security of machine learning systems and a defense against targeted sybil-based poisoning called FoolsGold. Works Mentioned: The Limitations of Federated Learning in Sybil Settings Twitter: @clemfung Website: https://clementfung.github.io/ Thanks to our sponsors: Brilliant - Online learning platform. Check out Geometry Fundamentals! Visit Brilliant.org/dataskeptic for 20% off Brilliant Premium!

BetterHelp - Convenient, professional, and affordable online counseling. Take 10% off your first month at betterhelp.com/dataskeptic

Black Boxes Are Not Required

2020-06-05 Listen

podcast_episode

Kyle Polich , Cynthia Rudin , Joanna Radin

AI/ML

Deep neural networks are undeniably effective. They rely on such a high number of parameters, that they are appropriately described as "black boxes". While black boxes lack desirably properties like interpretability and explainability, in some cases, their accuracy makes them incredibly useful. But does achiving "usefulness" require a black box? Can we be sure an equally valid but simpler solution does not exist? Cynthia Rudin helps us answer that question. We discuss her recent paper with co-author Joanna Radin titled (spoiler warning)… Why Are We Using Black Box Models in AI When We Don't Need To? A Lesson From An Explainable AI Competition

Interpretable AI in Healthcare

2020-05-15 Listen

podcast_episode

Jayaraman Thiagarajan , Kyle Polich

AI/ML

Jayaraman Thiagarajan joins us to discuss the recent paper Calibrating Healthcare AI: Towards Reliable and Interpretable Deep Predictive Models.

Self-Explaining AI

2020-05-02 Listen

podcast_episode

Kyle Polich , Dan Elton

AI/ML

Dan Elton joins us to discuss self-explaining AI. What could be better than an interpretable model? How about a model wich explains itself in a conversational way, engaging in a back and forth with the user. We discuss the paper Self-explaining AI as an alternative to interpretable AI which presents a framework for self-explainging AI.

Visualization and Interpretability

2020-01-31 Listen

podcast_episode

Kyle Polich , Enrico Bertini

AI/ML DataViz

Enrico Bertini joins us to discuss how data visualization can be used to help make machine learning more interpretable and explainable. Find out more about Enrico at http://enrico.bertini.io/. More from Enrico with co-host Moritz Stefaner on the Data Stories podcast!

Interpretability

2020-01-07 Listen

podcast_episode

Kyle Polich , Christoph Molnar

AI/ML Analytics

Interpretability Machine learning has shown a rapid expansion into every sector and industry. With increasing reliance on models and increasing stakes for the decisions of models, questions of how models actually work are becoming increasingly important to ask. Welcome to Data Skeptic Interpretability. In this episode, Kyle interviews Christoph Molnar about his book Interpretable Machine Learning. Thanks to our sponsor, the Gartner Data & Analytics Summit going on in Grapevine, TX on March 23 – 26, 2020. Use discount code: dataskeptic. Music Our new theme song is #5 by Big D and the Kids Table. Incidental music by Tanuki Suit Riot.

Jumpstart Your ML Project

2019-12-15 Listen

podcast_episode

Kyle Polich , Seth Juarez

AI/ML

Seth Juarez joins us to discuss the toolbox of options available to a data scientist to jumpstart or extend their machine learning efforts.

Serverless NLP Model Training

2019-12-10 Listen

podcast_episode

Kyle Polich , Alex Reeves

AI/ML NLP

Alex Reeves joins us to discuss some of the challenges around building a serverless, scalable, generic machine learning pipeline. The is a technical deep dive on architecting solutions and a discussion of some of the design choices made.

Ancient Text Restoration

2019-12-01 Listen

podcast_episode

Kyle Polich , Thea Sommerschield

AI/ML

Thea Sommerschield joins us this week to discuss the development of Pythia - a machine learning model trained to assist in the reconstruction of ancient language text.

ML Ops

2019-11-27 Listen

podcast_episode

Damian Brady , Kyle Polich

AI/ML MLOps

Kyle met up with Damian Brady at MS Ignite 2019 to discuss machine learning operations.

NLP for Developers

2019-11-20 Listen

podcast_episode

Kyle Polich , Lance Olson (Microsoft)

AI/ML Lance NLP

While at MS Build 2019, Kyle sat down with Lance Olson from the Applied AI team about how tools like cognitive services and cognitive search enable non-data scientists to access relatively advanced NLP tools out of box, and how more advanced data scientists can focus more time on the bigger picture problems.

Talking to GPT-2

2019-10-31 Listen

podcast_episode

Kyle Polich , Vazgen Davidyants

AI/ML

GPT-2 is yet another in a succession of models like ELMo and BERT which adopt a similar deep learning architecture and train an unsupervised model on a massive text corpus. As we have been covering recently, these approaches are showing tremendous promise, but how close are they to an AGI? Our guest today, Vazgen Davidyants wondered exactly that, and have conversations with a Chatbot running GPT-2. We discuss his experiences as well as some novel thoughts on artificial intelligence.

Catastrophic Forgetting

2019-07-15 Listen

podcast_episode

Kyle Polich , Linhda

AI/ML

Kyle and Linhda discuss some high level theory of mind and overview the concept machine learning concept of catastrophic forgetting.

The Death of a Language

2019-06-01 Listen

podcast_episode

Kyle Polich , Leena , Zane

AI/ML

USC students from the CAIS++ student organization have created a variety of novel projects under the mission statement of "artificial intelligence for social good". In this episode, Kyle interviews Zane and Leena about the Endangered Languages Project.

talk-data.com

Top Topics

Top Speakers

Shilling Attacks on Recommender Systems

Boosted Embeddings for Time Series

Predicting Urban Land Use

Opportunities for Skillful Weather Prediction

Translation Automation

Detecting Drift

They're Coming for Our Jobs

Pandemic Machine Learning Pitfalls

Flesch Kincaid Readability Tests

Earthquake Detection with Crowd-sourced Data

Face Mask Sentiment Analysis

Sybil Attacks on Federated Learning

Black Boxes Are Not Required

Interpretable AI in Healthcare

Self-Explaining AI

Visualization and Interpretability

Interpretability

Jumpstart Your ML Project

Serverless NLP Model Training

Ancient Text Restoration

ML Ops

NLP for Developers

Talking to GPT-2

Catastrophic Forgetting

The Death of a Language