talk-data.com talk-data.com

Topic

ai safety

2

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

2 activities · Newest first

AI safety discourse often splits into immediate harm vs catastrophic risk framings. In this keynote, I argue that the two research streams will benefit from increased cross-talk and a greater number of synergistic projects. A zero-sum framing on attention and resources between the two communities is incorrect and does not serve either side's goals. Recent theoretical work, including on accumulative existential risk, unifies risk pathways between the two fields. Building on this, I suggest concrete synergies that are already in place - as well as opportunities for future collaboration.

I will discuss how shared research and monitoring infrastructure, such as UK AISI Inspect, can benefit both areas; how methodological approaches from human behavioral science, currently used in immediate harms research, can be ported into AI behavioral science applied to existential risk research; and how technical solutions from catastrophic risk research can be applied to mitigate immediate societal harms. We have a shared goal of building a better, safer future for everyone. Let's work together!

Keep Learning and Building! Accelerate your professional development with hands-on training, talks, workshops, networking events, 10+ tracks, and more at ODSC West AI Training Conference (San Francisco and virtual). More here - https://odsc.ai/

Abstract: We will navigate through the alignment challenges and safety considerations of LLMs, addressing both their limitations and capabilities, particularly focusing on techniques related to instruction prefix tuning and their theoretical limitations toward alignment. Additionally, I will discuss fairness across languages in common tokenizers used in LLMs. Finally, I will address safety considerations for agentic systems, illustrating methods to compromise their safety by exploiting seemingly minor changes, such as altering the desktop background to generate a chain of sequenced harmful actions. I will also explore the transferability of these vulnerabilities across different agents.