Safety Devolution in AI Agents

Yu Cheng; Stroebl Benedikt; Yang Diyi; Papakyriakopoulos Orestis

Safety Devolution in AI Agents

dc.contributor.author	Yu Cheng
dc.contributor.author	Stroebl Benedikt
dc.contributor.author	Yang Diyi
dc.contributor.author	Papakyriakopoulos Orestis
dc.date.accessioned	2025-05-28T07:33:21Z
dc.date.available	2025-05-28T07:33:21Z
dc.date.issued	2025-05-20
dc.description	이 논문은 Retrieval-Augmented AI Agents의 안전성과 정렬 문제에 대한 체계적인 실험과 분석을 통해서 외부 정보 접근이 에이전트의 행위에 구조적 영향을 미친다는 사실을 밝힌다.
dc.description.abstract	As retrieval-augmented AI agents become more embedded in society, their safety properties and ethical behavior remain insufficiently understood. In particular, the growing integration of LLMs and AI agents raises critical questions about how they engage with and are influenced by their environments. This study investigates how expanding retrieval access—from no external sources to Wikipedia-based retrieval and open web search—affects model reliability, bias propagation, and harmful content generation. Through extensive benchmarking of censored and uncensored LLMs and AI Agents, our findings reveal a consistent degradation in refusal rates, bias sensitivity, and harmfulness safeguards as models gain broader access to external sources, culminating in a phenomenon we term safety devolution. Notably, retrieval-augmented agents built on aligned LLMs often behave more unsafely than uncensored models without retrieval. This effect persists even under strong retrieval accuracy and prompt-based mitigation, suggesting that the mere presence of retrieved content reshapes model behavior in structurally unsafe ways. These findings underscore the need for robust mitigation strategies to ensure fairness and reliability in retrieval-augmented and increasingly autonomous AI systems.
dc.identifier.citation	Cheng Yu, Benedikt Stroebl, Diyi Yang, Orestis Papakyriakopoulos. Safety Devolution in AI Agents. arXiv:2505.14215 [cs.CY], 2025. https://arxiv.org/abs/2505.14215
dc.identifier.other	2505.14215v1
dc.identifier.uri	http://data.inu.ac.kr/handle/123456789/943
dc.language.iso	en_US
dc.relation.ispartofseries	2505.14215v1
dc.subject	Retrieval-Augmented Generation
dc.subject	AI Agents
dc.subject	Safety Devolution
dc.subject	Refusal Rate
dc.title	Safety Devolution in AI Agents
dc.type	Preprint

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Safety Devolution in AI Agents.pdf
Size:: 592.8 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

AI Ethics & Social Impact