Generative AI language models are increasingly being positioned as epistemic tools, used to aid enquiry and generally help us find things out. However, they suffer from certain flaws which both limit their usefulness as epistemic tools and risk causing epistemic harm. While AI bias and hallucinations have been written about as being epistemically harmful, an underexplored trait is that of sycophancy. Sycophantic AI models produce outputs which match user beliefs over truthful ones. This draws parallels with other forms of algorithmic feed-back loops and epistemic bubbles which can limit the user's ability to see beyond their own perspective and to acquire knowledge. The trait of sycophancy in AI has been attributed to stages in the training process where models learn from human feedback to reflect user preferences. This work further sketches a possible application of vice epistemology to language models. It does so, not by giving agency to these models, but by looking at whether aggregating human preferences in the training process can manifest a kind of collective epistemic vice. I will ask whether epistemically harmful character traits, arising from collective training process, can meaningfully qualify as (non-agential) epistemic vices of AI.
Conference presentation
Human-in-the-feedback-loop: Epistemic harms of sycophantic AI
Australasian Association of Philosophy (AAP) Conference, 2025 (Brisbane, Australia, 06-Jul-2025–10-Jul-2025)
2025
Abstract
Details
- Title
- Human-in-the-feedback-loop: Epistemic harms of sycophantic AI
- Authors
- Declan Humphreys - University of the Sunshine Coast, Queensland, School of Science, Technology and Engineering
- Conference details
- Australasian Association of Philosophy (AAP) Conference, 2025 (Brisbane, Australia, 06-Jul-2025–10-Jul-2025)
- Date published
- 2025
- Organisation Unit
- School of Science, Technology and Engineering
- Language
- English
- Record Identifier
- 991141839302621
- Output Type
- Conference presentation
Metrics
1 Record Views