Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Self-Recognition in Language Models
 
conference paper

Self-Recognition in Language Models

Davidson, Tim Ruben  
•
Surkov, Viacheslav  
•
Veselovskyy, Veniamin  
Show more
July 9, 2024
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Conference on Empirical Methods in Natural Language Processing (EMNLP 2024)

A rapidly growing number of applications rely on a small set of closed-source language models (LMs). This dependency might introduce novel security risks if LMs develop self-recognition capabilities. Inspired by human identity verification methods, we propose a novel approach for assessing self-recognition in LMs using model-generated "security questions". Our test can be externally administered to monitor frontier models as it does not require access to internal model parameters or output probabilities. We use our test to examine self-recognition in ten of the most capable open- and closed-source LMs currently publicly available. Our extensive experiments found no empirical evidence of general or consistent self-recognition in any examined LM. Instead, our results suggest that given a set of alternatives, LMs seek to pick the "best" answer, regardless of its origin. Moreover, we find indications that preferences about which models produce the best answers are consistent across LMs. We additionally uncover novel insights on position bias considerations for LMs in multiple-choice settings.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

2407.06946v2.pdf

Type

Main Document

Version

Submitted version (Preprint)

Access type

openaccess

License Condition

N/A

Size

1.03 MB

Format

Adobe PDF

Checksum (MD5)

b134fe03592c9b158f1ada63e91c0657

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés