Quantifying Training Data Retention in Large Language Models: An Analysis of Pretraining Factors and Mitigation Strategies
Large language models (LLMs) have demonstrated remarkable capabilities but face a significant challenge: inadvertently memorizing portions of their training data. This memorization raises serious concerns about privacy and potential copyright violations. Most existing efforts to address this issue are reactive, filtering model outputs or fine-tuning after problematic content has been encoded into the model. This thesis takes a proactive approach by investigating methods to control memorization during the pretraining phase. The study leverages Goldfish Loss, an innovative modification to the training objective designed to discourage verbatim memorization of long sequences like copyrighted documents. This thesis compares two experimental conditions: Dense Gutenberg (extreme data repetition) and Sparse Gutenberg (sparse text inclusion). The experiments uncover a critical phenomenon dubbed the Offset Effect, revealing that minor shifts in a prompt’s starting position can dramatically alter whether a model reproduces memorized text. The study reveals a connection between memorization and text degradation. When models cannot retrieve memorized content due to mitigation strategies or limited exposure, they tend to generate repetitive, lower-quality text. This finding suggests that memory retrieval limitations drive degenerative output. By quantifying these insights, this thesis emphasizes introducing the offset dimension into verbatim memorization evaluation frameworks. The approach promises more accurate assessment methods and proactive strategies for mitigating memorization risks in large language models.
MSc_Thesis.pdf
Main Document
Not Applicable (or Unknown)
openaccess
CC BY
2.16 MB
Adobe PDF
2c02ccdaeec541c7342dcc90d6bbbe8b