A dataset of scientific citations in U.S. patent Office Actions
We present a curated dataset of about 850,000 citations extracted from Office Actions issued by examiners at the United States Patent and Trademark Office. These references, historically underused due to accessibility challenges, provide a granular view into the patent examination process and complement traditional front-page citation data. We classify each citation into one of 14 categories and focus on the 265,000 references to scientific literature, which we parse, clean, and disambiguate using machine learning and external bibliographic services. To enhance reusability, disambiguated records are linked to OpenAlex, a comprehensive research metadata platform. The dataset enables new research on examiner behavior, science-technology linkages, and the construction of citation-based metrics. All data and code are openly available to facilitate reuse across disciplines.
s41597-026-06720-7_reference.pdf
Main Document
Accepted version
openaccess
CC BY
1.2 MB
Adobe PDF
896b3d49d1cfce5313dfc6fa88b16041