Rationalization through Concepts
Automated predictions require explanations to be interpretable by humans. One type of ex- planation is a rationale, i.e., a selection of in- put features such as relevant text snippets from which the model computes the outcome. How- ever, a single overall selection does not pro- vide a complete explanation, e.g., weighing several aspects for decisions. To this end, we present a novel self-interpretable model called ConRAT. Inspired by how human explanations for high-level decisions are often based on key concepts, ConRAT extracts a set of text snip- pets as concepts and infers which ones are de- scribed in the document. Then, it explains the outcome with a linear aggregation of concepts. Two regularizers drive ConRAT to build in- terpretable concepts. In addition, we propose two techniques to boost the rationale and pre- dictive performance further. Experiments on both single- and multi-aspect sentiment classi- fication tasks show that ConRAT is the first to generate concepts that align with human ratio- nalization while using only the overall label. Further, it outperforms state-of-the-art meth- ods trained on each aspect label independently.
2021.findings-acl.68.pdf
preprint
openaccess
copyright
1.7 MB
Adobe PDF
0712c2d7ee6cbb94e92e9a7ae6d6d04a