Historically, hardware acceleration technologies have either been application-specific, therefore lacking in flexibility, or fully programmable, thereby suffering from notable inefficiencies on an application-by-application basis. To address the growing need for domain-specific acceleration technologies, this paper describes a design methodology (i) to automatically generate a domain-specific coarse-grained array from a set of representative applications and (ii) to introduce limited forms of architectural generality to increase the likelihood that additional applications can be successfully mapped onto it. In particular, coarse-grained arrays generated using our approach are intended to be integrated into customizable processors that use application-specific instruction set extensions to accelerate performance and reduce energy; rather than implementing these extensions using application-specific integrated circuit (ASIC) logic, which lacks flexibility, they can be synthesized onto our reconfigurable array instead, allowing the processor to be used for a variety of applications in related domains. Results show that our array is around 2x slower and 15x larger than an ultimately efficient ASIC implementation, and thus far more efficient than field-programmable gate arrays (FPGAs), which are known to be 3-4x slower and 20-40x larger. Additionally, we estimate that our array is usually around 2x larger and 2x slower than an accelerator synthesized using traditional datapath merging, which has, if any, very limited flexibility beyond the design set of DFGs.