This paper proposes a novel queue-based programming abstraction, Parallel Dispatch Queue (PDQ), that enables efficient parallel execution of fine-grain software communication protocols. Parallel systems often use fine-grain software handlers to integrate a network message into computation. Executing such handlers in parallel requires access synchronization around resources. Much as a monitor construct in a concurrent language protects accesses to a set of data structures, PDQ allows messages to include a synchronization key protecting handler accesses to a group of protocol resources. By simply synchronizing messages in a queue prior to dispatch, PDQ not only eliminates the overhead of acquiring/releasing synchronization primitives but also prevents busy-waiting within handlers. In this paper, we study PDQ's impact on software protocol performance in the context of fine- grain distributed shared memory (DSM) on an SMP cluster. Simulation results running shared-memory applications indicate that: (i) parallel software protocol execution using PDQ significantly improves performance in fine- grain DSM, (ii) tight integration of PDQ and embedded processors into a single custom device can offer performance competitive or better than an all- hardware DSM, and (iii) PDQ best benefits cost-effective systems that use idle SMP processors (rather than custom embedded processors) to execute protocols. On a cluster of 4 16-way SMPs, a PDQ-based parallel protocol running on idle SMP processors improves application performance by a factor of 2.6 over a system running a serial protocol on a single dedicated processor