Software routers promise to enable the fast deployment of new, sophisticated kinds of packet processing without the need to buy and deploy expensive new equipment. The challenge is offering such programmability while at the same time achieving a competitive level of performance. Recent work has demonstrated that software routers are capable of high performance, but only for conventional, simple workloads (like packet forwarding and IP routing) and, even that, after careful manual calibration. In contrast, we are interested in achieving high performance in the context of a software router running multiple sophisticated packet-processing applications. In particular: first, we identify the main factors that affect packet-processing performance on a modern multicore general-purpose server---cache misses, cache contention, load-balancing across processing cores; then, we formulate an optimization problem that takes as input a particular server architecture and a packet processing flow, and determines how to parallelize the router's functionality across the available cores so as to maximize its throughput.