For a class of sensor networks, the task is to monitor an underlying physical phenomenon over space and time through an imperfect observation process. The sensors can communicate back to a central data collector over a noisy channel. The key pa- rameters in such a setting are the fidelity (or distortion) at which the underlying physical phenomenon can be estimated by the data collector, and the cost of operating the sensor network. This is a network joint source-channel communication problem, involving both compression and communication. It is well known that these two tasks may not be addressed separately without sacrificing op- timality, and the optimal performance is generally unknown. This paper presents a lower bound on the best achievable end-to-end distortion as a function of the number of sensors, their total transmit power, the number of degrees of freedom of the un- derlying source process, and the spatio-temporal communication bandwidth. Particular coding schemes are studied, and it is shown that in some cases, the lower bound is tight in a scaling-law sense. By contrast, it is shown that the standard practice of separating source from channel coding may incur an exponential penalty in terms of communication resources, as a function of the number of sensors. Hence, such code designs effectively prevent scalability. Finally, it is outlined how the results extend to cases involving missing synchronization and channel fading.