Numerous methods have been developed for inference of gene regulatory networks from expression data, however, their strengths and weaknesses remain poorly understood. Accurate and systematic evaluation of these methods is hampered by the difficulty of constructing adequate benchmarks and the lack of tools for a differentiated analysis of network predictions on such benchmarks. Here, we present the new version (3.0) of GeneNetWeaver (GNW), an open-source tool for in silico benchmark generation and performance profiling of net-work inference methods. GNW can be launched directly from any web browser and it has an intuitive graphical user interface. Using GNW it is possible to generate biologically plausible in silico gene networks and simulated expression data, which can be used as benchmarks for network inference methods. Realistic network structures are generated by extracting modules from known biological interaction networks. These networks are then endowed with dynamics using a kinetic model of transcription and translation, where transcriptional regulation is modeled using a thermodynamic approach allowing for both independent and synergistic interactions. Finally, these models are used to produce synthetic gene expression data by simulating different biological experiments. Simulations can be done either deterministically or stochastically to model internal noise in the dynamics of the networks, and experimental noise can be added using a model of noise observed in microarrays. Another important feature of GNW is systematic evaluation of the predictions from different inference methods on in silico networks in the benchmark. For a set of network predictions from one or several inference methods, GNW automatically generates a comprehensive report in PDF format. These reports include standard metrics used to assess the accuracy of network inference methods such as precision-recall and receiver operating characteristic (ROC) curves. Furthermore, the reports include network motif analysis, where the performance of inference methods is profiled on local connectivity patterns. The network motif analysis often reveals systematic prediction errors, thereby indicating potential ways of network reconstruction improvements. We are using GNW to provide an annual network inference challenge for the DREAM project. In the past three editions, a total of 91 teams submitted about 900 network predictions to evaluate the performance of their methods on GNW-generated benchmarks.