Constraint-based metabolic control analysis for rational strain engineering

The advancements in genome editing techniques over the past years have rekindled interest in rational metabolic engineering strategies. While Metabolic Control Analysis (MCA) is a well-established method for quantifying the effects of metabolic engineering interventions on flows in metabolic networks and metabolic concentrations, it fails to account for the physiological limitations of the cellular environment and metabolic engineering design constraints. We report here a constraint-based framework based on MCA, Network Response Analysis (NRA), for the rational genetic strain design that incorporates biologically relevant constraints, as well as genome editing restrictions. The NRA core constraints being similar to the ones of Flux Balance Analysis, allow it to be used for a wide range of optimization criteria and with various physiological constraints. We show how the parametrization and introduction of biological constraints enhance the NRA formulation compared to the classical MCA approach, and we demonstrate its features and its ability to generate multiple alternative optimal strategies given several user-defined boundaries and objectives. In summary, NRA is a sophisticated alternative to classical MCA for rational metabolic engineering that accommodates the incorporation of physiological data at metabolic flux, metabolite concentration, and enzyme expression levels.

constraints. We report here a constraint-based framework based on MCA, Network Response 23 Analysis (NRA), for the rational genetic strain design that incorporates biologically relevant 24 constraints, as well as genome editing restrictions. The NRA core constraints being similar to 25 the ones of Flux Balance Analysis, allow it to be used for a wide range of optimization criteria 26 and with various physiological constraints. We show how the parametrization and 27 introduction of biological constraints enhance the NRA formulation compared to the classical 28 MCA approach, and we demonstrate its features and its ability to generate multiple 29 alternative optimal strategies given several user-defined boundaries and objectives. In 30 summary, NRA is a sophisticated alternative to classical MCA for rational metabolic 31 Introduction 36 Recent improvements in genome editing techniques have paved the way for more 37 sophisticated and performant metabolic engineering designs for achieving desired 38 physiological states of host organisms. Two approaches for reaching the targeted states exist: 39 (i) integrating heterologous pathways to disruptively overcome native control patterns, and 40 (ii) modifying the endogenous regulatory architecture by removal of the existing control loops 41 (Bailey, 1991). The former method can be rather arduous because it requires testing if the 42 integration of DNA fragments into the original genome sequence perturbs cellular regulation 43 in the desired fashion. The latter technique demands knowledge about cellular control so that 44 the DNA sequence can be modified effectively and without unwanted side effects. 45 Mathematical models are nowadays becoming an indispensable part of strain design. 46 Available gene-protein-reaction associations of various organisms provide invaluable 47 information about cellular metabolism and enable the elaboration of these models. The 48 models can be studied computationally to interrogate and analyze cellular behavior and 49 derive metabolic engineering strategies for improved cellular performance (Gombert and 50 Nielsen, 2000). Strain design requires the identification and engineering of pathways toward 51 the production of desired compounds (Hadadi and Hatzimanikatis, 2015), and mathematical 52 models can provide an invaluable insight in the process of selection of deletions, insertions, 53 and up-and down-regulation of genes encoding for metabolic enzymes. Reviews of the most 54 prominent computational tools and workflows for the strain design are provided elsewhere 55 (Costa et al., 2016;Long et al., 2015;Wang et al., 2017). 56 Metabolic control analysis (MCA) is a mathematical formalism that uses models to quantify 57 the distribution of control over metabolic states in a network such as fluxes and 58 concentrations (Kacser et al., 1995). In MCA, Control Coefficients (CCs) quantify how a given 59 metabolic flux or metabolite concentration would respond to perturbations of the system 60 parameters. This information is used in traditional rational metabolic design to identify the 61 rate-limiting steps of the network and select potential targets for engineering. Strain 62 engineering typically requires a holistic approach where one simultaneously analyzes the 63 effects of genetic manipulations on specific productivity of desired molecules, maximum 64 achievable yield, energetic and redox requirements, etc. Simultaneous analysis of these 65 effects is a cumbersome task using classical MCA tools, especially if the design requires 66 124 Figure 1. The    140 Importantly, the NRA formulation allows us to prevent thermodynamically infeasible designs 141 because it naturally includes thermodynamic constraints regarding the Gibbs free energy 142 change ( . # q ) of each reaction (eq. 15). Furthermore, the proposed formulation allows 143 imposing additional design criteria such as production robustness and operational 144 parameters (Eqs. 16-19). The NRA optimization problems can be solved with the TFA toolbox 145 Here, we examined the effects of the imposed physiological and design constraints on the 160 strain design for improved glucose uptake. To this end, we analyzed the achievable glucose 161 uptake rates with 2-fold, 5-fold, and 10-fold maximum allowable deviation of enzyme 162 activities from the reference level for a set of designs ranging from 1 to 25 gene manipulations 163 ( Figure 2a). The metabolite concentrations were subject to the thermodynamic feasibility 164 constraints (Methods), and within the predefined physiological ranges (10nM -0.1M) for each 165 cellular compartment. We allowed the fluxes to increase up to 10-fold of their reference level, 166 and both fluxes and enzyme activities could reduce to zero. The latter means that solutions 167 can include potential gene knockouts. As a mean to investigate the sensitivity of obtained 168 solutions, we repeated the study for one reference and 18 extreme sets of control coefficients 169 (Methods). 170 As the allowable enzyme activity change (Eqs. 10 and 23) increased from 2-to 10-fold, the 171 predicted attainable glucose uptake rate was about the same for up to 10 gene 172 manipulations, indicating that for a small number of gene manipulations the upper limits on 173 enzyme activity were not a limiting factor ( Figure 2a). However, starting from 13 gene 174 manipulations, the difference between the predictions increased considerably. As expected, 175 the higher limits on enzyme activity, the larger predicted improvement of glucose uptake was 176 observed. For example, NRA predicted for 25 gene manipulations that glucose uptake rate 177 would increase by 26%, 39%, and 46% for 2-, 5-, and 10-fold change in enzyme activity, 178 respectively. Interestingly, the predicted fold change of the glucose uptake across the 179 nineteen studied reference and extreme CC-sets varied similarly for the designs with 13 or

243
Next, we focused on finding which were the metabolites whose concentration constraints 244 should be violated to improve glucose uptake. To this end, we studied the case of four 245 violations and two, four, and seven gene manipulations. For each gene manipulation study, 246 we obtained the unique sets of four metabolite concentrations violating constraints (Table  247 3a). The three gene manipulation studies involved, in total, seven species with concentrations 248 violating the thermodynamic constraints. Among the seven species, peroxisomal protons 249 appeared in all three studies. Moreover, irrespectively of the study, to achieve a higher 250 glucose uptake, the concentrations of protons (both cytosolic and peroxisomal), AMP, and 251 phenylalanine needed to be increased, while the ones for CTP, dCTP and glutamine needed 252 to be decreased. The violations ranged from 2% for the case of cytosolic hydrogen to 57% for 253 the case of CTP (Supplementary Table S1). 254 This analysis provides an opportunity to focus on each of the identified molecules, draw 255 hypothesis about their role in the system limitations, and investigate these interplays and 256 ways to overcome them in vitro. For example, it suggested that the pH value in compartments 257 can be a limiting factor for metabolic design. 258 259

Metabolite concentrations
NRA design for Pyruvate production considers together specific production rate and yield 264 Pyruvate (pyruvic acid) is widely used in the food, chemical, and pharmaceutical industries. It 265 is a precursor for the synthesis of various amino acids, and has been used for the production 266 of antioxidants, food additives and supplements, pharmaceutical precursors, and biofuels 267 were subject to the thermodynamic feasibility constraints and the physiological ranges (10nM 283 -0.1M). Given these constraints, we first performed an optimization to determine the 284 maximum yield of pyruvate from glucose. Then, we added the pyruvate yield to be at least 285 90% of this value to the set of constraints and maximized the specific pyruvate productivity 286 rate. In this manner, we were able to implicitly account for the potential tradeoffs of yield 287 and productivity that can occur in such designs. 288 We generated 51 alternative designs with five gene manipulations providing at least 99% of 289 the maximum specific productivity rate of pyruvate and fulfilling the imposed constraints. The 290 alternative designs involved the manipulation of genes corresponding to 48 distinct enzymes 291 (Supplementary Table S2 The transport of pyruvate from the cytosol to the periplasm (PYRt2rpp) appeared as a target 299 in all designs with 50-fold upregulation of the PYRt2rpp encoding gene (Figure 4). The 300 upregulation of glycolytic enzymes and enzymes leading to pyruvate synthesis would also 301 improve pyruvate production, with the most prominent target being glycerate kinase 302 (GLYCK2). We also observed knockouts (or significant downregulations) with the majority of 303 downregulated genes involving the consumption of pyruvate towards the formation of 304 byproducts. Among these, the periplasmic transport of glycerate (GLYCAt2rpp) was present 305 in most generated sets, being replaced by the extracellular transport of citrate (CITtex) in a 306 few cases (Figure 4). We also observed the knockout of PPS (Phosphoenolpyruvate synthase), 307 which is associated with the conversion of pyruvate to phosphoenolpyruvate. 308 309

371
The thicker arrow in the colored boxes, the higher change in the activity occurred.

373
The fifth group, composed of alternatives 5-7, 9-12, 21-24, 28, 30, 31, 33, 43, 45, and the 374 second group have in common manipulations (i), (ii), and (iv) (Figure 4 and Supplementary 375 Table S2). Additionally, the fifth group involved either a very slight upregulation of glycolytic 376 enzymes PGM, ENO, and PGK (alternatives 11, 12, 21-24, 31, 43, and 45)  Comparison with targets determined by looking only at unconstrained specific productivity 388 We proceeded by examining how different are the targets obtained with the NRA design from 389 the ones determined by looking only the specific productivity rate of pyruvate. This 390 comparison will reveal how the physiology and design constraints affect our design decisions. 391 To this end, we computed the mean values of the control coefficient of the specific 392 productivity rate of pyruvate with respect to network enzyme activities, and then ranked 393 them according to their absolute value. Most of the top 15 enzymes represent either 394 extracellular transports such as oxygen uptake and ammonium secretion, as well as glycolysis 395 reactions leading to the synthesis of pyruvate (Table 4). Interestingly, the majority of these 396 enzymes do not appear as targets in any of the NRA alternatives (Table 4 and Supplementary  397   Table S2). Some of these enzymes exhibit a large control over multiple fluxes and 398 concentrations across the metabolic network. These are, therefore, severely constrained by 399 the imposed specifications in the constrained NRA design. This suggests that the NRA 400 formulation will favor parameters that have less control over the network, ensuring that 401 cellular balance will not be excessively perturbed. 402  Since the model is constrained to grow on minimal media with glucose as its sole carbon 463 source, the choice of the representative set will have a strong impact on the design criteria 464 we wish to explore. To investigate the variability in results that this choice can induce, we 465 additionally selected several "extreme" CC-sets through the use of PCA. We used nine 466 principal components to describe the space of CCs with respect to glucose uptake, which lead 467 to a coverage of 96.63% of the space variance. We selected the minimum and maximum 468 corresponding CC-sets for each component (2 x 9), leading to a total of 19 sets. We then 469 constructed 19 NRA models with these CC-sets and used them in the performed studies. Since the activity of an enzyme in the metabolic network could either be increased or 497 decreased, but not both at the same time, we made use of integer variables in the 498 formulation. Therefore, we split the catalytic activity deviations of our system, ' , into the 499 continuous variables ' j and ' k , which denote the upregulation and downregulation of the 500 gene encoding for enzyme k, respectively (Eqs. 11-14). As these should not have nonzero 501 values simultaneously, we define the integer binary variables ' jj and ' kj . ' jj equals one 502 if the gene catalyzing the enzyme k is upregulated and equals zero otherwise. In contrast, 503 ' kj equals zero in the case of upregulation, and it is one for downregulation. As expressed in 504 Eq. 12, only one of these variables can be active at a time, since deregulation cannot occur in 505 both directions simultaneously, or they can both be inactive for the case of no change in the 506 respective enzyme's catalytic activity. To complete the formulation, these variables are 507 further coupled to the above defined split enzymatic deviation variables through Eqs. 13 and 508 14. The integer binary variable ' is equal to zero if the activity of enzyme k is modified in the 509 solution, and it equals to one otherwise (Eq. 11). ξ is a constant selected to be larger than the 510 absolute value of the largest enzymatic deviation constraints, X and X , defined in Eq. 10. 511 512

Software and optimization parameters 513
The computations were made on a Mac Pro workstation running Mac OS X version 10.11.6, 514 equipped with a 2.7 GHz 12-Core Intel Xeon E5 processor and 32GB DDR3 memory, using 515 MATLAB version R2016a and the IBM CPLEX solver version 12.5.1. Time limits for the solver 516 were set as following: in Figure 2(a), for 2-fold (blue line) to 10 minutes, for 5-fold (orange 517 line) to 30 minutes, and for 10-fold (yellow line) to 3 hours; in Figure 2(b), for all cases to 10 518 mins; in Figure 3, for all cases to 30 minutes; in Table 2, for all cases to 30 minutes; in the 519 pyruvate case study (Figures 4-7), for all cases 3 hours. 520 521

522
The NRA framework enables the consistent and sophisticated design of metabolic engineering 523 strategies using MCA-based control coefficients. NRA is computationally faster and simpler 524 than other approaches since the derivation of control coefficients does not require the 525 numerical integration of non-linear kinetic models, and offers the implementation of a wide 526 variety of metabolic engineering criteria. To our knowledge, this type of approach has never 527 been applied to large or genome scale kinetic models of metabolism. Using a previously 528 published large-scale kinetic model of E. coli, we demonstrated that the NRA formulation can 529 be applied to large-scale metabolic networks. We used the PCA method to select a number 530 of representative sets of kinetic parameters among their population, in order to effectively 531 represent the uncertainty and flexibility of the kinetic model in respect to parametrization. 532 One of the main advantages of NRA is that, being a constraint-based modeling method, it can 533 accommodate the integration of biologically relevant bounds and constraints, which ensure 534 that the proposed strategies are consistent with the entire system capabilities and limitations 535 thereof. Since the NRA model predictions can be sensitive to the user-defined bounds on the 536 allowable reaction flux, metabolite concentration and enzymatic expression deviations, the 537 importance of including relevant physiological constraints, such as thermodynamic feasibility 538 constraints, was discussed extensively. Focusing on the case of pyruvate production, a 539 compound of great industrial interest, viable metabolic engineering strategies were shown to 540 be readily derived using this formulation. Alternative solutions could also be generated and 541 evaluated on their efficiency and potential implementation. We believe that this formulation 542 will provide a refined alternative to computational genetic design, due to its simplicity and 543 modularity, and that it will continue to be enhanced through the introduction of ever-growing 544 omics data, and additional specialized constraints and objectives.  Table S1: Metabolite concentration violation magnitudes for designs with two, four, and 555 seven gene manipulations. 556 Table S2: List of the 51 generated alternative designs with the corresponding manipulations 557 and magnitudes of manipulations, pyruvate productivity, and yield. 558 Table S3: List of aerobically grown E.coli model reactions, metabolites, and parameters 559 considered in the study. 560