Optimized Data Exploration of Recombinant Fermentations using Neural Network Simulations

E. Dürrschmid, and K.Bayer

 Institut of Applied Microbiology, University of Agricultural Sciences, Vienna

Muthgasse 18B, A-1190 Wien, Austria

 Abstract: A novel approach to bioprocess optimization is to focus on the signal processing capabilities of the cell. The analysis of specific signals, such as Guanosinetetraphosphate, enables better understanding of the relations between metabolic load and recombinant protein production, to make use of the cell´s synthetic capacity. Due to the fact that analysis of these signal molecules is a difficult task, it would be highly beneficial to model their appearance and concentration. Kohonen´s Self Organizing Maps were used for input data selection improvement, furtheron Neural Network models (Radial Basis Function Networks) representing offline data were developed and first results are presented.

Keywords: Fermentation Processes, Process Control, Neural-network models, Radial base function networks, Productivity

 Introduction

One of the major problems in the optimization of recombinant bioprocesses today, is the lack of knowledge about the key variables for optimal process supervision.

Viewing the bioprocess as a blackbox, most approaches to optimization are empirical. The reasons for the failure to control a bioprocess as well as e.g. a chemical process, are substantial backlogs in the establishment of novel measurement methods and discovery of important key variables to gain deeper insight into the metabolic state and capacity, resulting in an unability to monitor relevant variables and therefore making highly effective process control difficult.

Molecular biologists have made many attempts to unravel the structure of metabolic control that makes a single cell organism capable of reflecting environmental changes in adaptive processes. This enables the cell to handle tasks of such differing metabolic relevance as the demand for a higher concentration of a single enzyme and in contrast e.g. the regulation of adaption to sudden depletion in nutrient supply.

Recent findings point out a hierarchically structured organization of the cells metabolic reactions. This structure enables the cell to activate even widely distributed and complex mechanisms due to intrinsic changes (Lengeler, 1996).

 

  1. Intracellular signal processing and metabolic complexity

A novel approach to advanced bioprocess control, is to look at the bacterial cell as a signal processing machine, transducing environmental signals over its membrane, interpreting them and reacting in a nearly optimal way to maintain at least essential metabolic functions equilibrated as long as possible.

For this reason there seems to be the need for a broader view on intracellular capacity boundaries as it is the case when the cell is forced to exploit its energy and building block pools for recombinant protein synthesis.

To cope with such complex tasks, like to coordinate adaptation processes involving hundreds of genes distributed widely in the genome, microbial cells have evolved a great variety of regulatory networks. The occurence of a so called ´bottle neck´ cannot entirely be refered to the reaction catalyzed by an individual enzyme. However a multitude of global regulatory networks, reactions and flux rates has to be considered when analysing the potential capacity of a certain metabolic pathway.

The hierarchical structure of metabolic control falls into three regulatory levels:

Physiological flexibility is evidently an evolutionary criterion, thereby forcing microbial cells to maintain a plethora of idling metabolic path-ways. The coordination of such a complex network creates the need for sophisticated control strategies as realized in its hierarchical structure. The formation of `modulatory signals` is triggered by environmental changes. These global signal molecules are distributed throughout the metabolic network to specific receptors. There they are interpreted and generate locally individual responses targetted to the overall goal of adaptation (e.g. stringent response network). So a stimulon may affect up to a third of all the cell activities and therefore contains most information about the metabolic state of the cell.

From the viewpoint of bioprocess optimization it would obviously be highly beneficial to monitor the concentration of these molecules involved in controlling high level regulatory units like the stimulons, e.g. the intracellular concentration of guanosintetraphosphate, a signal molecule triggering the stringent response (Cserjan-Puschmann et al., 1997).

However, the analysis of these molecules is difficult, prone to complex chemical reactions, analytical hardware errors and carries all the disadvantages common to offline-analysis. Main problems are that offline samples cannot be taken with a high enough frequency and their outcome is available too late for the operator to react properly.

On the other hand there is a great amount of data that are monitored online and easily available. Exploitation of these data could be optimized by establishing novel methods of process modelling like neural network simulations. The goal should be to establish neural network representations of interdependencies between easily obtainable online and offline data.

 

Materials and Methods

The fermentation data used were obtained from fermentations using recombinant E. coli HMS 174 (DE3)pet11arhSOD including

 

Modelling of bioprocesses requires tools that are capable of coping with the nonlinearities encountered in most of the examined biosystems. Most statistical tools that are widely used in system modelling assume linear relationships (e.g. Partial Least Squares, Principal Component Analysis) and for this reason can produce suboptimal system representations.

Neural Network Simulations provide a possibility to achieve nonlinear representations of complex systems without the need to specify a huge set of parameters as required by classical mathematical models, resulting in development times too long to be efficient.

Surveys on modelling approaches show that neural networks achieve superior results in modelling biological systems compared to linear statistical methods and are therefore the most promising applications (Warnes et al., 1995; Joseph et al., 1992).

This type of neural network simulation was chosen because of its rapid convergence times and the short implementation period.

RBF networks are structurally similar to feedforward networks, consisting of an input layer that distributes data to a hidden layer, carrying the Gaussian radial basis function and an output layer where the outputs are to be read of.

The input layer has the sole task of distributing the input patterns over connections with fixed weights towards the hidden layer. Here the distance between the input pattern and the centers of the RBF is used to compute an output. These results are transferred to the output layer via weighted connections where they are linearly transformed, providing the networks output (Leonard and Kramer, 1991).

RBF networks require only few parameters to be set and are therefore easy to establish and use.

The programs used for RBF network evaluation were written using MATLABTM utilizing the neural network toolbox, a program package making the complex methods of neural network simulation easily accessible.

A persisting problem in neural network implementation is the optimal selection of input values. RBF networks in contrast to sigmoidal feedforward networks have the disadvantage, of not being able to selectively ignore non-information bearing data, therefore requiring careful selection of inputs (Montague and Morris, 1994).

The process of input selection often requires long time periods for empirical testing of optimal sets as well as consuming a lot of personell and computing ressources in implementation of neural network applications. A novel and tremendously efficient approach to select inputs of high relevance was found in using a Software Tool, based on Kohonen Self Organizing Maps (Viscovery SOMineTM) to quickly extract underlying features from the fermentation data matrices.

Viscovery SOMineTM is a commercially available Software package (Eudaptics, Austria) using the concept of Kohonen´s Self Organizing Map. Kohonen networks are able to implement a nonlinear mapping, filling the multidimensional space of the input data with neurons centered around the spaces containing most of the data points.

Kohonen Self Organizing Map (SOM). SOM´s are realized simulating a two dimensional layer of neurons, that are interconnected in a way that each neuron has four connected neighbours. Each neuron is fed parallel with the information contained in the input data. The neuron c with the smallest distance to the input vector (= winning neuron) gets activated at strongest:

The formula (Eq. 1)

where x represents the input vector and wi the neuron index, is called the nearest neighbourhood classification. The weight vectors of neighbouring neurons are adjusted in the same direction as the winning neuron and therefore get more similar with each iteration, whereas the neurons in the outer regions are fed with the outlying data points from the input matrix. These mathematical operations explain the tendency of SOMs to autonomously fill the data space (so called self organization) (Bouton and Pages, 1992).

The major advantage of the application of SOM network is the ability to map a multidimensional input matrix of on-line and off-line data into a lower dimensional output matrix sorting out the relevant relationships. Viscovery SOMineTM maps the output data into two-dimensional representations, where each cluster of data is coloured differently, making the visual exploration of nonlinear relationships a simple task, because each elements partition in the clustering of the data matrix can be shown separately as coloured feature map (Fig. 2). This approach to data preselection provides rational and highly efficient input selection criteria for modelling with Radial Basis Networks.

Results

Prior to the application of Viscovery SOMineTM a great amount of empirical testruns were performed to find an optimal input data-set. The analysis of fermentation matrices showed the already known high correlation between optical density and biomass dry weight (BTS), but also good correlation between consumed base and BTS (see Fig. 2). Using these as input variables, the best results were obtained, whereas using different or even more inputs just added noise to the estimation. It is therefore not necessary to compute large amounts of RBF testruns to find the optimal combination of input data. The output of the Viscovery SOMineTM calculations is analyzed visually and the input data with the highest correlation to the desired output are chosen for the RBF modelling process.

Using optical density (which can be analysed in a few minutes after sampling) and consumed base, very promising results were obtained. The fermentation modelled consists of a batch and a fed-batch phase, the latter under substrate limitation-controlled growth rate (Fig. 3).

The Mean of Squared Errors (MSE) for the estimation is 2.995 for a span of estimated Biomass-values of min.: 3.4 g/l and max.: 17.43 g/l. A Radial Basis Function Network with a spread constant of 128.5 was used for modelling.

Through the application of the Viscovery SOMineTM a great improvement has been made in the process of input data set selection because all other combinations of input variables that were tested, just added noise to the model, resulting in an increase in MSE.

Figure 3: Real values of biomass concentration (fat line) and estimated values (thin line).

Conclusion

The key advantage of SOMs is the formation of clusters, which helps to reduce the input space into representative features using a self organization process. Hence the underlying structure is kept, while the dimensionality of the space is reduced. The precisely selected input-output combinations are computed by RBF networks, which achieve very good modelling results, due to the elimination of meaningless inputs.

The combination of SOM and RBF networks is a powerful tool, applied to fermentation data, enabling rapid recognition of interdependencies and subsequent modelling. The esta-blished methods will be used to especially screen for interdependencies between online-data described above and highly relevant offline-data like guanosinetetraphosphate concentration.

 

References

Bouton, C. and Pages, G. (1992) Self-Organization of the Kohonen Algorithm; Proceedings of the Aspects Thèoretiques des Résaux de Neurones, Paris, July 1992

Cserjan-Puschmann M.; Kramer W.; Dürrschmid E., Bayer K. (1997) Quantification of the Signal Molecule Guanosine-5´-Diphosphate-3´-Diphosphate and other Nucleotides in industrial relevant Cell Culture Samples, submitted to BioTechniques

Joseph, B., Wang, F.H. and Shieh, D.S.S. (1992); Exploratory Data Analysis: A Comparison of Statistical Methods with Artificial Neural Networks; Comp. Chem. Engng. 16; (4), 413-423

Lengeler, J.W. (1996) Basic Concepts of Microbial Physiology and Metabolic Control, p 232. Proc. Bioprocess Engineering Course, Saltsjöbaden, Sweden, June, 14 - 18. 1996

Leonard, J.A. and Kramer, M.A. (1991) Radial Basis Function Networks for Classifying Process Faults; IEEE Proceedings; 2; (3), 31-38

Montague, G.; Morris, J. (1994) Neural Network Contributions in Biotechnology; Tibtech, 12, 312-324

Warnes M. R.; Glassey J.; Montague G. A.; Kara B. (1995) On-Data based Modelling Techniques for Fermentation Processes; Process Biochemistry 31, (2),. 147-155