O’Hagan, A. (2006). Bayesian analysis of computer code outputs: a tutorial. Reliability Engineering & System Safety, 91(10), 1290-1300.
This is an excellent paper for describing some of the essence of the Bayesian statistical approach to model calibration and validation. It has a strong emphasis on emulators, being statistical models to represent the output of a numerical model, but it is the uncertainty associated with the emulator which is the common theme among all Bayesian works, whether this is to represent the uncertainty in the emulator or the numerical model itself.
The paper is laid out to deconstruct the taxonomy of the statistical approach to model uncertainty quantification and is somewhat different from that of the model developer approach to V&V.
A simulator is the numerical model in question. This includes the conceptual model and the numerical implementation. [This distinction is subtle and important as the Roy approach carefully separates all of these parts of a model.] A simulator is given the symbology $f(\,.)$.
“The outputs y of a simulator are a prediction of the real-world phenomena that are simulated by the model, but as such will inevitably be imperfect. There will be uncertainty about how close the true real-world quantities will be to the outputs y. This uncertainty arises from many sources, particularly uncertainty in the correct values to give the inputs x and uncertainty about the correctness of the model $f(\,.)$ in representing the real-world system;”
The purpose of the process is to “quantify, analyse and reduce uncertainty in simulator outputs”.
This section is trying to draw the distinction between Bayesian and frequentist statistics. Frequentist statistics is familiar, in that it is the probability we all know and hate, i.e. it the probability that something might happen given an infinite occurrences of the event. This is described as aleatory uncertainty, as it concerns the outcomes of random events.
As expected, a great deal of uncertainty is not due to inherent randomness, but from not knowing enough about a parameter, for instance, through lack of knowledge. Its not random in itself. This is an epistemic uncertainty. “Within the Bayesian framework we are free to quantify all kinds of uncertainty, whether aleatory or epistemic, through probabilities.”
[Bayesian probability is often presented in probability density functions where the area under the curve represents the probability for an interval. This is why there are sometimes high numbers on pdf graphs. See probabilitydensity.pdf.]
Emulators are statistical models of the outputs of simulators. This approximation to $f(\,.)$ is given as $\hat{f}(\,.)$. The important distinction is made that $\hat{f}(\mathbf{x})$ is the mean of the statistical emulator and that there is a statistical distribution about the mean, which describes the uncertainty of the emulator to the true $f(\mathbf{x})$.
Emulators are built by training them to data from $f(\,.)$, providing an estimate of $f(\,.)$. The emulator must then:
The best choice of emulator to achieve this is a Gaussian Process emulator (GP).
The uncertainty created by using an emulator as an approximation to $f(\,.)$ is called code uncertainty.
A large section is dedicated to demonstrating the application of a GP emulator, but that is not necessarily important for the Bayesian concepts.
“If the input vector $\mathbf{x}$ is uncertain, we can consider it as a random vector. In statistics, we denote this by writing it as $\mathbf{X}$. The output is of course now also a random variable, and we write $Y=f(\mathbf{X})$”.
The probability distribution for $\mathbf{X}$ is denoted as $G$.
Monte Carlo techniques can be used on sample inputs from $G$ and compute them in the emulator. However, the uncertainty can be analytically calculated from the GP using uncertain data $\mathbf{X}$. $G$, can be used to get the emulated mean $\hat{M}$, and a associated uncertainty distribution which will be normal, assuming $G$ is normal. Thus the simulator has only been required to generate the GP emulator.
Note that its the uncertainty model in the GP that is most important here, rather than the GP itself. As seen in other papers, the uncertainty model can be embedded in the simulator and then inferences made from that. This does remove a number of the efficiency advantages of using an emulator, however.
Emulators can be used for sensitivity analysis, reducing the computational burden on the simulator significantly.
Calibration can be achieved for model parameters. The approach gives an uncertainty measure in the inputs which is reduced, but not lost. The validation metric is known as a model discrepancy function which can also be considered as a GP.
There seem to be a number of issues with emulators, particularly to do with multiple outputs and computing the GP itself.
There is also a problem that the emulator may be unnaturally smooth compared to the reality of the computation.
Correlation between outputs is an issue. Emulators can be built for multiple outputs separately, but if they are correlated, information may be lost.
Emulators don’t seem to do “dynamic” models, like time series I guess.
Bayesian approaches can be used for validation (as seen in other papers), by setting bounds on the uncertainty in the model emulator, or otherwise.
No general softwares exist for this yet.
I need a \$1. I might also need another \$1. $e^{i\pi}=1$