The Corner

On Data Samples, Con’td

From a reader:


As an engineer in experimental R&D. To say the least, I am scientifically inclined. Your reader points out quite correctly the problem of small data sets. However, there are two other problems: the dirty little secrets of modeling: errors compound and validation is vital.

The sad truth is that every data manipulation you do makes the data worse. Instrument errors, sampling errors, environmental errors, these all build and increase the uncertainty. This doesn’t even take using the data. Every calculation has its own errors and these compound every time its used. This is why models become less and less accurate over time. The errors build up. Similarly, this is why the more complicated a model is the more error prone it is.

Which leads to the question of how do you know your model is right? To put it bluntly, it isn’t. All models are wrong. The important part is do determine how wrong, and under what conditions does the model give useful answers. This is called validation.

Consider something as complicated as the entire planet’s atmosphere, ocean, landmasses, and solar behavior. And consider that, unlike most science, there is no experimentation. One can’t buy a bunch of earths from a supply house and put various conditions on them and record the results.

This makes model validation challenging. In normal validation, experimental data is used to see how well the model predicts a multitude of input conditions. In climate modeling, there is only one set of data, since there is only one Earth.

This results in less validation, compared to models that can use experiments. It’s no wonder that climate models have proven themselves woefully inept at predicting current conditions when given past conditions. Give them data up until ten years ago and they can’t predict this year.

Remember that this is what they’re talking about when they say the “science is settled”.

Oh, and a reader sent me here, which is a very interesting read.


The Latest