Introduction
I have often been critical on Twitter of adjusting climate data and the continual creation of new "official" data sets via "deprecation". As a former data developer, I understand the reason why one wants the most current information to be one's first choice in analysis, but I am critical of how it seems to be used as an excuse for providing deceptive information. The deprecation, of course, mostly refers to the adjusted (occasionally identified as "corrected") data from temperature stations. In these Twitter conversations, I often hear how "I don't understand the purpose of homogenization" and how the process is robust and needed to compensate for non-climate issues with stations, such as relocation, missing data, sensor changes, etc. The World Meteorological Organization explains it this way:
"The aim of climate data homogenization is to adjust climate records, if necessary, to remove non-climatic factors so that the temporal variations in the adjusted data reflect only the variations due to climate processes."*
These are certainly valid issues and I've dealt with them many times. For most of my career, data I collected and used was considered proprietary** and only reports and summaries were made publicly available because:
(a) in research, where publication counts matter, we didn't want to help other researchers undercut our work,
(b) we could not be sure whether external users would understand or be aware of these issues described above, and
(c) we didn't want to be held responsible for conclusions that we did not draw from the data.
One thing we concentrated on carefully was site metadata because field crews occasionally got lost in the woods and the sites were not geographically fixed because streams move around; one thing we did not do was adjust data when we thought it was incorrect. We did use models in quality control (QC) where appropriate: We simply eliminated all data in those years in which our QC test failed from analyses that included statistical testing. ***
But this is about "deprecation". We deprecated analyses by releasing updated reports approximately annually; we did not do it by changing raw data to "corrected" data should be according to the QC model.
Official climate modelers do exactly that, and an in-depth case study shows what can happen in the case of an individual station.
Case Study
Overview
One thing one hears is that adjustments are needed to correct for non-climatic issues (see above). These should be explained in the metadata and used to specify where the issues occur. This does not seem to occur in any explicit or standardized way. I regularly ask Twitter's #ClimateHysterians to explain the specific corrections needed at the Global Historical Climate Network (GHCN) weather station at the Payette National Forest ranger station in New Meadows, Idaho, that I have discussed previously (USC USC00106388)†. The answer I always get, of course, is "crickets". I assume this is because they have no idea but assume it needs to be adjusted to show an increase over time despite the fact no such trend can be seen in the raw data:
There are, of course, studies that show homogenization to be useful and I have little complaint about it with respect to calculating a global temperature index; however, I dispute the notion that adjustments (which I will call "estimates") at individual stations must necessarily be more accurate or useful than actual measurements, which are temporally stable estimators of local conditions.
Specific Issues
Underlying all discussion of estimated data are two implicit assumptions:
(1) the modeled results are more accurate than the raw data measured at individual stations, and
(2) the older data sets are not useful.
The latter is somewhat reasonable for calculating a global index for monitoring global change, but absurd if it also leads to the notion that vast amounts of local data should not be used. In the case of GHCN, Idaho has just two stations, one in New Meadows and one at Lewiston.†† In contrast, the USHCN data set contains data from 29 locations in Idaho; fortunately, while officially considered deprecated, these stations are still maintained and updated by the National Oceanic and Atmospheric Administration (NOAA). The following shows how the New Meadows USHCN data through 2022:
The homogenization process has taken a more-or-less stable lack of trend over the course of a little more than 100 years of observation into a trend suggesting increasing temperature. Careful inspection will also show estimates that have no corresponding raw data in the data set (why they are missing is unknown to me); these can only have been fabricated in some manner. This station has been moved around as suggested in the image above with the numbered balloons showing locations over time; it is not certain that these are the only locations, but are the most probable recent locations suggested by NOAA's poor metadata and personal observation. It seems likely that the first location was in the old town location of Meadows about 2 miles to the east, which was the postal office at the beginning of the 20th century (the balloon labeled "P" shows the current post office, which the NOAA metadata use as a reference point).
"Official" Data and Deprecation
Underlying the idea of homogenization is that there is some implication that the results are somehow more accurate than the observations. I don't believe there are any tests of that hypothesis for this site, but all "official" analyses have made changes. The interesting thing about deprecation is that the result (i.e., the reported temperature) continually changes when the homogenization algorithms are applied. NASA acknowledges that with the following statement (emphasis added):
Q. Do the raw data ever change, and why do monthly updates impact earlier global mean data?This is more obliquely stated in the USHCN documentation (emphasis added [GHCN provides a similar statement]):
A. The raw data always stays the same, except for occasional reported corrections or replacements of preliminary data from one source by reports obtained later from a more trusted source.
These occasional corrections are one reason why monthly updates not only add e.g. global mean estimates for the new month, but may slightly change estimates for earlier months. Another reason for such changes are late reports for earlier months; finally, as more data become available, they impact the results of NOAA/NCEI's homogenization scheme and of NASA/GISS's combination scheme due to the presence of data gaps (see also the answer to the previous question).
At this time, the National Climatic Data Center does not maintain an online ftp archive of daily processed USHCN versions. The latest version always overwrites the previous version and thus represents the latest data, quality control, etc.I have been looking at the raw and adjusted data for awhile, and I had a few older downloads of USHCN maximum monthly temperatures (Tmax) that I decided to look at and found this:
Clearly, the data change (even the data lacking a raw observed counterpart), and not necessarily by small amounts (at least, I don't consider one full ºC "small").
It gets even more interesting when we look at GISS visualization and then use the data. This is a screen grab of the result of plotting from their interactive tool for the New Meadows station:
Note that it says "Based on GHCN data from NOAA-NCEI", leading one to presume that a similar result would accrue by independently plotting the GHCN data. This I have done:
The two data sets do not match up very well until after the station move in 2000. But what's really interesting to me is that the GISS data don't even match the visualization obtained from their website and shown above. It is also apparent in the background image of the above chart, which shows the MMTS station at the New Meadows Ranger Station that it is poorly sited. Specifically, although I have not measured distances, it is clearly close enough to be influenced by heat exchangers, buildings, and asphalt.
Conclusions
It seems that the "official" climate record is not what proponents claim. While adjustments to data may be useful in estimating trends when there may be (in this case were) changes to station conditions when trying to calculate a global index, it can hardly be definitively asserted that the adjusted data points are more accurate than measured data points. One thing about measured data is that it is stable, does not change from year to year; we can see that this is not true of the adjusted estimates. One naturally wonders how any value can be considered accurate when it changes regularly; as the database changes the data changes with older estimates eliminated by "deprecating".
And why does GISS plot station data that doesn't reflect the data that it claims to rely on? They almost certainly use a slightly different homogenizing method than NOAA, but why do they show a plot that differs from what one gets when downloading and plotting the data independently? This is an absurdity that obviously makes one question the credibility and integrity of the system. A closer look at the downloaded GISS plot above suggests a long term declining trend at this station, which agrees roughly with the raw USHCN data trend. However, the current GHCN homogenized data trend is reversed:
What does one do for estimating temperature trends for other Idaho stations? If the data for this station are unreliable, as I think they may be given the above, can it reasonably be extrapolated to other stations? There is another station at Cambridge, Idaho, that is also surrounded by the Payette National Forest that is not looked at in detail here, but should we assume we can tell what trends it has from this unreliable data from a station about 35 miles northeast? Seems unlikely to me.
I know that I was responsible for collaborating on a variety of assessments as a U.S. Forest Service employee and that some I reviewed had boilerplate language in them talking about increasing air temperatures on the Payette National Forest. That is certainly true if the GHCN adjustments at this station surrounded by the Forest are correct, but that is obviously uncertain if one takes a deeper look. There are also questions when one looks at water temperatures in streams on the Forest, but that is a subject for a different analysis.
__________
* https://community.wmo.int/climate-data-homogenization
** Even now, I think one must invoke the Freedom of Information Act (FOIA) to get data and other non-public information from the Payette National Forest.
*** This applied mainly to a measurement called "cobble embeddedness" that required metrological skill; fuller information can be found here. We were, in fact, attempting to develop a technique for identifying specific sites that had incorrect data so that we could keep some of the data from a "bad" year and we were considering methods to estimate cobble embeddedness from other data so that projects subject to embeddedness guidelines specifically could be used.
† Also identified as U.S. Historical Climatology Network (USHCN USH00106388).
†† This is incorrect. I have not yet determined whether all USHCN stations are in the GHCN system, but there are more than two. Naming changes probably led to my error. The biggest issue is that GHCN is updated more slowly than USHCN and lags behind by a year or two. In addition, daily data for GHCN v4 do not yet exist. - Added: 04-18-2023, RLN.




