Introduction
One of the arguments used by climate scientists to infill or interpolate data for stations with broken or “suspect” records and areas where no measured data are available is that anomalies (i.e., differences of values form some arbitrary baseline period) are correlated over long distances. I expect this is largely true, though I question this for describing temperatures for an unsampled area or for creating missing measurements. There are many statistical ways of modeling or interpolating data and these are not the subject of this essay. In this essay, I just wanted to take a look at how well temperatures at dispersed sites are, in fact, correlated, whether there are confounding factors of interest, and what their individual trends are (this last will be in a subsequent post).
Methods
I decided to use the United States Climate Reference Network (USCRN) for this exercise. The USCRN was established by the National Oceanic and Atmospheric Administration (NOAA) in 2003 to provide a well distributed network of high quality instruments in relatively pristine areas to monitor climate attributes, theoretically eliminating the need to compensate for the limitations of stations already in existence (NOAA 2023), such as station relocations, data gaps, poor instrument siting, etc. These should represent both the best longer term climate monitoring (up to 20 years) available in a spatially well dispersed network.
Monitoring Stations
I selected five USCRN stations in roughly a circle around my area, that has a United States Historical Climate Network (USHCN) station nearby in New Meadows, Idaho (below), that I have discussed previously; I also selected one a bit farther away (Moose, Wyoming) to get one that was in a similar ecological setting to my area because the closest five sites were more like desert than the forest river valley of my area. (Note: USHCN is technically “deprecated” and replaced for official purposes by a Global Historic Climate Network [GHCN] analysis. The GHCN system is analyzed differently than the homogenization employed by USHCN, but NOAA continues to populate the USHCN database and it includes a more complete record than GHCN.)
The analysis is based around the Murphy, Idaho, site because it is the closest to my area. This site is shown in dark red on the map. I have chosen not to display photos of all of the stations on this page because that would not be a good use of page space; I have included these on a separate page here.
The relevant metadata for these stations and my local GHCN station are listed in the table below. I will be placing most emphasis on Tmax because it is less smoothed than Tavg (annual average temperature) and because Tmax is closer to actually indicating warming if it is happening than Tmin.
Analysis
The obvious working hypothesis is that the correlation coefficients (R) should decrease with distance from the Murphy station. I used the latest USCRN data where the stations have differring record lengths, but for the correlation analyses they were restricted to the 15 years where every station had complete, paired record. I looked at actual temperatures rather than anomalies because I wanted data as unprocessed as possible, so I used Average Annual Mean Maximum Temperature (Tmax), Average Annual Mean Minimum Temperature (Tmin), and Average Annual True Mean Temperature (Tavg)*. I created program code in R Studio and used the "cor" routine to create a correlation matrix; ggplot2 was used to chart the trend of correlaton with distance and the ggpairs extension to ggplot2 was used to plot the correlation matrices.
I will briefly discuss the results, but my intent is mainly to display the results of the analyses without any effort to provide any in-depth analysis. I want to leave most interpretation to the reader because everyone may come to different conclusions as to how they think correlation over distance affects the ability to interpolate temperatures for areas with no or missing data. I intend to take a specific look at trends at these stations individually and in reference to the New Meadows Ranger Station GHCN station in a later post.
Results
The correlations of of all temperature metrics among these USCRN stations was a somewhat surprising. The Tmax correlation matrix is shown graphically here:
The Arco, Idaho station is the closest to Murphy and is the most stongly correlated as expected, but the farthest site from Murphy (Moose, Wyoming) was almost as strongly correlated; not only are they relatively far apart, they occupy very different ecosystms (see photos here). On the basis of distance alone, one would expect the Dillon, Montana, site to be approximately as correlated with Murphy and Spokane, Washington, to be similar in correlation coefficient to Moose, Wyoming. Interestingly, the two sites farthest from each other (John Day, Oregon and Moose, Wyoming, at 488 miles) are not the least correlated; the least correlated are John Day, Oregon, and Dillon, Montana, 330 miles apart.
With Tavg (the true mean temperature), we see a pattern generally similar to Tmax and quite different that that for Tmin. I think this can be explained by the fact that the variability in minimum temperatures is much higher than for Tmax. This is evident by looking at the standard deviations (SD) for these three temperature metrics (second below): SD is typically quite a bit larger for maximum temperatures than for minimums.
Since one of the issues here was distance and we showed that there was no consistent correlation of stations based on distance, it seemed useful to graph the relationship (below): There is a non-significant relationship of Murphy, Idaho maximum temperatures with distance from that station.
Summary
Distance between stations did not seem to be a reliable indicator of similarity for any temperature metric. This was shown by inconsistency in correlation coefficients of measured temperatures (calculated for true average, obviously) and by a downsloping but non-significant linear model of correlation with distance. The working hypothesis is therefore falsified, and some other factor must be involved. I suspect that factor is ecological setting, as these sites vary quite a bit in elevation and relationship to orographic features, both of which affect prevailing climate. Some "official" temperature data (I think GHCN, for example) contain some ecological metadata, but I haven't seen any analysis of these (but I haven't really looked, either, as of yet).
_______________
* This is different than the normal (Tmax+Tmin)/2 method for calculating average temperature (simple average): USCRN reports the simple average, but it also reports a true mean based on all obervations.






