GYGA Protocol for Weather Data
Step-by-step procedure:
- Minimum data requirements
- For direct use in crop models: >10 years (preferably after 2000) with <20 consecutive missing days for Tmax & Tmin (<10 consecutive missing days for precipitation) and at least 80% of the data for each year are available.
- If data requirements in (1a) are not met, weather will be propagated based on, at least, three years (after 2000) of measured daily temperature (Tmin/Tmax), while we will use crude TRMM (or NASA) data on precipitation, in all cases with the purpose of having at least 10 years of daily weather data available for simulations. All available years of observed data should be used for calibrating NASA Tmax and Tmin (and Tdew if available). 15-d and 30-d TRMM precipitation totals have better agreement than NASA when compared against observed precipitation (van Wart et al., 2015)
- If three years of measured data are not available for propagation of temperature, we will use crude NASA data on both temperature and precipitation (at least 10 years). We note that ‘crude' NASA performs better than other sources of gridded weather data such as CRU, NCEP, and aWhere (van Wart et al., 2013a).
- Whenever available, we will use measured solar radiation. If not available, crude NASA data on solar radiation will be used. Solar radiation can also be estimated from sunshine hours or temperature but these estimates have to be locally (and satisfactorily) validated against subsets of observed data.
- Calculation of reference evapotranspiration (ETo) following some of the classic approaches sometimes requires data on dew point temperature (Tdew), humidity and wind sped. Whenever available, we will use measured data for these variables. If not available, we will use Tdew from crude NASA data, estimated from Tmin, or calibrated from any short-term data on Tdew that may be available. In the case of wind speed, a constant value of 2 m/s is assumed, unless observed wind speed data are available for some years; in those cases, the average wind speed from those years will be used to fill out missing data.
In all cases, recent data are desirable (preferably after 2000) to prevent possible climate change effects and be consistent with NASA/TRMM data. We also note that more years of weather data (>10 years) maybe desirable for countries with high year-to-year variation in weather and yields (see Grassini et al., 2015).
- Station location (coordinates) double checked.
- Data will be QC:
- Check data against threshold values for each variable (see Appendix I)
- Check date (dd-mm-yyyy) sequence
- Check cases of Tmax<Tmin and repeated values
- Check number of days for leap years
- Check annual data: cumulative annual rainfall, cumulative annual radiation, and average min and max temperatures
- Check non-numerical characters
- Check that data meet the requirements indicated in (1a, b) above.
- Following the principle of including as much observed weather data as possible, we will generate complete 10+ (preferably 15-20) years of daily weather for crop simulations using the following protocols:
- IF data requirements in (1a) are met, THEN, use linear interpolation to fill out missing Tmax & Tmin data and NASA to fill out missing precipitation data. See appendix II for alternative methodology for countries with dense meteorological networks.
- IF conditions in (1a) are not met
- AND correlation between NASA vs. observed data has R2=0.35, THEN, use propagation to fill out missing Tmax & Tmin and TRMM (or NASA) to fill out missing precipitation data. Generate 10 (preferably 15-20) years of weather data (van Wart, 2015).
- AND correlation between NASA vs. observed has R2<0.35, THEN, use crude NASA data on temperature (1c above) and TRMM (or NASA) to fill out missing precipitation data.
- QC (as per #3) will be applied again on the weather data after step (4)
- Data will be generated in the GYGA format; model users should convert the data into specific model formats themselves.
We note that sometimes the R² criteria may not be sufficient to assess the strength of the relationship between observed and NASA data. For example, if R² is relatively high (>0.35) and the slope is near zero, resulting propagated weather data will exhibit much lower variability relative to the measured weather data that were used as basis for the propagation. This situation is likely to happen in regions with complex topography or when measured weather data are of poor quality. These cases have been identified by visual inspection of the data after propagation, and removed from the public weather data posted on this website when only propagated data can be made publicly available (see appendix III).
APPENDIX I - THRESHOLDS FOR QC OF WEATHER DATA
| Solar radiation a | Tmax | Tmin | Precipitation | Mean RH | Mean Tdew | Mean air actual VP | Mean wind speed |
---|---|---|---|---|---|---|---|---|
Units: | MJ m-2 d-1 | °C | ºC | mm/d | % | °C | kPa | m s-1 |
Thresholds: | ||||||||
Upper | 50 | 50 | 50 | 300 | 100 | 50 | 10 | 50 |
Lower | 1 | -40 | -40 | 0 | 1 | -40 | 0 | 0 |
a Thresholds are based on team discussions, supported by Hubbard et al. (2005, 2007). Incident solar radiation won't be allowed to exceed extraterrestrial solar radiation estimated for a particular location and day. Tmax/Tmin: maximum and minimum temperature; RH: relative humidity; Tdew: dew-point temperature; VP: vapour pressure. |
APPENDIX II - COUNTRIES WITH DENSE NETWORKS OF METEOROLOGICAL STATION WITH LONG-TERM (10+) DAILY WEATHER DATA
For countries where the weather stations network is dense (defined roughly as 3 weather stations within 150 km from each other) and have long-term (12+) daily weather records, such as it is the case of USA, Argentina, Germany, an alternative, perhaps superior, method to (1b) above is to correct/fill data by looking at the correlations between the target weather station and the two adjacent ones. Briefly, weather data for each selected RWS are subjected to quality control measures to fill in missing data and identify and correct erroneous values that occur due to technical problems common in weather data acquisition. A spatial regression test (SRT) (Hubbard et al., 2005) was used to check and correct weather data at a given RWS against data from nearby stations based on the strength of correlation between nearby and reference station data. Developed for use in the Midwest USA, this QC method was found to outperform other QC approaches in a wide variety of climate-zones (Hubbard et al., 2007; You et al., 2008). At least 2 nearest stations were used with the SRT to identify and correct missing and suspicious values for Tmin, Tmax, dew point temperature, wind speed, and precipitation. Typically about 0.5% of observations were corrected, roughly 2 days per year. Following Hubbard et al. (2005), a daily value was flagged as suspicious if it was greater than 3 standard deviations (5 for precipitation) from the SRT value, which is a regression-estimated value based on 15 days before and after the daily value in question. In rare cases where a single daily record was missing from the RWS and nearby stations, the average of the preceding and succeeding day was substituted for the missing value. This QC method has been automated and applied successfully to corrected weather data in China, USA, Germany, and Argentina (van Wart et al., 2013b; Aramburu Merlos et al., 2015).
APPENDIX III - WHEN PUBLIC POSTING OF WEATHER DATA IS PROHIBITED
When observed weather data cannot be made publicly available, we will (i) simulate yields based on observed weather data, (ii) use the propagation technique to create weather data based on the correlations between NASA and observed weather data, and (iii) make the propagated weather publicly available through the website, making explicit the difference between the weather data used for simulations and posting.
References
Aramburu Merlos, F., Monzon, J.P., Mercau, J.L., Taboada, M., Andrade, F.H., Hall, A.J., Jobbagy, E., Cassman, K.G., Grassini, P., 2015. Potential for crop production increase in Argentina through closure of existing yield gaps. Field Crops Research. Vol. 184, 145-154
Grassini, P., Van Bussel, L.G.J., van Wart, J., Wolf, J., Claessens, L. , Yang, H., Boogaard, H., de Groot, H., van Ittersum, M.K. and K.G. Cassman. 2015. How good is good enough? Data requirements for reliable crop yield simulations and yield-gap analysis. Field Crops Research. Vol. 177, 49-63
Hubbard, K.G., Guttman, B., You, J., Chen, Z., 2007. An improved QC process for temperature in the daily cooperative weather observations. J. Atmos. Ocean. Technol. 24, 206–213
Hubbard, K.G., Goddard, S., Sorensen, W.D., Wells, N., Osugi, T.T., 2005. Performance of quality assurance procedures for an applied climate information system. J. Atmos. Ocean. Technol. 22, 105–112
Van Wart, J., Grassini, P., Yang, H.S., Claessens, L., Jarvis, A., Cassman, K.G. 2015. Creating long-term weather data from the thin air for crop simulation modelling. Agricultural and Forest Meteorology. 208, 49-58
Van Wart J., van Bussel L.G.J., Wolf .J, Licker R., Grassini P., Nelson A., Boogaard H., Gerber J., Mueller N.D., Claessens L., van Ittersum M.K., Cassman, K.G. 2013a. Use of agro-climatic zones to upscale simulated crop yield potential. Field Crops Research. 143, 44-55
Van Wart, J., Kersebaum, C.K., Peng, S., Milner, M., Cassman, K.G. 2013b. Estimating crop yield potential at regional to national scales. Field Crops Research. 143, 34-43
You, J., Hubbard, K.G., Goddard, S., 2008. Comparison of methods for spatially estimating station temperatures in a quality control system. Int. J. Climatol. 28, 777–787.