GYGA Protocol for Weather Data GYGA Protocol for Weather Data

Step-by-step procedure:

  1. Minimum data requirements
    1. For direct use in crop models: >10 years (preferably after 1983) with <20 consecutive missing days for Tmax & Tmin (<10 consecutive missing days for precipitation) and at least 80% of the data for each year are available[1]
    2. If data requirements in (1a) are not met, weather will be propagated based on, at least, 3 years (after 1983) of daily temperature (Tmin/Tmax) and precipitation with <20% of missing data on each year and <45 consecutive missing days for any variable [2]
    3. If data requirements in (1b) are not met, we will use IRRI-corrected NASA gridded radiation, Tmax, Tmin, Tdew, and precipitation[3].
    4. Whenever available, we will use measured incident solar radiation, wind speed (at 2-m height), and any measure of humidity such as relative humidity or dew-point temperature (Tdew). If these are not available, radiation will be retrieved from NASA-POWER, humidity estimated from NASA Tdew, and wind speed assumed at 2 m/s for calculation of ETo[4].
  2. Station location (coordinates) double checked.
  3. Data will be QC:
    1. Check data against threshold values for each variable (see Appendix I)
    2. Checks date (dd-mm-yyyy) sequence
    3. Check cases of Tmax<Tmin and repeated values
    4. Check non-numerical characters
    5. Check that data meet the requirements indicated in (1a, b) above.
  4. Following the principle of including as much observed weather data as possible, we will generate complete 10+ (preferably 15-20) years of daily weather for crop simulations using the following protocols:
    1. IF data requirements in (1a) are met, THEN, use linear interpolation to fill out missing Tmax & Tmin data and NASA to fill out missing precipitation data. See appendix II for alternative methodology for countries with dense meteorological networks.
    2. IF conditions in (1a) are not met
      1. AND correlation between NASA vs. observed data has R2=0.35, THEN, use propagation to fill out missing Tmax & Tmin and TRMM (or NASA if TRMM not available) to fill out missing precipitation data[5]. Generate 10 (preferably 15-20) years of weather data.
      2. AND correlation between NASA vs. observed has R2<0.35[6], THEN, use IRRI-corrected NASA data (1c above)
  5. QC (as per #3) will be applied again on the weather data after step (4)
  6. Data will be generated in the GYGA format; model users should convert the data into specific model formats themselves.




Solar radiation a




Mean RH

Mean Tdew

Mean air actual VP

Mean wind speed


MJ m-2 d-1







m s-1






















a Incident solar radiation won't be allowed to exceed extraterrestrial solar radiation estimated for a particular location and day. Tmax/Tmin: maximum and minimum temperature; RH: relative humidity; Tdew: dew-point temperature; VP: vapour pressure.



For countries where weather stations network are dense (defined roughly as 3 weather stations within 150 km from each other) and have long-term (12+) daily weather records, such as it is the case of USA, Argentina, Germany, an alternative, perhaps superior, method to (1b) above is to correct/fill data by looking at the correlations between the target weather station and the two adjacent ones. Briefly, weather data for each selected RWS are subjected to quality control measures to fill in missing data and identify and correct erroneous values that occur due to technical problems common in weather data acquisition. A spatial regression test (SRT) (Hubbard et al., 2005) was used to check and correct weather data at a given RWS against data from nearby stations based on the strength of correlation between nearby and reference station data. Developed for use in the Midwest USA, this QC method was found to outperform other QC approaches in a wide variety of climate-zones (Hubbard et al., 2007; You et al., 2008). At least 2 nearest stations were used with the SRT to identify and correct missing and suspicious values for Tmin, Tmax, dew point temperature, wind speed, and precipitation. Typically about 0.5% of observations were corrected, roughly 2 days per year. Following Hubbard et al. (2005), a daily value was flagged as suspicious if it was greater than 3 standard deviations (5 for precipitation) from the SRT value, which is a regression-estimated value based on 15 days before and after the daily value in question. In rare cases where a single daily record was missing from the RWS and nearby stations, the average of the preceding and succeeding day was substituted for the missing value. This QC method has been automated and applied successfully to corrected weather data in China, USA, Germany, and Argentina (van Wart et al., 2012; Grassini, unpublished)



When observed weather data cannot be made publicly available, we will (i) simulate yields based on observed weather data, (ii) use the propagation technique to create weather data based on the correlations between NASA and observed weather data, and (iii) make the propagated weather publicly available through the website, making explicit the difference between the weather data used for simulations and posting.



Hubbard, K.G., Guttman, B., You, J., Chen, Z., 2007. An improved QC process for temperature in the daily cooperative weather observations. J. Atmos. Ocean. Technol. 24, 206–213.

Hubbard, K.G., Goddard, S., Sorensen, W.D., Wells, N., Osugi, T.T., 2005. Performance of quality assurance procedures for an applied climate information system. J. Atmos. Ocean. Technol. 22, 105–112.

Van Wart, J., Grassini, P., Yang, H.S., Claessens, L., Jarvis, A., Cassman, K.G. 2015. Creating long-term weather data from the thin air for crop simulation modelling. Agricultural and Forest Meteorology. 208, 49-58.

Van Wart, J., Kersebaum, C.K., Peng, S., Milner, M., Cassman, K.G. 2013b. Estimating crop yield potential at regional to national scales. Field Crops Research. 143, 34-43.

You, J., Hubbard, K.G., Goddard, S., 2008. Comparison of methods for spatially estimating station temperatures in a quality control system. Int. J. Climatol. 28, 777–787.


[1] Recent data are desirable (preferably after 1983) to prevent possible climate change effects and be consistent with NASA/TRMM data. Thresholds are based on team discussions, supported by Hubbard et al (2005, 2007) 

[2] Propagated weather data consist on TRMM rain data (or NASA if TRMM is not available) and NASA Tmax, Tmin, and Tdew data corrected based on calibrations with short-term (<10 yrs) observed weather data (van Wart et al., 2015).

[3] ‘Crude' NASA performs better than other sources of gridded weather data such as CRU, NCEP, and aWhere (van Wart et al., 2013). Here, we propose to use IRRI weather database which consists of NASA data corrected using NOAA ground-observations (hereafter called ‘IRRI-corrected NASA').

[4] Evaluation between NASA radiation & Tdew against subsets of observed data (and calibration if needed) is highly desirable whenever possible. Solar radiation can also be estimated from sunshine hours or temperature but these estimates have to be locally (and satisfactorily) validated against subsets of observed data. If observed wind speed data are available for some years, their average will be used to fill out missing data.

[5] All available years of observed data should be used for calibrating NASA Tmax and Tmin (and Tdew if available). 15-d and 30-d TRMM precipitation totals have better agreement then NASA when compared against observed precipitation.

[6] Sometimes the R² criteria may not be sufficient to assess the strength of the relationship between observed and NASA data. For example, if R² is relatively high (>0.35) and the slope is near zero, resulting propagated weather data will exhibit much lower variability relative to the measured weather data that were used as basis for the propagation. This situation is likely to happen in regions with complex topography or when measured weather data are of poor quality. These cases have been identified by visual inspection of the data after propagation, and removed from the public weather data posted on this website when only propagated data can be made publicly available (see appendix III).