Protocols for selection of climate zones, reference weather stations, and upscaling to national levels Protocols for selection of climate zones, reference weather stations, and upscaling to national levels

Summary

Yield gap estimates are made at several spatial scales, from specific locations within important crop production regions  (i.e. points at locations with large harvested crop area density and an associated buffer zone), to climate zones (CZs -- defined by growing degree days, temperature seasonality, and aridity index), to large administrative units within a country (province/state), to a national average. For relatively large countries, only crops with total national harvested area of >100,000 ha are evaluated in GYGA. For smaller countries also crops with <100,000 ha are evaluated in GYGA. The underpinning principle is to select CZs and specific locations (points) and associated buffer zones within these CZs that best represent how a given crop is produced in terms of weather, soils, and cropping system. Cropping system information focuses on the proportion of the harvested area, the cropping intensity and some aspects of management (e.g. sowing date and cultivar maturity) at each of these different spatial scales. Justification for this approach comes from recent papers by van Ittersum et al (2013) and Van Wart et al (2013a). The points are defined as locations with weather data. Buffer zones of selected points with weather data include an area within 100 km of the weather station point, with a focus on harvested crop area within that buffer zone. Thus, polygons that define buffer zones are either circular with 100-km radius if the entire buffer fits within the CZ in which it is located or irregular and "clipped" by CZ boundaries if it doesn't.

Within these buffer zones, data are collected for the most prominent soil type[1] x cropping systems combinations for a given water-regime—either rainfed, irrigated, or both if there are significant areas under both types of water regime. For a given buffer zone, Yp and/or Yw are estimated by simulation using the weather data and information about soil types and cropping systems as input to a crop model. Upscaling moves from buffer zones (if there is more than one buffer zone within a CZ), to CZs, to sub-national and national. This approach requires flexibility as to source of weather data because selected points with weather data should be well within the main cropping areas within CZs with large production areas. In cases where good quality weather stations of at least 10 years are lacking, generated 20-yr weather data from a minimum of 3-yrs actual weather data are the second best option, hybrid weather data the third-best option (partly observed and partly generated by using data from nearby stations that may only have rainfall and/or temperature data), or derived gridded weather data (last option). Because detailed data on cropping systems and soils are required for each location, one goal of the selection protocol is to minimize the number of points and associated buffer zones needed within a country to obtain a robust estimate of Yp and/or Yw.

A premise of this method is that weather data, soil data and cropping system data are considered equally important to capture the variation within a climate zone. Data on actual farm yields are also critical for estimating Yg. Selecting CZs and locations with weather data is the starting point in the protocol to minimize the number of locations where the other essential data are required while achieving adequate coverage of crop production area to ensure assessment across a representative range of cropping systems and soils. 

Geospatial distribution of crop harvested area is retrieved from SPAM database (You et al., 2006, 2009). SPAM provides gridded data (5 arc minute resolution, approximately 10 x 10 km at the equator) on harvested area around year 2000 for 20 major staple crops, water regime (rainfed or irrigated), and, for rainfed agriculture, harvested area are disaggregated by crop-system input level (subsistence, low-, and high-input). For each grid, the harvested area of rainfed crops is calculated as the sum of the harvested area reported for subsistence-, low- and high-input systems while the harvested area of irrigated crops is taken as given in the SPAM database. If national statistics on crop production are available, updated maps on crop harvested area can be generated for countries where cropland area has recently expanded (e.g., Argentina & Brazil).

The following steps can be distinguished in the protocol to estimate and upscale Yg:

1.  CZ selection. Within a country, identify CZs with >5% of total national harvested crop area for the crop/water regime (irrigated or rainfed) in question. These CZs are the "designated" CZs (DCZs) for yield gap assessment of that crop/water regime in that country. Following this approach the selected DCZs typically contain more than 50% of national crop area except in a few cases (see Tables 2 to 9).

2.  Selection of weather station points. Selected weather stations can either be existing points where a weather station exists with long-term weather data of adequate quality for yield gap assessment, or a hypothetical weather station location in cases where there is large crop area but without existing weather station coverage. Selected weather stations, either actual or hypothetical, are called reference weather stations (RWS). Hypothetical RWS points will be used in addition to existing RWS for a given crop and country when existing weather stations and their associated buffer zones do not provide 50% coverage of harvested crop area.  Based on a recent study in countries with relatively uniform topography, it was found that 40-50% coverage of total harvested crop area within weather station buffer zones is required for a robust estimate of Yp or Yw at a national level (Van Wart et al., 2013b). Therefore, the protocol seeks to achieve 50% coverage of national harvested crop area within buffer zones of the RWS (countries with heterogeneous topography in crop-growing regions may require a larger fraction of total crop area). Selection of RWS proceeds as follows:

(a)  Identify existing qualified weather stations[2] within DCZs. Quantify amount of harvested area for the crop in question within each buffer zone surrounding all existing qualified weather stations located within DCZs selected under step #1 above. For each of these buffer zones, exclude harvested area that falls outside the CZ in which a weather station is located.

(b)  Select RWS from existing weather stations within DCZs. Identify all existing weather stations located within DCZs that contain >1% of national harvested area for the crop in question within the 100km buffer zone, clipped by the DCZ. Rank weather stations for their clipped harvested crop area. Select the weather station with greatest harvested area and then re-rank all other weather stations that are further away than 180 km of the selected station. Select from among remaining weather stations the one with greatest harvested area, re-rank, and so forth until total harvested area in buffer zones of selected weather stations reaches 50% of total national harvested crop area. If, after achieving 50% coverage, there is one or more DCZ with >5% total national crop area that do not contain a selected weather station, select an additional existing weather station in the crop production area within those DCZs (again, having >1% of national harvested area to qualify). If, after selecting among existing weather stations within DCZs, there is still less than 50% coverage, select among existing weather stations located in other CZs with <5% of national crop area if the weather station's clipped buffer zone contains >1% of national crop area.  If 50% coverage is still not achieved, proceed to step 2c.

(c) Selection of hypothetical RWS (if needed). For countries that do not have adequate existing weather stations to achieve 50% coverage of harvested area within DCZs (as in 2b above), or if there is a DCZ without an existing weather station, hypothetical weather stations are selected to achieve 50%  coverage and/or to have at least one RWS in each DCZ with >5% total area. Hypothetical weather stations are located in areas with greatest crop area density within the DCZ using a procedure to minimize the number of hypothetical stations needed. As per #2a above, only harvested area within the DCZ in which the hypothetical weather station is located is counted within the buffer zone for that hypothetical weather station point.

(d)  The final RWS set. Existing and hypothetical weather stations selected in steps 2a, 2b, and 2c become the RWS for a specific country/crop/water regime (irrigated or rainfed) combination. The set may contain only existing weather stations or it may contain both existing and hypothetical stations. In all cases, however, harvested area within buffer zones is not double-counted. In most cases, a surprisingly small number of RWS is required to achieve 50% coverage of national crop area (Tables 2 to 9) because production of a given crop is concentrated in a few major zones of production. For a few countries and crops, however, production is highly dispersed or topography is not homogeneous such that there are a large number of small CZs. In these cases final total harvested area within buffer zones of selected RWS may not reach 50% coverage.[3]

3.  Backfilling weather data for hypothetical RWS. Minimum weather data requirements are listed in footnote[2]. For countries and crops in which there are not adequate numbers and distribution of existing weather stations, we will ask country GYGA country agronomists to search for existing weather data near the location of hypothetical RWS. Maps with preferred locations of these hypothetical RWS will be provided to country agronomists. Sources of data can include: (i) weather stations located at experimental field research and crop breeding sites used by universities, national agricultural research institutes, international CGIAR centers (e.g. CRISAT, AfricaRice, IITA, CIMMYT, IRRI), and (ii) weather data obtained by collaborating projects also seeking actual weather data (AgMIP, CCCAF, HarvestChoice, etc). The following preference hierarchy shall be used in identifying additional weather data sources:

(a)  First preference: an existing weather station with good quality, 20+yr data located as close as possible to the hypothetical RWS and within the same CZ;

(b) Second preference: an existing weather station with good quality data of at least 10 years located as close as possible to the hypothetical RWS and within the same CZ;

(b)  Third preference: an existing weather station with less than 20yr weather data (but a minimum of one complete year, preferably 3-5 years). We will generate a long-term weather database for that location by calibration/correlation with NASA-Power data for temperature and solar radiation, and TRMM data for rainfall (link to detailed weather generation method).

(c)  Fourth preference: a hybrid weather database. In some places there are long-term rainfall data without other required weather data. In cases where the location of these long-term rainfall data are close to locations with short-term weather data as in 3c above, a "hybrid" weather database may be created using a combination of existing weather data, generated weather data and actual rainfall data.

(d)  Last option: Where no weather data are available, we will use the most appropriate source of gridded weather data such as: the TRMM dataset (http://trmm.gsfc.nasa.gov/), containing satellite-based rainfall data, the CRU TS 3 dataset, (Mitchell and Jones, 2005), or the ERA-40 re-analysis dataset (Uppala et al., 2005), the latter two containing among others monthly temperature data.

4.  Cropping system and soil data.  Collection of data on cropping systems and soil type is tightly focused on the RWS selected by steps 2a, 2b and 2c above. Unfortunately few countries collect and report data on cropping systems at sub-national scales. Hence, in many cases country agronomists will be the "expert" source for estimates of the proportion of total harvested area within a RWS's buffer zone represented by a given cropping system x soil type combination. Site visits to the RWS locations allow collecting information about the area distribution of these systems. Soil parameters will be obtained from existing soil maps and derived crop simulation model parameters (ISRIC-WISE or if available, national maps). Only the most important cropping systems x soil type combinations will be considered.

5.  Actual farm yields. The preferred sources of data for actual yields are sub-district or municipality data that is as congruent as possible with crop area distribution within RWS buffer zones. For irrigated crops, the most recent 5-year mean for actual farm yields is preferred, rather than a shorter or longer time series, to avoid an atypical value that may occur in an unusual year and to avoid confounding effects of a yield time trend due to adoption of improved technologies (van Ittersum et al. 2013). For rainfed crops, the most recent 10-year mean for actual farm yields is preferred due to greater year-to-year variability in yield. Where such yield data are not available, actual yield data from household surveys can be used (e.g. those collected by some CGIAR Centers, the World Bank, national agricultural research programs, and other institutions) if they were taken in RWS buffers or similar areas. Where no sub-national data exist near a RWS or roughly congruent with a CZ, GYGA country agronomists may targeted survey led by the GYGA country agronomist, or use the national average yield. Detailed description of preferred methods for obtaining actual yield data can be found here.

6.  Simulation of Yp and Yw. Yp and/or Yw will be simulated for each cropping system x soil type x RWS (CSxSoilxRWS) identified in step 2b or 2c.  Desired attributes of crop models used for yield gap assessment are provided in Table 1. Estimated Yp and Yw values are upscaled from RWS to the CZ level by weighting for the proportion of harvested area for each RWS x Soil x CS combination. Results at CZ level are used to upscale to the national level by weighting for the proportion of harvested area for each CZ. Annual variability in Yp and Yw will be evaluated at the RWS buffer zone scale and also at CZ and national levels by weighted averaging based on harvested area. Because time-series of actual farm yields at the RWS spatial scale are not likely available in most countries, annual variability in Yg will not be estimated. Instead, Yg will be a fixed value based on average Yp or Yw at each spatial scale and the associated value of Ya. If Ya is only available at a national level, Yg will be estimated by a single value of Ya and will vary only to the extent that Yp or Yw vary at different spatial scales, from the RWS, to CZ, administrative units and nation.

 

Underpinning assumptions and uncertainties:

1.  Availability of weather data. We will be able to find good weather data within the key crop areas. If not, we have to be ready to accept gridded weather data. Generating weather data from incomplete datasets (method to be proposed by Justin et al.) may imply a lot of work if we have to do this for many cases. So, with this proposal we must be ready to accept gridded weather data, at least to get started in year 1, or until actual weather data becomes available.

2.  When is a weather data source located in an acceptable place? Ideally it should be in the center of a region with high density of harvested crop area. At a minimum, point 2b above specifies that weather data points must be located within CZ-clipped buffer zones with >1% of national harvested crop area.

3.  Uncertainty in crop model simulations. There can be large differences in simulations of Yp and Yw between different crop models using the same data set. Some of these differences occur because some models are better suited, and more rigorously validated for certain locations/conditions than others. Our preferences for desired model attributes addresses this concern to a large extent (see Table 1 below). Likewise, the transparency issue becomes important so that the model used and model inputs are available for all to see within the GYGA for all RWS where Yp and Yw are estimated.

4. What happens if new weather and crop area distribution data become available? Do we need to do a new sampling of soil and management data within the CZ? In the short term, we fix the location of the buffer zones as a result of the protocols 2a, 2b and 2c above. If new, better quality weather data become available as per our preference list under 2a-c, we assign those weather data to an existing buffer zone. In the longer term (say every 3-4 yr) we can update the entire analysis for a country and revise the selection of RWS and sometimes even crop area distribution as influenced by access to improved weather and/or crop distribution data.

 

Citations

Van Ittersum, M., Cassman K.G., Grassini, P., Wolf, J. Tittonell, P., Hochman, Z.  2013.  Yield gap analysis with local to global relevance—A Review. Field Crops Research. 143, 4-17 

Van Wart J, van Bussel LGJ, Wolf J, Licker R, Grassini P, Nelson A, Boogaard H,  Gerber J, Mueller ND, Claessens L, van Ittersum MK,  KG Cassman. 2013a. Use of agro-climatic zones to upscale simulated crop yield potential. Field Crops Research. 143, 44-55

Van Wart, J., Kersebaum, C.K., Peng, S., Milner, M., Cassman, K.G. 2013b. Estimating crop yield potential at regional to national scales. Field Crops Research. 143, 34-43

 

Table 1: Desired attributes of crop simulation models

Desired attribute

Explanation

Daily step simulation

Simulation of daily crop growth and development based on weather, soil, and crop physiological attributes

Flexibility to simulate management practices

Key management practices include: sowing date, plant density, cultivar maturity, and irrigation

Simulation of fundamental physiological processes

Simulation of key physiological processes such as crop development, net carbon assimilation, biomass partitioning, crop water relations, and grain growth

Crop specificity

Should reflect crop-specific physiological attributes for respiration and photosynthesis, critical stages and growth periods that define vegetative and grain filling periods, and canopy architecture

Minimum requirement of crop ‘genetic' coefficients

Minimum requirement of crop-site ‘genetic' coefficients, such as maximum leaf area index, date of flowering, etc.

Validation against data from field crops that approach YP and YW

Comparison of model outcomes (grain yield, aboveground dry matter, crop evapotranspiration) against actual measured data from field crops that received management practices conducive to achieve YP (irrigated) or YW (rainfed crops)

User friendly

Models embedded in user-friendly interfaces, with required data inputs and outputs easily visualized, and flexibility to modify default values for internal parameters

Full documentation of model parameterization and availability

Public available models, published in the peer-review literature, with full documentation, and with description of data sources from where internal parameters were derived

 

 

Table 2:  Rainfed maize CZs with >5% total national rainfed maize area for GYGA countries.

 

Number of CZs with >5% of national harvested area

% of national harvested area in these CZs

 Bangladesh

5

88%

 Burkina

5

97%

 China

8

45%

 Ethiopia

9

56%

 Ghana

7

88%

 India

4

34%

 Kenya

8

59%

 Mali

6

96%

 Niger

<100,000 ha

--

 Nigeria

9

63%

 Tanzania

5

61%

 Uganda

7

77%

 USA

5

69%

 Zambia

6

93%

 

Table 3: Irrigated maize CZs with >5% total national irrigated maize area for GYGA countries.

 

Number of CZs with >5% of national harvested area

% of national harvested area in these CZs

 Bangladesh

<100,000 ha

--

 Burkina

<100,000 ha

--

 China

5

49%

 Ethiopia

<100,000 ha

--

 Ghana

<100,000 ha

--

 India

5

40%

 Kenya

<100,000 ha

--

 Mali

<100,000 ha

--

 Niger

<100,000 ha

--

 Nigeria

<100,000 ha

--

 Tanzania

<100,000 ha

--

 Uganda

<100,000 ha

--

 USA

5

69%

 Zambia

<100,000 ha

--

 

Table 4:  Rainfed rice CZs with >5% total national rainfed maize area for GYGA countries.

 

Number of CZs with >5% of national harvested area

% of national harvested area in these CZs

 Bangladesh

6

96%

 Burkina

<100,000 ha

--

 China

9

74%

 Ethiopia

<100,000 ha

--

 Ghana

7

94%

 India

6

54%

 Kenya

<100,000 ha

--

 Mali

7

98%

 Niger

<100,000 ha

--

 Nigeria

8

62%

 Tanzania

3

70%

 Uganda

6

96%

 USA

<100,000 ha

--

 Zambia

<100,000 ha

--

 

Table 5: Irrigated rice CZs with >5% total national rainfed maize area for GYGA countries.

 

Number of CZs with >5% of national harvested area

% of national harvested area in these CZs

 Bangladesh

6

95%

 Burkina

<100,000 ha

--

 China

 6

57%

 Ethiopia

<100,000 ha

--

 Ghana

<100,000 ha

--

 India

 6

37%

 Kenya

<100,000 ha

--

 Mali

 5

100%

 Niger

<100,000 ha

--

 Nigeria

 8

76%

 Tanzania

<100,000 ha

--

 Uganda

<100,000 ha

--

 USA

 6

83%

 Zambia

<100,000 ha

--

 

Table 6: Rainfed wheat CZs with >5% total national rainfed maize area for GYGA countries.

 

Number of CZs with >5% of national harvested area

% of national harvested area in these CZs

 Bangladesh

5

96%

 Burkina

<100,000 ha

--

 China

 5

38%

 Ethiopia

 8

65%

 Ghana

<100,000 ha

--

 India

 6

54%

 Kenya

 8

73%

 Mali

<100,000 ha

--

 Niger

<100,000 ha

--

 Nigeria

<100,000 ha

--

 Tanzania

 3

47%

 Uganda

<100,000 ha

--

 USA

 5

34%

 Zambia

<100,000 ha

--

 

Table 7: Irrigated wheat CZs with >5% total national rainfed maize area for GYGA countries.

 

Number of CZs with >5% of national harvested area

% of national harvested area in these CZs

 Bangladesh

4

84%

 Burkina

<100,000 ha

--

 China

5

58%

 Ethiopia

<100,000 ha

--

 Ghana

<100,000 ha

--

 India

8

71%

 Kenya

<100,000 ha

--

 Mali

<100,000 ha

--

 Niger

<100,000 ha

--

 Nigeria

<100,000 ha

--

 Tanzania

<100,000 ha

--

 Uganda

<100,000 ha

--

 USA

3

28%

 Zambia

<100,000 ha

--

 

Table 8: Rainfed sorghum CZs with >5% total national rainfed maize area for GYGA countries.

 

Number of CZs with >5% of national harvested area

% of national harvested area in these CZs

 Bangladesh

<100,000 ha

--

 Burkina

4

96%

 China

5

38%

 Ethiopia

7

56%

 Ghana

2

91%

 India

6

44%

 Kenya

7

73%

 Mali

4

93%

 Niger

1

97%

 Nigeria

8

77%

 Tanzania

4

69%

 Uganda

6

73%

 USA

7

48%

 Zambia

<100,000 ha

--

 

Table 9: Irrigated Sorghum CZs with >5% total national rainfed maize area for GYGA countries.

 

Number of CZs with >5% of national harvested area

% of national harvested area in these CZs

 Bangladesh

<100,000 ha

--

 Burkina

<100,000 ha

--

 China

6

55%

 Ethiopia

<100,000 ha

--

 Ghana

<100,000 ha

--

 India

<100,000 ha

--

 Kenya

<100,000 ha

--

 Mali

<100,000 ha

--

 Niger

<100,000 ha

--

 Nigeria

<100,000 ha

--

 Tanzania

<100,000 ha

--

 Uganda

<100,000 ha

--

 USA

5

57%

 Zambia

<100,000 ha

--

 

[1] Information on soil properties is required to estimate Yw in rainfed systems (not needed to estimate Yp of irrigated systems). Essential soil properties include: texture, depth of root zone, and slope.

[2] A qualified weather station has 20+ years of data of acceptable quality. Minimum data: daily max/min temperature, rainfall, and some index of humidity (relative humidity, dew point temperature, actual vapor pressure, etc). Daily solar radiation is also required, but if not available, we can obtain solar radiation data from NASA Power database.

[3] In cases where 50% coverage is not achieved using the above methods, it would be possible to add additional CZs with <5% national crop area and weather stations with <1% national crop area if there is time and resources to do so.