Protocol for estimating average actual yields for yield gap determination Protocol for estimating average actual yields for yield gap determination

GYGA Protocol for estimating actual yields and yield gaps

The crop yield gap is estimated as the difference between average simulated yield potential (Yp, crop production without water stress) or water-limited yield potential (Yw, rainfed crop production with water stress) minus the average on-farm actual yield. In the GYGA project, the finest spatial resolution at which yield gap is determined is at the reference weather station (RWS) buffer as "clipped" by the climate zone in which the RWS is located. Actual yield data need to be disaggregated by water regime in those areas where irrigated and rainfed crop systems coexist. Hence, an estimate of actual yield needs to be determined for each crop x water regime x RWS buffer combination. In countries where long-term crop yield data are available at the appropriate spatial resolution (see below), the number of years of actual yield data to be included for the calculation of average actual yield will be determined on a case-by-case basis, following the principle of including as many recent years of actual yield data as possible, to account for weather variability, while avoiding the bias due to the technological time-trend (van Ittersum et al., 2013). As a general guideline for data-rich countries that show a steep yield trend (or trend break), we recommend using the actual yields reported for the 5 most recent years for the calculation of average yield; if there is no trend, the actual yields reported for the most recent 10 years average will be taken into account. However, this approach cannot be followed in data-poor countries where long-term statistics are not available. For these cases, we recommend a minimum of 5 recent years of actual yield data (3-4 years are acceptable if, and only if, no more years are available), recognizing that this may not be sufficient to account for year-to-year variability in yields due to weather, especially in harsh rainfed environments.

Preferably, existing data on crop yields available at the finest level of spatial resolution (typically, county, sub-district, district, or municipality) will be used to determine actual yields for each RWS. This is the case, for example, in the USA, Brazil, Argentina, and Europe where sub-national yield data are available. However, yield data at this level of spatial resolution are not always available in some parts of the world (e.g. sub-Saharan Africa). Hence, the following procedure will be followed to determine the most appropriate data source to be used for estimating average actual yields. The approach aims to portray the most likely scenarios of yield data availability across countries.

Collection of actual yield data: preference list of data sources:

Always inspect actual yield time series for trends (or trend breaks). In case of trends or trend breaks only use data for the 5 most recent years. If there is no trend, use data from the most recent 10 years. In data-poor countries, a minimum of 3 years of data is acceptable if no more years are available.

1. We will use the most recent crop-specific yield statistics available for administrative units at the finest level of spatial resolution (municipality/county/department/sub-district/district). The location and extent of the administrative unit should be (reasonably) congruent with the location and spatial extent of the target RWS buffer. If 2+ administrative units (or parts of them) are located within the RWS buffer, a weighted average yield can be estimated based on their relative contribution to the crop harvested area (from sub-national statistics if available or otherwise SPAM) located within the RWS buffer (exclude administrative units with small portions within the RWS buffer). Administrative-unit data can be accessed directly through national statistics bureaus websites (USA, Argentina) or will need to be retrieved by the local agronomists from their local statistical bureaus or research stations (Africa). With regards to data quality, large discrepancies between official yield statistics and independent yield measurements have been found in African countries (e.g., Wairegi et al., 2010; Tittonel & GIller, 2013). Hence, whenever possible, validation of national statistics against yield estimates from other independent data sources is highly desirable. If validation shows that national statistics are unreliable, see 2 below.

2. If national statistics at the municipality/county/sub-district/district are not available or are unreliable, estimates of average yield could be retrieved from existing data collected through farm surveys and local agronomists administered by national agricultural research institutions, universities, CGIAR centers, World Bank (LSMS), private sector, or other on-going projects (e.g., Harvest Choice, Future Harvest, TAPRA survey panel). It is fundamental to ensure that the spatial coverage of the survey is consistent with the RWS buffer and includes a minimum number of years of data to account for weather variability (minimum of 5 years, preferably more). Another source of yield data is from on-farm experiment data that includes a treatment which follows local ‘farmer practices' over several years (e.g., Tittonel et al., 2006, Fermont et al., 2009, Wairegi et al., 2010). This source of data can be useful to determine actual yields as long as the farms where the studies were conducted are representative of the soil/landscape/management for the population of farms within the RWS buffers.

3. If national statistics at the municipality/county/sub-district/district are not available or are unreliable and the data mentioned in #2 are also not available, we will rely on most recent yield data reported for larger administrative units (regions/provinces/states), recognizing that the yield reported at this level of spatial resolution may not be representative of the actual average yield at the RWS buffer level. In practice, these coarser-scale yield data will be used as a first estimate of actual yield in the buffer and gradually substituted by more accurate yield data as these become available (see 1 and 2 above). Recent actual yield data at the region/province/state can be accessed through national statistical bureaus or FAO/IFPRI/SAGRE database. When these are not available, an estimate of actual yield can be derived from global gridded databases such as MapSpam v. 2 (International Food Policy Research Institute, 2019) which reports 3-year average actual yields around year 2010 at a grid level.

4. In regions where rainfed and irrigated crops co-exist within the same buffer zone (with both water regimes accounting for >10% of the area sown with a given crop), but for which official yield statistics are not disaggregated by water regime, estimates of actual yield for irrigated and/or rainfed crops will be adjusted based on expert opinion, SPAM actual yield maps, etc. Respective yield gaps will be estimated using Yw and rainfed Ya (rainfed conditions) and Yp and irrigated Ya (irrigated crops). These estimates will be refined in the future as better actual yield data disaggregated by water regime become available.

5. If no yield data are available at any sub-national level, we will rely on local GYGA agronomists' estimates on actual yields. These estimates can be based on survey/interviews conducted within (or as close as possible) the RWS buffer. Interview/surveys of a representative sample of farmers are preferred, but if these are not available or impossible to undertake, then interviews/surveys of local agronomists, agricultural input or seed dealers, or others engaged in businesses that deal directly with farmers might be considered, with the aim of determining average crop yields during the most recent past 3-year period (better 5-year period). The local GYGA agronomist will determine the most appropriate source of information to determine actual yields using surveys.

6. If less than three years of actual yield data are available for option 5, then the final option is to use statistics for the country level (e.g. from FAOSTAT). When coarse-scale yield data are used as a first estimate of actual yield for a RWS buffer (see 3 & 4 & 5 & 6 above), the goal of the GYGA project is to continue to improve data quality over time through substitution by more accurate yield data as these become available (see 1-2 above). In-situ measurements of yields are not really useful to retrieve robust estimates of actual average yield due to the limited spatial and temporal coverage and relatively high requirements of time/effort/resources, which are beyond current GYGA resources.

Calculation of yield gap at the level of the RWS buffer:

For a given RWS buffer, we will calculate (i) long-term average simulated Yp and Yw (and associated CVs) based on simulated Yp and Yw for all available years of weather data, (ii) average actual yield and (iii) yield gap. The yield gap is calculated as the difference between average Yw or Yp and average actual yield. Yp, Yw, and yield gaps need to be expressed at standard commercial moisture content (APPENDIX 1).

In those cases with high year-to-year variability in yield (CV>20%) or presence of a few outliers with large difference in yield compared to the mean, the median yield may be more representative than an arithmetic average. In GYGA, the average farm yields will be reported except for cases in which there are significant outliers or when the distribution of observed yields is highly skewed. Whether to use average or median yields for a given crop and country will be determined on a case-by-case basis by the responsible GYGA team member and this will be clearly documented.


The method used to estimate actual yields and yield gaps will be specified in GYGA with sufficient detail to understand exactly how actual yields and yield gaps were estimated. Per RWS buffer zone the most appropriate data source for actual yield should be selected. Therefore, a description of the method used will be provided for each RWS buffer, which is the smallest scale at which yield gaps are estimated.


APPENDIX 1: Moisture content of harvestable organs for GYGA crops

Maize grain 15.5
Wheat grain 13.5
Soybean grain 13.0
Rice grain 14.0
Millet grain 14.0
Sorghum grain 14.0
Sugarcane stalk Fresh mass (~75%)
Sunflower grain 11.0
Potatoes tuber Fresh mass (~75%)
Cassava root Fresh mass (~60%)














FAO, IFPRI, SAGE, 2006. Agro-Maps. A global spatial database of agricultural land-use statistics aggregated by sub-national administrative districts. Available at:

Fermont, A.M., van Asten, P.J.A., Tittonell, P., van Wijk, M.T., Giller, K.E., 2009. Closing the cassava yield gap: An analysis from smallholder farms in East Africa. Field Crops Research 112, 24-36

International Food Policy Research Institute, 2019. Global Spatially-Disaggregated Crop Production Statistics Data for 2010 Version 2.0.

Tegemeo Agricultural Policy Research and Analysis (TAPRA) Project, Kenya.

Tittonell, P., Corbeels, M., van Wijk M.T., Vanlauwe, B., Giller, K.E., 2006. Combining organic and mineral fertilizers for integrated soil fertility management in smallholder farming systems of Kenya: Explorations using the crop-soil model FIELD. Agronomy Journal 100, 1511-1526

Tittonell P., Giller K.E. 2013. When yield gaps are poverty traps: The paradigm of ecological intensification in African smallholder agriculture. Field Crops Research. 143, 76-90

Van Ittersum, M., Cassman K.G., Grassini, P., Wolf, J. Tittonell, P., Hochman, Z.  2013.  Yield gap analysis with local to global relevance - A Review. Field Crops Research. 143, 4-17

Wairegi, L.W.I, van Asten, P.J.A., Tenywa, M.M., Bekuna, M.A., 2010. Abiotic constraints override biotic constraints in East African highland banana systems. Field Crops Research 117, 146-153

World Bank - Living Standards Measurement Study – Integrated study on agriculture (LSMS-ISA). URL:,,contentMDK:21610833~pagePK:64168427~piPK:64168435~theSitePK:3358997,00.html.



Maize production in Kenya