PMFBY: Crop Yield estimation: Crop Cutting Experiments and Big Data Technologies

Crop Cutting Experiments (CCE) – Background

CCE refer to a scientifically designed sampling and field experiment scheme to accurately estimate the production of a crop in a region at the end of its cultivation cycle. The scientific design of CCE was developed in India in 1940s at IASRI for Crop Estimation Surveys, to reliably estimate production of principal food and non-food crops at State and National levels, for developing effective agricultural policies and programmes. In 1970, CCE sampling designs and field experimental methods were adopted by Ministry of Agriculture, Govt. of India into the General Crop Estimation Surveys Scheme (GCES), for estimation of crop production at national level, based on surveys and estimates by State Governments. The GCES presently covers 68 crops (52 food and 16 non-food) in 22 States and 4 Union Territories. About 500,000 CCE are conducted every year by various State Governments. The survey design adopted in GCES is stratified, multistage, random sampling design with tehsil/taluka/blocks in a district as strata; villages within strata as primary sampling unit in first stage; two fields within each selected village as second stage sampling unit; and plots in selected fields (usually of size 5 m x 5 m) as ultimate sampling units for CCE (CSO, 2007)[1]. Once the plots are selected for CCE, the estimation of crop yields is based on well-tested crop cutting experiment methodology under stringent supervision and well defined protocols. The harvested produce is measured for biomass weight, grain weight, moisture, and other relevant correction factors for estimating final crop yield. The stratum yield is multiplied by the area of the crop in the stratum, and aggregated to obtain the district level production which is aggregated to State, and National levels for crop production estimation.

CCEs in GCES are also subject to concurrent sample checks and audits by NSSO directly, or by State Governments under supervision/advise of NSSO. The robustness of the statistical sampling design of GCES, and the concurrent checks and audits on CCEs, have ensured that, when CCE are conducted reliably as per design and defined protocols, the standard errors in yield estimates will be in acceptable range to accurately estimate production at national (1-2%), State (1-5%) and District level (8-10%) (Neelakantan, 2005)[2]. The scheme of crop production estimation under GCES has also been adopted by many countries in Asia, Africa and Europe following recommendations by FAO in the 1950s as the world standard for crop yield estimation.

 CCE adaptation at Gram Panchayat (GP) level in PMFBY – limitations

Note that the design of CCEs in GCES were not intended to provide crop estimates at sub-district level, but for obtaining reliable average yield estimates at National and State levels. For sub-district GP level, Prem Narain (2004)[3] suggests that the “number of crop-cuts required according to statistical considerations is at least 8’. He further suggests (with reference to the area-yield-index based National Crop Insurance Scheme (NAIS) introduced in 1999, and a forerunner to PMFBY): “special studies should be taken up by the National Statistical Office to develop appropriate “Small Area Estimation” techniques for this purpose”. It may be noted that initially NAIS had stipulated a minimum of 8 CCEs, if the insurance unit is a GP (, which meets the above statistical requirement. However, the modified NAIS (mNAIS) of 2014 arbitrarily changed the minimum CCE requirement to ‘4 for major crops, and 8 for minor crops’ ( .

In PMFBY scheme, average crop yield in a GP-area is estimated from only 4 randomly sampled CCE per GP, instead of more statistically robust 8 CCE stipulated in NAIS. The protocols for conducting CCE in fields are directly adopted from GCES. However, the required concurrent supervision, checks and audits of CCE under GCES are not present in PMFBY. Further, conducting the total number of CCE strictly per protocols of GCES in 250000 GPs as per statistical requirements is much beyond the scope of available resources and physical feasibility. Thus, both sampling and non-sampling errors from CCE can be significantly high in PMFBY and increase basis risk for farmers. As a result, individual farmers can suffer negative impacts on incomes and livelihoods despite insuring their crops. Such losses can lead to loss of confidence and acceptance among farmers in crop insurance. This can be disastrous not only for farmers, but also for the insurance industry. These factors limit the applicability of CCE based GP-average-yield estimation for reliable crop insurance liability assessment. Alternative methods are needed to overcome this problem (Prem Narain, 2004). Similar observations about insurance assessments based on CCE at GP level were also made by the IIM Ahmedabad Committee that reviewed the PMFBY in 2018 (IIM, Ahmedabad, 2018)[4].

Push for reducing CCE by smart sampling in PMFBY

Recognizing the inadequacies in present model of CCE in PMFBY, the Ministry of Agriculture explored ways to reduce the sample size for better implementation of CCE by smart sampling to improve the precision of the GP-area estimated yield from CCE. The most effective way to do this is to exploit available information on one or more auxiliary variables correlated with the GP-area-yield. The best options for relevant auxiliary variables for GP-level yield assessments are available from recent advances in high resolution satellite imagery and technology, drone imagery, and digital connectivity through mobiles, when used together with advanced advances in machine learning and analytics. These developments have opened opportunities for improved precision in stratification for sampling, as well as for direct GP-area-yield estimation. However, it is important to evaluate capabilities of these technologies in mirroring the real-time crop yield distribution among individual farms at the IU/GP level.

The Ministry of Agriculture commissioned a number of pilot studies in 2019, 2020, and also  2021 involving government agencies, Agritech Start-ups, and Insurance Companies as partners, to explore the feasibility of adapting emerging imaging  technologies, modelling, and data analytics for expediting crop yield assessment by rationally reducing the numbers of CCE to manageable levels, and also by directly estimating area-yield at IU/GP level.  This is to be done by: (i) combining smart sampling[5] approaches with complementary data and information from alternate sources of crop health indicators that capture multiple adverse risks like dry spells, drought, pest attacks, etc., without impacting the quality of field sampling for average yield estimation, and also to  (ii) directly estimate average crop yield at IU/GP level using new technologies. Towards this end all contract agencies were mandated to adopt a consistent two-step yield estimation approach:

  1. Derive a scientifically designed objective smart sampling strategy to optimize numbers and field locations of CCE to better reflect the actual crop yield distribution in the IU/GP than random sampling, and conduct CCE in a transparent manner (using digital photo/video recording on mobiles) to eliminate human bias and moral hazard.
  2. Develop a scientific protocol to reduce the number of CCE to manageable levels by: (i) developing high resolution pixel level yield proxy maps (weather indices, vegetation indices) at district level; (ii) classifying the District level maps into four classes of equal frequency at Block level, categorized as normal, mild, moderate and severe, each with relatively homogeneous yield/yield proxy index distribution, (iii) carrying out limited number of CCE at locations randomly selected at Block level in Blocks classified as normal or moderate; and carrying out 4 CCE per GP as per PMFBY protocol in blocks classified as ‘moderate’ or ‘severe’. as per (1) above.

Though the two-step method has been shown to reduce the number of CCE by 20-30%, it has has the following limitations:

  1. Even after the estimated reduction in CCE numbers, the number CCEs to be carried out will still remain enormous.
  2. The method is in conflict with the definition of the Insurance Unit as the GP-area in PMFBY. In some GPs no CCEs may be conducted at all in some while in others only one or two CEE may be conducted in Blocks classified as normal or moderate. This may be unacceptable to both farmers and insurance companies
  3. In moderate to severe Blocks, the high sampling error concerns of only 4 CCE/GP discussed above may be further exacerbated as in adverse conditions yield variability among farms will be high.

Need for an exclusively technology driven approach to estimate crop yield or alternative index for crop  insurance

It is clear from above, and from the results of MOA sponsored pilot studies results, that smart sampling does not reduce CCE numbers significantly to manage them effectively,  nor does it  reduce the sampling and non sampling errors in yield assessment significantly at GP-area level. Thus for implementing the PMFBY’s GP-area-index model, it is more scientific to focus exclusively on a technology driven approach, based on auxiliary variables derived from high resolution RS time series data and Machine learning or other analytics tools. The IIM Ahmadabad (2018) report had also recommended an exclusively technology driven approach in their review of PMFBY scheme, when they recommended remote sensing satellite technology-driven GP-area crop yield assessment, with “minimum human involvement”.

In the technology driven approach, high resolution satellite imagery is used to: (i) estimate GP-average of yield proxies from corresponding individual pixel level yield proxies like Biomass, NDVI, EVI, LAI, NWSI or other indices crop condition, and/or (ii) directly determine GP-average crop yield from individual pixel level yield estimates derived from the yield proxy indices using ML and other analytics tools like ML, for direct use in PMFBY area-yield-index model. Since both yield proxies and estimated yield are assessed per pixel and averaged over all identified crop pixels in the GP (instead of 4 random CCE-average yield in the present PMFBY model), the sampling errors in area-index estimation can be reduced significantly, leading to reduction in basis risk for farmers.

However, it is important to recognize that ML algorithms learn from data, and more and better data leads to better predictions. The algorithms use training data to generate the algorithm in the first place. Then input data is fed to the algorithm to make a prediction, and feedback data is used to improve the prediction over time as more data becomes available, as   the algorithm learns with each set of new data. It is not therefore surprising that in nearly all pilot studies sponsored by the Ministry, the RMSE for yield predictions using various ML and other algorithms ranged from 10-70%. At this level of error, even with smartly sampled CCE, the insurance liability requirements of crop insurance for yield accuracy and  consistency will not be satisfied. More training data is necessary to improve prediction accuracy.

Clearly, the technology driven approach will need generation of large training data sets from historical and current season CEE, before yield prediction accuracies can improve to acceptable levels. Therefore the need for CCE will not be eliminated with a technology-driven approach. The difference in this case is that CEE will provide the data for training sets. Smart sampling for CCE can then be limited to generating field data for ground truth and training data sets for training machine learning algorithms. As the ML algorithms learn from the data they become more efficient and will progressively need fewer CCE over time. Further, the transparency and traceability enabled by high resolution real-time and archival RS data from multiple sources for crop health monitoring, can help traceability of claims and accountability for insurance payout decisions, leading to gains in farmer trust and confidence.

The technology based pilot studies carried out under the PMFBY provide valuable insights into national capacities and limitations in testing and scaling high resolution satellite technologies, crop models and machine learning and AI for crop yield estimation at GP level. For supporting field level decisions on insurance in complex landscapes, deeper, more systematic and rigorous approaches subject to verifiable standards of rigour in methods used are needed, to gain confidence that they can accurately capture yield losses under adverse conditions, when farmers suffer the most. Strategically improving use of time series high resolution satellite data, collection of reliable ground reference data on crop types and yields with CCEs and other means, and evaluating prediction accuracies with machine learning algorithms and models, and archiving relevant data  systematically in a freely accessible form would facilitate this task.

Scaling technology based GP level yield estimation to National level

The technology based approaches are presently at proof-of-concept stage. Scaling them  to crop-specific national scale agricultural monitoring (~250000 GPs) for diverse crops and seasons requires a paradigm shift in the use of technologies. The paradigm shift is from scattered pilot studies by multiple private agencies towards  adoption of : (i) a single national scale big data technology platform based on cloud computing architectures to develop ML/AI tools to store, process and automate generation of  multi-source, high resolution, time series RS data into analysis  ready data (ARD) for use by implementing agencies and others, and (ii) ML and AI algorithms that can be trained progressively with concurrent field survey data to generate more accurate, reliable, and periodic (monthly) cropland maps and GP-average crop yield. The accuracy and efficiency of the processes will improve as more data (both archival and current season data) come into play and the learning of the ML/AI algorithms for crop land mapping, yield estimation, and analysis tools also improves.  In due course the learning processes involved can be automated with AI tools and the number of CCE reduced to provide training datasets and for the algorithms and not for GP-area sampling for average yield.

A large number of scientific papers and a few operational cloud platforms have demonstrated the potential of combining high resolution remote sensing time series big data and ML/AI based analytics to generate ARD for cropland monitoring, in support of crop insurance. But, only a few operational applications of remote sensing for insurance exist on a limited scale. The research and pilot studies typically demonstrate technical possibilities, without considering implications for scaling the models to countrywide regions a sustainable way. However, the feasibility of establishing such a open source  cloud based platform that generates ARD for cropland mapping and vegetation condition estimation in diverse cropping systems has been demonstrated by the Sen2AgriProject (Defourney et al, 2006)[6] using Sentinel 2 (10m) and Landsat (30m) archival and near-time data. In India, efforts at developing cloud based platforms with multisource RS high resolution data and ARD for crop condition estimation for  use by relevant stakeholders is still in their infancy.

Way Forward

For authentic implementation of technology-driven PMFBY, a central, autonomous technical institution is needed (in place of multiple agencies using diverse approaches), with requisite infrastructure and multidisciplinary expertise to operationalize a countrywide, open source, high resolution crop monitoring system at pixel/parcel level with specified standards, The institution must have the requisite infrastructure and multidisciplinary expertise in agricultural remote sensing; big data cloud infrastructures, technologies and standards; and generating authentic multi-source analysis ready high resolution key agricultural remote sensing time series data sets for crop monitoring,  small area crop yield and yield loss assessments, insurance, statistics, and other relevant areas.  The institution must process data sets from multiple RS sources into authenticated analysis ready datasets that can be made available in open domain for all parties involved in crop insurance namely, insurance companies, state governments, scientists, and farmers to make independent assessments and verifications.

The IIM Ahmedabad led Committee (IIM Ahmedabad, 2018)[7] on performance evaluation of PMFBY had also recommended that technology led crop yield loss assessments for insurance payouts should be entrusted to “some scientific institution of high repute, capability and intent such as the National Remote Sensing Centre (NRSC) of Indian Space Research Organization (ISRO)….. No less credible, talented and independent an institution, which is also devoid of conflicts of interest can be trusted to carry this out. Certainly, no routine bureaucratic, quasi-scientific organization is capable of being an effective conduit”.  The present MNCFC of MoA in Delhi engaged exclusively in multisource high resolution RS based crop forecasting research can be upgraded to such a larger autonomous institution. The upgraded institution can be renamed National Institute for Crop Forecasting and Insurance (NICFI)[8].

Note that investments in NICFI are not exclusively relevant only to crop insurance, but can simultaneously benefit the agriculture/agribusiness sector in India as a whole through concurrent value added benefits of high resolution crop monitoring to a range of end users. Examples include, (i) crop area and production monitoring to complement GCES  (ii) more effective drought assessments for crop and water resources management; (iii) more precise agro-advisories to farmers by public and private agencies for crop and land management, including for precision agriculture, (iv) information to agribusinesses for efficient management of their supply chains, (v) precise long term high resolution data to scientists and policy makers for addressing climate change related concerns, among many others. Thus, investments in NICFI in national agricultural big data infrastructure and technologies, and associated institutional structures and systems to implement index insurance more effectively, will be investments for advancing Indian agriculture as a whole and risk-proof farmer investments and livelihoods[9].

[1] Central Statistical Organization (2007) Manual on Crop Area and Production Statistics,  Central statistical Organization, Giovt of India, 110 pages.

[2]Neelakantan, M (2005) Quality aspects of crop statistics in India, problems and prospects,   Dr. V. G Panse Memorial Lecture delivered during the 58th Annual Conference of Indian Society of Agricultural Statistics held at Central Marine Fisheries Research Institute (I.C.A.R.), Tatapuram PO. Kochi. on January 20. 2005; Journal of the  Indian Society of Agricultural Statistics, 59(1), 7-2.

[3] Prem Narain (2004) Estimation of Crop Yields Revisited, Dr. V.G. Panse Memorial Lecture delivered at the 56 th   Annual Conference of the Indian Society of Agricultural Statistics at VAS, Dharwad (Kamataka) on December 9,2002, Journal .of the  Society of  Agricultural  Statistics, 55(3).2002: 273-287.

[4](IIM Ahmadabad (2018). Performance Evaluation of PMFBY (Part 1): Governance Analysis, 160 pp)

[5] Smart sampling is a stratified random sampling method where a crop yield proxy variable is used to stratify the block or village into different strata of high to low yield. CCE locations are then selected randomly and proportionately from each stratum.

[6]Defourny et al (2019) Near real-time agriculture monitoring at national scale at parcel resolution: Performance assessment of the Sen2-Agri automated system in various cropping systems around the world, Remote Sensing of Environment, 221, 551-568.

[7] IIM Ahmedabad (2018) Performance Evaluation of PMFBY (Part 1): Governance Analysis”, 160 pp.

[8] Upgrading MNCFC to NICFI would be a better option than entrusting NRSC with the responsibility as suggested by the IIM Commiottee, for reasons of greater autonomy, exclusive focus on agriculture, ease of providing timely  open data access to public and private stakeholders  of  not only Indian satellite data but also global public and private sources of RS data.

[9] Keeping in view the general relevance of NICFI to agriculture sector, the institution can be given  an alternate name like National Institute for Agricultural Information Services (on the lines of INCOIS of CSIR)