The statistics of cultivated area (in hectares, ha, for sown areas) and production (in kg) at the departmental level from 1900 to 1988 were collected from national agricultural statistics books (“Annual agricultural statistics” or “Annuaire of agricultural statistics”) compiled by the French Ministry of Agriculture; detailed references are provided in the supplementary information. The issues were scanned manually from photocopied versions of the original paper documents. Data from 1989 to 2018 are taken from digital statistics from the Agreste database (“Annual agricultural statistics” compiled by the Statistics and Forecasting Service (SSP), General Secretariat of the Ministry of Agriculture, Agrifood and Forest (MAAF), France); details are provided in the additional information. Yields were calculated from total production and sown area for each department to avoid apparently often incorrect yield values printed in old statistical books. Yields are given in kilograms per hectare (kg/ha, for sown area) for dry mass with a moisture content of 10-16%, depending on the crop.
Data are available for ten crops: soft wheat (spring and winter separately), durum wheat, maize, oats (spring and winter), rapeseed (spring and winter), barley (spring and winter), potato, sugar beet , sunflower and wine. The division into spring and winter crops ultimately results in 18 distinct crop-cultivar types. The delays with the available data and the correspondence between the French and English names are provided in Table 1.
The forms of the French departments have evolved over time. We use the 96 metropolitan departments (metropolitan France) in their present form and subsume the historical values to the modern departments as follows. Corsica was a single department until 1975 but then split into Corse-du-Sud and Haute-Corse. Data from Corsica up to 1975 have been distributed equally (area, production) or copied (yield) in the two new departments. Seine and Seine-et-Oise were two departments until 1967, then subdivided into seven new departments on January 1, 1968. To take this into account, we consider the values of the seven new departments (Essonne, Hauts-de-Seine, Paris, Seine-Saint-Denis, Val-de-Marne, Val-d’Oise, Yvelines) only from 1968 and combine the two former departments into a single counterfactual (“Seine_SeineOise” in the data tables) until in 1967.
Multiple crops per year within this set of crops are accounted for by separate surface data, but are practically non-existent in France6.
Some yield values had to be considered as outliers, also after checking for scanning errors. There were four criteria for defining an outlier. First, absolute efficiency values above a physiologically unattainable threshold have been removed; the threshold values were 15 t/ha for barley and durum wheat, 200 t/ha for sugar beet and potato, 20 t/ha for maize, oats and wheat, 10 t/ ha for rapeseed and sunflower and 200 hl/ha for wine. These thresholds were chosen to eliminate visually obvious outliers, likely due to inconsistencies between area and production records. The values are set slightly above the current maximum yields achieved, thus remaining permissive and only removing obvious errors in this first step. In addition, all yield values for winter rapeseed in 1944, spring rapeseed in 1968 and spring barley in 1980 were deleted due to erroneous values in yearbooks. This first step removed a total of 167 yield data points. Second, the top 1% of yield values in all departments per decade have been removed. Third, values above or below the mean +/- four times the standard deviation of each crop department time series (for yield, area and production separately) were removed. Fourth and finally, a similar variance filter as in the third step was applied in each decade of a single time series, filtering out values above or below the decadal average +/- two (for yield) or three ( area, production) decadal standard deviations . The last three filters removed, on average, 3.6% of yield data and 0.2% of area or production data, respectively (Table 1). There were, on the median, 43 outliers per department (out of 1,260 data points on average), with a range of 4 (department Hauts de Seine) and 255 (North) and an interquartile range of 35 to 50 outliers. Outliers were masked as missing values to avoid introducing bias from any correction. In the attached datasets we provide two versions of the full dataset, one without any corrections (“RAW”) and one where the filters described above have been applied (“FILTERED”).
Aggregated national data on area, production and yield from our dataset have been validated with national data from 1961 to 2018 provided by FAO (http://faostat3.fao.org/home/E). Area and production data for crops with separate spring and winter data were summed at the department level to test agreement with digitized area and production data for the “total” crop.