The protocol

This protocol intends to provide a common basis for model testing, to make results more comparable and to ease their analysis. Tests are proposed on catchments showing non stationarities. This common evaluation protocol should be followed on the study cases the participants will run. Everyone is of course free to do more than what is advised in this protocol and to apply complementary methods. Running only one study case is enough to have results considered in the common assessment (of course the more the better !).

Since the implementation of some models require more work for performing the proposed experiments, we define three testing “levels”. Levels 1 and 2 correspond to common evaluation techniques. Level 3 is the most advanced part, and should be considered as the main objective of the workshop. In this third level, participants are free to use the methods they think the most appropriate. The workshop is designed for hydrologic models running at a daily timestep. All the criteria will be calculated on the daily discharge time series.

Who can participate?

Basically, anyone with a hydrologic model that can produce daily discharge from precipitation and temperature (or potential evapotranspiration) time series. Of course, the Level 2 of the protocol will not be accessible to models not needing calibration. Since we cannot provide detailed information about vegetation, elevation or physical characteristics needed by some models (e.g. physically-based models), such models can only be tested on the basins that were already studied in previous works by the participants. Information can also be collected from the literature.


After applying the testing protocol, participants are expected to send the results of their discharge simulations to Guillaume Thirel. We will also calculate the criteria in order to be sure of their comparability, even if it will remain up to the participants to also do it if they want.


Protocol description


In the following, the terms “complete period” and “pre-defined calibration periods” are periods defined in each metadata file. Meteorological data are available for each case study over a couple of years before the beginning of the “complete period” for initialization/warm-up purposes.



  • Level 1 (beginner): Single model parameterization on the whole period

Define a single parameterization of your model to the complete period, and submit

  1. The simulated discharge time series
  2. The efficiency criteria computed on the 5 sub-periods specified in the metadata file (Note: if you provide us the simulated discharge time series, we can calculate the scores)


Calibration Period

Complete period

Criterion computation period (‘validation period’)











Complete period Cp

Note: Level 1 is adapted to all participants: model parameters can be either calibrated or defined using catchment descriptors.



  • Level 2 (normal): Multi-parameterization on sub-periods

After performing Level 1 exercise, Level 2 will consist in applying the model to simulate the flow time series on the complete period with 5 different parameterizations. Each parameterization should be obtained through calibration on one of the 5 pre-defined periods. We only impose the periods of calibration (see the metadata files, periods P1 to P5). Participants are free to use the calibration procedure and criteria they find the most appropriate.

Each period yields a parameter set representative of the behaviour of the study catchment during this time interval. For each of the 5 sub-periods, we ask you to perform a calibration, and then to simulate the whole time period. We will then compute a 5×5 matrix of efficiency criteria corresponding to cross-validation on each sub-period:


Calibration Period






Criterion computation period (‘validation period’)












  • Where Ci,j represents the value of the given criterion, computed on period Pi, with the simulation obtained using the parameters calibrated on period Pj. On the diagonal, we have criteria values corresponding to the calibration case, elsewhere, values in validation.
  • Based on the 5 simulated time series produced by each participant, we will also compute the evolution of a few flow quantiles, as summarized in Figure 1.
  • For arid or small catchments, we will calculate the frequency of days with simulated discharge lower than 5% of the average observed discharge.


Figure 1: Summary of the procedure used to generate the evaluation matrix



  • Level 3 (expert): Improvement of model behaviour in non-stationary conditions

The methodologies to be applied in third level are left at the choice of the participants. The objective is to improve the ability of the model to cope with non-stationary conditions, i.e. to improve the efficiency matrix calculated in Level 2. Be creative and try innovative solutions to solve the problems you observe.

This may include multi-criteria calibration methods (Guinot et al., 2011; Madsen, 2000; Tiemeyer et al., 2007), decisions you make on the choice of the parameters sets to address issues such as equifinality, parameter interdependency, etc…  (Andréassian et al., 2011; Beven, 2006; Beven and Binley, 1992). On catchments with a non-stationary land cover, explicitly accounting for it in the model can be effective. On catchments experiencing a warming trend, identifying a relationship between parameter values and climate variables could be tried.

These are only suggestions, better ideas can obviously be found!



Evaluation criteria: (don't forget that we can do it for you if you send us the simulated discharges)


  • calculate the bias (over the 5 pre-defined periods)
  • calculate the Nash-Sutcliffe coefficient (Nash and Sutcliffe, 1970) on discharges (score sensible to high flows) and inverse of discharges (low flows; see the definition of NSEiQ in Pushpalatha et al., 2012)


  • Calculate simulated quantiles: Q0.95 (quantile that is exceeded only 5% of the time), Q0.85, Q0.15, Q0.05
  • For arid or small catchments, calculate the frequency of days with simulated discharge lower than 5% of the average observed discharge



Andréassian V., N. Le Moine, C. Perrin, T. Jakeman, MH Ramos, L. Oudin, T. Mathevet and J. Lerat. (2011). All that glitters is not gold: the case of hydrological models' calibration. Hydrological Processes. 26, 14 (2011) pp. 2206-2210. Link:

Beven K.J.. (2006) A manifesto for the equifinality thesis. J. Hydrol., 320 (2006), pp. 18–36. Link:

Beven K.J., A.M. Binley. (1992) The future of distributed models: model calibration and uncertainty prediction. Hydrol. Process., 6 (1992), pp. 279–298. Link:

Guinot V., B. Cappelaere, C. Delenne, D. Ruelland. (2011) Towards improved criteria for hydrological model calibration: theoretical analysis of distance- and weak form-based functions. Journal of Hydrology 401 (2011) 1–13. Link:

Madsen H. (2000) Automatic calibration of a conceptual rainfall-runoff model using multiple objectives. J. Hydrol., 235 (2000), pp. 276–288. Link:

Nash J.E., J.V. Sutcliffe. (1970) River flow forecasting through conceptual models. Part I. A discussion of principles. J. Hydrol., 10 (1970), pp. 282–290. Link:

Pushpalatha, R., C. Perrin, N. Le Moine, and V. Andréassian. 2012. A review of efficiency criteria suitable for evaluating low-flow simulations, Journal of Hydrology, 420–421: 171–182. Link:

Tiemeyer B., R. Moussa, B. Lennartz, M. Voltz. (2007) MHYDAS-DRAIN: A spatially distributed model for small, artificially drained lowland catchments. Ecological Modelling, Volume 209, Issue 1, 24 November 2007, Pages 2–20. Link: