This protocol intends to provide a common basis for model testing, to make results more comparable and to ease their analysis. Tests are proposed on catchments showing non stationarities. This common evaluation protocol should be followed on the study cases the participants will run. Everyone is of course free to do more than what is advised in this protocol and to apply complementary methods. Running only one study case is enough to have results considered in the common assessment (of course the more the better !).
Since the implementation of some models require more work for performing the proposed experiments, we define three testing “levels”. Levels 1 and 2 correspond to common evaluation techniques. Level 3 is the most advanced part, and should be considered as the main objective of the workshop. In this third level, participants are free to use the methods they think the most appropriate. The workshop is designed for hydrologic models running at a daily timestep. All the criteria will be calculated on the daily discharge time series.
Who can participate?
Basically, anyone with a hydrologic model that can produce daily discharge from precipitation and temperature (or potential evapotranspiration) time series. Of course, the Level 2 of the protocol will not be accessible to models not needing calibration. Since we cannot provide detailed information about vegetation, elevation or physical characteristics needed by some models (e.g. physicallybased models), such models can only be tested on the basins that were already studied in previous works by the participants. Information can also be collected from the literature.
After applying the testing protocol, participants are expected to send the results of their discharge simulations to Guillaume Thirel. We will also calculate the criteria in order to be sure of their comparability, even if it will remain up to the participants to also do it if they want.
Protocol description
In the following, the terms “complete period” and “predefined calibration periods” are periods defined in each metadata file. Meteorological data are available for each case study over a couple of years before the beginning of the “complete period” for initialization/warmup purposes.
Evaluation:
 Level 1 (beginner): Single model parameterization on the whole period
Define a single parameterization of your model to the complete period, and submit
 The simulated discharge time series
 The efficiency criteria computed on the 5 subperiods specified in the metadata file (Note: if you provide us the simulated discharge time series, we can calculate the scores)

Calibration Period 

Complete period 

Criterion computation period (‘validation period’) 
P1 
C_{1} 
P2 
C_{2} 

P3 
C_{3}  
P4 
C_{4}  
P5 
C_{5} 

Complete period  C_{p} 
Note: Level 1 is adapted to all participants: model parameters can be either calibrated or defined using catchment descriptors.
 Level 2 (normal): Multiparameterization on subperiods
After performing Level 1 exercise, Level 2 will consist in applying the model to simulate the flow time series on the complete period with 5 different parameterizations. Each parameterization should be obtained through calibration on one of the 5 predefined periods. We only impose the periods of calibration (see the metadata files, periods P1 to P5). Participants are free to use the calibration procedure and criteria they find the most appropriate.
Each period yields a parameter set representative of the behaviour of the study catchment during this time interval. For each of the 5 subperiods, we ask you to perform a calibration, and then to simulate the whole time period. We will then compute a 5×5 matrix of efficiency criteria corresponding to crossvalidation on each subperiod:

Calibration Period 

P1 
P2 
P3 
P4 
P5 

Criterion computation period (‘validation period’) 
P1 
C_{1,1} 
C_{1,2} 
… 
… 
… 
P2 
C_{2,1} 
C_{2,2} 
… 
… 
… 

P3 
… 
… 
… 
… 
… 

P4 
… 
… 
… 
C_{4,4} 
… 

P5 
… 
… 
… 
… 
C_{5,5} 
 Where C_{i,j} represents the value of the given criterion, computed on period Pi, with the simulation obtained using the parameters calibrated on period Pj. On the diagonal, we have criteria values corresponding to the calibration case, elsewhere, values in validation.
 Based on the 5 simulated time series produced by each participant, we will also compute the evolution of a few flow quantiles, as summarized in Figure 1.
 For arid or small catchments, we will calculate the frequency of days with simulated discharge lower than 5% of the average observed discharge.
Figure 1: Summary of the procedure used to generate the evaluation matrix
 Level 3 (expert): Improvement of model behaviour in nonstationary conditions
The methodologies to be applied in third level are left at the choice of the participants. The objective is to improve the ability of the model to cope with nonstationary conditions, i.e. to improve the efficiency matrix calculated in Level 2. Be creative and try innovative solutions to solve the problems you observe.
This may include multicriteria calibration methods (Guinot et al., 2011; Madsen, 2000; Tiemeyer et al., 2007), decisions you make on the choice of the parameters sets to address issues such as equiﬁnality, parameter interdependency, etc… (Andréassian et al., 2011; Beven, 2006; Beven and Binley, 1992). On catchments with a nonstationary land cover, explicitly accounting for it in the model can be effective. On catchments experiencing a warming trend, identifying a relationship between parameter values and climate variables could be tried.
These are only suggestions, better ideas can obviously be found!
Evaluation criteria: (don't forget that we can do it for you if you send us the simulated discharges)
Required:
 calculate the bias (over the 5 predefined periods)
 calculate the NashSutcliffe coefficient (Nash and Sutcliffe, 1970) on discharges (score sensible to high flows) and inverse of discharges (low flows; see the definition of NSE_{iQ} in Pushpalatha et al., 2012)
Additional:
 Calculate simulated quantiles: Q_{0.95} (quantile that is exceeded only 5% of the time), Q_{0.85}, Q_{0.15}, Q_{0.05}
 For arid or small catchments, calculate the frequency of days with simulated discharge lower than 5% of the average observed discharge
References:
Andréassian V., N. Le Moine, C. Perrin, T. Jakeman, MH Ramos, L. Oudin, T. Mathevet and J. Lerat. (2011). All that glitters is not gold: the case of hydrological models' calibration. Hydrological Processes. 26, 14 (2011) pp. 22062210. Link: http://onlinelibrary.wiley.com/doi/10.1002/hyp.9264/abstract
Beven K.J.. (2006) A manifesto for the equifinality thesis. J. Hydrol., 320 (2006), pp. 18–36. Link: http://www.sciencedirect.com/science/article/pii/S002216940500332X
Beven K.J., A.M. Binley. (1992) The future of distributed models: model calibration and uncertainty prediction. Hydrol. Process., 6 (1992), pp. 279–298. Link: http://onlinelibrary.wiley.com/doi/10.1002/hyp.3360060305/abstract
Guinot V., B. Cappelaere, C. Delenne, D. Ruelland. (2011) Towards improved criteria for hydrological model calibration: theoretical analysis of distance and weak formbased functions. Journal of Hydrology 401 (2011) 1–13. Link: http://www.sciencedirect.com/science/article/pii/S0022169411000990
Madsen H. (2000) Automatic calibration of a conceptual rainfallrunoff model using multiple objectives. J. Hydrol., 235 (2000), pp. 276–288. Link: http://www.sciencedirect.com/science/article/pii/S0022169400002791
Nash J.E., J.V. Sutcliffe. (1970) River flow forecasting through conceptual models. Part I. A discussion of principles. J. Hydrol., 10 (1970), pp. 282–290. Link: http://www.sciencedirect.com/science/article/pii/0022169470902556
Pushpalatha, R., C. Perrin, N. Le Moine, and V. Andréassian. 2012. A review of efficiency criteria suitable for evaluating lowflow simulations, Journal of Hydrology, 420–421: 171–182. Link: http://www.sciencedirect.com/science/article/pii/S0022169411008407
Tiemeyer B., R. Moussa, B. Lennartz, M. Voltz. (2007) MHYDASDRAIN: A spatially distributed model for small, artificially drained lowland catchments. Ecological Modelling, Volume 209, Issue 1, 24 November 2007, Pages 2–20. Link: http://www.sciencedirect.com/science/article/pii/S0304380007003559