Tools

Longitudinal Profile¶

Item	Description
Description	This tool creates a longitudinal profile time series based on simulation output time series. It supports the calculation points, structures, cross sections and boundaries of MIKE11 model, the calculation points, structures and cross sections of MIKE Hydro River model, and the nodes of MIKE URBAN model.
Input items	One or more simulation model objects.
Tool properties	Time relative to ToF: Time releative to ToF in seconds. Time Series Information: Information about the simulation output time series to include in the longitudinal profile. Data type: One or more types of time series to include. Time series item filter: One or more time series name filters. One of the filters must be satisfied. Value Type: One or more values to add from the time series and/or the model object. - Dynamic (Current time step) The tool will be animated in MO Desktop and Web, so that changing the current time will call the tool again. - Maximum The maximum value of the output time series. - Minimum The minimum value of the time series. - Bottom Level The bottom level property from the model object (MIKE URBAN models). - Ground Level The ground level property from the model object (MIKE URBAN models). - Left Levee Bank The left levee bank (marker 1) property from the model object (MIKE HYDRO RIVER models). - Lowest Point The lowest point (marker 2) property from the model object (MIKE HYDRO RIVER models). - Right Levee Bank The right levee bank (marker 3) property from the model object (MIKE HYDRO RIVER models).
Output items	A longitudinal profile time series.
API reference	DHI.Solutions.ScenarioManager.Tools.SimulationTimeSeriesProfileTool
Scripting	To create an instance of the tool in the scripting environment use tool = app.Tools.CreateNew(‘Longitudinal Profile’).

R Statistics¶

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, etc.) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity. R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS. In MIKE OPERATIONS, R for Windows is supported from MIKE Workbench by introducing a tool, taking R scripts as arguments. Tools has been introduced in MIKE Workbench to cover the following R-scripts. Packages used in the scripts can be found on the CRAN website (https://cran.r-project.org/) - Multi category skill score
packages: verification, abind
Tool: Skill Scores - Goodness of fit measures (GOF)
packages: hydroGOF, abind
Tool: Goodness of Fit - Forecast error model
packages: quantreg, abind, MASS, quantregGrowth
Tool: Confidence Intervals The tools will support data collection, writing input files for the scripts, executing the scripts and parsing the results into MC entities.

The implementation of R will require that R is installed on the workstation. The R installer can be downloaded from http://cran.r-project.org/ . All tools can be executed directly from the MIKE Workbench, without installing MIKE OPERATIONS.

Confidence Intervals¶

The confidence interval tool will generate error models from the collected data of the collection spreadsheet (simulated and observed data). An error model is created for each model object variable. The error models are saved in the Document Manager under the same relative path named: /R-Statistics/\<Error model name>/\<Model Object Name>/\<Model object variable name>. (See image below).

In the spreadsheet manager, a spreadsheet will be created, holding information about the error model created.

The following properties are specified for the Action=Analysis used to create the model:

| Property | Description | |----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Spreadsheet Path | The full path to the data collection spreadsheet. | | From Date To Date | Simulations with time of forecast between “From Date” and “To Date” are collected. If left blank all simulations are be collected | | Analysis name | The name of the analysis. The name is used for updating error model in the document manager. | | Confidence Intervals | A comma separated string with quantiles included in the calculation of the error model. The list must use decimal points. Default: "0.05,0.175,0.5,0.875,0.95" | | Model Type | Model Type value can be 1, 2, 3 or 4.
(1)Quantile regression model
(2)Quantile regression model using initial error at time of forecast.
(3)Quantile regression model with NQT transformation
(4)Quantile regression model with NQT transformation and using initial error at time of forecast | | Use Weights | If weights should be used in quantile calibration. Each simulation is given a weight by its magnitude rank in the dataset rank of data/number values. Lowest value in a size 10 data set would get 1/10 as weight and largest 10/10. | Action=Execute will create an ensemble time series representing the uncertainty in the deterministic forecast at the quantiles specified during the generation of the error model.

The following properties must be specified for Action=Execute. | Property | Description | |----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Confidence Intervals | A comma separated string specifying the quantiles that will be calculated. The list must use decimal points. Default: "0.05,0.175,0.5,0.875,0.95". The confidence intervals specified must be supported by the error model as specified during the creation in the range ]0 ; 1[. It can be a subset for the confidence of the supported confidence intervals. | | Error Model Path | The full path to the error model in the document manager. | | Last Error | The last error when taking initial error into consideration. Last error is the difference between the observed last observed value and a previous simulation (obs.value - sim.value). If Last Error is set to 0, it is not taken into account. | | Time Series Path | The full path of the time series to generate confidence intervals for. |

Goodness of Fit¶

The goodness of fit measures intends to summarize how well two continuous datasets fit each other. They might compare distributional parameters (statistical moments, IQR, Peak time frequency, etc.), residual errors (RMSE, ME, bias, NRMSE, etc.), data patterns (correlation coefficient, NSE) or concurrent comparisons (linear regression, peak difference, peak percentage error, etc.). The Goodness of Fit tool calculates goodness of fit measures based on a collected simulations and observations. The tool will generate a result spreadsheet containing goodness of fit measures for each station (Model object and model object variable) and for each lead time in the forecast period. Running Action=Analysis requires the following properties.

Property	Description
Spreadsheet Path	The full path to the data collection spreadsheet.
From Date To Date	Simulations with time of forecast between “From Date” and “To Date” are collected. If left blank all simulations are be collected.
Results Spreadsheet Path	The full path of the results spreadsheet.

The results contains in the rows different performance measures and in the columns for each lead times in ascending order. The table below describes the measures supported by the Goodness of Fit tool. Refer to the “hydroGOF” CRAN package for more information.

Measure	Description
MeanError	Mean error or bias between simulation and observation. where Q_o^t is observation and Q_m^t simulation at time t.
MeanAbsoluteError	Mean absolute error calculates the mean of the absolute error: Where Q_o^t is observation and Q_m^t simulation at time t.
MeanSquaredError	Mean Square Error MSE is the mean square error or the residual mean square. It is a biased estimator in contrast to the variance which is unbiased.MSE=SSE/n Where SSE is the sum of squared errors and n the number of observation and estimation pairs. A value closer to 0 indicates a fit that is more useful for prediction.
RootMeanSquareError	Root Mean Square Error (RMSE) is also known as the fit standard error and the standard error of the regression. It is an estimate of the standard deviation of the random component in the data, and is defined as the square root of the Mean Squared Error. Where Q_o^t is observation and Q_m^t simulation at time t. Just as with the mean squared error, an RMSE value closer to 0 indicates a fit that is more useful for prediction.
RootMeanSquareErrorNormalized	A normalized version of the Root mean square error that has been divided by the range of the dataset. Range from -100% to 100%.
PercentBias	Percent bias measures the average tendency of the simulated values to be larger or smaller than their observed ones. The optimal value of PBIAS is 0.0, with low-magnitude values indicating accurate model simulation. Positive values indicate overestimation bias, whereas negative values indicate model underestimation bias
RootMeanSquareErrorRatio	Root Mean Square Error Ratio
StandardDeviationRatio	The ratio between the standard deviation of the simulation and observation:
NashSutcliffeEfficiency	The Nash–Sutcliffe model efficiency coefficient is used to assess the predictive power of hydrological models. It is defined as: where Qo is the mean of observed discharges, and Qm is modeled discharge. Q_o^t is observed discharge and Q_m^t simulated discharge at time ^t. Nash–Sutcliffe efficiency can range from −∞ to 1. An efficiency of 1 (E = 1) corresponds to a perfect match of modeled discharge to the observed data. An efficiency of 0 (E = 0) indicates that the model predictions are as accurate as the mean of the observed data, whereas an efficiency less than zero (E* \< 0) occurs when the observed mean is a better predictor than the model or, in other words, when the residual variance (described by the numerator in the expression above), is larger than the data variance (described by the denominator). Essentially, the closer the model efficiency is to 1, the more accurate the model is.
NashSutcliffeEfficiencyModified	Same as the “NashSutcliffeEfficiency” but using absolute errors instead of squared errors. where Qo is the mean observation, Q_o^t observation and Q_m^t simulation at time t. Making it less sensitive to extreme values.
NashSutcliffeEfficiencyRelative	Same as the “NashSutCliffeEfficiency” but relative to the mean of the observation. where Qo is the mean observation. Q_o^t is observation and Q_m^t simulation at time t
IndexAgreement	The Index of Agreement (d) developed by Willmott (1981) as a standardized measure of the degree of model prediction error and varies between 0 and 1. A value of 1 indicates a perfect match, and 0 indicates no agreement at all (Willmott, 1981). It is defined by: where is the mean observation. Q_o^t is observed discharge and Q_m^t simulated discharge at time t. d varies from 0 to 1 whit increasing values meaning better agreement between simulation and observations. The index of agreement can detect additive and proportional differences in the observed and simulated means and variances; however, it is overly sensitive to extreme values due to the squared differences (Legates and McCabe, 1999).
IndexAgreementModified	In the modified index of agreement method the squaring has been removed in order to make it less sensitive to extreme values. where is the mean observation. Q_o^t is observed discharge and Q_m^t simulated discharge at time t. It varies from 0 to 1 whit increasing values meaning better agreement between simulation and observations. The modified version limits the inflation from squaring the terms.
IndexAgreementRelative	Same as the modified index of agreement but relative to the observation. The relative index of agreement is defined by: where is the mean observation. Q_o^t is observed discharge and Q_m^t simulated discharge at time t. d varies from 0 to 1 whit increasing values meaning better agreement between simulation and observations.
PersistenceIndex	Compares the squared error of the simulation with the squared error if the previous observation was used as forecase. Where Q_m is simulation and Q_o^t is is observation at time t. Qot-1 is observation at the previous time step at time t-1. Nb. This score does not seem to be appropriate when applied on lead times as the persistence would be the observed value from the previous forecast.
PearsonCorrelationCoefficient	The pearson correlation coefficient is the covariance of the observed and simulated values divided by the product of their standard deviations: Where is the pearson correlation coefficient, cov the covariance, the standard diviation, Qm the simulation and Qo the observation.
CoefficientDetermination	Coefficient of determination denoted r2 is the square value of the coefficient of correlation: Where is the mean observation, the mean simulation. Q_o^t is observed discharge and Q_m^t simulated discharge at time t. R2 range between 0 and 1 and describes how much of the observed dispersion is explained by the simulation. Where b is the slope b is the slope of the regression line between sim and obs.
CoefficientDeterminationMultiplied	Same as Coefficient of determination but multiplied by the regression line between simulation and observation to penalize for systematic errors in forecast as the correlation of coefficient only compares dispersion.
KlingGuptaEfficiency	Kling-Gupta efficiency(KGE) between simulation and observation. This goodness-of-fit measure was developed by Gupta et al. (2009) to provide a diagnostically interesting decomposition of the Nash-Sutcliffe efficiency (and hence MSE), which facilitates the analysis of the relative importance of its different components (correlation, bias and variability) in the context of hydrological modelling Kling et al. (2012), proposed a revised version of this index, to ensure that the bias and variability ratios are not cross-correlated. In the computation of this index, there are three main components involved: (1)r : the Pearson product-moment correlation coefficient. Ideal value is r=1 (2) Beta : the ratio between the mean of the simulated values and the mean of the observed ones. Ideal value is Beta=1 (3) vr : variability ratio, which could be computed using the standard deviation (Alpha) or the coefficient of variation (Gamma) of simulation and observation, depending on the value of method (3.1) Alpha: the ratio between the standard deviation of the simulated values and the standard deviation of the observed ones. Ideal value is Alpha=1. (3.2) Gamma: the ratio between the coefficient of variation (CV) of the simulated values to the coefficient of variation of the observed ones. Ideal value is Gamma=1.KGE can range from −∞ to 1 and improves as it becomes closer to 1.
VolumetricEfficiency	The Volumetric efficiency is given by: Where Q_o^t is observed discharge and Q_m^t simulated discharge at time t. VE ranges between 0 and 1 where simulation improves as VE increases.
InterQuantileRange	Is a measure of spread in the data and is defined as the difference between the upper and lower quartile:
PeakTimeFrequencyObserved	Frequency of Peak in observed time serie
PeakTimeFrequencySimulated	Frequancy of peak in simulated time serie
MedianSimulated	The median value of the simulation
MedianObserved	The median value of the observation

Action=Execute will simply list the content of the spreadsheet in a table.

Specify the result spreadsheet and the model object name and variable of the station to display. Run to ‘List’ or to ‘Chart’.

Skill Scores¶

The skill of a forecast expresses the relative accuracy of a set of forecasts compared to a reference forecast. Often the value of a given measure in itself is hard to assess but by referencing to another forecast the relative improvement is found which is easier to evaluate. Skill scores are commonly used in this assessment and are formulated:

Where A is the accuracy of the forecast, Aref the reference and Aperf the value a forecast would take if it was perfect. The skill score therefore measure the improvement over a reference forecast compared to that of a perfect forecast set. Any quality measure can be used as accuracy, both goodness of fit measures and quality attributes for the joint distributions.

The reference forecast could for example be persistency or the climatology estimated from observations. It might be of interest to calculate how much a set of forecasts from a recalibrated model compare to the old model calibration using this as reference. The Skill Score tool will generate a result spreadsheet containing skill score results for each station (Model object and model object variable). Skill scores can be derive from the forecast attributes in a contingency table making the attributes relative to a reference. A further advantage is that the approach can be extended to multi categorical contingency tables if accuracy measures are considered and a single scalar value thereby can represent the accuracy of the complete sample joint probability. By accuracy we mean comparing the correct against the incorrect forecasts disregarding the type of error. The skill scores “Heidke skill score”(HSS) and “Peirce skill score”(PSS) are commonly used but do not consider the rarity of the event therefore it is recommended to use the Gerrity skill score(GS). The forecast categories need to be ordinal as the GS rewards correct forecast of rare events higher than for an event that often occur and also the distance between forecast and observed category. For a flood forecasting where the categories ranges from high to low a flood warning issued as high is penalized harder if the following observed event is low than if it was medium. Running Action=Analysis requires the following properties.

Analysis will run time R package “verification” and create or update the results spreadsheet specified.

Property	Description
Spreadsheet Path	The full path to the data collection spreadsheet.
From Date To Date	Simulations with time of forecast between “From Date” and “To Date” are collected. If left blank all simulations are be collected.
Results Spreadsheet Path	The full path of the results spreadsheet.
Crossing Time Tolerance	Maximum time steps allowed between observed and forecasted crossing of warning threshold. If exceeded forecasted warning level is considered a miss.
Max Lead Time Steps	Time steps after the maximum lead time step value are not included in calculations. Default -1 means that all time steps are included.

The result spreadsheet will contain a sheet per station. The columns will contain the results for each skill score, the rows will contain the thresholds. Action=Execute will simply list the content of the spreadsheet in a table.

Specify the result spreadsheet and the model object name and variable of the station to display.

The table below shows the available skill scores. Please refer to the CRAN package “verification” for more information.

Skill Score	Description
PercentCorrect	The percent correct is the percent of forecasts that are correct.
BiasScore	Bias score is the bias in frequency between forecasted and observed events.
CriticalSuccessIndex	The Critical Success Index (CSI) is also called the Threat Score. Its range is 0 to 1, with a value of 1 indicating perfect forecast. The CSI is relatively frequently used, with good reason. Unlike the POD and the FAR, it takes into account both false alarms and missed events, and is therefore a more balanced score. The CSI is somewhat sensitive to the climatology of the event, tending to give poorer scores for rare events.
HeidkeSkillScore	The Heidke Skill score (HSS) is in the usual skill score format. Skill = (score value – score for reference forecast) / (perfect score – score for reference forecast) For the HSS, the "score" is the number correct or the proportion correct. The "reference forecast" is usually the number correct by chance or the proportion correct by chance.The HSS measures the fractional improvement of the forecast over the reference forecast. Like most skill scores, it is normalized by the total range of possible improvement over the reference, which means Heidke Skill scores can safely be compared on different datasets. The range of the HSS is -∞ to 1. Negative values indicate that the chance forecast is better, 0 means no skill, and a perfect forecast obtains a HSS of 1. The HSS is a popular score, partly because it is relatively easy to compute and perhaps also because the standard forecast, chance, is relatively easy to beat
PeirceSkillScore	Peirce skill score (PSS) is similar to Heidke Skill score (HSS), except the reference hit rate in the denominator is random and unbiased forecasts.
GerrityScore	The Gerrity skill score (GSS) is a weighted sum of elements in the contingency table of possible combinations of the forecast and observed categories, where the a penalty matrix favour forecasts closer to the observed categories and reward forecasts of rare events.
PercentCorrectCategory	Percent correct by category (vector)
ProbabilityOfDetection	Probability Of Detection (POD) is also called the hit score (H). The range of H is 0 to 1 and the score is positively oriented. A perfect score is 1. Since the formula for H contains reference to “misses” and not to “false alarms”, the hit rate is sensitive only to missed events rather than false alarms. This means that one can always improve the hit rate by forecasting the event more often, which usually results in higher false alarms and, especially for rare events, is likely to result in an overforecasting bias.
ProbabilityOfFalseDetection	Probability Of False Detection
FalseAlarmRatio	False Alarm Ratio (FAR) is the fraction of the forecasts of the event associated with non-occurrences. The FAR can be controlled by deliberately underforecasting the event; such a strategy risks increasing the number of missed events, which is not considered in the FAR. For this reason, the POD and the FAR should both be considered for a better understanding of the performance of the forecast.

Here P are the entrances in the contingency table and S a penalty matrix. The GSS takes values between -infinity and 1, where values over 0 indicate that the forecast model is more skillful than climatology and a value of 1 indicates a perfect forecast. It is advisable to use several scores in the assessment as they reveal different aspects of the forecast performance.

R-Statistics¶

The R-Statistics tool calls R-Statistics in a separate process with the specified parameters.

This tool is used by the Confidence Interval, performance and Skill score tools.

Property	Description
Arguments	Argument string to use for executing R in batch mode through the rscript.exe method.
Working Folder	The working folder containing the scripts specified in the arguments property.

Data Collection¶

The R-statistics tools “Confidence intervals”, “Goodness of Fit” and “Skill Scores” use the data collection spreadsheet in the format generated by using the DataCollection option. This means that all the tools use the same data collection spreadsheet for carrying out analysis. When collecting data, the following properties are available in the tools:

Property	Description
Update Mapping Sheet	A value indicating whether the mapping sheet should be updated, so that a row in the mapping sheet and a data collection sheet is present for all output time series included in the specified scenario.
Mask	Mask for finding output time series for mapping. If the mask is not specified, no time series is found for mapping. Regular expressions are used for finding matching output time series. The mask is shown only when updating the mapping sheet.
Spreadsheet Path	The full path to the data collection spreadsheet.
Scenario Path	The path of the scenario from where data is collected.
From Date To Date	Simulations with time of forecast between “From Date” and “To Date” are collected. If left blank all simulations are be collected.
Period Interval	The time unit used when collecting data. This interval can only be specified when new data collection spreadsheets are created. For existing spreadsheets, the collection interval is specified by the spreadsheet.
Period Length	The length of a period interval used when collection data. This interval can only be specified when new data collection spreadsheets are created. For existing spreadsheets, the collection interval is specified by the spreadsheet.
Data Collection Spreadsheet

The image below shows a sample of a data collection spreadsheet.

The data collection spreadsheet contains 3 types of sheets.

Configuration
A single sheet containing Information about what scenario to collect data from and a description of intervals..

1. Data Collection Scenario Path
Full path to the scenario containing forecasted data. 2. Period Interval
Period interval for time steps when collecting data (seconds, minutes, hours, days, months or years). 3. Period Interval Length
The length of each time step when collecting data.
Mapping
A single sheet containing information about stations where data should be collected and what output- and observations time series to use as well as thresholds for the station.

1. Sheet Name
The name of the sheet where data from the station is collected. 2. Model Object Name
Name of the model object containing the output time series. 3. Model Object Variable
Variable of the time series. 4. Observation Time Series
Path to observation time serie corresponding to the simulation point.. 5. Thresholds
Threshold defined for the simulation point.. Note that the thresholds should be in descending order. 3. Collected data
A sheet per station where data is collected containing forecasted and observed time series. Scenario Tools