Quality Control
Quality Control¶
MIKE INFO offers a framework for performing Quality Control of data via scripts. The Quality Control can be invoked in Data Exchange and/or used in scripts an jobs to automate control of data (the latter assumes a MIKE OPERATIONS license or additional purchase of the Real-time license).
The following describes the configuration and use of DataQC as the component is called in both Data Exchange.
The concept¶
A spreadsheet is used to hold the configuration of which entities of data should be controlled and how. Scripts are used to perform the actual checks.
Setting up DataQC configuration¶
A spreadsheet is used to configure DataQC. The spreadsheet can be filled using MIKE Workbench.
The spreadsheet can be located anywhere, but it is suggested to name it /DataQC/ValidationConfiguration.
The spreadsheet MUST have all of the following worksheets (case-sensitive names):
-
Timeseries
-
Featureclass
-
Document
-
Raster
-
Spreadsheet
-
Script
Each worksheet defines a set of rules – one rule per line – giving how an entity or set of entities must be validated. The structure of each of the sheets is the same.
Figure 1 Spreadsheet layout for DataQC
Each worksheet must contain a header row followed by rows for each of the rules.
A rule is defined by
-
Name – a name of the rule
-
Entity Path – a path of the entity in the corresponding manager/explorer. Wildcards are allowed
‘*’ can be used in place of a group or entity name
‘**’ can be used in place of a set of groups
In the example above ‘/WRIS/**/*’ means that all entities in all groups below group /WRIS must be checked by the scripts of the rule.
-
Target Entity Path – an absolute path where to store quality controlled time series. This is not used for pure validation, but in conjunction with the FullQC option where the scripts return a modified entity which can be stored in a different location. This enables using the DataQC component for e.g. quality control and correction of realtime time series.
-
Scripts… - a series of scripts (one per cell) from column D and onwards to be executed by the framework for each of the entities matching the Entity Path pattern.
Each script is given with the full path and if required additional arguments in parenthesis “( )”, e.g. a script validating the allowed range of a time series between 5 and 19 could be specified as “/DataQC/Validation/ValidateTimeSeriesRange(5 , 19)”
See detailed requirements for the scripts in Creating DataQC scripts.
Creating DataQC scripts¶
DataQC scripts are MIKE OPERATIONS scripts defined in Script Manager. The suggested structure of such scripts is shown in Figure 2.
Figure 2 Organization for scripts for DataQC
Script in /DAtaQC/QC are used for FullQC, i.e. quality control and correction of realtime data. Scripts in /DataQC/Validation are used for Validation, and called in MIKE INFO DataExchange. Scripts in /DataQC/Test are functions used for testing when developing the other scripts.
Examples of validation scripts are
-
ValidateFeatureClass – this script will check if a featureclass has attributes and shapes
-
ValidateTimeSeriesName – this script performs check on time series name according to the MIKE INFO convention.
-
ValidateTimeSeriesType – the script validates if the time series name, type and unit match the expected according to the MIKE INFO convention.
Any number of validation scripts can be defined.
A script function must full fill three requirements to be used with the framework:
-
Be robust, i.e. it should not raise exceptions, but handle problems and express it as an error on the entity.
-
Take as minimum two arguments:
-
EntityPath – i.e. the full path of an entity, i.e. a string.
-
Entity – the object of the entity, i.e. an IDataSeries in case of time series
Additional arguments are possible, but then they must be provided in the spreadsheet
-
-
Return and object of type DHI.Solutions.IMS.Business.DataQC.ResultInformation
with the result of the validation performed by the script.
In case of FullQC the modified entity should be provided as ResultInformation. Entity, see more details on FullQC below.
Developing and Testing scripts¶
Developing new scripts involves defining the functionality, coding the script(s) and testing it all.
Below is a walk-thru of an example script.
Functionality¶
The script should verify that the name of a time series adheres to the MIKE INFO Rules and Procedures:
-
The name must consist of 5 parts separated by underscore (“_”)
-
The first part must be one of the country codes
-
All uppercase
-
Exactly 3 characters long
-
One of the allowed country codes
-
-
Last part must be one of the allowed interval types (“Daily”, “Monthly”, or “Hourly”)
-
Second last part should be an allowed data type (“Q”, “P”, “T”, or “WL”)
The script should check all and write individual errors if it does not comply.
Coding¶
A function called ValidateTimeSeriesName was coded, see Figure 2. The script takes only the default the entity path and entity arguments. The green piece of text is the standard way in MIKE OPERATIONS for defining the name and type of arguments to a script – and the ReturnType, which in this case is the ResultInformation.
The first two lines define the result and assigns the entity to it – returning the same entity as was checked.
Then follows three lines defining arrays of allowed country code, time units and data types. It is followed by 5 different checks of the name of the data series (entity). For each of the “if” statements the error is created by using the result.AddErrorMessage call, providing the Caller (\~name of the script), Indentation (\~the level of the call – set to 1. If subroutines were used, they could have higher numbers) and Message.
Finally the result is returned.
DataQC will then interpret the result.
Figure 3 Sample validation script
Using¶
Once the method is tested (see below) it can be used from the spreadsheet holding the configuration, for instance with a configuration as shown in Figure 1.
Testing¶
Testing happens at different levels
-
Simple script debugging.
This involves running a script which prepares a data series to be tested. It should preferably have something wrong with the same which triggers the 5 errors. It should also be validates that a proper name validates OK.Figure 4 shows how a simple test can be carried out by making a test script to start the validation script. The data series is made in memory, i.e. no data is required in the database to test the names. Stepping through the code in the Script Debugger allows inspection of the calls and results.
-
Testing the configuration via Script Manager
Once the spreadsheet is configured the DataQC.Validate call can be tested also using scripting, thus controlling which entities should be tested.
Figure 5 shows an example of testing 5 different time series in the database. Each of them is fetched in the database, DataQC.Validate is called and the returned ResultInformation is printed to the screen.
The debug result would look similar to Figure 4
-
Using MIKE INFO data exchange
Once the configuration seem to give satisfactory results, it can be tested with the Data Exchange form. The Dump selection should be set such that it captures the testing time series and the result should reflect the same output as the debugger above.
This requires MIKE INFO to be configured to using the spreadsheet configuration, see Setting up DataQC configuration.
Figure 4 Debug result
Figure 5 Validation test routine
Figure 6 DataQC.Validation test routine
Use DataQC in Data Exchange¶
See Data Exchange.
Flags and Reports¶
Flags and Reports is a setting that enables the system administrator to configure data flags and get badge reporting. This includes the following two key functionalities:
Flags: This is a feature where flags can be configured (created, edited or deleted).
Reports: Visualise a report on badge achievements with information on gauge reader, time series frequency, last value and deviation to automatically collected data (if applicable)
Flags¶
Note: The following steps may need to be completed before the ‘Flags and Reports’ button is visible in MIKE INFO:
-
In Windows Explorer, go to the directory where the various files are stored for Workbench (i.e.: C:\Program Files (x86)\DHI\2019\MIKE OPERATIONS). Open the files called “IMSConfiguration.xml” in notepad.
-
Under the Flags and Reports option, enable the menus to be visible by change the text from “false” to “true” for the relevant menus to display. Save and close the file.
When opening MIKE INFO after the above changes to the xml file have been changed, the main Flags and Reports menu, along with their sub menus should be visible.
When selecting the Flags buttonin ‘Flags and Reports’, a dialog appears, where one can specify flag descriptions for time series values. Once the user creates a flag and its associated properties in this window, a time series would need to be selected from the station data window, and then ‘Update time series’, for the user to assign flags to various time series values.
When updating time series, a dialog appears where the user can ‘Edit Flags’ by selecting the descriptions from the created flag that corresponds to particular time series value/s. Once editing is finished, ‘Save changes’ and then ‘Stop Edit’ is clicked, then the user can display the flags on the time series chart. The user also has the option to display flag data for time series in a tabular view.
Creating Flags¶
-
Click on ‘Settings’ tab at the top-left of the screen. The default view should be ‘Map Settings’.
-
Click on ‘Flags and Reports’ (under ‘Tool settings’).
-
Two additional options should appear on the right, namely ‘Flags’ and ‘Badges’. Click on ‘Flags’.
-
A new window should appear, called ‘Flags’. This may or may not be empty, where a new flag description can be created.
-
Click the ‘Create’ flag button.
-
A new window should appear, where the name of the new flag type must be specified. Insert a name for the new flag (e.g., ‘Outliers’) then click ‘OK’.
-
The name for the newly created flag should appear. Insert some values, descriptions and colours for the newly created flag type. (e.g., pass, fail, N/A). Once finished, click ‘Update’ (see figure below).
-
The new flag descriptions should be saved under the flag category in the list on the left of the ‘Flag’ window.
Visualization and Editing Flags¶
For a time series dataset that has a flag associated with it, it is possible to show the flags within the chart and table views. An example of this is seen below, where unusually high or low values are flagged as ‘Fail’, to later undergo quality control checks to see if these values are accurate or not. Comments can be included for any of these flagged values under the Comments column (if a Comments column has been added).
Editing Flags¶
This section will explain how to edit flags for time series data.
-
Select a station that has associated time series.
-
Select the time series, then click on the ‘Data’ tab and then ‘Update time series’.
-
Then click on the ‘Edit Flags’ button (below the tabular view of the time series)
-
You should now see a table, with the time and time series values, but additionally the name of the Flag that was created (i.e. ‘Outliers); as seen below.
-
Here you can edit the time series values and the flags.
Next to an obvious outlier (potentially caused by a gauge error), click on the drop-down arrow under your flags column (i.e., ‘Outliers’), and select ‘Fail’. (see figure below). Comments can be added to any of the time series records if required, under the ‘Comments’ column (if present). -
Click on ‘Save changes’, then ‘Stop Edit’. The edit flags window closes, and the ‘Update time series’ tab is now in view.
-
To verify that the flags were edited correctly, click on the ‘Show Flags’ button, then select the flag you created. The outlier that you edited should now appear in the colour you specified at the particular time series values. In this case, the values of 5 has been flagged as fail outliers (see figure below).
-
To now show these flags in a normal chart view (i.e., not in the Update time series view), close the ‘Update time series’ tab, then plot a new chart for the time series you’ve used to edit the flags.
-
Under the Chart tab, click on the ‘Show flags’ dropdown menu, then select the name of your flag (i.e., ‘Outliers’), and the flag to show.
-
Your flag will now be plotted on the chart (see figure below), where you have the option to copy, save or export the chart.
To remove the flags from the chart, follow the following steps:
-
With your chart with flags plotted on it still in view, click on the ‘Show flags’ dropdown menu (under the Chart tab).
-
Untick the name of the flag being displayed.
-
The flags will now be removed from the chart, where only the chart time series is plotted.
Alternative ways of editing flags¶
Once flags have been created within a MIKE INFO database, there are several methods of importing time series that already have flag-related data associated with them. These include:
-
Importing time series and flags within the same file;
-
Importing time series in one file and the corresponding flags in a separate file, and;
-
Importing time series with a corresponding mapping file.
It is important that the time series and flags are in the correct format. Examples of time series and flags in the same file (Figure 6), time series and flags in different files (Figure 7) and time series with a corresponding mapping file (Figure 8) are shown below.
Figure 6: Time series and flags in the same file.
Figure 7: Time series (left) and flags (right) in different files
Figure 8: Time series (left) and corresponding mapping file (right).
Reports (Badges)¶
When selecting the Reports (Badges) button in ‘Flags and Reports’, a dialog appears, where one generate a summary overview report of all the time series in the database. This type of information includes the gauge reader, which station the time series is associated with, the time series variable, frequency, last value, deviation to automatically collected data (if applicable), number of submissions and consecutive days.
This report will be exported in an Excel format. Furthermore, the start and end dates can be specified to allow the period for which the report will be generated.
Creating badge reports¶
-
Click on ‘Settings’ (top left corner), then the default view is Map settings.
-
Click on ‘Flags and Reports’. Two additional options should appear on the right, namely ‘Flags’ and ‘Badges’. Click on ‘Badges’. A new window should appear, called ‘Badge Report’.
-
With the ‘Badge Report’ window in view, there should be a table summary of the various gauges being used to record time series.
-
Depending on the last time the table was updated, click the ‘Update report’ button to refresh the badge table.
-
If required, select a start and end date (see figure below) that coincides with the period that the gauges have been recording data for
-
Click on the ‘Export’ button. A new window should appear, allowing a folder directory to be chosen on your PC for where the badge report will be exported to in Excel format. Select an appropriate folder, then click ‘Save’.
-
The badge report will now be found in the folder directory specified.
-
Close the Badge Report window once the export process is complete.