Validate¶
The validate program performs a validation off a trained model. The predictor can thus be evaluated to see how good it performs on unknown data.
Table of Contents
Parameters¶
The full usage menu can be retrieved by running command:
> java -jar cpsign-[version].jar validate
validate
SYNOPSIS
------------------------------------------------------------------------------------------
validate [options]
validate @/tmp/runconfigs/parameters.txt [options]
validate @C:\Users\User\runconfigs\parameters.txt [options]
DESCRIPTION
------------------------------------------------------------------------------------------
Use a test-file with existing (true) labels to validate a Predictor. The normal
execution will only report overall statistics, but it is possible to print all predicted
result to json, smiles or sdf file format.
OPTIONS
------------------------------------------------------------------------------------------
Input:
* -mi | --model-in [URI | path]
Trained CPSign model
* -p | --predict-file [format] [opt args] [URI | path]
File to use for validation. Accepted formats are CSV, SDFfile or JSON.
-ve | --validation-endpoint [text]
(SDFile) Name of field with true label, should match a property in the predict file
(CSV) Name of the column to use for validation, should match header of that column
(JSON) JSON-key for the property with the true response value
Validation:
-co | --confidences [confidence confidence .. ]
(ACP/TCP) Confidences for predictions (e.g. '0.5,0.7,0.9' or '0.5 0.7 0.9'). Should
be in the range [0,1]
Default: 0.8
Output:
-rf | --result-format [id | text]
Output format, options:
(1) json
(2) text | plain
(3) CSV
(4) TSV
Default: 2
--roc
Output the ROC curve (VAP only), the ROC curve has many points and lead to verbose
output. Default is to only print the AUC score
--print-predictions
Print the prediction output in json/csv/sdf format (default is only printing
overall statistics)
-of | --output-format [text]
Output format of predictions (only applicable if the --print flag is given),
options:
(1) json
(2) smiles | plain
(3) sdf | sdf-v2000
(4) sdf-v3000
Default: 1
-o | --output [path]
File to write prediction output to (default is printing to screen). Giving this
parameter sets the --print flag to true
--output-inchi
Generate InChI and InChIKey in the output
--compress
If the outputfile should be compressed (only possible when writing to file)
Encryption:
General:
* --license [URI | path]
Path or URI to license file
-h | --help | man
Get help text
--short
Use shorter help text (used together with the --help argument)
--logfile [path]
Path to a user-set logfile, will be specific for this run
--silent
Silent mode (only print output to logfile)
--echo
Echo the input arguments given to CPSign
--seed [integer]
Set this flag if an explicit RNG seed should be used in tasks that require a RNG
(randomization of training data, splitting in cross-validation, learning algorithms
etc). Not used by all programs.
--progress-bar
Add a Progress bar in the system error output
--progress-bar-ascii
Add a Progress bar in ASCII in the system error output
--time
Print wall-time for all individual steps in execution
------------------------------------------------------------------------------------------
Example Usage¶
Example (ACP Classification):
> java -jar cpsign-[version].jar validate \
--license /path/to/Standard-license.license \
--validation-endpoint "Ames test categorisation" \
-p sdf /path/to/validatefile.sdf \
-co 0.7 0.8 0.9 \
-mi /path/to/model.cpsign
Running with Standard License registered to [Name] at [Company]. Expiry
date is [Date]
Loading model..
Loaded an ACP classification predictor with 2 aggregated models. Model has been trained
from 123 training examples. The model endpoint is 'Ames test categorisation'. Class labels
are 'nonmutagen' and 'mutagen'.
Starting to perform validation..
- Predicted 100/126 molecules
Successfully predicted 126 molecules
==========================================================================================
Validation result for confidence level set to 0.7:
- Accuracy: 0.976
- Single label predictions: 0.992
- Double label predictions: 0.008
- Mean classification confidence: 0.963
- Mean classification credibility: 0.765
Validation result for confidence level set to 0.8:
- Accuracy: 0.984
- Single label predictions: 0.905
- Double label predictions: 0.095
- Mean classification confidence: 0.963
- Mean classification credibility: 0.765
Validation result for confidence level set to 0.9:
- Accuracy: 0.992
- Single label predictions: 0.849
- Double label predictions: 0.151
- Mean classification confidence: 0.963
- Mean classification credibility: 0.765
In this case we validated the results given the same input file as the model was trained of, so the results are much better than expected, producing accuracies much higher than the ones asked for. In the case you are validating the results using a non-seen validation set the accuracies should be close to desired confidence levels.