.. _online-predict: .. |br| raw:: html
Online-predict ============== The online-predict program performs predictions on individual molecules and/or files of molecules, just as the :ref:`predict ` program. The difference is that no models need to be trained before hand, but instead the predictions are done in an online fashion, training models on the fly. .. contents:: Table of Contents :depth: 3 :backlinks: top Parameters ---------- The full usage menu can be retrieved by running command: .. code-block:: text > java -jar cpsign-[version].jar online-predict online-predict SYNOPSIS ------------------------------------------------------------------------------------------ online-predict [options] online-predict @/tmp/runconfigs/parameters.txt [options] online-predict @C:\Users\User\runconfigs\parameters.txt [options] DESCRIPTION ------------------------------------------------------------------------------------------ Train and predict new examples on the fly. Will not save the model. Currently only available for TCP. OPTIONS ------------------------------------------------------------------------------------------ Input: -mi | --model-in [URI | path] Precomputed CPSign classification model -td | --train-data [URI | path] File with molecules in SMILES, SDF or JSON format (used for deriving the predictive model) -sm | --smiles [SMILES] SMILES string to predict, can optionally include a blank space and a molecule name/identfier -p | --predict-file [format] [opt args] [URI | path] File to predict. Accepted formats are SMILES, SDF or JSON -e | --endpoint [text] Endpoint property that should be used for modeling (the endoint of the model) -l | --labels [label label] Label(s) for endpoint values in classification mode. More info can be found running "explain labels" Modeling: -i | --impl [id | text] Scoring algorithm (i.e. underlying machine learning implementation): (1) LibLinear (2) LibSvm (3) ProbabilisticLibSvm Default: 1 --cost [number] User defined Cost value in SVM training Default: 50.0 --gamma [number] User defined Gamma value in SVM training (only used in libsvm) Default: 0.002 --epsilon [number] User defined tolerance of termination criterion Default: 0.001 --epsilon-svr [number] User defined epsilon in loss function of epsilon-SVR Default: 0.1 --nonconf-measure [text] Nonconformity measure that should be used, see documentation for clarifications. Run "explain ncm" to get further information Options (Classification): (11) NegativeDistanceToHyperplane (12) PositiveDistanceToHyperplane (13) ProbabilityEstimates (Only for ProbabilisticLibSVM - slower to compute) Default: 11 --percentiles [integer] The maximum number of molecules used for calculating percentiles. This will only be used in case image-generation should performed. Default: 1000 Signature generation: -hs | --height-start [integer] Signatures start height Default: 1 -he | --height-end [integer] Signatures end height Default: 3 -sg | --signatures-generator [id | text] Type of signatures that should be used, note that stereo-signatures take much longer time to compute. Stereo signatures also requires input data to have stereo information explicitly given in the file. Options: (1) default | normal (2) stereo (experimental mode) Default: 1 Data manipulation: --duplicates [id | text] Resolve/remove potential duplicates which can make it difficult for the SVM to find a good decision plane. Replace duplicates by a single record with a new label or remove all conflicting records. Regression options: (1) median (2) mean (3) min (4) max (5) remove:[maximum allowed difference] Classification options: (5) remove (6) vote (7) keep:[label] --filters [id | text] Filters to apply on the records, currently only filters records based on the endpoint value for regression. Options: (1) min:[min] (2) max:[max] (3) range:[min]:[max] Prediction: -co | --confidences [confidence confidence .. ] Confidences for predictions (e.g. '0.5,0.7,0.9' or '0.5 0.7 0.9'). Should be in the range [0,1] -cg | --calculate-gradient Calculate the Significant Signature of molecules Output: -of | --output-format [id | text] Output format of predictions, options: (1) json (2) smiles | plain (3) sdf | sdf-v2000 (4) sdf-v3000 Default: 1 -o | --output [path] File to write output to (default is printing to screen) --output-inchi Generate InChI and InChIKey in the output --compress If the outputfile should be compressed (only possible when writing to file) Encryption: Gradient image output: -gi | --gradient-images Create a Gradient image for each predicted molecule. -if | --image-file [path] Path to where generated images should be saved, can either be a path to a specific folder or a full path including a file name (only .png file ending supported). Every image will be named '[name]-[count].png' or '[name]-[$cdk:title].png' where name is either a default name or the specified name to this parameter (e.g. '.' - current folder using default file name, '/tmp/imgs/DefaultImageName.png' - use /tmp/imgs/ as directory and use 'DefaultImageName' as file name) Default: imgs/GradientDepiction.png -cs | --color-scheme [text] The specified color-scheme (case in-sensitive), options: (1) blue:red (2) red:blue (3) red:blue:red (4) cyan:magenta (5) rainbow custom - contact Aros Bio for custom requirements! Default: 1 --color-legend Add a color legend at the bottom of the image --atom-numbers Depict atom numbers --atom-number-color [color name] | [hex color] Color of the atom numbers Default: BLUE -ih | --image-height [text] The height of the generated images (in pixels) Default: 400 -iw | --image-width [integer] The width of the generated images (in pixels) Default: 400 Significant Signature image output: -si | --signature-images Create a Significant Signature image for each predicted molecule -sf | --signature-image-file [path] Path to where generated images should be saved, can either be a path to a specific folder or a full path including a file name (only .png file ending supported). Every image will be named '[name]-[count].png' or '[name]-[$cdk:title].png' where name is either a default name or the specified name to this parameter (e.g. '.' - current folder using default file name, '/tmp/imgs/DefaultImageName.png' - use /tmp/imgs/ as directory and use 'DefaultImageName' as file name) Default: imgs/SigificantSignatureDepiction.png -hc | --highlight-color [color name] | [hex color] The color that should be used for the highlighting of the significant signature Default: BLUE --signature-color-legend Add a color legend at the bottom of the image --signature-atom-numbers Depict atom numbers --signature-atom-number-color [color name] | [hex color] Color of the atom numbers Default: BLUE -sh | --signature-image-height [text] The height of the generated images (in pixels) Default: 400 -sw | --signature-image-width [integer] The width of the generated images (in pixels) Default: 400 General: * --license [URI | path] Path or URI to license file -h | --help | man Get help text --short Use shorter help text (used together with the --help argument) --logfile [path] Path to a user-set logfile, will be specific for this run --silent Silent mode (only print output to logfile) --echo Echo the input arguments given to CPSign --seed [integer] Set this flag if an explicit RNG seed should be used in tasks that require a RNG (randomization of training data, splitting in cross-validation, learning algorithms etc). Not used by all programs. --progress-bar Add a Progress bar in the system error output --progress-bar-ascii Add a Progress bar in ASCII in the system error output --time Print wall-time for all individual steps in execution ------------------------------------------------------------------------------------------ The list of parameters are even larger than the one for :ref:`predict ` as there are more input options, options for signature generation and modeling. Once again we can retrieved parameters by section individually, using for instance: .. code-block:: text > java -jar cpsign-[version].jar online-predict input online-predict ------------------------------------------------------------------------------------------ Input: -mi | --model-in [URI | path] Precomputed CPSign classification model -td | --train-data [URI | path] File with molecules in SMILES, SDF or JSON format (used for deriving the predictive model) -sm | --smiles [SMILES] SMILES string to predict, can optionally include a blank space and a molecule name/identfier -p | --predict-file [format] [opt args] [URI | path] File to predict. Accepted formats are SMILES, SDF or JSON -e | --endpoint [text] Endpoint property that should be used for modeling (the endoint of the model) -l | --labels [label label] | [label,label] Label(s) for endpoint values in classification mode. More info can be found running "explain labels" Examples Usage -------------- TCP classification with chemical input data: .. code-block:: bash > java -jar cpsign-[version].jar online-predict \ --license /path/to/Standard-license.license \ --smiles O=Cc1ccc(O)c(OC)c1 \ --endpoint "Ames test categorisation" \ --labels mutagen,nonmutagen \ --time \ --percentiles 0 \ --train-data sdf data/ames_small.sdf.gz Running with Standard License registered to [Name] at [Company]. Expiry date is [Date] Reading train file and performing signature generation.. Successfully parsed 123 molecules. Detected labels: 'mutagen'=64, 'nonmutagen'=59. Generated 1930 new signatures. (1 s) Training TCP predictor.. Finished (0 s) Starting to do predictions.. { "prediction": { "pValues": { "nonmutagen": 0.204, "mutagen": 0.0 } }, "molecule": { "SMILES": "O=Cc1ccc(O)c(OC)c1" } } Successfully predicted 1 molecule (0 s) Parameters are fairly consistent with a mix of parameters for :ref:`train ` and :ref:`predict `, apart for missing arguments for choosing predictor type as only TCP-classification is available.