Signatures Predictor¶
A Signatures Predictor are a wrapper put on top of a Predictor, adding useful functionality for handling predictions made for chemical compounds described with the signatures descriptor [2].
Table of Contents
Instantiation¶
There is one wrapper class for each Predictor type, instantiation is done either by CPSignFactory
or the constructors of each class:
CVAP :
SignaturesVAPClassification
class orCPSignFactory.createSignaturesVAPClassification
ACP Classification :
SignaturesCPClassification
class orCPSignFactory.createSignaturesCPClassification
ACP Regression :
SignaturesCPRegression
class orCPSignFactory.createSignaturesCPRegression
TCP Classification :
SignaturesCPClassification
class orCPSignFactory.createSignaturesCPClassification
Loading data & Predict¶
Once instantiated, the Signatures-wrapper object offers a way for loading training data from a set of different
file-types using the fromMolsIterator
methods.
These methods are only accessible with the Standard or Pro licenses. There are two types of iterators that can be given to these methods,
either an Iterator<IAtomContainer>
that also requires the endpoint to be given, which should then be stored as a property in the
IAtomContainer
or an Iterator<Pair<IAtomContainer,Double>>
that give an IAtomContainer with it’s associated endpoint value directly.
The iterator can be of any origin, whether it be a database or some other file format of your preference. For convenience we have the
classes SDFile
, CSVFile
and JSONFile
that allows you to get the iterator in an easy fashion. Note that you in this way can load data from
multiple files, simply by calling fromChemFile
or fromMolsIterator
once for each file/data source.
CPSign can in this way merge multiple datasources, from multiple formats.
// From an SDFile
List<String> labels = Arrays.asList("0", "1");
String endpoint = "class";
predictor.fromMolsIterator(new SDFile(sdf).getIterator(), endpoint, new NamedLabels(labels));
// From CSV-file
CSVFile csv = new CSVFile(csv);
csv.setDelimiter(delim);
String endpointHeaderColumn = "target";
predictor.fromMolsIterator(csv.getIterator(), endpointHeaderColumn, new NamedLabels(labels));
CPSign version 0.6.0 introduced the possibility to use partitions of data exclusively for either training of models (proper training)
or for calibration. This is handled at the API level by introducing the Dataset.java
class that holds a single dataset
and the Problem.java
class now holds three datasets; dataset, calibrationExclusive and modelingExclusive. These can be manipulated
directly if one would like to do so, or if the datasets are kept in separate files that is solved by calling the fromMolsIterator
methods
with an extra argument that takes the enum RecordType
as such:
// Use records in molsIterator for only calibration set
predictor.fromMolsIterator(molsIterator, RecordType.CALIBRATION_EXCLUSIVE);
Saving and loading predictor models¶
Both the precomputed data and the finished trained predictor can be of interest to save. The precomputed
data can be saved in case it is desired to train different predictors, possibly using different scoring
implementations or parameters. The trained predictor model can be used for later predictions and
be distributed to partners etc. Precomputed models can be saved through the ModelCreator
class,
whereas the trained predictors can be saved both using the ModelCreator
class and calling the save()
method of the Signatures wrapper class.
Image generation¶
To get visual results from the predictions (i.e. of the significant signature), please refer to the Image rendering page.