Model 111BayesianTempModel-10

Calculates the property 111BayesianTempModel-10.

The document contains these additional sections of information:
Cross-validation Results
Enrichment Results
Percentile Results
Category Statistics Results
Non-validated Models Results
Training Data Information
Model Construction Information
Model Construction Parameters

Leave-one-out Cross-Validation Results

This model was built using 5159 samples, and validated using a leave-one-out cross-validation. Each sample was left out one at a time, and a model built using the results of the samples, and that model used to predict the left-out sample. Once all the samples had predictions, a ROC plot was generated, and the area under the curve (XV ROC AUC) calculated.

Best Split was calculated by picking the split that minimized the sum of the percent misclassified for category members and for category nonmembers, using the cross-validated score for each sample. Using that split, a contingency table is constructed, containing the number of true positives (TP), false negatives (FN), false positives (FP), and true negatives (TN).

OutputXV ROC AUCBest SplitTP/FN
FP/TN
# in Category
111BayesianTempModel-10 0.843-1.0742094/660
491/1914
2754

Enrichment Results

Back to Top

This model was built using 5159 samples, and validated using a leave-one-out cross-validation. Each sample was left out one at a time, and a model built using the results of the samples, and that model used to predict the left-out sample. Once all the samples had predictions, an enrichment plot was generated, and the percentage of true category members captured at a particular percentage cutoff. (For example, in a column labeled "1%" would be the percentage of true category members (e.g., actives) that were found in the top 1% of the list, when sorted by the model score.)

This table shows the output name, the percentage of samples that are in that particular category, the number of category members, and the percentage of true members found. Percentages that are less than 100% are in bold.

Output
Category %
1%5%10%25%50%75%90%95%99%
111BayesianTempModel-10
53.382%
1.8%8.9%17.8%42.7%75.9%92.3%97.9%99.1%99.9%

Percentile Results

Back to Top

This table shows, for each model, the cutoff needed to capture a particular percentage of the good samples. For each cutoff, it shows below the estimated percentages of false positives and true negatives for the non-good samples. This table is designed to help you pick the cutoff value that best balances your desire to capture as many good samples as possible, while keeping the number of false positives at a minimum.

The rates shown in this table are estimates derived from the cross-validated data; the actual numbers you would find on your own data may vary.

Cutoffs which lead to 10% or greater false positives are displayed in bold for ease of identification.

Model Name99%95%90% 70%50%30%10%5%1%
111BayesianTempModel-10-16.420
70%/30%
-10.283
55%/45%
-6.952
47%/53%
-3.270
38%/62%
4.182
22%/78%
11.634
11%/89%
15.315
7%/93%
18.647
5%/95%
24.783
2%/98%

Category Statistics Results

Back to Top

This table shows, for each category, statistics derived from the cross-validated predictions of the model built for that category as applied to members of that category and non-members of that category. For each group, the number of members/nonmembers (N) is given; the mean prediction for each subset (Mean); and the estimate standard deviation of the predictions for each subset (StdDev).

(Categories with one or no members do not have a mean and standard deviation, as there are too few predictions upon which to base them during cross-validation. Also, occasionally categories may contain many duplicate or highly-similar compounds which predict close or identical values, causing them to have unusually low standard deviation values. These low values may be adjusted at time of use of these standard deviations for predicting, for example, percentile results.)

Output
Category
N
Category
Mean (±StdDev)
Noncategory
N
Noncategory
Mean (±StdDev)
111BayesianTempModel-10
2754
4.18 (±8.77)
2405
-8.53 (±15.84)

Non-validated Models Results

Back to Top

All categories contained enough samples for cross-validation.

Training Data Information

Back to Top

The data used to train the model consisted of 5159 samples. The following are the statistics for the independent (X) properties.

PropertyMinMaxMeanStd. Dev.
ECFP_14N/AN/AN/AN/A
Apol372.51.0942e+0059732.35212.6
Num_H_Donors0481.08321.6773
Num_Rings0122.0881.65
Wiener15.0715e+0051097.38329.7

The test to identify "good" samples is:

property("Class") is defined AND property("Class") = 1

You can extend this model by adding your own training data to it to create a new model. Use the New Model from Old component to do this. The new training samples must contain the properties as specified above (except that they need not contain properties that can be calculated on-demand). The "good" samples must be marked so that they can be identified by the above test. Because the original training data were not saved with this model, you will not be able to compute cross-validation statistics for the new model.

Model Construction Information

Back to Top

Post-processing was performed to remove low-information bins. Low-information bins are those that have: normalized estimates in the range [-0.05, 0.05].

For each property, the following table gives the original number of bins (Original), the number removed due to too few samples (TooFew), the number removed due to a poor normalized estimate (Noninformative), and the final number of bins saved in the model (Final).

PropertyOriginalTooFewNoninformativeFinal
ECFP_1484655021284443
Apol110011
Num_H_Donors5032
Num_Rings6015
Wiener10019

Model Construction Parameters

Back to Top

The following parameter values were specified by the learner component. Some items are internal parameters not exposed by the component. In the course of building the model, certain values may have been adjusted from the values shown below.

ParameterValue
LearnedPropertyName111BayesianTempModel-10
TestForGoodproperty("Class") is defined AND property("Class") = 1
UsePropertiesUserSet
PredefinedSetALogP, Molecular_Weight, Num_H_Donors, Num_H_Acceptors, Num_RotatableBonds, Molecular_FractionalPolarSurfaceArea, ECFP_6
UserSetECFP_14,Apol,Num_H_Donors,Num_Rings,Wiener
IgnoreProperties
Additional Options
NumberOfBins10
Learn OptionsValidate Models, Remove Uninformative Bins, Equipopulate Bins
Numeric Distance FunctionEuclidean
Numeric ScalingMean-Center and Scale, Scale by Number of Dimensions
Fingerprint Distance FunctionTanimoto
Model Domain FingerprintFCFP_2
DestinationFolderAdministrator/LearnedProperties
Post-Processing Scriptresize(#op, 4); #op[1] := 'NormalizedProbability'; #op[2] := 'Enrichment'; #op[3] := 'EstPGood'; #op[4] := 'Prediction'; SetParam('Output Options',#op);
DuplicationEstimate1.0
GoodDuplicationEstimate1.0
Additional Properties