simscreen
Performs a fast fingerprint-based similarity screening of molecule databases.
Synopsis
simscreen [-hVvpMTugSNGFBYt] [-c arg] [-l arg] [-o arg] [-r arg] [-m arg] [-f arg] [-e arg] [-b arg] [-n arg] [-x arg] [-P arg] [-Q arg] [-D arg] [-O arg] [–tversky-weight-a arg] [–tversky-weight-b arg] [–ecfp-size arg] [–ecfp-radius arg] [–ecfp-inc-H arg] [–ecfp-inc-chirality arg] [–daylight-size arg] [–daylight-min-path-len arg] [–daylight-max-path-len arg] [–daylight-inc-H arg] [–pharm-2d-size arg] [–pharm-2d-min-tuple-size arg] [–pharm-2d-max-tuple-size arg] [–pharm-2d-bin-size arg] [–pharm-3d-size arg] [–pharm-3d-min-tuple-size arg] [–pharm-3d-max-tuple-size arg] [–pharm-3d-bin-size arg] -q arg -d arg
Mandatory options
-q [ –query ] arg
The query molecule input file.
- Supported Input Formats:
JME Molecular Editor String (*.jme)
MDL Structure-Data Format (*.sdf, *.sd)
MDL Molfile (*.mol)
Daylight SMILES Format (*.smi)
Daylight SMARTS Format (*.sma)
IUPAC International Chemical Identifier (*.inchi, *.ichi)
Native CDPL Format (*.cdf)
Tripos Sybyl MOL2 Format (*.mol2)
Atomic Coordinates XYZ Format (*.xyz)
Chemical Markup Language Format (*.cml)
GZip-Compressed MDL Structure-Data Format (*.sdf.gz, *.sd.gz, *.sdz)
BZip2-Compressed MDL Structure-Data Format (*.sdf.bz2, *.sd.bz2)
GZip-Compressed Native CDPL Format (*.cdf.gz)
BZip2-Compressed Native CDPL Format (*.cdf.bz2)
GZip-Compressed Daylight SMILES Format (*.smi.gz)
BZip2-Compressed Daylight SMILES Format (*.smi.bz2)
GZip-Compressed Tripos Sybyl MOL2 Format (*.mol2.gz)
BZip2-Compressed Tripos Sybyl MOL2 Format (*.mol2.bz2)
GZip-Compressed Atomic Coordinates XYZ Format (*.xyz.gz)
BZip2-Compressed Atomic Coordinates XYZ Format (*.xyz.bz2)
GZip-Compressed Chemical Markup Language Format (*.cml.gz)
BZip2-Compressed Chemical Markup Language Format (*.cml.bz2)
Brookhaven Protein Data Bank Entry Format (*.pdb, *.ent)
Macromolecular Transmission Format (*.mmtf)
Macromolecular Crystallographic Information File Format (*.mmcif, *.cif)
GZip-Compressed Brookhaven Protein Data Bank Entry Format (*.pdb.gz, *.ent.gz)
BZip2-Compressed Brookhaven Protein Data Bank Entry Format (*.pdb.bz2, *.ent.bz2)
GZip-Compressed Macromolecular Transmission Format (*.mmtf.gz)
BZip2-Compressed Macromolecular Transmission Format (*.mmtf.bz2)
GZip-Compressed Macromolecular Crystallographic Information File Format (*.mmcif.gz, *.cif.gz)
BZip2-Compressed Macromolecular Crystallographic Information File Format (*.mmcif.bz2,*.cif.bz2)
Pharmacophore Screening Database Format (*.psd)
-d [ –database ] arg
The molecule database file to screen.
- Supported Input Formats:
JME Molecular Editor String (*.jme)
MDL Structure-Data Format (*.sdf, *.sd)
MDL Molfile (*.mol)
Daylight SMILES Format (*.smi)
Daylight SMARTS Format (*.sma)
IUPAC International Chemical Identifier (*.inchi, *.ichi)
Native CDPL Format (*.cdf)
Tripos Sybyl MOL2 Format (*.mol2)
Atomic Coordinates XYZ Format (*.xyz)
Chemical Markup Language Format (*.cml)
GZip-Compressed MDL Structure-Data Format (*.sdf.gz, *.sd.gz, *.sdz)
BZip2-Compressed MDL Structure-Data Format (*.sdf.bz2, *.sd.bz2)
GZip-Compressed Native CDPL Format (*.cdf.gz)
BZip2-Compressed Native CDPL Format (*.cdf.bz2)
GZip-Compressed Daylight SMILES Format (*.smi.gz)
BZip2-Compressed Daylight SMILES Format (*.smi.bz2)
GZip-Compressed Tripos Sybyl MOL2 Format (*.mol2.gz)
BZip2-Compressed Tripos Sybyl MOL2 Format (*.mol2.bz2)
GZip-Compressed Atomic Coordinates XYZ Format (*.xyz.gz)
BZip2-Compressed Atomic Coordinates XYZ Format (*.xyz.bz2)
GZip-Compressed Chemical Markup Language Format (*.cml.gz)
BZip2-Compressed Chemical Markup Language Format (*.cml.bz2)
Brookhaven Protein Data Bank Entry Format (*.pdb, *.ent)
Macromolecular Transmission Format (*.mmtf)
Macromolecular Crystallographic Information File Format (*.mmcif, *.cif)
GZip-Compressed Brookhaven Protein Data Bank Entry Format (*.pdb.gz, *.ent.gz)
BZip2-Compressed Brookhaven Protein Data Bank Entry Format (*.pdb.bz2, *.ent.bz2)
GZip-Compressed Macromolecular Transmission Format (*.mmtf.gz)
BZip2-Compressed Macromolecular Transmission Format (*.mmtf.bz2)
GZip-Compressed Macromolecular Crystallographic Information File Format (*.mmcif.gz, *.cif.gz)
BZip2-Compressed Macromolecular Crystallographic Information File Format (*.mmcif.bz2, *.cif.bz2)
Pharmacophore Screening Database Format (*.psd)
Other options
-h [ –help ] [=arg(=SHORT)]
Print help message and exit (ABOUT, USAGE, SHORT, ALL or ‘name of option’, default: SHORT).
-V [ –version ]
Print version information and exit.
-v [ –verbosity ] [=arg(=VERBOSE)]
Verbosity level of information output (QUIET, ERROR, INFO, VERBOSE, DEBUG, default: INFO).
-c [ –config ] arg
Use file with program options.
-l [ –log-file ] arg
Redirect text-output to file.
-p [ –progress ] [=arg(=1)]
Show progress bar (default: true).
-o [ –output ] arg
Hit molecule output file.
- Supported Output Formats:
JME Molecular Editor String (*.jme)
MDL Structure-Data Format (*.sdf, *.sd)
MDL Molfile (*.mol)
Daylight SMILES Format (*.smi)
Daylight SMARTS Format (*.sma)
IUPAC International Chemical Identifier (*.inchi, *.ichi)
Native CDPL Format (*.cdf)
Tripos Sybyl MOL2 Format (*.mol2)
Atomic Coordinates XYZ Format (*.xyz)
Chemical Markup Language Format (*.cml)
GZip-Compressed MDL Structure-Data Format (*.sdf.gz, *.sd.gz, *.sdz)
BZip2-Compressed MDL Structure-Data Format (*.sdf.bz2, *.sd.bz2)
GZip-Compressed Native CDPL Format (*.cdf.gz)
BZip2-Compressed Native CDPL Format (*.cdf.bz2)
GZip-Compressed Daylight SMILES Format (*.smi.gz)
BZip2-Compressed Daylight SMILES Format (*.smi.bz2)
GZip-Compressed Tripos Sybyl MOL2 Format (*.mol2.gz)
BZip2-Compressed Tripos Sybyl MOL2 Format (*.mol2.bz2)
GZip-Compressed Atomic Coordinates XYZ Format (*.xyz.gz)
BZip2-Compressed Atomic Coordinates XYZ Format (*.xyz.bz2)
GZip-Compressed Chemical Markup Language Format (*.cml.gz)
BZip2-Compressed Chemical Markup Language Format (*.cml.bz2)
Brookhaven Protein Data Bank Entry Format (*.pdb, *.ent)
Macromolecular Transmission Format (*.mmtf)
Macromolecular Crystallographic Information File Format (*.mmcif, *.cif)
GZip-Compressed Brookhaven Protein Data Bank Entry Format (*.pdb.gz, *.ent.gz)
BZip2-Compressed Brookhaven Protein Data Bank Entry Format (*.pdb.bz2, *.ent.bz2)
GZip-Compressed Macromolecular Transmission Format (*.mmtf.gz)
BZip2-Compressed Macromolecular Transmission Format (*.mmtf.bz2)
GZip-Compressed Macromolecular Crystallographic Information File Format (*.mmcif.gz, *.cif.gz)
BZip2-Compressed Macromolecular Crystallographic Information File Format (*.mmcif.bz2, *.cif.bz2)
Pharmacophore Screening Database Format (*.psd)
-r [ –report ] arg
Report output file.
-m [ –mode ] arg
Specifies which kind of obtained results for the query/database molecule pairings are of interest (BEST_OVERALL, BEST_PER_QUERY, BEST_PER_QUERY_CONF, default: BEST_PER_QUERY).
-f [ –func ] arg
Function to use for molecule similarity/distance calculation and ranking operations (TANIMOTO_SIM, TVERSKY_SIM, COSINE_SIM, DICE_SIM, MANHATTAN_SIM, MANHATTAN_DIST, HAMMING_DIST, EUCLIDEAN_SIM, EUCLIDEAN_DIST, default: TANIMOTO_SIM)
-e [ –descr ] arg
Type of molecule descriptor to use for similarity/distance calculations (ECFP, DAYLIGHT, PUBCHEM, MACCS, PHARM_2D, PHARM_3D, default: ECFP)
-b [ –best-hits ] arg
Maximum number of best scoring hits to output (default: 1000).
-n [ –max-hits ] arg
Maximum number of found hits at which the screen will terminate (overrides the – best-hits option, default: 0 - no limit).
-x [ –cutoff ] arg
Similarity/distance cutoff value which determines whether an database molecule is considered as a hit (default: -1.0 -> no cutoff).
-M [ –merge-hits ] [=arg(=1)]
If true, identified hits are merged into a single, combined hit list. If false, a separate hit list for every query molecule will be maintained (default: false).
-T [ –split-output ] [=arg(=1)]
If true, for every query molecule a separate report and hit output file will be generated (default: true).
-u [ –output-query ] [=arg(=1)]
If specified, query molecules will be written at the beginning of the hit molecule output file (default: true).
-g [ –single-conf ] [=arg(=1)]
If specified, conformers of the database molecules are treated as individual single conformer molecules (default: false).
-S [ –score-sd-tags ] [=arg(=1)]
If true, similarity/distance score values will be appended as SD-block entries of the output hit molecules (default: true).
-N [ –query-name-sd-tags ] [=arg(=1)]
If true, the query molecule name will be appended to the SD-block of the output hit molecules (default: false).
-G [ –query-idx-sd-tags ] [=arg(=1)]
If true, the query molecule index will be appended to the SD-block of the output hit molecules (default: false).
-F [ –query-conf-sd-tags ] [=arg(=1)]
If true, the query molecule conformer index will be appended to the SD-block of the output hit molecules (default: true).
-B [ –db-idx-sd-tags ] [=arg(=1)]
If true, the database molecule index will be appended to the SD-block of the output hit molecules (default: false).
-Y [ –db-conf-sd-tags ] [=arg(=1)]
If true, the database molecule conformer index will be appended to the SD-block of the output hit molecules (default: true).
-P [ –hit-name-ptn ] arg
Pattern for composing the names of written hit molecules by variable substitution (supported variables: @Q@ = query molecule name, @D@ = database molecule name, @C@ = query molecule conformer index, @c@ = database molecule conformer index, @I@ = query molecule index and @i@ = database molecule index, default: @D@_@c@_@Q@_@C@).
-t [ –num-threads ] [=arg(=4)]
Number of parallel execution threads (default: no multithreading, implicit value: 4 threads, must be >= 0, 0 disables multithreading).
-Q [ –query-format ] arg
Allows to explicitly specify the format of the query molecule file by providing one of the supported file-extensions (without leading dot!) as argument.
- Supported Input Formats:
JME Molecular Editor String (*.jme)
MDL Structure-Data Format (*.sdf, *.sd)
MDL Molfile (*.mol)
Daylight SMILES Format (*.smi)
Daylight SMARTS Format (*.sma)
IUPAC International Chemical Identifier (*.inchi, *.ichi)
Native CDPL Format (*.cdf)
Tripos Sybyl MOL2 Format (*.mol2)
Atomic Coordinates XYZ Format (*.xyz)
Chemical Markup Language Format (*.cml)
GZip-Compressed MDL Structure-Data Format (*.sdf.gz, *.sd.gz, *.sdz)
BZip2-Compressed MDL Structure-Data Format (*.sdf.bz2, *.sd.bz2)
GZip-Compressed Native CDPL Format (*.cdf.gz)
BZip2-Compressed Native CDPL Format (*.cdf.bz2)
GZip-Compressed Daylight SMILES Format (*.smi.gz)
BZip2-Compressed Daylight SMILES Format (*.smi.bz2)
GZip-Compressed Tripos Sybyl MOL2 Format (*.mol2.gz)
BZip2-Compressed Tripos Sybyl MOL2 Format (*.mol2.bz2)
GZip-Compressed Atomic Coordinates XYZ Format (*.xyz.gz)
BZip2-Compressed Atomic Coordinates XYZ Format (*.xyz.bz2)
GZip-Compressed Chemical Markup Language Format (*.cml.gz)
BZip2-Compressed Chemical Markup Language Format (*.cml.bz2)
Brookhaven Protein Data Bank Entry Format (*.pdb, *.ent)
Macromolecular Transmission Format (*.mmtf)
Macromolecular Crystallographic Information File Format (*.mmcif, *.cif)
GZip-Compressed Brookhaven Protein Data Bank Entry Format (*.pdb.gz, *.ent.gz)
BZip2-Compressed Brookhaven Protein Data Bank Entry Format (*.pdb.bz2, *.ent.bz2)
GZip-Compressed Macromolecular Transmission Format (*.mmtf.gz)
BZip2-Compressed Macromolecular Transmission Format (*.mmtf.bz2)
GZip-Compressed Macromolecular Crystallographic Information File Format (*.mmcif.gz, *.cif.gz)
BZip2-Compressed Macromolecular Crystallographic Information File Format (*.mmcif.bz2, *.cif.bz2)
Pharmacophore Screening Database Format (*.psd)
This option is useful when the format cannot be auto-detected from the actual extension of the file (because missing, misleading or not supported).
-D [ –database-format ] arg
Allows to explicitly specify the format of the screening database file by providing one of the supported file-extensions (without leading dot!) as argument.
- Supported Input Formats:
JME Molecular Editor String (*.jme)
MDL Structure-Data Format (*.sdf, *.sd)
MDL Molfile (*.mol)
Daylight SMILES Format (*.smi)
Daylight SMARTS Format (*.sma)
IUPAC International Chemical Identifier (*.inchi, *.ichi)
Native CDPL Format (*.cdf)
Tripos Sybyl MOL2 Format (*.mol2)
Atomic Coordinates XYZ Format (*.xyz)
Chemical Markup Language Format (*.cml)
GZip-Compressed MDL Structure-Data Format (*.sdf.gz, *.sd.gz, *.sdz)
BZip2-Compressed MDL Structure-Data Format (*.sdf.bz2, *.sd.bz2)
GZip-Compressed Native CDPL Format (*.cdf.gz)
BZip2-Compressed Native CDPL Format (*.cdf.bz2)
GZip-Compressed Daylight SMILES Format (*.smi.gz)
BZip2-Compressed Daylight SMILES Format (*.smi.bz2)
GZip-Compressed Tripos Sybyl MOL2 Format (*.mol2.gz)
BZip2-Compressed Tripos Sybyl MOL2 Format (*.mol2.bz2)
GZip-Compressed Atomic Coordinates XYZ Format (*.xyz.gz)
BZip2-Compressed Atomic Coordinates XYZ Format (*.xyz.bz2)
GZip-Compressed Chemical Markup Language Format (*.cml.gz)
BZip2-Compressed Chemical Markup Language Format (*.cml.bz2)
Brookhaven Protein Data Bank Entry Format (*.pdb, *.ent)
Macromolecular Transmission Format (*.mmtf)
Macromolecular Crystallographic Information File Format (*.mmcif, *.cif)
GZip-Compressed Brookhaven Protein Data Bank Entry Format (*.pdb.gz, *.ent.gz)
BZip2-Compressed Brookhaven Protein Data Bank Entry Format (*.pdb.bz2, *.ent.bz2)
GZip-Compressed Macromolecular Transmission Format (*.mmtf.gz)
BZip2-Compressed Macromolecular Transmission Format (*.mmtf.bz2)
GZip-Compressed Macromolecular Crystallographic Information File Format (*.mmcif.gz, *.cif.gz)
BZip2-Compressed Macromolecular Crystallographic Information File Format (*.mmcif.bz2, *.cif.bz2)
Pharmacophore Screening Database Format (*.psd)
This option is useful when the format cannot be auto-detected from the actual extension of the file(s) (because missing, misleading or not supported).
-O [ –output-format ] arg
Allows to explicitly specify the hit molecule output file format by providing one of the supported file-extensions (without leading dot!) as argument.
- Supported Output Formats:
JME Molecular Editor String (*.jme)
MDL Structure-Data Format (*.sdf, *.sd)
MDL Molfile (*.mol)
Daylight SMILES Format (*.smi)
Daylight SMARTS Format (*.sma)
IUPAC International Chemical Identifier (*.inchi, *.ichi)
Native CDPL Format (*.cdf)
Tripos Sybyl MOL2 Format (*.mol2)
Atomic Coordinates XYZ Format (*.xyz)
Chemical Markup Language Format (*.cml)
GZip-Compressed MDL Structure-Data Format (*.sdf.gz, *.sd.gz, *.sdz)
BZip2-Compressed MDL Structure-Data Format (*.sdf.bz2, *.sd.bz2)
GZip-Compressed Native CDPL Format (*.cdf.gz)
BZip2-Compressed Native CDPL Format (*.cdf.bz2)
GZip-Compressed Daylight SMILES Format (*.smi.gz)
BZip2-Compressed Daylight SMILES Format (*.smi.bz2)
GZip-Compressed Tripos Sybyl MOL2 Format (*.mol2.gz)
BZip2-Compressed Tripos Sybyl MOL2 Format (*.mol2.bz2)
GZip-Compressed Atomic Coordinates XYZ Format (*.xyz.gz)
BZip2-Compressed Atomic Coordinates XYZ Format (*.xyz.bz2)
GZip-Compressed Chemical Markup Language Format (*.cml.gz)
BZip2-Compressed Chemical Markup Language Format (*.cml.bz2)
Brookhaven Protein Data Bank Entry Format (*.pdb, *.ent)
Macromolecular Transmission Format (*.mmtf)
Macromolecular Crystallographic Information File Format (*.mmcif, *.cif)
GZip-Compressed Brookhaven Protein Data Bank Entry Format (*.pdb.gz, *.ent.gz)
BZip2-Compressed Brookhaven Protein Data Bank Entry Format (*.pdb.bz2, *.ent.bz2)
GZip-Compressed Macromolecular Transmission Format (*.mmtf.gz)
BZip2-Compressed Macromolecular Transmission Format (*.mmtf.bz2)
GZip-Compressed Macromolecular Crystallographic Information File Format (*.mmcif.gz, *.cif.gz)
BZip2-Compressed Macromolecular Crystallographic Information File Format (*.mmcif.bz2, *.cif.bz2)
Pharmacophore Screening Database Format (*.psd)
This option is useful when the format cannot be auto-detected from the actual extension of the file (because missing, misleading or not supported).
—ecfp-size arg
Size of the generated fingerprint (default: 8191).
—ecfp-radius arg
Atom environment radius in number of bonds (default: 2 -> ECFP4).
—ecfp-inc-H arg
Whether or not to include hydrogen atoms (default: false).
—ecfp-inc-chirality arg
Whether or not to regard the chriality of stereo atoms(default: false).
—daylight-size arg
Size of the generated fingerprint (default: 8191).
—daylight-min-path-len arg
Minimum considered atom path length in number of bonds (default: 0).
—daylight-max-path-len arg
Maximum considered atom path length in number of bonds (default: 5).
—daylight-inc-H arg
Whether or not to include hydrogen atoms (default: false).
—pharm-2d-size arg
Size of the generated fingerprint (default: 8191).
—pharm-2d-min-tuple-size arg
Minimum feature tuple size (default: 1).
—pharm-2d-max-tuple-size arg
Maximum feature tuple size (default: 3).
—pharm-2d-bin-size arg
Feature distance bin size (default: 2.0, must be > 0).
—pharm-3d-size arg
Size of the generated fingerprint (default: 8191).
—pharm-3d-min-tuple-size arg
Minimum feature tuple size (default: 1).
—pharm-3d-max-tuple-size arg
Maximum feature tuple size (default: 3).
—pharm-3d-bin-size arg
Feature distance bin size (default: 3.0, must be > 0).
—tversky-weight-a arg
Weight factor of the query molecule fingerprint exclusive bits (default: 1.0).
—tversky-weight-b arg
Weight factor of the database molecule fingerprint exclusive bits (default: 0.0).