simscreen

Performs a fast fingerprint-based similarity screening of molecule databases.

Synopsis

simscreen [-hVvpMTugSNGFBYt] [-c arg] [-l arg] [-o arg] [-r arg] [-m arg] [-f arg] [-e arg] [-b arg] [-n arg] [-x arg] [-P arg] [-Q arg] [-D arg] [-O arg] [–tversky-weight-a arg] [–tversky-weight-b arg] [–ecfp-size arg] [–ecfp-radius arg] [–ecfp-inc-H arg] [–ecfp-inc-chirality arg] [–daylight-size arg] [–daylight-min-path-len arg] [–daylight-max-path-len arg] [–daylight-inc-H arg] [–pharm-2d-size arg] [–pharm-2d-min-tuple-size arg] [–pharm-2d-max-tuple-size arg] [–pharm-2d-bin-size arg] [–pharm-3d-size arg] [–pharm-3d-min-tuple-size arg] [–pharm-3d-max-tuple-size arg] [–pharm-3d-bin-size arg] -q arg -d arg

Mandatory options

-q [ –query ] arg

The query molecule input file.

Supported Input Formats:

JME Molecular Editor String (.jme)

MDL Structure-Data Format (.sdf, .sd)

MDL Molfile (.mol)

Daylight SMILES Format (.smi)

Daylight SMARTS Format (.sma)

IUPAC International Chemical Identifier (.inchi, .ichi)

Native CDPL Format (.cdf)

Tripos Sybyl MOL2 Format (.mol2)

Atomic Coordinates XYZ Format (.xyz)

Chemical Markup Language Format (.cml)

GZip-Compressed MDL Structure-Data Format (.sdf.gz, .sd.gz, .sdz)

BZip2-Compressed MDL Structure-Data Format (.sdf.bz2, .sd.bz2)

GZip-Compressed Native CDPL Format (.cdf.gz)

BZip2-Compressed Native CDPL Format (.cdf.bz2)

GZip-Compressed Daylight SMILES Format (.smi.gz)

BZip2-Compressed Daylight SMILES Format (.smi.bz2)

GZip-Compressed Tripos Sybyl MOL2 Format (.mol2.gz)

BZip2-Compressed Tripos Sybyl MOL2 Format (.mol2.bz2)

GZip-Compressed Atomic Coordinates XYZ Format (.xyz.gz)

BZip2-Compressed Atomic Coordinates XYZ Format (.xyz.bz2)

GZip-Compressed Chemical Markup Language Format (.cml.gz)

BZip2-Compressed Chemical Markup Language Format (.cml.bz2)

Brookhaven Protein Data Bank Entry Format (.pdb, .ent)

Macromolecular Transmission Format (.mmtf)

Macromolecular Crystallographic Information File Format (.mmcif, .cif)

GZip-Compressed Brookhaven Protein Data Bank Entry Format (.pdb.gz, .ent.gz)

BZip2-Compressed Brookhaven Protein Data Bank Entry Format (.pdb.bz2, .ent.bz2)

GZip-Compressed Macromolecular Transmission Format (.mmtf.gz)

BZip2-Compressed Macromolecular Transmission Format (.mmtf.bz2)

GZip-Compressed Macromolecular Crystallographic Information File Format (.mmcif.gz, .cif.gz)

BZip2-Compressed Macromolecular Crystallographic Information File Format (.mmcif.bz2,.cif.bz2)

Pharmacophore Screening Database Format (.psd)

-d [ –database ] arg

The molecule database file to screen.

Supported Input Formats:

JME Molecular Editor String (.jme)

MDL Structure-Data Format (.sdf, .sd)

MDL Molfile (.mol)

Daylight SMILES Format (.smi)

Daylight SMARTS Format (.sma)

IUPAC International Chemical Identifier (.inchi, .ichi)

Native CDPL Format (.cdf)

Tripos Sybyl MOL2 Format (.mol2)

Atomic Coordinates XYZ Format (.xyz)

Chemical Markup Language Format (.cml)

GZip-Compressed MDL Structure-Data Format (.sdf.gz, .sd.gz, .sdz)

BZip2-Compressed MDL Structure-Data Format (.sdf.bz2, .sd.bz2)

GZip-Compressed Native CDPL Format (.cdf.gz)

BZip2-Compressed Native CDPL Format (.cdf.bz2)

GZip-Compressed Daylight SMILES Format (.smi.gz)

BZip2-Compressed Daylight SMILES Format (.smi.bz2)

GZip-Compressed Tripos Sybyl MOL2 Format (.mol2.gz)

BZip2-Compressed Tripos Sybyl MOL2 Format (.mol2.bz2)

GZip-Compressed Atomic Coordinates XYZ Format (.xyz.gz)

BZip2-Compressed Atomic Coordinates XYZ Format (.xyz.bz2)

GZip-Compressed Chemical Markup Language Format (.cml.gz)

BZip2-Compressed Chemical Markup Language Format (.cml.bz2)

Brookhaven Protein Data Bank Entry Format (.pdb, .ent)

Macromolecular Transmission Format (.mmtf)

Macromolecular Crystallographic Information File Format (.mmcif, .cif)

GZip-Compressed Brookhaven Protein Data Bank Entry Format (.pdb.gz, .ent.gz)

BZip2-Compressed Brookhaven Protein Data Bank Entry Format (.pdb.bz2, .ent.bz2)

GZip-Compressed Macromolecular Transmission Format (.mmtf.gz)

BZip2-Compressed Macromolecular Transmission Format (.mmtf.bz2)

GZip-Compressed Macromolecular Crystallographic Information File Format (.mmcif.gz, .cif.gz)

BZip2-Compressed Macromolecular Crystallographic Information File Format (.mmcif.bz2, .cif.bz2)

Pharmacophore Screening Database Format (.psd)

Other options

-h [ –help ] [=arg(=SHORT)]

Print help message and exit (ABOUT, USAGE, SHORT, ALL or ‘name of option’, default: SHORT).

-V [ –version ]

Print version information and exit.

-v [ –verbosity ] [=arg(=VERBOSE)]

Verbosity level of information output (QUIET, ERROR, INFO, VERBOSE, DEBUG, default: INFO).

-c [ –config ] arg

Use file with program options.

-l [ –log-file ] arg

Redirect text-output to file.

-p [ –progress ] [=arg(=1)]

Show progress bar (default: true).

-o [ –output ] arg

Hit molecule output file.

Supported Output Formats:

JME Molecular Editor String (.jme)

MDL Structure-Data Format (.sdf, .sd)

MDL Molfile (.mol)

Daylight SMILES Format (.smi)

Daylight SMARTS Format (.sma)

IUPAC International Chemical Identifier (.inchi, .ichi)

Native CDPL Format (.cdf)

Tripos Sybyl MOL2 Format (.mol2)

Atomic Coordinates XYZ Format (.xyz)

Chemical Markup Language Format (.cml)

GZip-Compressed MDL Structure-Data Format (.sdf.gz, .sd.gz, .sdz)

BZip2-Compressed MDL Structure-Data Format (.sdf.bz2, .sd.bz2)

GZip-Compressed Native CDPL Format (.cdf.gz)

BZip2-Compressed Native CDPL Format (.cdf.bz2)

GZip-Compressed Daylight SMILES Format (.smi.gz)

BZip2-Compressed Daylight SMILES Format (.smi.bz2)

GZip-Compressed Tripos Sybyl MOL2 Format (.mol2.gz)

BZip2-Compressed Tripos Sybyl MOL2 Format (.mol2.bz2)

GZip-Compressed Atomic Coordinates XYZ Format (.xyz.gz)

BZip2-Compressed Atomic Coordinates XYZ Format (.xyz.bz2)

GZip-Compressed Chemical Markup Language Format (.cml.gz)

BZip2-Compressed Chemical Markup Language Format (.cml.bz2)

Brookhaven Protein Data Bank Entry Format (.pdb, .ent)

Macromolecular Transmission Format (.mmtf)

Macromolecular Crystallographic Information File Format (.mmcif, .cif)

GZip-Compressed Brookhaven Protein Data Bank Entry Format (.pdb.gz, .ent.gz)

BZip2-Compressed Brookhaven Protein Data Bank Entry Format (.pdb.bz2, .ent.bz2)

GZip-Compressed Macromolecular Transmission Format (.mmtf.gz)

BZip2-Compressed Macromolecular Transmission Format (.mmtf.bz2)

GZip-Compressed Macromolecular Crystallographic Information File Format (.mmcif.gz, .cif.gz)

BZip2-Compressed Macromolecular Crystallographic Information File Format (.mmcif.bz2, .cif.bz2)

Pharmacophore Screening Database Format (.psd)

-r [ –report ] arg

Report output file.

-m [ –mode ] arg

Specifies which kind of obtained results for the query/database molecule pairings are of interest (BEST_OVERALL, BEST_PER_QUERY, BEST_PER_QUERY_CONF, default: BEST_PER_QUERY).

-f [ –func ] arg

Function to use for molecule similarity/distance calculation and ranking operations (TANIMOTO_SIM, TVERSKY_SIM, COSINE_SIM, DICE_SIM, MANHATTAN_SIM, MANHATTAN_DIST, HAMMING_DIST, EUCLIDEAN_SIM, EUCLIDEAN_DIST, default: TANIMOTO_SIM)

-e [ –descr ] arg

Type of molecule descriptor to use for similarity/distance calculations (ECFP, DAYLIGHT, PUBCHEM, MACCS, PHARM_2D, PHARM_3D, default: ECFP)

-b [ –best-hits ] arg

Maximum number of best scoring hits to output (default: 1000).

-n [ –max-hits ] arg

Maximum number of found hits at which the screen will terminate (overrides the – best-hits option, default: 0 - no limit).

-x [ –cutoff ] arg

Similarity/distance cutoff value which determines whether an database molecule is considered as a hit (default: -1.0 -> no cutoff).

-M [ –merge-hits ] [=arg(=1)]

If true, identified hits are merged into a single, combined hit list. If false, a separate hit list for every query molecule will be maintained (default: false).

-T [ –split-output ] [=arg(=1)]

If true, for every query molecule a separate report and hit output file will be generated (default: true).

-u [ –output-query ] [=arg(=1)]

If specified, query molecules will be written at the beginning of the hit molecule output file (default: true).

-g [ –single-conf ] [=arg(=1)]

If specified, conformers of the database molecules are treated as individual single conformer molecules (default: false).

-S [ –score-sd-tags ] [=arg(=1)]

If true, similarity/distance score values will be appended as SD-block entries of the output hit molecules (default: true).

-N [ –query-name-sd-tags ] [=arg(=1)]

If true, the query molecule name will be appended to the SD-block of the output hit molecules (default: false).

-G [ –query-idx-sd-tags ] [=arg(=1)]

If true, the query molecule index will be appended to the SD-block of the output hit molecules (default: false).

-F [ –query-conf-sd-tags ] [=arg(=1)]

If true, the query molecule conformer index will be appended to the SD-block of the output hit molecules (default: true).

-B [ –db-idx-sd-tags ] [=arg(=1)]

If true, the database molecule index will be appended to the SD-block of the output hit molecules (default: false).

-Y [ –db-conf-sd-tags ] [=arg(=1)]

If true, the database molecule conformer index will be appended to the SD-block of the output hit molecules (default: true).

-P [ –hit-name-ptn ] arg

Pattern for composing the names of written hit molecules by variable substitution (supported variables: @Q@ = query molecule name, @D@ = database molecule name, @C@ = query molecule conformer index, @c@ = database molecule conformer index, @I@ = query molecule index and @i@ = database molecule index, default: @D@_@c@_@Q@_@C@).

-t [ –num-threads ] [=arg(=4)]

Number of parallel execution threads (default: no multithreading, implicit value: 4 threads, must be >= 0, 0 disables multithreading).

-Q [ –query-format ] arg

Allows to explicitly specify the format of the query molecule file by providing one of the supported file-extensions (without leading dot!) as argument.

Supported Input Formats:

JME Molecular Editor String (.jme)

MDL Structure-Data Format (.sdf, .sd)

MDL Molfile (.mol)

Daylight SMILES Format (.smi)

Daylight SMARTS Format (.sma)

IUPAC International Chemical Identifier (.inchi, .ichi)

Native CDPL Format (.cdf)

Tripos Sybyl MOL2 Format (.mol2)

Atomic Coordinates XYZ Format (.xyz)

Chemical Markup Language Format (.cml)

GZip-Compressed MDL Structure-Data Format (.sdf.gz, .sd.gz, .sdz)

BZip2-Compressed MDL Structure-Data Format (.sdf.bz2, .sd.bz2)

GZip-Compressed Native CDPL Format (.cdf.gz)

BZip2-Compressed Native CDPL Format (.cdf.bz2)

GZip-Compressed Daylight SMILES Format (.smi.gz)

BZip2-Compressed Daylight SMILES Format (.smi.bz2)

GZip-Compressed Tripos Sybyl MOL2 Format (.mol2.gz)

BZip2-Compressed Tripos Sybyl MOL2 Format (.mol2.bz2)

GZip-Compressed Atomic Coordinates XYZ Format (.xyz.gz)

BZip2-Compressed Atomic Coordinates XYZ Format (.xyz.bz2)

GZip-Compressed Chemical Markup Language Format (.cml.gz)

BZip2-Compressed Chemical Markup Language Format (.cml.bz2)

Brookhaven Protein Data Bank Entry Format (.pdb, .ent)

Macromolecular Transmission Format (.mmtf)

Macromolecular Crystallographic Information File Format (.mmcif, .cif)

GZip-Compressed Brookhaven Protein Data Bank Entry Format (.pdb.gz, .ent.gz)

BZip2-Compressed Brookhaven Protein Data Bank Entry Format (.pdb.bz2, .ent.bz2)

GZip-Compressed Macromolecular Transmission Format (.mmtf.gz)

BZip2-Compressed Macromolecular Transmission Format (.mmtf.bz2)

GZip-Compressed Macromolecular Crystallographic Information File Format (.mmcif.gz, .cif.gz)

BZip2-Compressed Macromolecular Crystallographic Information File Format (.mmcif.bz2, .cif.bz2)

Pharmacophore Screening Database Format (.psd)

This option is useful when the format cannot be auto-detected from the actual extension of the file (because missing, misleading or not supported).

-D [ –database-format ] arg

Allows to explicitly specify the format of the screening database file by providing one of the supported file-extensions (without leading dot!) as argument.

Supported Input Formats:

JME Molecular Editor String (.jme)

MDL Structure-Data Format (.sdf, .sd)

MDL Molfile (.mol)

Daylight SMILES Format (.smi)

Daylight SMARTS Format (.sma)

IUPAC International Chemical Identifier (.inchi, .ichi)

Native CDPL Format (.cdf)

Tripos Sybyl MOL2 Format (.mol2)

Atomic Coordinates XYZ Format (.xyz)

Chemical Markup Language Format (.cml)

GZip-Compressed MDL Structure-Data Format (.sdf.gz, .sd.gz, .sdz)

BZip2-Compressed MDL Structure-Data Format (.sdf.bz2, .sd.bz2)

GZip-Compressed Native CDPL Format (.cdf.gz)

BZip2-Compressed Native CDPL Format (.cdf.bz2)

GZip-Compressed Daylight SMILES Format (.smi.gz)

BZip2-Compressed Daylight SMILES Format (.smi.bz2)

GZip-Compressed Tripos Sybyl MOL2 Format (.mol2.gz)

BZip2-Compressed Tripos Sybyl MOL2 Format (.mol2.bz2)

GZip-Compressed Atomic Coordinates XYZ Format (.xyz.gz)

BZip2-Compressed Atomic Coordinates XYZ Format (.xyz.bz2)

GZip-Compressed Chemical Markup Language Format (.cml.gz)

BZip2-Compressed Chemical Markup Language Format (.cml.bz2)

Brookhaven Protein Data Bank Entry Format (.pdb, .ent)

Macromolecular Transmission Format (.mmtf)

Macromolecular Crystallographic Information File Format (.mmcif, .cif)

GZip-Compressed Brookhaven Protein Data Bank Entry Format (.pdb.gz, .ent.gz)

BZip2-Compressed Brookhaven Protein Data Bank Entry Format (.pdb.bz2, .ent.bz2)

GZip-Compressed Macromolecular Transmission Format (.mmtf.gz)

BZip2-Compressed Macromolecular Transmission Format (.mmtf.bz2)

GZip-Compressed Macromolecular Crystallographic Information File Format (.mmcif.gz, .cif.gz)

BZip2-Compressed Macromolecular Crystallographic Information File Format (.mmcif.bz2, .cif.bz2)

Pharmacophore Screening Database Format (.psd)

This option is useful when the format cannot be auto-detected from the actual extension of the file(s) (because missing, misleading or not supported).

-O [ –output-format ] arg

Allows to explicitly specify the hit molecule output file format by providing one of the supported file-extensions (without leading dot!) as argument.

Supported Output Formats:

JME Molecular Editor String (.jme)

MDL Structure-Data Format (.sdf, .sd)

MDL Molfile (.mol)

Daylight SMILES Format (.smi)

Daylight SMARTS Format (.sma)

IUPAC International Chemical Identifier (.inchi, .ichi)

Native CDPL Format (.cdf)

Tripos Sybyl MOL2 Format (.mol2)

Atomic Coordinates XYZ Format (.xyz)

Chemical Markup Language Format (.cml)

GZip-Compressed MDL Structure-Data Format (.sdf.gz, .sd.gz, .sdz)

BZip2-Compressed MDL Structure-Data Format (.sdf.bz2, .sd.bz2)

GZip-Compressed Native CDPL Format (.cdf.gz)

BZip2-Compressed Native CDPL Format (.cdf.bz2)

GZip-Compressed Daylight SMILES Format (.smi.gz)

BZip2-Compressed Daylight SMILES Format (.smi.bz2)

GZip-Compressed Tripos Sybyl MOL2 Format (.mol2.gz)

BZip2-Compressed Tripos Sybyl MOL2 Format (.mol2.bz2)

GZip-Compressed Atomic Coordinates XYZ Format (.xyz.gz)

BZip2-Compressed Atomic Coordinates XYZ Format (.xyz.bz2)

GZip-Compressed Chemical Markup Language Format (.cml.gz)

BZip2-Compressed Chemical Markup Language Format (.cml.bz2)

Brookhaven Protein Data Bank Entry Format (.pdb, .ent)

Macromolecular Transmission Format (.mmtf)

Macromolecular Crystallographic Information File Format (.mmcif, .cif)

GZip-Compressed Brookhaven Protein Data Bank Entry Format (.pdb.gz, .ent.gz)

BZip2-Compressed Brookhaven Protein Data Bank Entry Format (.pdb.bz2, .ent.bz2)

GZip-Compressed Macromolecular Transmission Format (.mmtf.gz)

BZip2-Compressed Macromolecular Transmission Format (.mmtf.bz2)

GZip-Compressed Macromolecular Crystallographic Information File Format (.mmcif.gz, .cif.gz)

BZip2-Compressed Macromolecular Crystallographic Information File Format (.mmcif.bz2, .cif.bz2)

Pharmacophore Screening Database Format (.psd)

This option is useful when the format cannot be auto-detected from the actual extension of the file (because missing, misleading or not supported).

—ecfp-size arg

Size of the generated fingerprint (default: 8191).

—ecfp-radius arg

Atom environment radius in number of bonds (default: 2 -> ECFP4).

—ecfp-inc-H arg

Whether or not to include hydrogen atoms (default: false).

—ecfp-inc-chirality arg

Whether or not to regard the chriality of stereo atoms(default: false).

—daylight-size arg

Size of the generated fingerprint (default: 8191).

—daylight-min-path-len arg

Minimum considered atom path length in number of bonds (default: 0).

—daylight-max-path-len arg

Maximum considered atom path length in number of bonds (default: 5).

—daylight-inc-H arg

Whether or not to include hydrogen atoms (default: false).

—pharm-2d-size arg

Size of the generated fingerprint (default: 8191).

—pharm-2d-min-tuple-size arg

Minimum feature tuple size (default: 1).

—pharm-2d-max-tuple-size arg

Maximum feature tuple size (default: 3).

—pharm-2d-bin-size arg

Feature distance bin size (default: 2.0, must be > 0).

—pharm-3d-size arg

Size of the generated fingerprint (default: 8191).

—pharm-3d-min-tuple-size arg

Minimum feature tuple size (default: 1).

—pharm-3d-max-tuple-size arg

Maximum feature tuple size (default: 3).

—pharm-3d-bin-size arg

Feature distance bin size (default: 3.0, must be > 0).

—tversky-weight-a arg

Weight factor of the query molecule fingerprint exclusive bits (default: 1.0).

—tversky-weight-b arg

Weight factor of the database molecule fingerprint exclusive bits (default: 0.0).