Introduction
MolShaCS is a C++ program developed to compare small molecules using shape and polarity attributes. MolShaCS uses a Gaussian description of molecular shape and a Gaussian description of charge distribution among the molecule to guide an overlay optimization and to compute similarity indexes.
Hodking-like similarity indexes are computed and written in a text file together with a log file. The outputs from a typical computation are a log file, a “dat” file with similarity indexes and (optional) a mol2 file with aligned molecules.
MolShaCS can be used through a Qt graphical user interface or through batch scripts. The details on how to use the program are given in the wiki pages.
MolShaCS does not deal with flexible molecules. All molecules are kept rigid during the overlay and molecular similarity computation. This is an important limitation of the program.
Usage
MolShaCS can be compiled and used using a Qt graphical user interface or (my preferred method) using command line. For command line, type:
[important]
MolShaCS Molshacs.inp
[/important]
where Molshacs.inp is a text file with the following syntax:
refmol_mol2 mol1.mol2 output_prefix MolShaCS minimizer nlopt_mma align_molecules yes timeout 60 write_coordinates yes mol2_aa no box_size 30.0 30.0 30.0 multimol molecules.list step 1.0E-5 tol 1.0E-4 delta 1.0E-5
A file molecules.list should also exist in the directory where MolShaCS is running. This file must be a text file with the path for the comparing molecules, only. For example:
[important] $ more molecules.list
../mol2/mol1.mol2
../mol2/mol2.mol2
../mol2/mol3.mol2
../mol2/mol4.mol2
... [/important]
Downloads
We currently work with Windows and Linux compilations, including the Qt GUI. Also, the source code is provided.
Version | GUI | No GUI |
Windows | Download | Download |
Linux (x64) | Download | Download |
Source Code | Download | Download |
Sample Run
As an example on how to use MolShaCS, we will compare aldosterone to a set of FDA approved molecules and take a look at the molecules in the top of the list. First of all, let’s get the molecules
- Let’s go to ZINC and download the Drugbank list of approved drugs with 1761 representative molecules. The molecules are provided as MOL2 files through script to download in Linux or Windows.
- Supposing you already have the mol2 files named ‘fda80.1.mol2’, fda80.2.mol2’, fda80.3.mol2’, etc, in a separate folder named ‘mol2’, lets generate a molecules.list file. If you use Linux or Cygwin in Windows, this should be easy (see below). These lines will look for the files in the folder mol2 and put them on the list if the file exists.
[important] $ for i in `seq 1 1761`; do if [ -e ../mol2/fda80.$i.mol2 ]; then echo ../mol2/fda80.$i.mol2 >> molecules.list ; fi; done [/important]
- Let’s get aldosterone from ZINC again, using this link.
- Now, let’s prepare an input file for MolShaCS. This file should have the following instructions (see below). Save the file as ‘input.inp’.
refmol_mol2 aldo.mol2 output_prefix MolShaCS minimizer nlopt_mma_sog align_molecules yes timeout 60 write_coordinates yes write_coord_threshold 0.85 mol2_aa no box_size 30.0 multimol molecules.list step 1.0E-5 tol 1.0E-4 delta 1.0E-5
- Ok, we are ready to start the computation with the command $MolShaCS input.inp. Note that the file vdw.param distributed together with MolShaCS should be in the same folder where the computation takes place or you can use the environment variable MOLSHACS_DIR to point to the folder where the file is located.
- After a couple of minutes the calculation is done and two files are written: MolShaCS.log and MolShaCS.cc.dat. The latter has the similarities computed for each of the provided molecules. We can rank the results using a bash command again (below) and we will find the top scored molecules:
[important] $ more MolShaCS.cc.dat | sort –n –r –k 5 | more [/important]