MolShaCS is a C++ program developed to compare small molecules using shape and polarity attributes. MolShaCS uses a Gaussian description of molecular shape and a Gaussian description of charge distribution among the molecule to guide an overlay optimization and to compute similarity indexes.

Hodking-like similarity indexes are computed and written in a text file together with a log file. The outputs from a typical computation are a log file, a “dat” file with similarity indexes and (optional) a mol2 file with aligned molecules.

MolShaCS can be used through a Qt graphical user interface or through batch scripts. The details on how to use the program are given in the wiki pages.

MolShaCS does not deal with flexible molecules. All molecules are kept rigid during the overlay and molecular similarity computation. This is an important limitation of the program.


MolShaCS can be compiled and used using a Qt graphical user interface or (my preferred method) using command line. For command line, type:


MolShaCS Molshacs.inp



where Molshacs.inp is a text file with the following syntax:
refmol_mol2         mol1.mol2
output_prefix       MolShaCS
minimizer           nlopt_mma
align_molecules     yes
timeout             60
write_coordinates   yes
mol2_aa             no
box_size            30.0 30.0 30.0
multimol            molecules.list
step                1.0E-5
tol                 1.0E-4
delta               1.0E-5

A file molecules.list should also exist in the directory where MolShaCS is running. This file must be a text file with the path for the comparing molecules, only. For example:

[important] $ more molecules.list

... [/important]


We currently work with Windows and Linux compilations, including the Qt GUI. Also, the source code is provided.


Version GUI No GUI
Windows Download Download
Linux (x64) Download Download
Source Code Download Download


Sample Run

As an example on how to use MolShaCS, we will compare aldosterone to a set of FDA approved molecules and take a look at the molecules in the top of the list. First of all, let’s get the molecules

  • Let’s go to ZINC and download the Drugbank list of approved drugs with 1761 representative molecules. The molecules are provided as MOL2 files through script to download in Linux or Windows.
  • Supposing you already have the mol2 files named ‘fda80.1.mol2’, fda80.2.mol2’, fda80.3.mol2’, etc, in a separate folder named ‘mol2’, lets generate a molecules.list file. If you use Linux or Cygwin in Windows, this should be easy (see below). These lines will look for the files in the folder mol2 and put them on the list if the file exists.
[important] $ for i in `seq 1 1761`; do if [ -e ../mol2/fda80.$i.mol2 ]; then echo ../mol2/fda80.$i.mol2 >> molecules.list ; fi; done [/important]
  • Let’s get aldosterone from ZINC again, using this link.
  • Now, let’s prepare an input file for MolShaCS. This file should have the following instructions (see below). Save the file as ‘input.inp’.
refmol_mol2           aldo.mol2
output_prefix         MolShaCS
minimizer             nlopt_mma_sog
align_molecules       yes
timeout               60
write_coordinates     yes
write_coord_threshold 0.85
mol2_aa               no
box_size              30.0
multimol              molecules.list
step                  1.0E-5
tol                   1.0E-4
delta                 1.0E-5
  1. Ok, we are ready to start the computation with the command $MolShaCS input.inp. Note that the file vdw.param distributed together with MolShaCS should be in the same folder where the computation takes place or you can use the environment variable MOLSHACS_DIR to point to the folder where the file is located.
  2. After a couple of minutes the calculation is done and two files are written: MolShaCS.log and The latter has the similarities computed for each of the provided molecules. We can rank the results using a bash command again (below) and we will find the top scored molecules:
[important] $ more | sort –n –r –k 5 | more [/important]