Conformer tools

Conformer ensembles can be generated and handled with the conformer tools in Mᴏʀғᴇᴜs. Generation of new conformers from SMILES strings relies on interfaces to the RDKit and OpenBabel.

OpenBabel and RDKit are avaible for install via, e.g., the conda-forge:

conda install -c conda-forge rdkit
conda install -c conda-forge openbabel

Module

Below is an example where the RDKit is used to generate conformers for n-butane.

Example
>>> ce = ConformerEnsemble.from_rdkit("CCCC", optimize="MMFF94")
>>> ce
ConformerEnsemble(50 conformers)
>>> ce.prune_rmsd()
ConformerEnsemble(3 conformers)
>>> ce.sort()
>>> ce.get_relative_energies()
array([0.        , 0.78220907, 0.78220907])
>>> ce.write_xyz("conformers.xyz")

Here, conformers are first generated with the RDKit and optimized with the MMFF94 force field. Then a pruning based on RMSD brings the number of conformers down to 3, which are are sorted based on energy. For full information, use help(ConformerEnsemble) or consult the API: ConformerEnsemble.

Conformer generation

There are three different methods for generating conformers with Mᴏʀғᴇᴜs. The available arguments are listed for the standalone functions.

The conformers from OpenBabel are generally pruned with respect to RSMD while those from RDKit require post-generation pruning. Here is an example of using OpenBabel.

Conformers from OpenBabel
>>> ce = ConformerEnsemble.from_ob_ga("CCCC")
>>> ce
ConformerEnsemble(3 conformers)

CREST

Output from the CREST program can be parsed. Mᴏʀғᴇᴜs does not currently have any functionality for running CREST. The following files must be present in the output folder: crest_conformers.xyz, cre_members and crest.eneriges.

Conformer ensemble from CREST output
>>> ce = ce = ConformerEnsemble.from_crest("crest_output_folder")
>>> ce
ConformerEnsemble(8 conformers)
>>> ce.sort()
>>> ce.get_relative_energies()
array([0.   , 0.468, 0.829, 0.832, 0.834])
>>> ce.get_degeneracies()
array([ 4, 27,  8,  2,  1])
>>> ce.boltzmann_weights()
array([0.45642055, 0.20716615, 0.11264294, 0.11207402, 0.11169634])
Erel/kcal        Etot weight/tot  conformer     set   degen     origin
1    0.000    -2.40317    0.05283    0.21084       1       4     mtd3
2    0.000    -2.40317    0.05281                                mtd4
3    0.002    -2.40317    0.05262                                mtd3
4    0.003    -2.40316    0.05258                                mtd4
5    0.468    -2.40242    0.02400    0.64605       2      27     mtd2
6    0.468    -2.40242    0.02399                                mtd4
7    0.468    -2.40242    0.02398                                mtd6
8    0.468    -2.40242    0.02398                                mtd1
9    0.469    -2.40242    0.02398                                mtd1
10   0.469    -2.40242    0.02398                                mtd1
11   0.469    -2.40242    0.02397                                mtd2
12   0.469    -2.40242    0.02397                                mtd1
13   0.469    -2.40242    0.02397                                mtd3
14   0.469    -2.40242    0.02396                                mtd1
15   0.469    -2.40242    0.02395                                mtd1
16   0.469    -2.40242    0.02395                                mtd4
17   0.469    -2.40242    0.02394                                mtd4
18   0.469    -2.40242    0.02394                                mtd5
19   0.469    -2.40242    0.02394                                mtd3
20   0.470    -2.40242    0.02393                                mtd1
21   0.470    -2.40242    0.02393                                mtd1
22   0.470    -2.40242    0.02391                                mtd3
23   0.470    -2.40242    0.02391                                mtd5
24   0.470    -2.40242    0.02390                                mtd5
25   0.471    -2.40242    0.02389                                mtd2
26   0.471    -2.40242    0.02388                                mtd1
27   0.471    -2.40242    0.02387                                mtd3
28   0.472    -2.40242    0.02386                                mtd3
29   0.472    -2.40242    0.02385                                mtd5
30   0.473    -2.40242    0.02381                                mtd3
31   0.473    -2.40242    0.02381                                mtd6
32   0.829    -2.40185    0.01306    0.10429       3       8     mtd3
33   0.829    -2.40185    0.01305                                mtd2
34   0.829    -2.40185    0.01305                                mtd6
35   0.829    -2.40185    0.01305                                mtd2
36   0.829    -2.40185    0.01305                                mtd2
37   0.830    -2.40185    0.01304                                mtd5
38   0.832    -2.40184    0.01300                                mtd5
39   0.832    -2.40184    0.01299                                mtd5
40   0.832    -2.40184    0.01299    0.02588       4       2     mtd3
41   0.837    -2.40184    0.01289                                mtd3
42   0.834    -2.40184    0.01295    0.01295       5       1     mtd5

Boltzmann-weighted properties

A convenient way to calculate properties for the ensemble is to iterate over the ConformerEnsemble object:

Boltzmann-weighted SASA
>>> ce = ConformerEnsemble.from_rdkit("CCCO", optimize="MMFF94")
>>> ce.prune_rmsd()
>>> ce.sort()
ConformerEnsemble(3 conformers)
>>> for conformer in ce:
>>> ... sasa = SASA(ce.elements, conformer.coordinates)
>>> ... conformer.properties["sasa"] = sasa.area
>>> ce.get_properties()
{'sasa': array([221.26889622, 217.38905484, 216.53891818])}
>>> ce.boltzmann_weights()
array([0.56040173, 0.28260296, 0.15699531])
>>> ce.boltzmann_statistic("sasa")
219.4298571958332

The default of the function ConformerEnsemble.boltzmann_statistic is to calculate the Boltzmann average at 298.15 K, but this can be changed with temperature=<float> and statistic=<str>, where “var” or “std” are available. The temperature derivative of the Boltzmann average can also be calculated with the method ConformerEnsemble.boltzmann_average_dT

RMSD pruning

Conformers are usually pruned on root mean square deviation in terms of (heavy) atom coordinates to remove redundant structures which correspond to essentially the same conformation. In Mᴏʀғᴇᴜs, this is achieved with the ConformerEnsemble.prune_rmsd method. By default, the AllChem.AlignMolConformers function from RDKit is used to calculate the RMSD, but this can be changed with the keyword argument method=<str>. The following options are available. spyrmsd needs to be installed for that option to work.

method

symmetry

include_hs

obrms-batch

Always

Never

obrms-iter

Always

Never

openbabel

Optional

Optional

rdkit

Never

Never

spyrmsd

Optional

Optional

The distinguishing factors are whether symmetry and non-heavy atoms are considered when calculating the RMSD. For the method="openbabel" and method="spyrmsd", the keyword arguments symmetry=<bool>``and ``include_hs=<bool> are used to control the behavior. For the rest of the methods, these arguments will be ignored. Pruning out conformers that are the same by symmetry can lower the computational cost, but might also lead to errors in Boltzmann weighting as degeneracy is not taken into account.

Warning

include_hs and symmetry are ignored unless method is “openbabel” or “spyrmsd”.

Energy pruning

Conformers are often pruned based on energy. The Boltzman weight for conformers above 3 kcal/mol are expected to contribute in a neglible fashion to the properties at room temperature. Therefore, the default of the ConformerEnsemble.prune_energy is to prune out all conformers above this energy. This can be changed with the keyword argument threshold=<float>.

Energy pruning
>>> ce = ConformerEnsemble.from_rdkit("C1CCCCC1", optimize="MMFF94")
>>> ce.prune_rmsd(method="obrms-batch")
>>> ce.sort()
>>> ce.get_relative_energies()
array([0.       , 5.9297458])
>>> ce.prune_energy()
>>> ce.get_relative_energies()
array([0.])

Optimization and ranking

Mᴏʀғᴇᴜs has an interface to QCEngine that allows calculation of single-point energies and geometry optimizations for conformers. The program, model, keywords and local_options keyword arguments are all passed on to QCEngine and more information can be found in the documentation. Here is an example, optimizing a conformer ensemble with GFN-FF and doing single points with GFN2-xTB.

Optimization and ranking
>>> ce = ConformerEnsemble.from_rdkit("CCCC", optimize="MMFF94")
>>> ce.prune_rmsd()
>>> ce.sort()
>>> ce.get_relative_energies()
array([0.        , 0.78220907, 0.78220907])
# Optimize with GFN-FF
>>> model={"method": "GFN-FF"}
>>> ce.optimize_qc_engine(program="xtb", model=model, procedure="geometric")
>>> ce.get_relative_energies()
array([0.        , 0.45271867, 0.45271867])
# Do single points with GFN2-xTB
>>> model={"method": "GFN2-xTB"}
>>> ce.sp_qc_engine(program="xtb", model=model)
>>> ce.get_relative_energies()
array([0.        , 0.63087431, 0.6308743 ])

Note

Optimization of many molecules with many conformers through the QCEngine interface is not efficient. The CREST program is recommended in these cases.

Warning

Optimization requires the installation of either PyBerny or geomeTRIC. They can be installed with pip or in the case of geomeTRIC, also with conda.

pip install pyberny
pip install geometric
conda install -c conda-forge geometric

Enantiomeric conformers

Mᴏʀғᴇᴜs can handle degeneracy stemming from enatiomeric conformations to some extent. This can be exemplified fro diethyl ether, which has three types of conformers tt (1), tg (4) and gg (2), where t stands for trans, g for gauche and the number in parenthesis is the total degeneracy of that type. [1]

Degeneracies
>>> ce = ConformerEnsemble.from_rdkit("CCOCC", optimize="MMFF94")
>>> ce
ConformerEnsemble(50 conformers)
>>> ce.prune_rmsd()
ConformerEnsemble(7 conformers)
>>> ce.add_inverted()  # Invert conformers to add potentially missing enantiomers
ConformerEnsemble(14 conformers)
>>> ce.prune_rmsd()  # Prune out all duplicates added in previous step
ConformerEnsemble(7 conformers)
>>> ce.condense_enantiomeric()  # Condense enantiomers to single conformer
ConformerEnsemble(4 conformers)
>>> ce.sort()
>>> ce.get_degeneracies()
array([1, 2, 2, 2])
>>> ce.get_relative_energies()
array([0.        , 1.51922335, 1.51922335, 3.03207887])

We can see that Mᴏʀғᴇᴜs has indeed obtained three different types of conformers (as seen by the energies) with degeneracies 1, 4 (2 + 2) and 2.

Enantiomers

Another situation is when there are enantiomers not due to conformation but to configuration. In these cases, the methods above are not safe to use. If the stereochemistry is given in the SMILES, RDKit will generate only that enantiomer. However, if the sterochemistry is not given, both enantiomers will be generated. Mᴏʀғᴇᴜs can prune out one of the enantiomers in this case to save computational time if there is no interest in the actual stereochemistry. Here is an example for alanine.

Generating specific enantiomer
>>> ce = ConformerEnsemble.from_rdkit("C[C@@H](C(=O)O)N", optimize="MMFF94")
>>> ce.prune_rmsd()
>>> ce.sort()
>>> ce.get_cip_labels()
[('', 'S', '', '', '', '', '', '', '', '', '', '', ''),
 ('', 'S', '', '', '', '', '', '', '', '', '', '', ''),
 ('', 'S', '', '', '', '', '', '', '', '', '', '', '')]
Generating both enantiomers
>>> ce = ConformerEnsemble.from_rdkit("CC(N)C(O)=O", optimize="MMFF94")
>>> ce.prune_rmsd()
>>> ce.sort()
>>> ce.get_cip_labels()
[('', 'R', '', '', '', '', '', '', '', '', '', '', ''),
 ('', 'S', '', '', '', '', '', '', '', '', '', '', ''),
 ('', 'S', '', '', '', '', '', '', '', '', '', '', ''),
 ('', 'R', '', '', '', '', '', '', '', '', '', '', ''),
 ('', 'S', '', '', '', '', '', '', '', '', '', '', ''),
 ('', 'R', '', '', '', '', '', '', '', '', '', '', ''),
 ('', 'S', '', '', '', '', '', '', '', '', '', '', ''),
 ('', 'R', '', '', '', '', '', '', '', '', '', '', '')]
>>> ce.prune_enantiomers()
>>> ce.get_cip_labels()
[('', 'R', '', '', '', '', '', '', '', '', '', '', ''),
 ('', 'R', '', '', '', '', '', '', '', '', '', '', ''),
 ('', 'R', '', '', '', '', '', '', '', '', '', '', ''),
 ('', 'R', '', '', '', '', '', '', '', '', '', '', '')]

For more information, see the ConformerEnsemble.prune_enantiomers method.

Command line script

References