Python code - Molecular structure from .sdf file
Python code – Fetching structure of molecules from .sdf file with rdkit
Written by,
Dr. M Kanagasabapathy
Asst. Professor
Department of Chemistry
Rajapalayam Rajus’ College
Madurai Kamaraj University
Rajapalayam (TN) INDIA
.sdf stands for Structural Data File (SDF) of a molecule and it’s based on .mol format.
.sdf files encoded for multiple molecular structure in a single file, whereas .mol file is encoded for a single molecule. In .sdf file format, either 2D or 3D structures of multiple molecules are delimited by $$$$ (4 dollars) and it is formatted with ASCII.
.sdf data files are primarily used by chemical suppliers.
Sample .sdf file
F0244-0040
-MTS- 05272009262D 0 0.00000 0.00000 0
40 45 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
-1.3070 -0.7190 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.6140 0.0480 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
-3.9210 -0.6950 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-5.2280 0.0480 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
-3.9330 -2.1820 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.6260 -2.9620 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.3190 -2.2060 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
-0.0120 -2.9620 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.2710 -2.2060 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.2590 -0.6710 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
2.5420 0.0720 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.5540 -2.9380 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.5420 -4.4240 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.2350 -5.2160 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.0480 -4.4600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.6380 -4.4480 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
-5.2400 -2.9380 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-6.7510 -2.9260 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-7.5300 -4.1850 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-9.0410 -4.1610 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-9.7600 -2.8540 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-9.0170 -1.5830 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-7.5060 -1.5950 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-6.7510 -0.3120 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-7.5060 1.0190 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-9.0170 1.0310 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-9.7480 -0.2760 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-9.7840 -5.4440 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-9.1010 -6.7270 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-7.5900 -6.7620 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-6.7980 -5.4920 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.6260 1.5590 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.3430 2.3140 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.0360 1.5830 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1.2470 2.3140 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.3310 3.8010 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.6380 4.5680 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-3.9450 3.8250 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-3.9330 2.2900 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
2 3 1 0 0 0 0
3 4 1 0 0 0 0
4 5 2 0 0 0 0
4 6 1 0 0 0 0
6 7 1 0 0 0 0
7 8 1 0 0 0 0
8 2 1 0 0 0 0
8 9 1 0 0 0 0
9 10 2 0 0 0 0
10 11 1 0 0 0 0
11 12 1 0 0 0 0
10 13 1 0 0 0 0
13 14 2 0 0 0 0
14 15 1 0 0 0 0
15 16 2 0 0 0 0
16 9 1 0 0 0 0
7 17 2 0 0 0 0
6 18 2 0 0 0 0
18 19 1 0 0 0 0
19 20 2 0 0 0 0
20 21 1 0 0 0 0
21 22 2 0 0 0 0
22 23 1 0 0 0 0
23 24 2 0 0 0 0
24 19 1 0 0 0 0
24 25 1 0 0 0 0
25 26 2 0 0 0 0
26 27 1 0 0 0 0
27 28 2 0 0 0 0
28 23 1 0 0 0 0
21 29 1 0 0 0 0
29 30 2 0 0 0 0
30 31 1 0 0 0 0
31 32 2 0 0 0 0
32 20 1 0 0 0 0
3 33 1 0 0 0 0
33 34 2 0 0 0 0
34 35 1 0 0 0 0
35 36 1 0 0 0 0
34 37 1 0 0 0 0
37 38 2 0 0 0 0
38 39 1 0 0 0 0
39 40 2 0 0 0 0
40 33 1 0 0 0 0
M END
> <IDNUMBER> (F0244-0040)
F0244-0040
> <Chemical_Name> (F0244-0040)
5-[(anthracen-9-yl)methylidene]-1,3-bis(2-methoxyphenyl)-1,3-diazinane-2,4,6-trione
> <CAS> (F0244-0040)
313267-75-3
> <PriceCoeff> (F0244-0040)
1.00
> <MW> (F0244-0040)
528.55
> <Description> (F0244-0040)
solid
> <FSP3> (F0244-0040)
0.06
> <clogP> (F0244-0040)
7.3400
> <TPSA> (F0244-0040)
76.15
> <H-acceptors> (F0244-0040)
5
> <H-donors> (F0244-0040)
0
> <RotBonds> (F0244-0040)
5
> <HAC> (F0244-0040)
40
> <fromRegid> (F0244-0040)
CHEMBL1339581
> <Similarity> (F0244-0040)
0.8960
> <Target_Name> (F0244-0040)
Fructose-bisphosphate aldolase
> <Standard_Type> (F0244-0040)
IC50
> <Standard_Value> (F0244-0040)
2232.0000
> <Standard_Units> (F0244-0040)
nM
> <Target_Organism> (F0244-0040)
Mycobacterium tuberculosis
> <Target_Type> (F0244-0040)
SINGLE PROTEIN
> <Ro5_Compliant> (F0244-0040)
2
$$$$
Python code to fetch structure of molecule as .png output with rdkit.
# Code
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
with Chem.SDMolSupplier('sample.sdf') as molecule:
molec = [x for x in molecule if x is not None]
for structure in molec:
temp = AllChem.Compute2DCoords(structure)
Draw.MolToFile(molec[2],'1.png') # index 2 for third molecule
break
>>>
Output as .png
This sample.zip file contains multiple molecules as a single sample.sdf file.