SMILES
The simplified molecular input line entry specification or SMILES is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings. SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensional models of the molecules.
The term SMILES refers to a line notation for encoding molecular structures and specific instances should strictly be called SMILES strings. However, the term SMILES is also commonly used to refer to both a single SMILES string and a number of SMILES strings; the exact meaning is usually apparent from the context. The terms Canonical and Isomeric can lead to some confusion when applied to SMILES. The terms describe different attributes of SMILES strings and are not mutually exclusive.
Typically, a number of equally valid SMILES can be written for a molecule. For example, CCO, OCC and C(O)C all specify the structure of ethanol. Algorithms have been developed to ensure the same SMILES is generated for a molecule regardless of the order of atoms in the structure. This SMILES is unique for each structure, although dependent on the canonicalisation algorithm used to generate it, and is termed the Canonical SMILES. These algorithms first convert the SMILES to an internal representation of the molecular structure and do not simply manipulate strings as is sometimes thought. Various algorithms for generating Canonical SMILES have been developed, including those by Daylight Chemical Information Systems, OpenEye Scientific Software, MEDIT and Chemical Computing Group. A common application of Canonical SMILES is indexing and ensuring uniqueness of molecules in a database.
Typically, a number of equally valid SMILES can be written for a molecule. For example, CCO, OCC and C(O)C all specify the structure of ethanol. Algorithms have been developed to ensure the same SMILES is generated for a molecule regardless of the order of atoms in the structure. This SMILES is unique for each structure, although dependent on the canonicalisation algorithm used to generate it, and is termed the Canonical SMILES. These algorithms first convert the SMILES to an internal representation of the molecular structure and do not simply manipulate strings as is sometimes thought. Various algorithms for generating Canonical SMILES have been developed, including those by Daylight Chemical Information Systems, OpenEye Scientific Software, MEDIT and Chemical Computing Group. A common application of Canonical SMILES is indexing and ensuring uniqueness of molecules in a database.
EXAMPLES :
TERMS AND MEANINGS:
TERM | MEANINGS |
Rings | To write a cyclic or ring structure, you "break" one of the bonds and write the structure as a line having digits following the atoms in the broken bond. Thus the SMILES for cyclohexane is C1CCCCC1. If a given atom is part of more than one ring structure, and you have to break more than one bond, you then use a different digit for each broken bond, in order to convey how to re-join the atoms. By convention, aromatic ring vertices are written in lowercase. Thus the SMILES for benzene is c1ccccc1 and that for pyridine is n1ccccc1. |
Charges and positions of atoms | Charge signs (+ and -) and digits giving the multiple of a charge or the position of an atom are the adjectives (and sometimes the adverbs) of SMILES grammar. An ionic valence is a classic application. For example, [Fe+2] is the ferrous or iron (II) ion. Note that SMILES does not require, nor use, superscripts or subscripts. One does not multiply atoms themselves (except for atoms of hydrogen) by using numbers. Instead, one repeats the atomic symbol as many times as the atom appears. |
Bonds | Bonds are the verbs of the SMILES grammar.To simplify things even further, one may omit the - and : symbols for atoms that are adjacent to one another and have single or aromatic bonds joining them. This is the reason for representing an aromatically bound atom in lowercase instead of in UPPERCASE. Thus the SMILES for diatomic oxygen is O=O; that for carbon dioxide is O=C=O; for diatomic nitrogen, N#N; for hydrogen cyanide, C#N; for acetylene or ethyne, C#C; for hydrazine, N=N. |
Branches | Branches are the subordinating conjunctions of the SMILES grammar. A structure that branches from the main line is enclosed in parentheses. Nesting and stacking of branches is permitted. An atom other than carbon in a linear structure would also receive a branch. Thus the SMILES for chloromethane (formerly called "methyl chloride") would be C(Cl), and that for tetrachloromethane ("carbon tetrachloride") would be C(Cl)(Cl)(Cl)(Cl). |
FOR MORE INFO CLICK HERE.
TAHNIAH, HASIL KERJA YANG SANGAT CANTIK DAN BERTAULIAH
ReplyDelete