Structural Keys

Brian T. Luke, Ph.D.


As the name implies, a structural key is a string of values that describes the chemical composition and/or structural motifs that are present in the chosen substructure and each molecule in the database. One form of this key is to use a boolean array, which is usually stored as an array of bits. A given bit is set to 1 (True) if a particular structural feature is present and 0 (False) if it is not. The I-th bit of this array, for example, can be used to represent any structural feature of the molecule. This list can include, but is not limited to

  • At least n occurrences of a particular element, or a particular atom-type (e.g. a tetravalent carbon that is bound to three hydrogens and one nonhydrogen atom).
  • The persence of a particular functional group, such as an amide linkage or a carboxylic acid group.
  • The presence of a given structural element such as a substituted cyclohexane.

If the bit array of the substructure contains a 1 in a position that has a 0 in the bit array of a database molecule, then this required feature must be missing and the next database entry can be examined.

One important point to emphasize in the use of a structural key is that the definition of a particular array element must be chosen initially. This has the disadvantage that this key can become extremely long and is inflexible. Conversely, it is possible to optimize this structural key for the class of compounds present in the database. For example, if none of the database compounds contain a transition metal, they do not need to be considered when developing the structural key.

Return to Clustering Binary Objects