SSP Group Meeting
Wednesday, March 1st, 11am-12pm
Division of Informatics, 80 South Bridge, Room E17a


 

Pattern Description of E. coli Promoters in Grammars: A Case Study in the Computational Linguistics of DNA

Siu-Wai Leung

DNA encodes genetic information in a sequence of nucleotides. Since the elucidation of DNA structure in early '50s, it has been thought that DNA is a kind of language although very few DNA sequences were available for linguistics study until recent years. A huge number of DNA sequences have been determined as a result of the Human Genome Project and the development of automated DNA sequencing methods. It is foreseeable that DNA (language) analysis, no longer the automation of DNA sequencing, will become a major rate limiting step towards understanding the genes. Formal DNA linguistics research were mainly done in using Definite Clause Grammars (DCG) and Prolog to represent the secondary structure (shape) of DNA and the models of genetic networks in the cells. We would like to have grammatical representations of the DNA sequence patterns related to gene expression. We used E. coli promoters, one kind of the most studied DNA sequences in gene expression, as a test model to see if any special requirements for computational linguistics to represent such DNA sequences.