The University of Edinburgh -
Division of Informatics
Forrest Hill & 80 South Bridge

MSc Thesis #9820

Title:An Evaluation of the Merge-based Approach to Stochastic Context-Free Grammar Learning
Date: 1998
Abstract:Stolcke shows that the model merging approach to grammar learning achieves satisfactory result when dealing with simplified natural language data. the major aim of this project is to evaluate the suitability of this approach, realized by the Bayesian Object-Oriented Grammer Induction and Estimation system, against a real English corpus, the Air Travel Information System-3 transcript.Although the grammar induction system we used is not efficient enough to carry out the experiments in the optimal way, the results of our experiments still show that when real natural language is concerned, the merge-based approach suffers seriously from sparse data problem.This project also develops a procedure for corpus manipulation, and several criteria for assessing the grammars induced by a grammar learning system. These criteria are important as they can be used for other projects of evaluating grammar induction systems.

