SSP Group Meeting
Tuesday, February 12th, 11am-12pm
Division of Informatics, 80 South Bridge, Room F13

Rough Set-based Feature Selection for Text Categorisation

Alexios Chouchoulas

The volume of electronically stored information increases exponentially as the state of the art progresses. Automated Information Filtering (IF), Information Retrieval (IR) and Text Categorisation (TC) systems in general are therefore acquiring rapidly increasing prominence. However, many such systems sacrifice efficiency to boost effectiveness. Such systems typically have to cope with sets of vectors of many tens of thousands of dimensions. Rough Set (RS) theory can be applied to reducing the dimensionality of data used in IF/IR tasks, by providing a measure of the information content of datasets with respect to a given classification. This can aid IF/IR systems that rely on the acquisition of large numbers of term weights or other measures of relevance.

This talk discusses the applicability of RS theory to the IF/IR/TC application domains, provides a few application case studies as examples of systems incorporating RS-based feature selection, and addresses some of the experimental work that is being carried out at the Approximative and Qualitative Reasoning group (AQR).