Similarity and Categorisation Inextricably Bound Together: The UNSCRAMBLE Machine Learning Algorithm

Emilios Cambouropoulos
(Faculty of Music, University of Edinburgh)
Alan Smaill
(Department of Artificial Intelligence, University of Edinburgh)

Abstract In this paper, a working formal definition will be given according to which similarity a) is contextually defined, b) may be applied to any property ascribed to an entity (not only to perceptual properties such as visual appearance) and c) has an associated notion of corresponding categories. This definition inextricably binds together similarity and categorisation in such a way that changes in similarity ratings between entities result in category changes, and vice versa. In line with the above descriptions, the Unscramble algorithm will be presented which, given a set of objects and an initial set of properties, generates a range of plausible classifications for a given context. During this dynamically evolving process the initial set of properties is altered/adjusted so that a 'satisfactory' description is generated (taking into account general cognitive principles such as economy and informativeness). There is no need to determine in advance an initial number of classes nor is there a need to reach a strictly well-formed (e.g. non-overlapping) description. At every stage of the process both the extension and the intension of the emerging categories is explicitly defined. One example will be presented that illustrates the capabilities and efficiency of the model. As the proposed algorithm has been developed mainly to accommodate problems of musical similarity and categorisation it is necessary to test it further on data sets from different domains and compare it to other relevant machine learning algorithms.