Extracting information from tables that are embedded in unstructured
data sources requires following through a large set of division and
aggregation decisions at character, cell, column, line and finally
table level. These decisions should lead to a table that is:
- graphically coherent, i.e. the result is a grid-like representation of the sort that is intrinsic to the very definition of a table;
- and ontologically coherent, i.e. the resulting table presents information that complies with the characteristics that its context prescribes for it.
If these two aspects of coherence are not verified, then a mistake was made during the decision process and it becomes relevant to locate it and try arrangements that lead to a more coherent overall result. In this talk we will be presenting the table understanding problem and portraying different strategies for representing uncertainty through an intricate set of interdependent decisions, in a way that is theoretically accurate but also computationally scalable.