Extracting information from tables that are embedded in unstructured
data sources requires following through a large set of division and
aggregation decisions at character, cell, column, line and finally
table level. These decisions should lead to a table that is:
- graphically coherent, i.e. the result is a grid-like representation
of the sort that is intrinsic to the very definition of a table;
- and ontologically coherent, i.e. the resulting table presents
information that complies with the characteristics that its context
prescribes for it.
If these two aspects of coherence are not
verified, then a mistake was made during the decision process and it
becomes relevant to locate it and try arrangements that lead to a more
coherent overall result. In this talk we will be presenting the table
understanding problem and portraying different strategies for
representing uncertainty through an intricate set of interdependent
decisions, in a way that is theoretically accurate but also
computationally scalable.