Italian Content Annotation Bank (I-CAB)

I-CAB is an annotated corpus consisting of 525 news stories taken from the local newspaper "L'Adige", for a total of around 180,000 words.

It is annotated with semantic information at different levels:

The annotation of I-CAB is realized in conjunction with CELCT. So far we have completed the first two levels of annotation, i.e. temporal expressions and entities.
As we intend I-CAB to become a benchmark for various automatic Information Extraction tasks, we have followed a policy of reusing already available markup languages. In particular, we have adopted the annotation schemes developed for the ACE Entity Detection and Time Expressions Recognition and Normalization tasks. As the ACE-LDC guidelines have originally been developed for English, part of the effort consisted in adapting them to the specific morpho-syntactic features of Italian.

I-CAB is accessible through the I-CAB Web Browser, a dedicated web interface. A version of the Ontotext portal for ICAB is also available.

I-CAB (freely available for research purposes upon acceptance of a license agreement) has been used in the following task at EVALITA: Obtain I-CAB


Last modified: Tue Feb 2 2010