Italian Content Annotation Bank (I-CAB)
I-CAB is an annotated corpus consisting of 525 news stories taken from the local newspaper
"L'Adige", for a total
of around 180,000 words.
It is annotated with semantic information at different levels:
The annotation of I-CAB is realized in conjunction with
CELCT. So far we have completed the
first two levels of annotation, i.e. temporal expressions and entities.
- temporal expressions
- entities (i.e. persons, organizations, locations, and geo-political entities)
- relations between entities (e.g. the
affiliation relation connecting a person to an organization)
As we intend I-CAB to become a
benchmark for various automatic Information Extraction tasks, we have
followed a policy of reusing already available markup languages. In
particular, we have adopted the annotation schemes developed for the ACE Entity
Detection and Time Expressions Recognition and Normalization tasks.
As the ACE-LDC
guidelines have originally been developed for English, part of the
effort consisted in adapting them to the specific morpho-syntactic
features of Italian.
I-CAB is accessible through the I-CAB
Web Browser, a dedicated web interface. A version of the
Ontotext portal for
ICAB is also available.
I-CAB (freely available for research purposes upon acceptance of a license agreement)
has been used in the following task at EVALITA:
Last modified: Tue Feb 2 2010