EntityPro
EntityPro is a system for the recognition of Italian Named Entities based on Support Vector Machines.
EntityPro has been built using YamCha, an open source text chunker that can be easily adapted to other NLP tasks. YamCha allows for handling both static and dynamic features, and for defining a number of parameters such as window-size and parsing-direction (forward/backward).
For each running word, EntityPro extracts a rich set of (static) linguistic features
in a one-word window (i.e. for the current, previous and following word):
- the word itself, both unchanged and lower-case
- its Part of Speech, as produced by TagPro
- prefixes and suffixes (1, 2, 3, or 4 characters at the start/end of the word)
- orthographic information (e.g. capitalization and hyphenation)
- collocation bigrams (36,000 bigrams from Italian newspapers ranked by Mutual Information value)
- gazetteers of proper nouns
As to dynamic features, which are decided dynamically during tagging, we used the tags of the three
words preceding the current word.
Fig.: EntityPro's architecture
EntityPro participated in the Named Entity Recognition (NER) Task at EVALITA 2007,
which consists of recognizing four types of Named Entities: Geo-Political, Location, Organization and Person Entities.
With an overall F1 measure of 82.14 (evaluation based on exact match), it obtained the best score.
EntityPro is part of TextPro, a suite of modular NLP tools
developed at FBK-irst.
Try the TextPro tools online
Publications
Maintainer: bentivofbk.eu
Last modified: Tue Aug 28 2007