Home | Activities | Resources & Tools | Portal | Publications | Events | People | Links
OntoTDT is an unsupervised Topic Detection system. A topic is defined a seminal event or activity, along with all directly related events and activities (for more details, see Topic Detection and Tracking 2004). A topic is expressed as a chronologically ordered list of "stories". A story is "on topic" whenever it discusses events and activities that are directly connected to that topic's seminal event. The goal of a topic detection system is to group together stories that discuss the same event. In the graphic at the bottom, the red circles represent stories that discuss one event, and the green diamonds are stories that discuss another event.
For the purposes of the OntoText Project we developed OntoTDT, an unsupervised topic detection system which is based on incremental clustering in a Latent Semantic Space. The Latent Semantic Space is generated by analyzing the content of the full document collection, and both terms and texts are there represented. Then the incremental clustering is broken down into two phases: detecting when a new event is seen and putting stories that discuss previously seen events into appropriate clusters. Since texts are represented in a Latent Semantic Space, both steps are approached by threshoolding a similarity measure in this space. A new cluster is generated if none of the existing cluster's centroids is similar to the new text classified, otherwise the document is assigned to the cluster with the closest centroid. Among the functionalities of OntoTDT we highlight the following:
Enter the Ontotext Portal
(access is restricted to the partners of the project)
Last modified: Tue Aug 28 2007