Pedro Saleiro, Luís Rei, Arian Pasquali, Carlos Soares, Jorge Teixeira, Fábio Pinto , Mohammad Nozari, Catarina Félix, Pedro Strecht. (2013) “POPSTAR at RepLab 2013: Name ambiguity resolution on Twitter” CLEF 2013 Evaluation Labs and Workshop – Online Working Notes. 23-26 September, Valencia – Spain.
Abstract: Filtering tweets relevant to a given entity is an important task for online reputation management systems. This contributes to a reliable analysis of opinions and trends regarding a given entity. In this paper we describe our participation at the Filtering Task of RepLab 2013. The goal of the competition is to classify a tweet as relevant or not relevant to a given entity. To address this task we studied a large set of features that can be generated to describe the relationship between an entity and a tweet. We explored different learning algorithms as well as, different types of features: text, keyword similarity scores between enti- ties metadata and tweets, Freebase entity graph and Wikipedia. The test set of the competition comprises more than 90000 tweets of 61 entities of four distinct categories: automotive, banking, universities and music. Results show that our approach is able to achieve a Reliability of 0.72 and a Sensitivity of 0.45 on the test set, corresponding to an F-measure of 0.48 and an Accuracy of 0.908.