G. Laboreiro, L. Sarmento and E. Oliveira
The 4th Track on Text Mining and Applications (TeMA 2011) in the 15th Portuguese Conference of Artificial Intelligence (EPIA), October 2011, Lisbon, Portugal (to be published)
In this paper we study the problem of identifying systems that automatically inject non-personal messages in micro-blogging mes- sage streams, thus potentially biasing results of certain information ex- traction procedures, such as opinion-mining and trend analysis. We also study several classes of features, namely features based on the time of posting, the client used to post, the presence of links, the user interaction and the writing style. This last class of features, that we introduce here for the first time, is proved to be a top performer, achieving accuracy near the 90%, on par with the best features previously used for this task.