Matko Bošnjak, Eduardo Oliveira, José Martins, Eduarda Mendes Rodrigues, Luís Sarmento “TwitterEcho – A Distributed Focused Crawler to Support Open Research with Twitter Data”, World Wide Web Conference 2012, April 16–20, 2012, Lyon, France
Modern social network analysis relies on vast quantities of data to infer new knowledge about human relations and communication. In this paper we describe TwitterEcho, an open source Twitter crawler for supporting this kind of re- search, which is characterized by a modular distributed ar- chitecture. Our crawler enables researchers to continuously collect data from particular user communities, while respect- ing Twitter’s imposed limits. We present the core modules of the crawling server, some of which were specifically de- signed to focus the crawl on the Portuguese Twittosphere. Additional modules can be easily implemented, thus chang- ing the focus to a different community. Our evaluation of the system shows high crawling performance and coverage.