GitHub Data

GitHub has made its data available on Google BigQuery and has recently launched a data challenge, which sounded like a good opportunity to experiment with some data visualizations in NodeXL. It’s really easy to extract relational data from this dataset and load it into a NodeXL template! All it takes is a simple SQL query, a click in the Google BigQuery CSV download button and a data import in NodeXL.

One of my queries returned a list of organisations and the number of projects each of them had on GitHub by programming language. Looking at this data from a network perspective, we have a bipartite graph with two types of nodes, organisations and programming languages, and links that indicate that a given organisation has projects in the languages it is connected to. Additional queries gave me the total number of projects per organisation and per language, which I use to map the size of the nodes in the graph visualization. The resulting graph is shown below.

GitHub Programming Languages Network

(Click on the image to view the full network with deep zoom – requires Silverlight.)

The graph shows organisations (in blue) and programming languages (in orange) that have at least 250 projects. You can also find a histogram on the top-left corner of the image that shows the languages by popularity on GitHub: JavaScript is by far the most popular language!

About the data
The data was retrieved from Google BigQuery on May 7th, 2012 and visualised with NodeXL (version 1.0.1.209). The NodeXL data file can be downloaded here: github_lang_org.

Posted in None | Tagged , , | Leave a comment

#pl118 social network map

Lately, there has been a fair amount of discussion about the Portuguese law proposal PL 118, both on Twitter and the Blogosphere. I’ve used NodeXL (version 1.0.1.196) to retrieve connections among Twitter users who recently tweeted the #pl118 hashtag and to visualize the resulting implicit social network. The data was collected on January 26, 2012.

Twitter Network #pl118

(Click on the image to view the full network with deep zoom – requires Silverlight.)

A connection between two users in the above network represents either a retweet (RT) or a mention (@). The nodes are scaled by betweenness centrality and were clustered using the Clauset-Newman-Moore algorithm. The network is displayed using the Group-in-a-Box meta-layout and the Harel-Koren layout within each box.

Top 20 most between users:
@jonasnuts @ruiseabra @albertamf @celso @luis_grave
@jmcest @iphil @wonderm00n @joaomhenrique @fatgiant
@afn1982 @bitaites @agranado @matamouros @ncruz77
@retorta @cteresa @jneves @luisfcorreia @rcarmo

Posted in None | Tagged , | Leave a comment

Euro crisis, EDP, Freemasonry, Football, US Politics: news network 1st week 2012, who stands out?

I decided to experiment adding some context to my news network visualizations, in the form of news snippets. This week’s network includes connections among people co-mentioned in the portuguese media at least 10 times, between Jan 01-07’2012.

The news topics cover the euro crisis, the chinese acquisition of EDP, the freemasonry connections in politics and secret services, national and international football events, US presidential candidates, and much more. The network visualization highlights the key people in each of these topics.

Taking advantage of the deep zoom experience from zoom.it and the clarity offered by the GIB layout, it is possible to read (in portuguese) the overlayed news snippets alongside the clustered network visualization. This improves a lot the interpretation of the network connections! Check it out:

News network 01-07 Jan'2011

(Click on the image to view the full network with deep zoom – requires Silverlight.)

The network map was created with NodeXL (version 1.0.1.196) using data from Verbetes.

Posted in None | Tagged , , , | Leave a comment

Twitter Social Network for the #feliztweetnatal trending hashtag

According to @TwitPortugal #feliztweetnatal was a popular hashtag today. It is possible to identify the key people responsible for the trending hashtag through simple social network analysis.

We used the NodeXL Twitter search data spigot to obtain the social network associated with this hashtag. The data importer retrieves users who mentioned the hashtag and connections among those users, based both on implicit links, derived from retweets and mentions, and on explicit links representing a following relationship.

The social network visualizations are shown below:

In the first visualization the size of the nodes is proportional to the number of followers. The top 5 users with the most followers are @PauloQuerido, @TwitPortugal, @maraajardiim, @andrebenjamim and @pesousa. In the second visualization the size is proportional to the betweenness centrality measure. In this case, the top 5 users are @twitportugal, @danielasofs, @joaohscosta, @PauloQuerido and @arianardoso, among which is the initiator of the hashtag.

In both cases, the blue edges represent retweets and/or mention and the gray edges a following relationship.

For more information about how to create Twitter networks with NodeXL have a look at this video by my colleague Marc Smith and see some examples in the NodeXL Graph Gallery.

Posted in None | Tagged , , , | Leave a comment

Weekly news network: Duarte Lima’s arrest on the spotlight

This week’s news network (18-24 Nov’2011) puts Duarte Lima on the spotlight. The former Portuguese politician has been arrested on suspicion of fraud related to the BPN bank scandal.  See also SAPO Notícias infografics “O mundo visto daqui“.

News network 18-24 Nov'2011

(Click on the image to view the full network with deep zoom – requires Silverlight)

The network map was created with NodeXL (version 1.0.1.193) using data from Verbetes.

Posted in None | Tagged | Leave a comment