GitHub has made its data available on Google BigQuery and has recently launched a data challenge, which sounded like a good opportunity to experiment with some data visualizations in NodeXL. It’s really easy to extract relational data from this dataset and load it into a NodeXL template! All it takes is a simple SQL query, a click in the Google BigQuery CSV download button and a data import in NodeXL.
One of my queries returned a list of organisations and the number of projects each of them had on GitHub by programming language. Looking at this data from a network perspective, we have a bipartite graph with two types of nodes, organisations and programming languages, and links that indicate that a given organisation has projects in the languages it is connected to. Additional queries gave me the total number of projects per organisation and per language, which I use to map the size of the nodes in the graph visualization. The resulting graph is shown below.
(Click on the image to view the full network with deep zoom – requires Silverlight.)