Latest Post

Generating network graphs with Python and pygexf

When you have data collected that consists of some form of connected elements, then visualization may be a fruitful method for exploring that data. If you are familiar with Python, or programming in general, creating network graphs that can be visualized in Gephi is a rather trivial task. The snippet below shows how to traverse an arbitrary number of connected elements and how to generate a graph that be loaded into Gephi:

def generate_graph(node,graph):

    # We start by adding the current node to the graph
    graph.addNode(id=str(node["label"]), label=str(node["id"]))

    # Loop over each child node, or twitter replies
    for reply in node["replies"]:
        # ... and go one level deeper if child nodes exists
        graph = generate_graph(reply, graph)

        # ... then connect the child node that now exists with the current parent node
        graph.addEdge(id=str(reply["id"]), source = str(reply["id"]), target = str(node["id"]))

    # When all child nodes has been appended, or if no more exists,
    return graph

The main element in the code is the line graph = generate_graph(r, graph) as it adds the recursion to the algorithm, meaning that the function calls itself. The call also provides the graph currently being constructed, and the node(s) to be added. In this example all nodes already exists as objects appended as a list of “replies” to the node. The node element is an element consisting of a dict object with nested list as the “replies” element. See below for the most simple example of a connected graph as dict/list elements.

{node_id=1, replies=[{node_id=2, replies=[]}]s}

The graph object needs to be created initially, and the data needs to be gathered. The data can be appended to the function in the cursive loop, but this would however affect both complexity and speed. I would advocate to first collect the data, then work with it. The code below illustrates how to initially setup create the graph object and later how to write the complete graph to file.

def main(argv):
   gexf = Gexf("Name of Graph collection","Info on graph collection")
   graph=gexf.addGraph("directed","static","Information about graph")

   # function to get data
   thread = get_single_thread()

   # generate graph
   graph = generate_graph(thread,graph)

   # we end by writing the graph to file

The following example is a graph generated to visualize reply-threads in Twitter data.

Replies on Twitter

Replies on Twitter

Data for the graph above was captured using Phirehose, the graphs were generated using Python and the pygexf library, and the network was visualized using Gephi. Read more about social web mining in ‘Mining the Social web’.

Erik Borglund

Researcher in information, recordkeeping informatics, and information systems

Martina Söderbom

Coaching, segling mm


From innovation to Revolution

Crisis Intelligence

Crisis intelligence an Crisis Management Innovation

humanitarian | tech

observations at the edge of the network: what works, what doesn't, and why


WordPress.com is the best place for your personal blog or business site.


Get every new post delivered to your Inbox.

Join 230 other followers