When you have data collected that consists of some form of connected elements, then visualization may be a fruitful method for exploring that data. If you are familiar with Python, or programming in general, creating network graphs that can be visualized in Gephi is a rather trivial task. The snippet below shows how to traverse an arbitrary number of connected elements and how to generate a graph that be loaded into Gephi:

def generate_graph(node,graph):

    # We start by adding the current node to the graph
    graph.addNode(id=str(node["label"]), label=str(node["id"]))

    # Loop over each child node, or twitter replies
    for reply in node["replies"]:
        # ... and go one level deeper if child nodes exists
        graph = generate_graph(reply, graph)

        # ... then connect the child node that now exists with the current parent node
        graph.addEdge(id=str(reply["id"]), source = str(reply["id"]), target = str(node["id"]))

    # When all child nodes has been appended, or if no more exists,
    return graph

The main element in the code is the line graph = generate_graph(r, graph) as it adds the recursion to the algorithm, meaning that the function calls itself. The call also provides the graph currently being constructed, and the node(s) to be added. In this example all nodes already exists as objects appended as a list of “replies” to the node. The node element is an element consisting of a dict object with nested list as the “replies” element. See below for the most simple example of a connected graph as dict/list elements.

{node_id=1, replies=[{node_id=2, replies=[]}]s}

The graph object needs to be created initially, and the data needs to be gathered. The data can be appended to the function in the cursive loop, but this would however affect both complexity and speed. I would advocate to first collect the data, then work with it. The code below illustrates how to initially setup create the graph object and later how to write the complete graph to file.

def main(argv):
   gexf = Gexf("Name of Graph collection","Info on graph collection")
   graph=gexf.addGraph("directed","static","Information about graph")

   # function to get data
   thread = get_single_thread()

   # generate graph
   graph = generate_graph(thread,graph)

   # we end by writing the graph to file

The following example is a graph generated to visualize reply-threads in Twitter data.

Replies on Twitter
Replies on Twitter

Data for the graph above was captured using Phirehose, the graphs were generated using Python and the pygexf library, and the network was visualized using Gephi. Read more about social web mining in ‘Mining the Social web’.