Making Graphical Models with PyDot

In [1]:
from IPython.display import Image

Graphical Models with PyDot

EDIT 6/26/13 Since I've written this post, I've heard about a new project, daft, for making probabilistic graphical models. I hope to give it a shot soon.

In [2]:
Image('Smoothed_LDA.png')
Out[2]:

Source: Blei, Ng, and Jordan. (2003) "Latent Dirichlet Allocation"

Graphical models, are a handy way to represent probabilistic models. They are used much more in Bayesian statistics, particularly in hierarchical models when it's convenient to have a quick representation of the various dependencies. In graphical models, the nodes represent random variables of which the shaded ones are observed. The edges denote possible dependencies. The plates denote sequencies. Last week I wanted to create a Directed Graph Model similar to the one below. This post will recreate this graph using Graphviz to render a DOT file created in PyDOT.

This graph is the total likelihood for a Latent Dirichlet Allocation model. Mathematically, this is

$$\begin{equation}\begin{split}p(w, \theta, \beta, z | \alpha, \eta) \\ &= \prod_{k=1}^{K}p\left(\beta_{k}|\eta\right)\prod_{m=1}^{M}p\left(\theta_{m}|\alpha\right)\left[\prod_{n=1}^{N}p\left(z_{m,n}|\theta_{m}\right)p\left(w_{m,n}|z_{m,n},\beta_{1:K}\right)\right]\end{split}\end{equation}$$

I already have Graphviz installed for use in Sphinx. It can be installed using apt-get. To use the HTML subscript tag, you need at least version 2.28, so you may have to install from elsewhere.

sudo apt-get install graphviz

PyDot can be installed using setuptools.

sudo easy_install pydot

I started by reading the PyMC source since they use PyDot to generate their graphical models. I spent most of my time in the Graphviz Dot configuration documentation. Here's what I came up with.

In [3]:
import pydot

dot_object = pydot.Dot(graph_name="main_graph",rankdir="LR", labelloc='b', 
                       labeljust='r', ranksep=1)
dot_object.set_node_defaults(shape='circle', fixedsize='true',
                             height=.85, width=.85, fontsize=24)

This instantiates the main Graph. It will be created left-to-right (alternatively, you could use 'TB' for top-to-bottom to look like the above example). This Graph will have SubGraphs, or Clusters, for the plates which will inherit labelloc and labeljust. Then I set the defaults for the Nodes.

Now make the nodes for the main graph and attach them.

In [4]:
node_eta = pydot.Node(name='eta', texlbl=r'\eta', label='<&#951;>')
dot_object.add_node(node_eta)

node_alpha = pydot.Node(name='alpha', texlbl=r'\alpha', label="<&#945;>")
dot_object.add_node(node_alpha)

You'll notice that the nodes take three parameters -- a name for internal use and texlbl and label containing $\TeX$ and HTML denoted by <>, respectively. The texlbl will allow us to use dot2tex to make an output for $LaTeX$, and the HTML will allow us to create images containing Greek letters from their HTML codes. Labels allow a subset of HTML tags.

The plates are called Clusters in DOT. They are a type of SubGraph that provide the borders for the plate notation and are attached as such.

In [5]:
# K plate
plate_k = pydot.Cluster(graph_name='plate_k', label='K', fontsize=24)
node_beta = pydot.Node(name='beta', texlbl=r'\beta', label='<&#946;<SUB>k</SUB>>')
plate_k.add_node(node_beta)

# add plate k to graph
dot_object.add_subgraph(plate_k)

# M plate
plate_M = pydot.Cluster(graph_name='plate_M', label='M', fontsize=24)
node_theta = pydot.Node(name='theta', texlbl=r'\theta',
                        label='<&#952;<SUB>m</SUB>>')
plate_M.add_node(node_theta)

# N plate
plate_N = pydot.Cluster(graph_name='plate_N', label='N', fontsize=24)
node_z = pydot.Node(name='z', texlbl='z_{m,n}', label='<z<SUB>m,n</SUB>>')
plate_N.add_node(node_z)
node_w = pydot.Node(name='w', texlbl='w_{m,n}', label='<w<SUB>m,n</SUB>>', 
                    style='filled', fillcolor='lightgray')
plate_N.add_node(node_w)

plate_M.add_subgraph(plate_N)
dot_object.add_subgraph(plate_M)

This creates the K, M, and N plates and attaches them to the appropriate objects. The last step is to instantiate and add the edges.

In [6]:
# Add the edges
dot_object.add_edge(pydot.Edge(node_alpha, node_theta))
dot_object.add_edge(pydot.Edge(node_theta, node_z))
dot_object.add_edge(pydot.Edge(node_z, node_w))
dot_object.add_edge(pydot.Edge(node_w, node_beta, dir='back'))
dot_object.add_edge(pydot.Edge(node_beta, node_eta, dir='back'))

You'll notice that we added node_w and node_beta in reverse left-to-right order and specified the attribute dir to be 'back'. This is to maintain the left-to-right orientation and have the edge dependencies make sense.

Optionally save the filter in raw form using the dot renderer do

In [7]:
dot_object.write('graph.dotfile', format='raw', prog='dot')
Out[7]:
True

You also have the option to use other renderers seach as neato, fdp, and sfdp. Unfortunately, each of the renderers has its own strengths and weaknesses with dot the most fully featured to my eyes. You can look at this file to get an idea of the DOT language syntax.

Now write the graph to a png. You will need a version of graphviz released after after 14, October 2011 for this to work, because of the <SUB> HTML tags.

In [8]:
dot_object.write_png('lda_graph.png', prog='dot')
Out[8]:
True
In [9]:
from IPython.display import Image
Image('lda_graph.png')
Out[9]:

If you want to render the image to pdf, you can install dot2tex.

sudo apt-get install dot2tex

To go from DOT to $\TeX$ do

In [10]:
import dot2tex as d2t
texcode = d2t.dot2tex(dot_object.to_string(), format='tikz', crop=True)

And then write it to a file.

In [11]:
fout = open("img.tex", 'w')
fout.write(texcode)
fout.close()

Then make a pdf and optionally convert this to a png

pdflatex img.tex && pdftops -eps img.pdf && convert -density 300 img.eps img.png

I wasn't particularly happy with the layout and wanted a little more control over the padding and label placement not to mention some easier $TeX$ integration. I suppose you could edit the img.tex file for a bit more control, but I haven't seen any other really good open source solutions for more fine-grained control. I'd be happy to hear about any in the comments though.

One that I came across has you converting the DOT object to xml with dottoxml and using yEd. I tried this but it isn't the solution I'm looking for particularly because yEd doesn't have built-in support for $LaTeX$ either.

I've thought about and implemented a couple of solutions using matplotlib. I'll put up my solution in a future post. But I'd be really interested to hear from some real matplotlib pros. Particularly, if anyone has any library code for padding between different nested objects in a plot like this.

Comments