This guest post was contributed by Moritz Guenther:

When I first encountered the IPython notebook, I thought this was a solution looking for a problem. However, I have since been converted! The tipping point for me was this: I want to version control my papers and I always had multiple directories for analysis code, plotting code, LaTeX files, plot scripts and figures and tables. That's just so unwieldy. Also, I found it cumbersome to email figures to individual collaborators all the time. The Notebook can hold all this information in one place and I can just provide my co-authors with a link to the github repository once and they have access to the latest version all the time. Even if they do not use python, they can still see the all the current figures using nbviewer.ipython.org

Now all papers I work on a are written in an IPython notebook. So, the final step to do is to convert the notebook to the LaTeX file I can submit to a journal. That's what this simple converter code does.

This converter is not intended to replace nbconvert from the IPython project. Instead, it serves one very specific purpose: Turn a notebook into a LaTeX file that I can submit to the journal.

How to use it

As a script


You can use this file form the command line:

> python ipynb2article.py myanalysis.ipynb myanalysis.tex

In this case it's run with my set of design choices (see below).

As a Python module

Import into python and make a <tt>NotebookConverter</tt> object:

from ipynb2article import NotebookConverter
converter = NotebookConverter

Then, customize how each type of cell is converted by changing the converter:

converter.cellconverters['code'] = NotebookConverter.IgnoreConverter()

Finally, call:

converter.convert(infile, outfile, ...)

This method allows you to use only part of a notebook file (ignore to first n cells or ignore everything until a cell has a specific string value, e.g. "The paper starts here"). Also, it allows you to provide a text file that will be pasted before or after the converted notebook (you can put the 'usepackage' and similar stuff in those files so they don't clutter your notebook). However, I do not use this option any longer, because that means I would have multiple input files. If I put all those LaTeX headers into the notebook as well, I only have a single file.

Design

The code is written around these design ideas:

  • Be able to ignore certain parts of the notebook (e.g. introductory comments in the first few cells).
  • Convert headings to section / subsection etc. I generally use "Heading 2" for section, "Heading 3" for subsections etc. In the notebook, just press "Ctrl+m 2" to format a cell as "Heading 2" or select with the mouse from the drop-down menu.
  • Copy text in "markdown" and "raw text" cells. To simplify, I just write real LaTeX code in those cells. All equations will be rendered correctly in the notebook file for me and my co-authors to see. When I want to highlight something I type LaTeX "emph{}" or "textbf{}", not the markdown equivalents. That looks not as nice in the notebook, but makes live so much easier. Also, markdown does not recognize "cite", "ref" and "label". Again, it looks not as nice in markdown, but (1) I only need to know LaTeX and (2) it works flawlessly when converted.
  • No figure conversion. Instead, in the notebook itself I issue: fig.savefig('/path/to/my/article/XXX.eps') because ApJ requires me to submit eps figures as separate files anyway.
  • Just type figure captions into markdown cells.
  • No conversion of code cells. Who wants code in an ApJ paper?
  • Occasionally, I want to have the output of a computation (e.g. a table written with astropy in LaTeX format) in the article. Keep it simple. Output of all code cells that have a certain comment string (I use "# output->LaTeX") is copied verbatim to the LaTeX file.
  • Work with the python standard library only. No external dependencies.


To implement this I wrote a converter for each cell type. LiteralSourceConverter just takes the literal string value (it also adds a line break at the end of the cell) and puts it into the LaTeX file (use for for markdown and raw text cells), MarkedCodeOutputConverter check if a code cell has a specific string in it and if so, it copies the output of this cell, and LatexHeadingConverter looks for the level of the heading and turns that into LaTeX (it also adds as label like "label{sect:title}").

Code

# Convert IPYthon notebook to LaTeX for ApJ or A&A
import json
import re

import sys
import getopt

def isstartmarker(cell, start):
 if 'source' in cell.keys():
  return cell['source'] == [start]
 elif 'input' in cell.keys():
  return cell['input'] == [start]
 else:
  raise ValueError('Type of cell not recognized.')

class IgnoreConverter(object):
 '''Use this converter for cell types that should be ignored'''
 def __call__(self, cell):
  return []

class LiteralSourceConverter(object):
 '''This converter return the literal ``source`` entry of a cell.'''
 def __call__(self, cell):
  text = cell['source']
  text[-1] +='\n'
  return text

class MarkedCodeOutputConverter(object):
 '''Add output of code cells that have a specific string in the code cell'''
 def __init__(self, marker):
  '''Add output of code cells that have a specific string in the code cell

  Parameters
  ----------
  marker : string
   Convert the output of a code cell if and only if one line in the
   code matches ``marker``.
   I often use ``marker='# output->LaTeX'`` to mark cells whose
   output I want.
  '''
  self.marker = marker
 def __call__(self, cell):
  text = []
  if (self.marker in cell['input']) or (self.marker+'\n' in cell['input']):
   for out in cell['outputs']:
    if 'text' in out:
     text.extend(out['text'])
     
  if len(text) > 0:
   text[-1] +='\n'
  return text

class LatexHeadingConverter(object):
 '''Convert headings in notebook to appropriate level in LaTeX'''
 def __init__(self, latexlevels=['chapter','section','subsection', 'subsubsection', 'paragraph', 'subparagraph']):
  '''Convert headings in notebook to appropriate level in LaTeX

  Parameters
  ----------
  latexlevels : list of 6 strings
   Latex equivalents for 'Heading 1', 'Heading 2' etc.
  '''
  self.latexlevels = latexlevels
 def __call__(self, cell):
  # Just to be careful for multi-line headings
  title = ''.join(cell['source']) 
  line1 = '\\{0}{{{1}}}\n'.format(self.latexlevels[cell['level']-1],
          title)
  cleantitle = re.sub(r'\W+', '', title)
  line2 = '\\label{{sect:{0}}}\n'.format(cleantitle.lower())
  return ['\n','\n', line1, line2, '\n']

class NotebookConverter(object):
 cellconverters = {
      'code' : MarkedCodeOutputConverter('# output->LaTeX'),
      'heading': LatexHeadingConverter(),
      'markdown': LiteralSourceConverter(),
      'raw': LiteralSourceConverter()
      }
 def convert(self, infile, outfile, start = 0, file_before=None, file_after=None):
  '''Convert IPython notebook to LaTeX file

  Parameters
  ----------
  infile : string
   filename of IPython notebook
  outfile : string
   filename of Latex file to be written
  start : int or string
   If this is a number, skip that many cells starting from the top;
   if it is a string, skip cells until a cell has *exactly* the 
   content that ``start`` has.
  file_before : string
  file_after: string
   String with filename. These files are copied above and below
   the content from the ipynb file. Use this e.g. for templates
   that contain the LaTeX header info that does not appear in the 
   notebook.
  '''
  with open(infile, 'r') as f:
   print 'Parsing ', infile
   ipynb = json.load(f)

  cells = ipynb['worksheets'][0]['cells']
  if isinstance(start, basestring):
   while not isstartmarker(cells[0], start):
    discard = cells.pop(0)
   discard = cells.pop(0) # pop the marker cell
  else:
   for i in range(start):
    discard = cells.pop(0)
  
  with open(outfile, 'w') as out:
   print 'Writing ', outfile
   if file_before is not None:
    with open(file_before, 'r') as f:
     for line in f:
      out.write(line)
   for cell in cells:
    lines = self.cellconverters[cell['cell_type']](cell)
    for line in lines:
     out.write(line)
   if file_after is not None:
    with open(file_after, 'r') as f:
     for line in f:
      out.write(line)
   
  
if __name__ == '__main__':
 converter = NotebookConverter()
 try:        
  opts, args = getopt.getopt(sys.argv[1:], "h", ["help"])
 except getopt.GetoptError:   
  print converter.convert.__doc__       
  sys.exit(2)

 for opt, arg in opts:   
  if opt in ("-h", "--help"): 
   print converter.convert.__doc__      
   sys.exit()     
 
 print args
 converter.convert(*args)



    6           8