Category Archives: University

Imitating PBRT-style literate programming in LaTeX

Today, I want to release another bit of code from my master's thesis. However, this time it won't be C++ code, instead I'm going to release some LaTeX code which I used to display source code fragments with.

Specifically the results look like this:


You can download the accompanying example PDF here

This mirrors the style of PBRT, which uses literate programming to develop and explain an advanced raytracer. Only it doesn't require you to write source code that adheres to literal programming principles. I used it to take parts of the source code of my master's thesis and explain them.

The following LaTeX code is used to create the fragment above:

You can either inline code fragment using a special listing environment, or you can reference code fragments in an external file.

Cross-referencing is allowed as well and forward page references are created just like in PBRT. However, I have not figured out how to create backward references.

You can find the LaTeX code on GitHub. I have not wrapped it into a LaTeX package because I don't consider it mature enough, but maybe someone finds it useful, and it eventually ends up in a package (that would be neat :)).

To close, I want to say that the book More Math into LaTeX1 has been very useful while creating these LaTeX macros. It is the first book that covers creating custom macros and commands understandable and in-depth. It feels good to finally understand a bit more about how you can customize LaTeX to fit your needs :)


Al et WML

First, I have created some new pages regarding old university projects. Among them is a condensed page about light propagation volumes, which also made me update the project files to Visual Studio 2012, a page about my bachelor thesis in mathematics (Discrete Elastic Rods) and a page about my master thesis in computer science (Assisted Object Placement). I have not written about the latter two subjects before. Maybe I'll talk some more about them later and write a full post-mortem on them.

This is the first of a number of posts that will be related to my master thesis, or rather code drops from its code base. I have written about 60k LoC in the 6 months of my master thesis and there are a few bits that might be useful in the future.

The first one that I want to talk about is a very simple file format I came up with. Devising new text file formats is not something that I have been very keen about lately. Especially not as some many already exist. However, I have found none which has really fit my requirements:

  • minimal clutter (preferably indentation-based),
  • support for raw text inclusion, and
  • good C++ support.

JSON has too much clutter and doesn't support raw text. YAML, on the other hand, sounds like the perfect choice, even though it's not that easy to find a good library for it. However, when it comes to raw text, you run into the issue that tab characters are never allowed as indentation. Moreover, I was not very happy with the API choices and some bugs in the libraries that I tried to use.

So I decided to develop a very simple text-based data storage format:

WML - Whitespace Markup Language

I'll just start with an example, ie my readme. You can find the full readme here.

As you can see, it is a whitespace-based format inspired by Python. Like in Python, indentation is used to convey structure. A WML file represents a map structure: every node has a name and possibly multiple children.
A node definition in a WML file either contains the node name first and then multiple children names (which won't have any children themselves), the node name followed by one colon to signify that a nested definition follows (similar to JSON), or the node name followed by two colons to signify a raw text block. Two different kinds of strings are supported: single-quoted raw strings that do not interpret escape sequences, and double-quoted C strings that support escape sequences.

All in all, the grammar is very simple:

I've used this format for custom shaders as well as for my settings files and the declaration files of my test scenes:

I've uploaded the current code for WML to GitHub, and you can find the code here. The API supports an overloaded index operator to access children of a node and contains both a parser and an emitter for WML.


First, the current API isn't brilliant. It would be nice to separate the data model from the parser and emitter by using templates and type traits to improve abstraction. I think I might go and investigate different API types in the future and see which one works best for some simple cases.

Second, it would be possible to reduce clutter even more and remove the need for single colons to denote nested maps.
A even simpler format could look like this:

This would yield the following JSON-equivalent:

This still separates raw text from normal data. A node that contains raw text can never contain other children this way. However, I cannot think of a good way to accomplish that without introducing a special character to end a raw text block.

That's it for now :)

Presentation (seminar) about convex analysis

Last year (yes, I'm really slow with writing things up "lately") I did a presentation about convex analysis as part of a seminar about inverse problems in imaging.

I want to write about it today.

The seminar was based on the book 'Handbook of Mathematical Methods in Imaging (Springer Reference)'. It's very expensive and, in my opinion, pretty useless to learn about any details. At least the chapter about duality and convex programming was very dense and you could not learn anything from it without consulting other book.

The presentation was supposed to give an overview over duality and convex analysis and serve as introduction. Summarzing 42 very concise pages in 45 minutes is impossible, so I had to choose a few topics that give an overview over the important concepts and go into detail there.

The book itself does not really contain many proofs which makes it hard to follow. As a mathematician I like proofs. They help clear things up usually and they can give valuable insight into the methods of a theory when they're good.

But proofs suck when you see them on powerpoint slides. They also suck when you're the presenter.

So I gave a blackboard presentation---using a few slides only to show some pictures and sketches which I could not draw well enough on a blackboard.

To prepare the presentation I used OneNote and my Bamboo tablet. I scribbled all down over a weekend in one OneNote notebook and later extracted some sketches into another notebook that I used instead of a PowerPoint presentation. I exported the original notes as PDF and used this as handout after the presentation.

The presentation was alright. I had fun writing everything down on the blackboard and developing the subset of the theory I wanted to present, and the audience was content with it, too, because it was just at the right speed and like a lecture everybody was used, too.

In retrospect I can say that using OneNote and a tablet to create a blackboard presentation was the best decision. I don't want to think of the time I would have lost if I had created everything with LaTeX or PowerPoint.

Long story short: here are my notes :)


Molecular Dynamics and CUDA---my interdisciplinary project

You have to do a interdisciplinary project for your Master's degree at the Technische Universität München. I decided to do it at the Chair for Scientific Computing.

My topic was the Efficient Implementation of Multi-Center Potential Models in Molecular Dynamics with CUDA. For this, I have parallelized the molecule interaction calculations in the molecular dynamics simulator MarDyn with CUDA, optimized the code, and added support for more advanced potential models on the GPU.

What is molecular dynamics?

Molecular dynamics deals with the simulation of atoms and molecules. It estimates the interaction between a high number of molecules and atoms using force fields which are modeled using quantum mechanics equations.

Such simulations are important for many different areas in biology, chemistry and material sciences. For example you can examine the folding of proteins or the nanoscopic behavior of materials. The simulations are sped up using sophisticated approximations and algorithms.

Molecules only have strong force interactions with molecules that are nearby. One of the first approximations is to take into account these strong interactions only and ignore weaker long-distance ones. This space locality of the calculations leads to the Linked Cell-algorithm. It divides the simulation space into a grid of cells and only interactions between molecules inside each cell and with nearby cells are calculated.

Molecules are composed of atoms which interact using different physical principles. This is approximated by having different sites in a molecule that use different potential models. The most common and simplest potential model is the Lennard-Jones potential.

What is MarDyn?

MarDyn is a molecular dynamics simulator that has been developed at the Technische Universität München. See Martin Buchholz's thesis for more information.

Its code is rather ugly and it features some crazy design decisions. I have been told that the people who were initially developing it where learning how to code in C++ at the same time---and it shows :(

Previous work

When I started, there was a 'working' implementation of Lennard-Jones potentials for molecules with only one site on the GPU. It used OpenCL and was the result from somebody else's IDP. However, it was not very useful: the code was crammed into one function and one letter variable names were used through-out.

The main aims of the project

The idea was to port the previous OpenCL implementation to CUDA and continue from there to add support for multiple sites with different potential models.

Chronology of my work

Porting the code to CUDA was a straight-forward API change. However, I found the original code to be impossible to read, maintain, and optimize. The logic behind the code was not clear or well explained.

Consequently my supervisor and I decided that I would rewrite everything and optimize it from the beginning. This took longer than expected and while the resulting parallelism and the logic behind it were clear in the code, code complexity was an issue. The code consisted of three helper functions and two kernel functions in two files and when I went and integrated support for multiple sites and potential models, it became clear that the current design was not clear enough to endure further feature additions and lacked the flexibility for quick changes.

Treating the old implementation as a prototype and knowing about many possible traps and fallacies, I set out and rewrote everything again. This time the focus was on modularity and separation of concerns instead of performance. Code architecture was the most important thing this time around. I scaffolded the new version around the old code, which was working correctly and already optimized. I embedded it into the new design step by step. Afterwards I optimized the code.

What I've done


  • worked with CUDA 1.x and 2.x,
  • used both the driver and the runtime API,
  • implemented a template-based, very modular code design,
  • tried lots of optimizations, and
  • learnt how to read PTX to find work-arounds for compiler bugs.

Sadly many optimizations didn't improve the performance noticeably. I think the monolithic, all-in-one approach I've used is the main issue.

My code uses the driver API, because it would be easy to dump all the CUDA calls and replay them later for debugging purposes.

If I could iterate over the code one more time, I'd try to use many small kernels instead of only two rather large ones. Currently the kernels sequentially calculate all potential models for the molecule pair that are needed (for each site). This makes it difficult to measure performance improvements and isolate bottlenecks. And I suspect it's slower than it has to be.

I'd also create a sandbox application to test and develop the kernels instead of using MarDyn as starting point, because it is unnecessarily huge and it is not really needed to verify that the CUDA code works (or not).

Final paper & code

You can find the final paper that contains a description of my code and the performance results here, and my code here (I cannot include MarDyn's code because its source is closed, but at least I can share my code).


I just want to share two more links:

  • HOOMD-blue is a molecular dynamics framework, which uses CUDA, too, and the code design looks what I probably should have done :)
  • VMD is a molecular visualization application, which looks promising, too


A Long Journey: Acceler8 and TBB

A month has passed with Intel's Acceler8 competition and it has finally come to an end. It's been a long way from implementing the first sequential algorithm to having a fully-fledged parallel version.

I have never worked with Intel's Threading Building Block library before and it was a nice opportunity to examine it, since it offered a better abstraction than OpenMP or pthreads.

The documentation is very good and you quickly learn how to work with the library. One of the caveats is that I didn't have to use a low-level synchronization construct once in the development and everything worked fine without any race conditions or similar . The parallel_* functions (eg parallel_for, parallel_reduce, and parallel_scan) together with icc's C++0x support (lambda functions) allowed for very concise code and little programming overhead.

The implementation builds on Kadane's algorithm for the two dimensional case using prefix sums. One implementation that gets across the basic idea can be found here. Mine is similar and I simply parallelized as much as possible.

As you can see the outer two loops iterate over a two-dimensional range that is pretty much an upper right triangle of the whole possible domain. For this I've implemented a custom range that allows for better load balancing. A range in TBB defines an iteration range of any kind and supports a split operation that is used internally by the task scheduler to distribute the range dynamically on multiple threads as it sees fit.

Last but not least I came up with a way to parallelize the 1D part of Kadane's algorithm that is being used by splitting the column range into linear subranges and merging the subsolutions into one solution, ie a classical divide and conquer approach.

Because it's the most abstract yet interesting part of our implementation, I'm going to go into more detail here. :-)

How can you find the maximum subarray of a 1D array, if you know the maximum subarray of the two "halves" (they don't have to be split evenly)?
Well, you don't, you need more information.

We calculate the following information for each chunk:

  • maximum subarray that starts at the beginning of the chunk
  • maximum subarray that ends at the end of the chunk
  • total sum
  • maximum subarray

It's easy to figure out how to merge these values for two neighboring chunks into the values of the merged chunk.
The merged maximum subarray that starts at the beginning of the merged chunk is either said value for the left chunk or the total sum of the left chunk + that value of the right chunk.
You can figure out how it works for the maximum subarray that ends at the end of the merged chunk :-)
The maximum subarray is just the biggest of all merged values or the left maximum subarray or right one.

Using this idea you can use a simple parallel_reduce to parallelize Kadane's algorithm.

Of course, there is some overhead but for the right problem sizes this will be faster than the sequential algorithm as always.

Two more take-aways:

  • Always try to use language features like templates or lambda expressions to reduce duplicate code or make the code more concise.
  • Write unit tests. I have used googletest which is a very small but very capable library, and it has spared me a lot of debugging trouble.

Cheers :-)

Reading Nonfiction Books Quickly

I like to read books and I also like to read nonfiction books and in particular nonfiction books that I do not agree with.

Speed Reading

For this I've decided to look into speed reading. It is an umbrella term for methods that increase your reading speed while keeping a satisfactory comprehension level. A usual reading speed is 200-300 wpm (words per minute). Speed reading advertises increasing your speed to > 600 wpm.

I've searched around for a bit and read lots of blog posts and articles and here are my favorites: is a good introduction to the main techniques of speed-reading. It also includes some exercises and is a short read. describes the author's personal experience with fourhourworkweek's article.

There is also speed-reading software available. The idea is your eye movement is the main hindrance to reading really fast, so this software displays the text for you to read in groups of several words in the center of the screen. This way you can read everything at once without having to move your eyes at all.
As crazy as it sounds, it works to a great degree. is  an online reader that works this way. Give it a try! You can make it display the introduction text at 600 wpm or 800 wpm and see how you'll understand - you'll be surprised!

'The Speed Reading Workbook' is what I've been using to practise it. It's okay written and has plenty of exercises and includes timing and evaluation sheets, which are really useful for measuring your progress.

A good summary of many concepts can be found in the PowerPoint Presentation 'Double your Reading Speed in 25 Minutes'.

About Reading in General is a good read and has many good suggestions about how to increase your reading productivity.

Own Expierence

I've been exercising with the workbook now and then for the past weeks and I've become faster but this is still an ongoing process for me, so I'm going to blog about it later (or rather update this post).

For what it's worth: my reading speed was 300 wpm and now it is around 550 wpm :-)

Presentation Advice

So.. the summer term has passed and I'm wrapping things up and one of the final things I want to get out of my notebooks into this blog are some notes I took for feedback during my Hauptseminar in my maths studies.

Everybody had to do a 60 minutes presentation and the audience was asked to take notes about the presentation style to give feedback later (in addition to paying attention to the presentations).

There were 4 blackboards in the room and a beamer.

Here are my notes:

  • 4 pages are the most you can do in 60 minutes (ie write down on the blackboard)
  • really think about your audience and what you want them to take away and focus on that in your presentation
  • practice your presentation a few times before your big day
  • if you use formal mathematical symbolism to express an idea, make sure all symbols and variables are properly defined/explained (for it to make sense!)
  • never tell people to "only ask questions at the end because it would disturb your presentation"!
  • focus on important concepts and ideas
  • motivate definitions and formulas - prefer an algorithmic view on things
  • try to use images and graphs to visualize ideas
  • if you can't avoid complex definitions and deductions, use a handout
  • make sure the handout doesn't contain superfluous information which will only confuse readers
  • don't just copy your handout onto the blackboard, if you hand it out at the beginning of the presentation (it's okay, of course, if you do it afterwards)
  • never clean all blackboards at the same time
  • avoid long breaks in your presentation
  • keep on talking all the time, even while cleaning the blackboard to keep people engaged
  • if the blackboards are in front of each other, always start writing on the one on the back
  • if you can use PowerPoint, try to use it for complex/boring formulas or for visualizations, which are impossible to draw by hand
  • on the other hand, make sure, you don't put things into PowerPoint, that would be better written out slowly on the blackboard
  • always listen to criticism politely, but don't take notes of it, while you're in front of your audience
  • don't degrade yourself by apologizing exaggeratedly for mishaps
  • put the sources of referenced material onto the slide the material is used in
  • don't waste time on complicated 'examples' that are too detailed to be of any use for the audience
  • don't let technicalities obscure examples - stick to the important parts of them
  • it's nice if the handout doesn't exactly match the presentation
  • always conclude your presentation with a summary of the main points of your talk
  • try to build intuition for your topic in your audience instead of explaining difficult proofs no one will remember or care for later anyway
  • make sure you have chosen the right priorities for your presentation and trim it down as much as possible
This was it :-)

Books on Numerical Analysis

Due my writing an exam on numerical analysis I had the pleasure to look through lots and lots of books on numerical analysis, and here is a list of my favorite ones so far:

  • Afternotes on Numerical AnalysisAfternotes Goes to Graduate School by G. W. Stewart
    Both books are very readable and introduce many of the concepts in a light way that builds an intuitive understanding for them.
    It's possible to read the books cover to cover in a few days and you can learn a lot very quickly.
    They are also quite amusing:

    "In the nineteenth century the Norwegian mathematician Niels Abel showed that no polynomial of degree five could be solved by a finite number of additions, multiplications, divisions, and root extractions. If we had a finite algorithm for finding eigenvalues of general matrices, we could apply it to companion matrices and make a fool out of Abel. Abel was no fool."

  • Numerical Linear Algebra by Trefethen
    another very good book which I haven't used much personally, though
  • Numerical Methods by Kelley
    contains an okay introduction to CG and GMRES.
  • Numerische Mathematik INumerische Mathematik II
    are very good books, too. The first volume contains a good introduction to everything but ODEs and PDEs and the second volume consists of a very thorough overview of numerical algorithms for differential equations, including a nice introduction to the general theory of their solvability etc
  • Numerical Analysis by Burden and Faires
    a very nice book that contains lots of visualizations and covers many topics
More to follow soon :-)

ANTLR Stupidity (Warning 209)

I've been playing around with a Java grammar for ANTLR that was supposed to work straight-away but it did not, with very strange warnings and errors, that made it look like ANTLR only supports lexers with a lookahead of 1 character:

while STAR matches only '*' and STAREQ only matches '*='. This is a huge w-t-f, especially if you have worked with ANTLR before and didn't have issues with this. This also contradicted all documentation you can find about ANTLR and its lexer rules.

I've spent a considerable amount of time with Google trying to find how to fix it. First I've found lots of posts on the ANTLR mailing list [antlr-interest] from people who had the same issue and no replies to them (really helpful, eh?). People had issues with replacing character ranges with unicode ranges (or rather a huge list of unicode characters), which probably caused the problem in my grammar, too. Others found that ANTLR suddenly behaved as if it only had a one character lookahead, but only if more than 300 lexer rules were used in the grammar.

After searching for a long time and almost giving up on the mini-project I've wanted to use ANTLR for, I've found this post: (which matches my problem more or less but with additional insight)
and someone even replied (someone being the guy who maintains the C runtime of ANTLR):

If you are sure that the messages are not correct and the lexer rules
are not ambiguous, then you probably need to increase the conversion

-Xconversiontimeout 30000

if that does not work, then there is a conflict in your rules.


And that turns out to be the right advice and the remedy to my problems and the problems of lots of other people probably.
However, no warning or error message I encountered mentioned that ANTLR's internal processing actually timed-out and there was no ambiguity in the grammar itself...

This comes to show that any good tool like ANTLR can quickly degrade to a piece of crap and a major source of annoyance, if error and warning messages aren't clear and helpful.

On further investigation, you can trigger warnings that the conversion times out:

but not consistently. I guess this is a bug - either in ANTLR or in ANTLRWorks... 😐