## A Long Journey: Acceler8 and TBB

A month has passed with Intel's Acceler8 competition and it has finally come to an end. It's been a long way from implementing the first sequential algorithm to having a fully-fledged parallel version.

I have never worked with Intel's Threading Building Block library before and it was a nice opportunity to examine it, since it offered a better abstraction than OpenMP or pthreads.

The documentation is very good and you quickly learn how to work with the library. One of the caveats is that I didn't have to use a low-level synchronization construct once in the development and everything worked fine without any race conditions or similar . The parallel_* functions (eg parallel_for, parallel_reduce, and parallel_scan) together with icc's C++0x support (lambda functions) allowed for very concise code and little programming overhead.

The implementation builds on Kadane's algorithm for the two dimensional case using prefix sums. One implementation that gets across the basic idea can be found here. Mine is similar and I simply parallelized as much as possible.

As you can see the outer two loops iterate over a two-dimensional range that is pretty much an upper right triangle of the whole possible domain. For this I've implemented a custom range that allows for better load balancing. A range in TBB defines an iteration range of any kind and supports a split operation that is used internally by the task scheduler to distribute the range dynamically on multiple threads as it sees fit.

Last but not least I came up with a way to parallelize the 1D part of Kadane's algorithm that is being used by splitting the column range into linear subranges and merging the subsolutions into one solution, ie a classical divide and conquer approach.

Because it's the most abstract yet interesting part of our implementation, I'm going to go into more detail here. :-)

How can you find the maximum subarray of a 1D array, if you know the maximum subarray of the two "halves" (they don't have to be split evenly)? Well, you don't, you need more information.

We calculate the following information for each chunk:

• maximum subarray that starts at the beginning of the chunk
• maximum subarray that ends at the end of the chunk
• total sum
• maximum subarray

It's easy to figure out how to merge these values for two neighboring chunks into the values of the merged chunk. The merged maximum subarray that starts at the beginning of the merged chunk is either said value for the left chunk or the total sum of the left chunk + that value of the right chunk. You can figure out how it works for the maximum subarray that ends at the end of the merged chunk :-) The maximum subarray is just the biggest of all merged values or the left maximum subarray or right one.

Using this idea you can use a simple parallel_reduce to parallelize Kadane's algorithm.

Of course, there is some overhead but for the right problem sizes this will be faster than the sequential algorithm as always.

Two more take-aways:

• Always try to use language features like templates or lambda expressions to reduce duplicate code or make the code more concise.
• Write unit tests. I have used googletest which is a very small but very capable library, and it has spared me a lot of debugging trouble.

Cheers :-)

## Reading Nonfiction Books Quickly

I like to read books and I also like to read nonfiction books and in particular nonfiction books that I do not agree with.

For this I've decided to look into speed reading. It is an umbrella term for methods that increase your reading speed while keeping a satisfactory comprehension level. A usual reading speed is 200-300 wpm (words per minute). Speed reading advertises increasing your speed to > 600 wpm.

I've searched around for a bit and read lots of blog posts and articles and here are my favorites:

http://www.fourhourworkweek.com/blog/2009/07/30/speed-reading-and-accelerated-learning/ is a good introduction to the main techniques of speed-reading. It also includes some exercises and is a short read.

http://noahfleming.com/blog/speed-read-like-rain-man-75-increased-reading-speed-in-20-minutes describes the author's personal experience with fourhourworkweek's article.

There is also speed-reading software available. The idea is your eye movement is the main hindrance to reading really fast, so this software displays the text for you to read in groups of several words in the center of the screen. This way you can read everything at once without having to move your eyes at all. As crazy as it sounds, it works to a great degree.

http://www.spreeder.com/ is  an online reader that works this way. Give it a try! You can make it display the introduction text at 600 wpm or 800 wpm and see how you'll understand - you'll be surprised!

'The Speed Reading Workbook' is what I've been using to practise it. It's okay written and has plenty of exercises and includes timing and evaluation sheets, which are really useful for measuring your progress.

A good summary of many concepts can be found in the PowerPoint Presentation 'Double your Reading Speed in 25 Minutes'.

##### Own Expierence

I've been exercising with the workbook now and then for the past weeks and I've become faster but this is still an ongoing process for me, so I'm going to blog about it later (or rather update this post).

For what it's worth: my reading speed was 300 wpm and now it is around 550 wpm :-)

So.. the summer term has passed and I'm wrapping things up and one of the final things I want to get out of my notebooks into this blog are some notes I took for feedback during my Hauptseminar in my maths studies.

Everybody had to do a 60 minutes presentation and the audience was asked to take notes about the presentation style to give feedback later (in addition to paying attention to the presentations).

There were 4 blackboards in the room and a beamer.

Here are my notes:

• 4 pages are the most you can do in 60 minutes (ie write down on the blackboard)
• really think about your audience and what you want them to take away and focus on that in your presentation
• practice your presentation a few times before your big day
• if you use formal mathematical symbolism to express an idea, make sure all symbols and variables are properly defined/explained (for it to make sense!)
• never tell people to "only ask questions at the end because it would disturb your presentation"!
• focus on important concepts and ideas
• motivate definitions and formulas - prefer an algorithmic view on things
• try to use images and graphs to visualize ideas
• if you can't avoid complex definitions and deductions, use a handout
• make sure the handout doesn't contain superfluous information which will only confuse readers
• don't just copy your handout onto the blackboard, if you hand it out at the beginning of the presentation (it's okay, of course, if you do it afterwards)
• never clean all blackboards at the same time
• avoid long breaks in your presentation
• keep on talking all the time, even while cleaning the blackboard to keep people engaged
• if the blackboards are in front of each other, always start writing on the one on the back
• if you can use PowerPoint, try to use it for complex/boring formulas or for visualizations, which are impossible to draw by hand
• on the other hand, make sure, you don't put things into PowerPoint, that would be better written out slowly on the blackboard
• always listen to criticism politely, but don't take notes of it, while you're in front of your audience
• don't degrade yourself by apologizing exaggeratedly for mishaps
• put the sources of referenced material onto the slide the material is used in
• don't waste time on complicated 'examples' that are too detailed to be of any use for the audience
• don't let technicalities obscure examples - stick to the important parts of them
• it's nice if the handout doesn't exactly match the presentation
• always conclude your presentation with a summary of the main points of your talk
• try to build intuition for your topic in your audience instead of explaining difficult proofs no one will remember or care for later anyway
• make sure you have chosen the right priorities for your presentation and trim it down as much as possible

This was it :-)

Cheers,
Andreas

## T61 Extractor

I've been a vivid fan of thesixtyone.com - until they changed the design. The old site can still be found here: http://old.thesixtyone.com/.

Many of my most favorite songs are from this site and I have only been able to listen to them through the site. However, since the aforementioned design change, the site has really died down in my opinion and I was fearful ever since that it would simply go down some day and vanish - taking my songs and playlists with it.

As counter-measure I've written a java console application to extract playlists and songs from thesixtyone. It uses Selenium to remote-control a FireFox instance that uses the normal user-interface to play songs and read in playlists.

I've uploaded the code to launchpad: https://launchpad.net/t61extractor/trunk I've tested and used it myself and it runs alright.

This was a small weekend project (or rather two weekend project) that I did a few months ago but I only found time now to write about it.

The code should be mostly self-explanatory and it's not a lot of code either.

Cheers,
Andreas

## Books on Numerical Analysis

Due my writing an exam on numerical analysis I had the pleasure to look through lots and lots of books on numerical analysis, and here is a list of my favorite ones so far:

• Afternotes on Numerical AnalysisAfternotes Goes to Graduate School by G. W. Stewart Both books are very readable and introduce many of the concepts in a light way that builds an intuitive understanding for them. It's possible to read the books cover to cover in a few days and you can learn a lot very quickly. They are also quite amusing:

"In the nineteenth century the Norwegian mathematician Niels Abel showed that no polynomial of degree five could be solved by a finite number of additions, multiplications, divisions, and root extractions. If we had a finite algorithm for finding eigenvalues of general matrices, we could apply it to companion matrices and make a fool out of Abel. Abel was no fool."
• Numerical Linear Algebra by Trefethen Another very good book which I haven't used much personally, though

• Numerical Methods by Kelley contains an okay introduction to CG and GMRES.

• Numerische Mathematik INumerische Mathematik II are very good books, too. The first volume contains a good introduction to everything but ODEs and PDEs and the second volume consists of a very thorough overview of numerical algorithms for differential equations, including a nice introduction to the general theory of their solvability etc.

• Numerical Analysis by Burden and Faires a very nice book that contains lots of visualizations and covers many topics

More to follow soon :-)

Cheers,
Andreas

## ANTLR Stupidity (Warning 209)

I've been playing around with a Java grammar for ANTLR that was supposed to work straight-away but it did not, with very strange warnings and errors, that made it look like ANTLR only supports lexers with a lookahead of 1 character:

warning(209): ...: Multiple token rules can match input such as "'*'": STAR, STAREQ

while STAR matches only '' and STAREQ only matches '='. This is a huge w-t-f, especially if you have worked with ANTLR before and didn't have issues with this. This also contradicted all documentation you can find about ANTLR and its lexer rules.

I've spent a considerable amount of time with Google trying to find how to fix it. First I've found lots of posts on the ANTLR mailing list [antlr-interest] from people who had the same issue and no replies to them. People had issues with replacing character ranges with unicode ranges (or rather a huge list of unicode characters), which probably caused the problem in my grammar, too. Others found that ANTLR suddenly behaved as if it only had a one character lookahead, but only if more than 300 lexer rules were used in the grammar.

After searching for a long time and almost giving up on the mini-project I've wanted to use ANTLR for, I've found this post: http://www.antlr.org/pipermail/antlr-interest/2009-September/035954.html (which matches my problem more or less but with additional insight) and someone even replied (someone being the guy who maintains the C runtime of ANTLR): http://www.antlr.org/pipermail/antlr-interest/2009-September/035955.html

If you are sure that the messages are not correct and the lexer rules are not ambiguous, then you probably need to increase the conversion timeout:

-Xconversiontimeout 30000

if that does not work, then there is a conflict in your rules.

Jim

And that turns out to be the right advice and the remedy to my problems and the problems of lots of other people probably. However, no warning or error message I encountered mentioned that ANTLR's internal processing actually timed-out and there was no ambiguity in the grammar itself...

This comes to show that any good tool like ANTLR can quickly degrade to a piece of crap and a major source of annoyance if error and warning messages aren't clear and helpful.

On further investigation, you can trigger warnings that the conversion times out:

internal error: org.antlr.tool.Grammar.createLookaheadDFA(Grammar.java:1279):
could not even do k=1 for decision 121; reason: timed out (>1ms)

but not consistently. I guess this is a bug - either in ANTLR or in ANTLRWorks... :-|

## Limbo

Limbo is a game that has been released on Xbox Live Arcade some time ago, but I've only now found time to play it.

It is a very nice game - albeit a bit on the short side: it took me about 5 hours to finish it. But this is the only negative bit I can think of. Everything else about the game is very nice. It is very polished and it's a charm to play. You have to think about the puzzles for some time but the learning curve is okay and there were no unfair bits.

The game is black-and-white only (with shades of gray) and you play a boy that apparently got lost in a forest (this is how it starts) and you want to get out.<

The game mechanics are very simple: you can only move around, jump and hold on to things to drag them around (or press buttons, etc). In this regard the game is very similar to Another World (another very good game worth playing).

The puzzles are all phyiscs-based and because of this the aforementioned fairness is achieved: sometimes you just have to think a bit longer how to solve a puzzle but it is always logical.

The game is somewhat violent because it works using the "learning by dying" principle and if you don't enable the gore filter, you'll see the poor boy being halved, stabbed, squashed,... many times. However, since he is only a black shape, it's okay and won't put you off playing. The game uses many (and only) auto-saves to track your progress and you never feel punished for dying because you'll get another try straightaway without having to replay more than a few seconds.

"Learning by dying" works really well (and is also used in Another World): You don't have to worry about a "health bar" or rewind time or...

The difficulty of the game is just right and it gets harder as you progress through the world: you get to use more objects to solve the puzzles or will have to think of new ways to use physics to achieve your goals. However, one annoying bit is that also autosave points get moved further apart even when there is no need for it. This only happens in the last few "chapters" of the game though.

All in all it is very much a game worth playing and a very polished experience - kudos to the developers!

## Light Propagation Volumes

I've finally finished my lab course last week - thanks to my supervisor Matthäus G. Chajdas - you can read his blog here -, it wasn't your usual lab course with work sheets and boring homework, instead I've been allowed to implement a nice paper about a Global Illumination approximation algorithm called (Cascaded) Light Propagation Volumes. It's been developed by Crytek and you can find more information (including some presentations and videos) on their server. (Note: this is an implementation of the I3D paper, not the earlier SIGGRAPH one.)

The algorithm approximates global illumination by rendering the light into a reflective shadow map, injecting it into a volume (using a spherical harmonics representation) and propagates the light flux in this volume (hence the name of the algorithm) and taking into account occlusion as possible extension.

The whole algorithm is physically motivated but corners everywhere, of course, to be more efficient. The paper also contains a few errors and doesn't explain everything needed to implement it in great detail (like eg the solid angles of the side faces), so I've written two documents detailing the mistakes I've found and the additional calculations I've performed.

You can find the mistakes here (including suggested corrections) and the full annotations document here.

Finally I've also uploaded the whole prototype (including my code licensed under the FreeBSD license and the media files) here - it's 68 MB big (and it's been compressed with 7zip with a compression mode that might not be supported by WinZIP. The Sponza model is from Crytek, too. You can download the original model and textures here. The project uses DirectX 10.1 and by default it won't run in DirectX 10, because it uses a texture format that is deprecated in D3D10 but supported again 10.1 (BGRA). See the comment by FatGarfield for the location that needs to be changed for it work in DX10, too. (However red and blue will be swapped then.)

Update: I've updated the project to support Visual Studio 2012. You can find the full download here (68 MB).

I haven't implemented cascaded LPVs and I also use only one light/RSM and only inject its depth into the occlusion volume, but the results already look very nice in my opinion.

Stay tuned for more :-)

Cheers,
Andreas

## Panorama Stitching

I've finally come around to "clean-up" some old project I've had lying around for a few months and upload it. I'm talking about some Panorama Stitching code I wrote for our participation in Microsoft's Imagine Cup. I'm suppressing all memories of it since it was an epic fail, but at least I still have learnt quite a bit about computer vision and image processing - enough to know that it's incredibly hard to come up with stable algorithms and kudos to anyone working in the field.

Here is the code dump: PanoramaStitching.zip

It contains many small projects which usually use multiple pictures as inputs or multiple webcams (depending on code or chosen preprocessor macros).

The most advanced prototype is the SnapshotHomographyConfigurator, which allows you to determine homographies between multiple cameras at once by marking shared points between the images.

Another one which works okay is the PanoramaStitching project. It creates panoramas using spherical or cylindrical projections of the input images. However, it is very sensitive to translations of the viewpoint. It works quite well with optimal/artificial images:

(Note: the small misalignment on the right stems from moving the player position slightly. Usually you use a deghosting algorithm to remove such misalignments.)

I've used OpenCV for image processing and yaml for loading and storing settings (and also rapidxml). OpenCV's C++ wrapper is pretty awesome. It's not perfect but it makes life a lot easier.

Stay tuned for more code/project uploads soon :-)

PS: Here are the links to some papers which proved useful to me (I didn't implement most of them though, and some are implemented in OpenCV already):

## Rotation of Low Order Spherical Harmonics

I'm currently working at university on implementing Light Propagation Volumes. The paper makes extensive use of spherical harmonics while the implementation uses the first two bands.

Below is a visualization of the first 4 bands of the SH basis functions (created using Mayavi):

As you can see the first two bands are 4 functions, so 4 coefficients to store which conveniently fits into one RGBA texture.

One of the main transformations that is performed in the LPV paper is the rotation of the spherical harmonics representation of a clamped cosine lobe (that represents surface lighting) onto a normal vector direction. It took me a while to figure out, but actually it's quite easy, which is why I write about it :-)

The analytical presentation of the first four base functions is simple:

$S_0 \left( x, y, z \right ) = \frac{1}{2 \sqrt{\pi}}$ $S_1 \left( x, y, z \right ) = - \frac{\sqrt{3}}{2 \sqrt{\pi}} y$ $S_2 \left( x, y, z \right ) = \frac{\sqrt{3}}{2 \sqrt{\pi}} z$ $S_3 \left( x, y, z \right ) = - \frac{\sqrt{3}}{2 \sqrt{\pi}} x$

To evaluate lighting with SH for some direction $$v$$, you first determine the coefficients/weights of the SH basis functions and then sum them up.

$L = \sum_i s_i \, S_i \left( v \right )$

Let's assume we know the coefficients $$s^z_0, s^z_1, ...$$ of the clamped cosine lobe around the z axis, then we can determine the lighting in direction v for the cosine lobe around the normal n by transforming it into the space where the normal coincides with the z axis (ie rotate n onto the z axis):

$L = \sum_i s^z_i \, S_i \left( R_{n \to z} \, v \right )$

where $$R_{n \to z}$$ is a rotation matrix that rotates n onto z.

The idea is to expand $$S_i \left( R_{n \to z} \, v \right )$$ and rewrite it in terms of $$S_i \left ( v \right )$$.

Before doing this, let's first take a look at the coefficients of the clamped cosine lobe:

\begin{align*} s^z_0 &=\frac{ \sqrt{ \pi } }{ 2 }\\ s^z_1 &= 0\\ s^z_2 &= \sqrt\frac{ \pi }{3}\\ s^z_3 &= 0\\ \end{align*}

The y and x direction are 0 because the cosine lobe is centered isotropic around the z axis:

So let's look at the expanded version of this formula if $$r_1^T$$, $$r_2^T$$, $$r_3^T$$ are the row vectors of the matrix, $$v=\bigl(\begin{smallmatrix} x\\ y\\ z \end{smallmatrix}\bigr)$$ and $$R_{n \to z}=\left(\begin{smallmatrix} r_1^T\\ r_2^T\\ r_3^T \end{smallmatrix}\right )$$, then:

$L = \sum_i s^z_i \, S_i \left( R_{n \to z} \, v \right ) = \sum_i s^z_i \, S_i \left( \left(\begin{smallmatrix} r_1^T \, v\\ r_2^T \, v\\ r_3^T \, v\end{smallmatrix}\right ) \right )$ \begin{align*} L &= s^z_0 \, c_0\\ &+ s^z_1 \, (-c_1) \, r_2^T \, v \\ &+ s^z_2 \, c_1 \, r_3^T \, v\\ &+ s^z_3 \, (-c_1) \, r_1^T \, v \end{align*}

Since $$s^z_1 = 0$$ and $$s^z_3 = 0$$:

$L = s^z_0 \, c_0 + s^z_2 \, c_1 \, r_3^T \, v = s^z_0 \, c_0 + s^z_2 \, c_1 \, r_{31} \, x + s^z_2 \, c_1 \, r_{32} \, y + s^z_2 \, c_1 \, r_{33} \, z$

$L = s^z_0 \, S_0 \left ( v \right ) - s^z_2 \, r_{32} \, S_1 \left ( v \right )+ s^z_2 \, r_{33} \, S_2 \left ( v \right ) - s^z_2 \, r_{31} \, S_3 \left ( v \right )$

Now the question is: what is the third row of $$R_{n \to z}$$? If we look at the inverse matrix instead: $$R_{z \to n}$$, we can immediately see that its third column has to be n, because $R_{z \to n} \, \bigl(\begin{smallmatrix} 0\\ 0\\ 1 \end{smallmatrix}\bigr) = n$ by construction. Since rotations are orthogonal matrices, the inverse is the same as the transposed, so we can deduce that the third row of $$R_{n \to z}$$ is the same as the third column of $$R_{z \to n}$$, that is: n. Thus with $$n = \bigl(\begin{smallmatrix} n_x\\ n_y\\ n_z \end{smallmatrix}\bigr)$$ we get:

$L = s^z_0 \, S_0 \left ( v \right ) - s^z_2 \, n_y \, S_1 \left ( v \right )+ s^z_2 \, n_z \, S_2 \left ( v \right ) - s^z_2 \, n_x \, S_3 \left ( v \right )$

So the SH coefficients of the clamped cosine lobe along n are:

$s^n_0 = s^z_0 = \frac{ \sqrt{ \pi } }{ 2 } \\ s^n_1 = - s^z_2 \, n_y = -\sqrt{ \frac{ \pi }{3} } \, n_y \\ s^n_2 = s^z_2 \, n_z = \sqrt{\frac{ \pi }{3} } \, n_z \\ s^n_1 = - s^z_2 \, n_x = - \sqrt{\frac{ \pi }{3}} \, n_x$

This is it :-)

Cheers,
Andreas

PS: a few screenshots from the LPV project:

GPUPropCopy

noLPV_2

noLPV

LPV32P128C_2

LPV32P128C