Category Archives: Coding

A Long Journey: Acceler8 and TBB

A month has passed with Intel's Acceler8 competition and it has finally come to an end. It's been a long way from implementing the first sequential algorithm to having a fully-fledged parallel version.

I have never worked with Intel's Threading Building Block library before and it was a nice opportunity to examine it, since it offered a better abstraction than OpenMP or pthreads.

The documentation is very good and you quickly learn how to work with the library. One of the caveats is that I didn't have to use a low-level synchronization construct once in the development and everything worked fine without any race conditions or similar . The parallel_* functions (eg parallel_for, parallel_reduce, and parallel_scan) together with icc's C++0x support (lambda functions) allowed for very concise code and little programming overhead.

The implementation builds on Kadane's algorithm for the two dimensional case using prefix sums. One implementation that gets across the basic idea can be found here. Mine is similar and I simply parallelized as much as possible.

As you can see the outer two loops iterate over a two-dimensional range that is pretty much an upper right triangle of the whole possible domain. For this I've implemented a custom range that allows for better load balancing. A range in TBB defines an iteration range of any kind and supports a split operation that is used internally by the task scheduler to distribute the range dynamically on multiple threads as it sees fit.

Last but not least I came up with a way to parallelize the 1D part of Kadane's algorithm that is being used by splitting the column range into linear subranges and merging the subsolutions into one solution, ie a classical divide and conquer approach.

Because it's the most abstract yet interesting part of our implementation, I'm going to go into more detail here. :-)

How can you find the maximum subarray of a 1D array, if you know the maximum subarray of the two "halves" (they don't have to be split evenly)?
Well, you don't, you need more information.

We calculate the following information for each chunk:

  • maximum subarray that starts at the beginning of the chunk
  • maximum subarray that ends at the end of the chunk
  • total sum
  • maximum subarray

It's easy to figure out how to merge these values for two neighboring chunks into the values of the merged chunk.
The merged maximum subarray that starts at the beginning of the merged chunk is either said value for the left chunk or the total sum of the left chunk + that value of the right chunk.
You can figure out how it works for the maximum subarray that ends at the end of the merged chunk :-)
The maximum subarray is just the biggest of all merged values or the left maximum subarray or right one.

Using this idea you can use a simple parallel_reduce to parallelize Kadane's algorithm.

Of course, there is some overhead but for the right problem sizes this will be faster than the sequential algorithm as always.

Two more take-aways:

  • Always try to use language features like templates or lambda expressions to reduce duplicate code or make the code more concise.
  • Write unit tests. I have used googletest which is a very small but very capable library, and it has spared me a lot of debugging trouble.

Cheers :-)

T61 Extractor

I've been a vivid fan of thesixtyone.com - until they changed the design. The old site can still be found here: http://old.thesixtyone.com/.

Many of my most favorite songs are from this site and I have only been able to listen to them through the site. However, since the aforementioned design change, the site has really died down in my opinion and I was fearful ever since that it would simply go down some day and vanish - taking my songs and playlists with it.

As counter-measure I've written a java console application to extract playlists and songs from thesixtyone. It uses Selenium to remote-control a FireFox instance that uses the normal user-interface to play songs and read in playlists.

I've uploaded the code to launchpad: https://launchpad.net/t61extractor/trunk
I've tested and used it myself and it runs alright.

This was a small weekend project (or rather two weekend project) that I did a few months ago but I only found time now to write about it.

The code should be mostly self-explanatory and it's not a lot of code either.

Cheers,
Andreas

ANTLR Stupidity (Warning 209)

I've been playing around with a Java grammar for ANTLR that was supposed to work straight-away but it did not, with very strange warnings and errors, that made it look like ANTLR only supports lexers with a lookahead of 1 character:

warning(209): ...: Multiple token rules can match input such as "'*'": STAR, STAREQ

while STAR matches only '*' and STAREQ only matches '*='. This is a huge w-t-f, especially if you have worked with ANTLR before and didn't have issues with this. This also contradicted all documentation you can find about ANTLR and its lexer rules.

I've spent a considerable amount of time with Google trying to find how to fix it. First I've found lots of posts on the ANTLR mailing list [antlr-interest] from people who had the same issue and no replies to them (really helpful, eh?). People had issues with replacing character ranges with unicode ranges (or rather a huge list of unicode characters), which probably caused the problem in my grammar, too. Others found that ANTLR suddenly behaved as if it only had a one character lookahead, but only if more than 300 lexer rules were used in the grammar.

After searching for a long time and almost giving up on the mini-project I've wanted to use ANTLR for, I've found this post: http://www.antlr.org/pipermail/antlr-interest/2009-September/035954.html (which matches my problem more or less but with additional insight)
and someone even replied (someone being the guy who maintains the C runtime of ANTLR):
http://www.antlr.org/pipermail/antlr-interest/2009-September/035955.html

If you are sure that the messages are not correct and the lexer rules
are not ambiguous, then you probably need to increase the conversion
timeout:

-Xconversiontimeout 30000

if that does not work, then there is a conflict in your rules.

Jim

And that turns out to be the right advice and the remedy to my problems and the problems of lots of other people probably.
However, no warning or error message I encountered mentioned that ANTLR's internal processing actually timed-out and there was no ambiguity in the grammar itself...

This comes to show that any good tool like ANTLR can quickly degrade to a piece of crap and a major source of annoyance, if error and warning messages aren't clear and helpful.

On further investigation, you can trigger warnings that the conversion times out:

internal error: org.antlr.tool.Grammar.createLookaheadDFA(Grammar.java:1279):
    could not even do k=1 for decision 121; reason: timed out (>1ms)

but not consistently. I guess this is a bug - either in ANTLR or in ANTLRWorks... :-|

Light Propagation Volumes

I've finally finished my lab course last week - thanks to my supervisor Matthäus G. Chajdas - you can read his blog here -, it wasn't your usual lab course with work sheets and boring homework, instead I've been allowed to implement a nice paper about a Global Illumination approximation algorithm called (Cascaded) Light Propagation Volumes. It's been developed by Crytek and you can find more information (including some presentations and videos) on their server. (Note: this is an implementation of the I3D paper, not the earlier SIGGRAPH one.)

Sponza scene (direct + indirect lighting w/ occlusion)

Sponza scene (only indirect lighting w/ occlusion)

Sponza scene (ony direct lighting)

Sponza scene (boosted indirect lighting w/ occlusion)

Sponza scene (boosted indirect lighting w/o occlusion)

The algorithm approximates global illumination by rendering the light into a reflective shadow map, injecting it into a volume (using a spherical harmonics representation) and propagates the light flux in this volume (hence the name of the algorithm) and taking into account occlusion as possible extension.

The whole algorithm is physically motivated but corners everywhere, of course, to be more efficient. The paper also contains a few errors and doesn't explain everything needed to implement it in great detail (like eg the solid angles of the side faces), so I've written two documents detailing the mistakes I've found and the additional calculations I've performed.

You can find the mistakes here (including suggested corrections) and the full annotations document here.

Finally I've also uploaded the whole prototype (including my code licensed under the FreeBSD license and the media files) here - it's 68 MB big (and it's been compressed with 7zip with a compression mode that might not be supported by WinZIP. The Sponza model is from Crytek, too. You can download the original model and textures here.
The project uses DirectX 10.1 and by default it won't run in DirectX 10, because it uses a texture format that is deprecated in D3D10 but supported again 10.1 (BGRA). See the comment by FatGarfield for the location that needs to be changed for it work in DX10, too. (However red and blue will be swapped then.)

I haven't implemented cascaded LPVs and I also use only one light/RSM and only inject its depth into the occlusion volume, but the results already look very nice in my opinion.

Stay tuned for more :-)
Cheers,
Andreas

Panorama Stitching

I've finally come around to "clean-up" some old project I've had lying around for a few months and upload it.
I'm talking about some Panorama Stitching code I wrote for our participation in Microsoft's Imagine Cup.
I'm suppressing all memories of it since it was an epic fail, but at least I still have learnt quite a bit about computer vision and image processing - enough to know that it's incredibly hard to come up with stable algorithms and kudos to anyone working in the field.

Here is the code dump: PanoramaStitching.zip

It contains many small projects which usually use multiple pictures as inputs or multiple webcams (depending on code or chosen preprocessor macros).

The most advanced prototype is the SnapshotHomographyConfigurator, which allows you to determine homographies between multiple cameras at once by marking shared points between the images.

Another one which works okay is the PanoramaStitching project. It creates panoramas using spherical or cylindircal projections of the input images. However, it is very sensitive to translations of the viewpoint. It works quite well with optimal/artificial images:

(Note: the small misalignment on the right stems from moving the player position slightly. Usually you use a deghosting algorithm to remove such misalignments.)

I've used OpenCV for image processing and yaml for loading and storing settings (and also rapidxml). OpenCV's C++ wrapper is pretty awesome. It's not perfect but it makes life a lot easier.

Stay tuned for more code/project uploads soon :-)

PS: Here are the links to some papers which proved useful to me (I didn't implement most of them though, and some are implemented in OpenCV already):

Rotation of Low Order Spherical Harmonics

I'm currently working at university on implementing Light Propagation Volumes. The paper makes extensive use of spherical harmonics while the implementation uses the first two bands.

Below is a visualization of the first 4 bands of the SH basis functions (created using Mayavi):

sh0to3

The first 4 bands of the spherical harmonic basis functions

As you can see the first two bands are 4 functions, so 4 coefficients to store which conveniently fits into one RGBA texture.

One of the main transformations that is performed in the LPV paper is the rotation of the spherical harmonics representation of a clamped cosine lobe (that represents surface lighting) onto a normal vector direction.  It took me a while to figure out, but actually it's quite easy, which is why I write about it :-)

The analytical presentation of the first four base functions is simple:

S_0 \left( x, y, z \right ) = \frac{1}{2 \sqrt{\pi}}
S_1 \left( x, y, z \right ) = - \frac{\sqrt{3}}{2 \sqrt{\pi}} y
S_2 \left( x, y, z \right ) = \frac{\sqrt{3}}{2 \sqrt{\pi}} z
S_3 \left( x, y, z \right ) = - \frac{\sqrt{3}}{2 \sqrt{\pi}} x

To evaluate lighting with SH for some direction v, you first determine the coefficients/weights of the SH basis functions and then sum them up.

 L = \sum_i s_i \, S_i \left( v \right )

Let's assume we know the coefficients  s^z_0, s^z_1, ... of the clamped cosine lobe around the z axis, then we can determine the lighting in direction v for the cosine lobe around the normal n by transforming it into the space where the normal coincides with the z axis (ie rotate n onto the z axis):

 L = \sum_i s^z_i \, S_i \left( R_{n \to z} \, v \right )

where  R_{n \to z} is a rotation matrix that rotates n onto z.

The idea is to expand  S_i \left( R_{n \to z} \, v \right ) and rewrite it in terms of  S_i \left ( v \right ) .

Before doing this, let's first take a look at the coefficients of the clamped cosine lobe:

\begin{align*} 
s^z_0 &=\frac{ \sqrt{ \pi } }{ 2 }\\ 
s^z_1 &= 0\\ 
s^z_2 &= \sqrt\frac{ \pi }{3}\\ 
s^z_3 &= 0\\ 
\end{align*}

The y and x direction are 0 because the cosine lobe is centered isotropic around the z axis:

So let's look at the expanded version of this formula if  r_1^T ,  r_2^T ,  r_3^T are the row vectors of the matrix,
 v=\bigl(\begin{smallmatrix} 
x\\ 
y\\ 
z 
\end{smallmatrix}\bigr) and  R_{n \to z}=\left(\begin{smallmatrix} 
r_1^T\\ 
r_2^T\\ 
r_3^T 
\end{smallmatrix}\right ) , then:

 L = \sum_i s^z_i \, S_i \left( R_{n \to z} \, v \right ) = \sum_i s^z_i \, S_i \left( \left(\begin{smallmatrix} 
r_1^T \, v\\ 
r_2^T \, v\\ 
r_3^T \, v\end{smallmatrix}\right ) \right )
\begin{align*} L &= s^z_0 \, c_0\\ 
&+ s^z_1 \, (-c_1) \, r_2^T \, v \\ 
&+ s^z_2 \, c_1 \, r_3^T \, v\\ 
&+ s^z_3 \, (-c_1) \, r_1^T \, v 
\end{align*}

Since  s^z_1 = 0 and  s^z_3 = 0 :

 L = s^z_0 \, c_0 + s^z_2 \, c_1 \, r_3^T \, v = s^z_0 \, c_0 + s^z_2 \, c_1 \, r_{31} \, x + s^z_2 \, c_1 \, r_{32} \, y + s^z_2 \, c_1 \, r_{33} \, z

  L = s^z_0 \, S_0 \left ( v \right ) - s^z_2 \, r_{32} \, S_1 \left ( v \right )+ s^z_2 \, r_{33} \, S_2 \left ( v \right ) - s^z_2 \, r_{31} \, S_3 \left ( v \right )

Now the question is: what is the third row of  R_{n \to z} ? If we look at the inverse matrix instead:  R_{z \to n} , we can immediately see that its third column has to be n, because  R_{z \to n} \, \bigl(\begin{smallmatrix} 
0\\ 
0\\ 
1 
\end{smallmatrix}\bigr) = n by construction. Since rotations are orthogonal matrices, the inverse is the same as the transposed, so we can deduce that the third row of  R_{n \to z} is the same as the third column of  R_{z \to n} ,  that is: n. Thus with  n = \bigl(\begin{smallmatrix} 
n_x\\ 
n_y\\ 
n_z 
\end{smallmatrix}\bigr) we get:

  L = s^z_0 \, S_0 \left ( v \right ) - s^z_2 \, n_y \, S_1 \left (  v \right )+ s^z_2 \, n_z \, S_2 \left ( v \right ) - s^z_2 \, n_x  \, S_3 \left ( v \right )

So the SH coefficients of the clamped cosine lobe along n are:

 
s^n_0 = s^z_0 = \frac{ \sqrt{ \pi } }{ 2 } \\ 
s^n_1 = - s^z_2 \, n_y =  -\sqrt{ \frac{ \pi }{3} } \, n_y \\ 
s^n_2 = s^z_2 \, n_z = \sqrt{\frac{ \pi }{3} } \, n_z \\ 
s^n_1 = - s^z_2 \, n_x = - \sqrt{\frac{ \pi }{3}} \, n_x

This is it :-)

Cheers,
Andreas

PS: a few screenshots from the LPV project:

GPUPropCopy 0616
noLPV
LPV32P128C

noLPV_2LPV32P128C_2

Extracting Information from StudiVZ

Some time ago somebody stole 1 million data records from StudiVZ, the German Facebook clone. I'm not exactly sure why people call the person a hacker who stole data, because it appears he simply wrote a tool that harvested the publicly available data from StudiVZ (which everyone with an account can view).

People on StudiVZ share all their data by default---contrary to Facebook which values a person's private data a lot more. Thus by simply opening each profile from a dummy user and processing the HTML data from StudiVZ one can extract a lot and some more information from random people who probably don't even know about it or don't care.. so I'm not sure about the stealing part.

Apparently there are some captcha's when you start browsing searches beyond a few pages. I guess that is where the hacking part comes in, because getting around a captcha probably constitutes hacking---maybe?

Anyway I think part of the media coverage is a bit ridiculous because anyone can write a simple harvester in an hour or two. It took me one and half hours, so I think I'm on the safe side with this estimate and I didn't really have a clue about this stuff before either.

Since I don't want to "hack", I've only written a very tame harvester. It connects to your personal StudiVZ account, and retrieves the name and profile ID (and thus profile URL) of all your friends in the "Meine Freunde" pages.

It could do a lot more with that like retrieving everybody's birthday or random pictures, but I'm too lazy to code that because you use the same pattern for extracting data over and over again and it stops being interesting quite fast.

You can download the project here. It is a one file C# project. I'm releasing it under GPL (whatever).

It's really easy to explain how it works:

  • It uses System.Net's HttpWebRequest and HttpWebResponse to get (and post) web pages.
  • StudiVZ (like every other portal) uses cookies, so I create a CookieContainer and use it in every http request.
  • There are a few hidden values that StudiVZ expects during login. I'm retrieving them from the main page using custom built regular expressions. I've found a handy AJAX tester for .NET regular expressions. It was really useful for building the expressions and debugging them. (BTW you can find all URLs I used in the comments.)
  • After login I use the same pattern: get page & parse using regex for everything.
  • Visual Studio has an awesome "HTML Visualizer" for strings. It displays the content of a string as HTML page, which is really nifty if you're doing anything related to HTML processing.

The code is quite ugly. Well, it's not production code and this is only meant as a proof of concept.

Also note that I have at most violated the AGB of StudiVZ and not committed any criminal acts and I'm not planning to sell my friend's profile IDs or data either :-)

Maybe someone can extend the code and make it more useful. I guess it would be fun to automatically download all your pictures (including tags) and feed them into flickr or picasa... but someone else can do that.

Cheers,
Andreas

Sploidz Revisited (Unofficially)

I've already written about the semi-conductor project and how I've written some Flash animations/applications for it. Of course, I'm more interested in making fun stuff´, so I decided to put my knowledge to good use and write a small game to see how difficult/awkward Flash actually is.

To sum it up, it is somewhat awkward, at least if you use the IDE itself. FlashDevelop still is as nice as ever, but you can quickly develop games nonetheless. I prefer Torque Game Builder though in retrospect.

Before I continue talking about the development itself, let's take a look at the actual game. Sploidz was the first game I wrote using Torque Game Builder for Joshua Dallman, and since I still had all the assets in my subversion repository, it was an easy decision to try and port this game. If you want to play it, you can download it for free here.
I haven't ported everything: I've just rewritten the main characteristic features that make up Sploidz's code in ActionScript.

Without further ado´ here is the game:

Click to open Sploidz in its own window

Because the art is still copyrighted and I haven't heard back from Joshua yet ´, I decided to create a free version that only uses "coder art" - in this hand-drawn coder art :-)

Some´ have said that this version looks cuter, decide for yourself:

Click to open SploidzCC in its own window

Below you'll find a description of the development and at least one helpful trick and most importantly a link to the source code of the "copyright-free" version.

Because the orginal version is way too difficult to be really fun, I actually sat down one more time and added code to make the platform slower if you're in danger of losing (up to 3 times slower):

Click to open SploidzMoreFun in its own window

Read more »

PowerPointLaTeX Update

Because people complained to me about the formula feature in my PowerPointLaTeX add-in, which used a somewhat experimental approach to editing formula objects by adding an editing text shape that contained the formula code and that would be merged back into the formula as soon as you deselect it, I decided to rewrite it to use a standard modal dialog to edit formula objects:

PPTLaTeX_eqeditor

Updated Ribbon (above) and Formula Editor Dialog (below)

The editor isn't perfect (yet), but it certainly shouldn't add any bugs to the add-in and solve some natural issues the old approach created.

Implementation Note

The idea was pretty straight-forward but the actual UI design was a PITA due me not knowning the panel/flow/table layout concepts very well and the code still has some annoying quirks with auto-scroll, so I need to fix that later.

I almost rewrote the whole cache system, because I'm using a background thread for updating the preview (if the text is changed, a 500 msec timer is started which triggers an update) and the update accesses the cache system, which in turn accesses PowerPoint to return some data, which in turn is busy because of the modal dialog -> dead-lock.

The solution to this is very simple but was not obvious to me at first (I actually began to rewrite the cache system with a feeling that there should be an easier solution):
The background thread needs an Invoke call to update the preview picture because the control has been created by a different thread (the main thread) and the code to get an updated picture can simply be moved into Invoke delegate function.

This solved all my problems and made 4 hours of previous work and thinking about a new cache system obsolete :-|

Download the new build at: http://code.google.com/p/powerpointtools/downloads/list

Cheers,
Andreas

Semi-Conductor Optimization (Uni Project)

I've written my last exam yesterday (except for two oral exams in September), so now I have got some spare time before I start working on my Bachelor Thesis tomorrow and I want to use it to wrap up a few things.

During this term I took part in a course that was both a (research) project/presentation/lecture thing, which was fun but also a lot of work.
I've already written about one mathematical aspect of it in my post about Analysis, Cauchy-Schwarz and Reciprocal Sums.

The project was about optimizing semi-conductor wiring placement. We wrote a small paper about our findings and the work it was based one - you can look download it here.

We also created a self-running presentation that doesn't contain any Maths at all but makes heavy use of Flash animations (which were exported to .gif manually, which was a huge pain in the ass, which I will never do again if possible) to visualize all the concepts and algorithms.

You can download a PowerPoint 2007 (.pptx) version here or one that works with PowerPoint 2003 here.

For the Student's I sat down and wrote a small Flash application to show the algorithms at work. It's not obvious how it works, so let me explain the major points:

  • On the right you have a number of panels that you can enlarge by clicking on the small button in the upper right of each panel.
  • The upper three panels show different views of the same dataset. They will all be updated as you run the algorithm step by step.
  • The fourth panel lets you change the number of wires and/or their activity.
  • The last panel shows the electrical field that is created by one active wire in the center. It was created using MatLAB. I've also uploaded the script here.

Abitag

Last but not least I've also uploaded the current version of all my .fla and .as files. You can download it here.

ActionScript is a nice language and you can quickly learn it using the available resources from Adobe.
While ActionScript 2.0 is arguably weird, ActionScript 3.0 is quite logical and it's syntax is straight-forward and consistent, too. You can't say that about the IDE (Flash CS4), which is braindead, but if you're only interested in writing ActionScript code, FlashDevelop is an excellent and free alternative.

This is it for now, maybe I'll play around with Flash some more another time.
Cheers,
Andreas