msgbartop
Just another weblog
msgbarbottom

19 Oct 09 Extracting Information from StudiVZ

Some time ago somebody stole 1 million data records from StudiVZ, the German Facebook clone. I’m not exactly sure why people call the person a hacker who stole data, because it appears he simply wrote a tool that harvested the publicly available data from StudiVZ (which everyone with an account can view).

People on StudiVZ share all their data by default—contrary to Facebook which values a person’s private data a lot more. Thus by simply opening each profile from a dummy user and processing the HTML data from StudiVZ one can extract a lot and some more information from random people who probably don’t even know about it or don’t care.. so I’m not sure about the stealing part.

Apparently there are some captcha’s when you start browsing searches beyond a few pages. I guess that is where the hacking part comes in, because getting around a captcha probably constitutes hacking—maybe?

Anyway I think part of the media coverage is a bit ridiculous because anyone can write a simple harvester in an hour or two. It took me one and half hours, so I think I’m on the safe side with this estimate and I didn’t really have a clue about this stuff before either.

Since I don’t want to “hack”, I’ve only written a very tame harvester. It connects to your personal StudiVZ account, and retrieves the name and profile ID (and thus profile URL) of all your friends in the “Meine Freunde” pages.

It could do a lot more with that like retrieving everybody’s birthday or random pictures, but I’m too lazy to code that because you use the same pattern for extracting data over and over again and it stops being interesting quite fast.

You can download the project here. It is a one file C# project. I’m releasing it under GPL (whatever).

It’s really easy to explain how it works:

  • It uses System.Net’s HttpWebRequest and HttpWebResponse to get (and post) web pages.
  • StudiVZ (like every other portal) uses cookies, so I create a CookieContainer and use it in every http request.
  • There are a few hidden values that StudiVZ expects during login. I’m retrieving them from the main page using custom built regular expressions. I’ve found a handy AJAX tester for .NET regular expressions. It was really useful for building the expressions and debugging them. (BTW you can find all URLs I used in the comments.)
  • After login I use the same pattern: get page & parse using regex for everything.
  • Visual Studio has an awesome “HTML Visualizer” for strings. It displays the content of a string as HTML page, which is really nifty if you’re doing anything related to HTML processing.

The code is quite ugly. Well, it’s not production code and this is only meant as a proof of concept.

Also note that I have at most violated the AGB of StudiVZ and not committed any criminal acts and I’m not planning to sell my friend’s profile IDs or data either :-)

Maybe someone can extend the code and make it more useful. I guess it would be fun to automatically download all your pictures (including tags) and feed them into flickr or picasa… but someone else can do that.

Cheers,
Andreas

Tags: , ,

17 Oct 09 My Bachelor Thesis

Screenshot of Utah

Screenshot of Utah

For the last two months I have been working on my bachelor thesis at the Chair of Computer Graphics and Visualization. It is about “Multi-Tile Terrain Rendering with OGL/Equalizer”´.
The chair has a very nice Direct3D 10 terrain rendering engine and they want to run it at the newly founded KAUST (King Abdullah University of Science and Technology´) in a massive CAVE environment. A CAVE is a room whose walls are actually screens.
The CAVE at KAUST even supports stereoscopic rendering. Thus in total 12 views can be rendered to.

My job was to port said terrain engine from Direct3D to OpenGL and afterwards to the Equalizer framework, which is an open-source framework for parallelizing OpenGL applications.

You can find/download an online version of my bachelor thesis here. I’ll upload the LaTeX at a later date and update this post.

I’ve spent the last months writing about all this, so I don’t feel like talking about the thesis itself anymore. Instead the remainder of this post will contain a post-mortem of it.
(more…)

Tags: , , , , , ,

05 Aug 09 PowerPointLaTeX Update

Because people complained to me about the formula feature in my PowerPointLaTeX add-in, which used a somewhat experimental approach to editing formula objects by adding an editing text shape that contained the formula code and that would be merged back into the formula as soon as you deselect it, I decided to rewrite it to use a standard modal dialog to edit formula objects:

PPTLaTeX_eqeditor

Updated Ribbon (above) and Formula Editor Dialog (below)

The editor isn’t perfect (yet), but it certainly shouldn’t add any bugs to the add-in and solve some natural issues the old approach created.

Implementation Note

The idea was pretty straight-forward but the actual UI design was a PITA due me not knowning the panel/flow/table layout concepts very well and the code still has some annoying quirks with auto-scroll, so I need to fix that later.

I almost rewrote the whole cache system, because I’m using a background thread for updating the preview (if the text is changed, a 500 msec timer is started which triggers an update) and the update accesses the cache system, which in turn accesses PowerPoint to return some data, which in turn is busy because of the modal dialog -> dead-lock.

The solution to this is very simple but was not obvious to me at first (I actually began to rewrite the cache system with a feeling that there should be an easier solution):
The background thread needs an Invoke call to update the preview picture because the control has been created by a different thread (the main thread) and the code to get an updated picture can simply be moved into Invoke delegate function.

This solved all my problems and made 4 hours of previous work and thinking about a new cache system obsolete :-|

Download the new build at: http://code.google.com/p/powerpointtools/downloads/list

Cheers,
Andreas

Tags: , , ,

02 Aug 09 Semi-Conductor Optimization (Uni Project)

I’ve written my last exam yesterday (except for two oral exams in September), so now I have got some spare time before I start working on my Bachelor Thesis tomorrow and I want to use it to wrap up a few things.

During this term I took part in a course that was both a (research) project/presentation/lecture thing, which was fun but also a lot of work.
I’ve already written about one mathematical aspect of it in my post about Analysis, Cauchy-Schwarz and Reciprocal Sums.

The project was about optimizing semi-conductor wiring placement. We wrote a small paper about our findings and the work it was based one – you can look download it here.

We also created a self-running presentation that doesn’t contain any Maths at all but makes heavy use of Flash animations (which were exported to .gif manually, which was a huge pain in the ass, which I will never do again if possible) to visualize all the concepts and algorithms.

You can download a PowerPoint 2007 (.pptx) version here or one that works with PowerPoint 2003 here.

For the Student’s I sat down and wrote a small Flash application to show the algorithms at work. It’s not obvious how it works, so let me explain the major points:

  • On the right you have a number of panels that you can enlarge by clicking on the small button in the upper right of each panel.
  • The upper three panels show different views of the same dataset. They will all be updated as you run the algorithm step by step.
  • The fourth panel lets you change the number of wires and/or their activity.
  • The last panel shows the electrical field that is created by one active wire in the center. It was created using MatLAB. I’ve also uploaded the script here.

Abitag

Last but not least I’ve also uploaded the current version of all my .fla and .as files. You can download it here.

ActionScript is a nice language and you can quickly learn it using the available resources from Adobe.
While ActionScript 2.0 is arguably weird, ActionScript 3.0 is quite logical and it’s syntax is straight-forward and consistent, too. You can’t say that about the IDE (Flash CS4), which is braindead, but if you’re only interested in writing ActionScript code, FlashDevelop is an excellent and free alternative.

This is it for now, maybe I’ll play around with Flash some more another time.
Cheers,
Andreas

Tags: , , , ,

18 Jul 09 From Vertex Positions to Packed Element Arrays

At my new workplace at university I’m currently porting an advanced terrain rendering engine from DirectX to OpenGL.
One of the performance optimizations the engine uses is that it draws the terrain tiles right from the index buffer without using a vertex buffer at all – that is it packs the vertex position into the 32-bit index and unpacks it in the vertex shader.

Why is this faster than rendering using a vertex buffer and no index buffer?

When using an index buffer the graphics card can make use of a cache of already transformed (vertex-shaded) vertices and when an index is reused, it can use the cached result instead of running the vertex shader again. Of course, this only works if there exists a certain temporal locality, but that is given.
If no index buffer is used, the vertex cache won’t be used, because the implicit index is different for each vertex.

Since I need to port the engine from DirectX to OpenGL, I did some research to see if it is possible to do the same in OpenGL.
It’s not really possible but you can achieve something quite similar in OpenGL 3.0.

Using the Input-Assembler Stage without Buffers (OpenGL)

This is meant as OpenGL analogon for Using the Input-Assembler Stage without Buffers (Direct3D 10).

I think the title is self-explanatory but for greater clarity let me rephrase it:
The aim is to render something without using vertex or index buffers, that is (in OpenGL speak) using neither vertex data nor an elements array to render something.
Instead the automatically supplied gl_VertexID attribute  (vertexId in DirectX) is used to determine the vertex the shader is currently processing.

The example in MSDN simply draws a triangle using vertexId:

VSOut VSmain(VSIn input)
{
VSOut output;

if (input.vertexId == 0)
output.pos = float4(0.0, 0.5, 0.5, 1.0);
else if (input.vertexId == 2)
output.pos = float4(0.5, -0.5, 0.5, 1.0);
else if (input.vertexId == 1)
output.pos = float4(-0.5, -0.5, 0.5, 1.0);

output.color = clamp(output.pos, 0, 1);

return output;
}

If you want to do the same thing in OpenGL, you have at least two problems:

  • OpenGL uses glVertex* as end marker for the specification of a single vertex and the array rendering commands (glDrawArrays, glDrawElements, etc.) all require an enabled vertex array.
  • gl_VertexID is only supplied if:
    • the vertex comes from a vertex array command that specifies a complete primitive (e.g. DrawArrays, DrawElements)
    • all enabled vertex arrays have non-zero buffer object bindings, and
    • the vertex does not come from a display list, even if the display list was compiled using DrawArrays / DrawElements with data sourced from buffer objects.

    (from GL_EXT_gpu_shader_4)

There is no way around these requirements, but what you can do is to create dummy vertex buffer with one element, bind it as vertex array and simply draw as many vertices as you want. If you don’t access gl_Vertex there is no way that uninitialized data can affect the shader and although behavior is generally undefined in OpenGL, if you render beyond the vertex buffer size, it has worked so far that I’ve test this on.

Drawing a circle without accessing gl_Vertex

Drawing a circle without accessing gl_Vertex

You can download the source code here.

Packing Vertex Positions into the Elements Array

The next step is to start packing and unpacking data in the gl_VertexID. For this an integer type and bit operations (shifting and masking at least) are required in the vertex shader, so it requires GLSL 1.30 at least.

The code is quite short from my proof of concept project, so I’m pasting the shader here:

#version 130
#extension GL_EXT_gpu_shader4 : enable

out vec4 color;

vec3 unpackVertex(int index) {
	return vec3( index & 0xFF, (index >> 8 ) & 0xFF, (index >> 16) & 0xFF ) * (2 / 255.0) - vec3(1.0);
}

void main()
{
	vec3 unpackedData = unpackVertex( gl_VertexID );
	gl_Position = vec4( unpackedData, 1.0 );
	color = vec4( (unpackedData + 1.0) / 2.0, 1.0 );
}

In main.cpp the equivalent can be found for setting up the elements array:

#define packFloat(v)	(int(((v) + 1.0) / 2 * 255) & 255)
#define packVertex(x,y,z) (packFloat( x ) + (packFloat( y ) << 8 ) + (packFloat( z ) << 16))

void display(void) {
[...]

unsigned indices[] = {
packVertex( 0.0, 0.0, -1.0 ), packVertex( 1.0, 0.0, -1.0 ), packVertex( 1.0, 1.0, -1.0 ),
packVertex( 0.0, 0.0, -1.0 ), packVertex( -1.0, 0.0, -1.0 ), packVertex( -1.0, -1.0, -1.0 )
};
glDrawElements( GL_TRIANGLES, sizeof( indices ) / sizeof( *indices ), GL_UNSIGNED_INT, indices );

[...]
}

For the code to actually make sense I should initialize an index buffer/elements array buffer in OpenGL and upload the indices into it but this was just for testing.

Drawing two triangles by packing their position into the index buffer

Drawing two triangles by packing their position into the index buffer

You can download the source code here.

PS: Wordpress is incredibly annoying – or rather this stupid SyntaxHighlighter Evolved plugin >_<

Tags: , ,

21 Jun 09 Extending Java and Javac

Today I want to write about something I’ve been working ages ago – specifically in March I wanted to see if I can extend a Java compiler to support LINQ´ expressions, too.

I probably spend more time on finding a good open-source compiler to experiment with than I later spent on trying things out, so let me share my preferred source with you: http://openjdk.java.net/ is a good address to start with.
More specifically http://openjdk.java.net/groups/compiler/ contains some valuable information about the way the compiler works.
A nice thing is that there is a branch that has added support for ANTLR which makes added language a tad bit easier since you get to change a grammar file instead of tweaking hand-written lexers and parsers. More info about it can be found at http://openjdk.java.net/projects/compiler-grammar/.
You can download the source code from http://hg.openjdk.java.net/ – don’t follow the link to http://hg.openjdk.java.net/compiler-grammar/compiler-grammar, that one will only allow you to download part of the branch´.

I didn’t come around to add support for LINQ in the end, but to get known to the compiler and the ANTLR grammer, I added support for the var keyword as known from C#, which allows for automatic type deduction and for anonymous objects (again using the C# syntax). Thus my changes allowed for the following to compile and execute correctly:

public class Test {
	public  static void main(String[] args) {
		// automatic type deduction
		var t = Math.atan(1);
		System.out.println( t );

		// anonymous type
		var i = new { Amount = 108, message = "hello" };
		System.out.println( i.Amount );
	}
}

(more…)

Tags: , , , , ,

26 May 09 \epsilon Induction

Induction over natural numbers is a standard tool in Mathematics. But what about doing induction over real numbers´? The structure is different and it’s not immediately clear what is meant, so let me clarify it.

Let P(v) be a proposition that we want to show for all  v \in \mathbb{R}^+_0 .

Then we can use the following “induction principle” to prove it:

 \exists \epsilon \in \mathbb{R}^+
 \text{a)} \forall u \in \left [0,\epsilon \right ]: P(u)
 \text{b)} \left( \forall u \in \left  [0, v - \epsilon \right ]: P(u) \right ) \Rightarrow P(v) ´

It’s easy to show that this is a correct way to prove it. I’ll use induction over natural numbers to prove it :-)

Let  I_k := \left [k \cdot \frac \epsilon 2, (k+1) \cdot \frac \epsilon 2 \right ] .
Then let’s show by induction over  k that  \text{(*)} \forall u \in I_k: P(u) :

Base:

From a) it follows that (*) already holds for I_0 and I_1.

Induction Step ( k \ge 2 ):

If (*) holds for  I_0, ..., I_{k-1}, then it obviously holds for all  u \in \left [ 0, k \cdot \frac \epsilon 2 \right ].

Fix  v \in I_k , then  v \le (k+1) \cdot \frac \epsilon 2 \Leftrightarrow v - \epsilon \le (k-2) \cdot \frac \epsilon 2 < k \cdot \frac \epsilon 2 \right . With b) it follows that P(v) holds.

That is,  P(v)  \forall v \in I_k .

#

It should be possible to show that you can generalize this to:

 \exists \epsilon \in \mathbb{R}^+
 \text{a)} \forall u \in \left [0,\epsilon \right ]: P(u)
 \text{b)} \left( \forall u \in \left  [\psi(v), \phi( v ) \right ]: P(u) \right ) \Rightarrow P(v)
with  \phi( v ):  \mathbb{R}^+ \to \mathbb{R}^+_0 ,  \psi( v ): \mathbb{R}^+ \to \mathbb{R}^+_0 and  \psi(v) \le \phi(v) < v .

Tags: ,

07 May 09 Analysis, Cauchy-Schwarz and Reciprocal Sums

Konkrete Analysis

First I want to share some solutions for the exercises of a Maths book. The book is called Konkrete Analysis by Folkmar Bornemann and is written in German as are my solutions for some of the exercises. (I think I cover about 70% of all exercises in the book and pretty much every easy one :-) )

I don’t claim that my solutions are correct and there are probably quite a few (uncorrected) mistakes, but right now I haven’t been able to find any other openly available solutions. Although the book claims to be easily readable without attending the lectures of Professor Bornemann, I doubt that it is possible to do so successfully without being able to do the exercises and while doing so being able to get help or look at possible solutions to find new ideas.

I did all the exercises as way to prepare myself for the exam (it was very successful alas probably not in the most time efficient way), so it also contains solutions for old exams (but those are appended at the end and probably not interesting at all).

You can download the PDF konkrete-analysis-solutions here.

Cauchy-Schwarz and a Problem

The book is quite useful though as was the lecture “Analysis for Computer Scientists” which was based on the book (or vice-versa) and it is probably the single one lecture that has taught most since I started attending Computer Science at university.

One of the many topics (we covered a lot which was one of the nice things) was inequalities and the useful things you can do with them. Especially the Cauchy-Schwarz inequality turns out to be quite useful while pretty basic:

 <a,b> \leq ||a|| \cdot ||b|| with  <a,b> = || a || \cdot || b || when  \exists \lambda: a = \lambda b

Interestingly enough this can already yield results that are quite difficult to obtain otherwise. For example we can easily prove lower or upper bounds that is the existence of a maximum or minimum by applying it.

First we obtain a lower or upper bound (for the right or the left side respectively) and then we can use the equality case to prove the existence of the a minium or maximum.

For example, let’s take a look at this problem:

We have  x \in \mathbb{R}^n, 0 < x, \sum_{i=1}^n {x_i} = 1 and want to minimize  \sum_{i=1}^n { \frac 1 x } .
We simply set  a_i := \sqrt{x_i}, b_i := \sqrt{ \frac 1 x_i } .

Then we get:

 \sum_{i=1}^n { a_i b_i } = \sum_{i=1}^n { \sqrt{ x_i } \frac 1 {\sqrt{ x_i }} } = \sum_{i=1}^n 1 = n
 \leq \sqrt{ \sum_{i=1}^n \sqrt{x_i}^2 } \sqrt{ \sum_{i=1}^n \frac 1 {\sqrt{x_i}^2} } = \sqrt { \sum_{i=1}^n x_i } \sqrt{\sum_{i=1}^n \frac 1 x_i } = 1 \cdot \sqrt{\sum_{i=1}^n \frac 1 x_i }

That is:  \sum_{i=1}^n \frac 1 x_i \geq n^2 .

Now we know an lower bound for the reciprocal sum.
To determine the minimal x vector, we remember to start with:

 a = \lamda b \Leftrightarrow \sqrt x_i = \lambda \frac 1 \x_i \Leftrightarrow x_i = \lambda

We use that with the constraint:  1 = \sum_{i=1}^n x_i = \sum_{i=1}^n \lambda = n \cdot \lambda ,
resulting in:  x_i = \lambda = \frac 1 n .

Generalization

The nice thing is you can go and generalize this finding to more interesting minimization problems as there also exist more advanced versions of the Cauchy-Schwarz inequality´ adding degrees of freedom to play around with´.

Thus it is possible to solve the following class of problems using a slightly more advanced Cauchy-Schwarz inequality:

 x,q \in \mathbb{R}^n; 0 < x, q
0 < \alpha, \gamma < \infty; \beta > 0
\large\sum_{i=1}^n {x_i}^\gamma = \beta
\large min \sum_{i=1}^n \frac {q_i} {x_i^\alpha}

by utilizing

\large \sum_{i=1}^n { a_i b_i } \le \sqrt[p]{ \sum_{i=1}^n a_i^p} \sqrt[p']{ \sum_{i=1}^n b_i^p'} with  \frac 1 p + \frac 1 {p'} = 1

Equality requires the same linear dependence condition as the normal Cauchy-Schwarz inequation.

I’ve only deduced the solution for  \gamma = 1 but it shouldn’t be difficult to adapt it once you get the idea how it works (it’s fairly straight forward).

You can find the whole deduction in the PDF “Minimum of a Generalized Reciprocal Sum”.

For completeness’ sake here are the results for an optimal solution  x^{*} :

\large x_i^{*} = \beta \cdot \frac { \sqrt[1 + \alpha]{q_i} }{\sum_{k=1}^n \sqrt[1 + \alpha]{q_k}}

\large \frac { \left (\sum_{i=1}^n \sqrt[1 + \alpha]{q_i} \right )^{1 + \alpha}} {\beta^\alpha} = \sum_{i=1}^n \frac {q_i} {x_i^{*}^\alpha}

Epilog

Coming up with the solution was actually a lot of fun and quite interesting because I didn’t know it was possible to deduce it this way nor did I know how to do it before. If you read the PDF, I think it’s quite nice how you can play around with the formulas and come up with new things from the Cauchy-Schwarz inequality.
If you are only interested in  x ^ {*} , you can shortcut everything by immediately skipping to the equality case once  p and  p' have been determined and solve for that.
I mainly wrote it all out because I wanted to make sure, that it actually was correct.

If it wasn’t for my Analysis lecture last term, I probably wouldn’t have found this.

Cheers,
Andreas

Tags: , , , ,

05 May 09 The Reverse Pigeonhole Principle

Everybody (I hope) knows about the pigeonhole principle.

In short it states that:

If  f: N \to M and  n:= \left | N \right | \leq \left | M \right | =: m , then there exists at least one  m_0 with  \left | f^{-1}\left ( m_0 \right ) \right | \geq \left \lceil \frac {n} {m} \right \rceil  .

For example, if we have n + 1 pigeons in only n pigeonholes, there is (at least) one pigeonhole that has at least two pigeons inside (thus the name).

One thing I haven’t seen mentioned anywhere yet (at least not with this name) is the reverse pigeonhole principle:

If  f: N \to M and  n  := \left | N \right | \leq \left | M \right |  =: m , then there exists at least one  m_0 \in M with  \left | f^{-1}\left ( m_0 \right ) \right | \leq \left \lfloor \frac {n} {m} \right \rfloor .

Again an example: if we have n + 1 pigeons (it’s valid up to 2n – 1) and n pigeonholes, then there exists (at least one) pigeonhole with at most one pigeon in it.

The proof for the reverse principle is simple:

Assume that for all  m_0 \in M :  \left | f^{-1}\left ( m_0 \right ) \right | > \left \lfloor \frac {n} {m} \right \rfloor, that is each such set contains at least  \left \lfloor \frac {n} {m} \right \rfloor + 1 elements, then from   \frac n m - 1 < \left \lfloor \frac {n} {m} \right \rfloor it follows that:

 n = \sum_{m_0 \in M} { \left | f^{-1} \left ( m_0 \right ) \right | \geq m \left ( \left \lfloor \frac {n} {m} \right \rfloor + 1 \right ) > m \cdot \frac n m = n  } .

This clearly is a contradiction, so the original assumption must have been wrong, which proves the reverse pigeonhole principle.

I haven’t seen this written down anywhere before and it’s useful to keep it in my mind on its own because it shortens some arguments, so I hope this might help someone :-)

On other news my LaTeX script is severly broken and I don’t know why :-|
For some reason it sometimes merges two LaTeX pieces into one because it forgets to echo the last ” after the title (which is impossible, of course..).
Cheers,
Andreas

Tags: ,

31 Jan 09 Seminar about Motion Retargeting

creatureanimationwbTwo weeks ago I had to give a presentation about Motion Retargeting, which I want to share with you now.
I created it due to me attending a seminar about the latest developments in Computer Graphics at university and my presentation was about the Siggraph ‘08 paper “Real-time Motion Retargeting to Highly Varied User-Created Morphologies” from Chris Hecker et al.
You can check it out on Chris Hecker’s homepage – his website also contains a bunch of other really cool articles and presentations from various conferences, so it certainly is worth taking a look at it.

I’ve also sifted through quite a lot of IK papers and lectures for my presentation to understand the later part about the IK solver in Spore and I’ve found a few links that are a nice read:

Ive created a huge PowerPoint presentation for my seminar  :-) It includes a few videos (thanks again to Chris Hecker for uploading them and replying to my emails incredibly fast ´) and two awesome IK solvers that I’ve implemented with VBA macros´ to show how CCD and Particle IK solvers work.

You can find a zip with all the videos and high resolution images here (includes both a PPT 2003 file and a 2007 file). I’ve also uploaded a small version without videos, macros as PPT 2003 file here, if you don’t feel like downloading the 23 MB .zip file.

Here’s a YouTube video of the two IK solvers:

http://www.vimeo.com/3021779

I’ve exported the code into an additional IK Playground presentation which contains just one slide and the two IK solvers with the setup you see in the video above. You can find the PPT 2003 version of it here and the 2007 one here.

I’ve zipped up the macros here if you want to use them in your own slides. I’ve also written a handy VBA form that allows one to edit everything more easily (the macros are hardly documented though, but if anyone really wants to use them and runs into problems – you can always drop me a line or two in a comment :) )

The IK Solver Tool Form

The IK Solver Tool Form

BTW I’m not sure you know about it´, but Blender contains an awesome video editor – the UI needs some time to get used to, but the online documentation has improved a lot and with it, it works like a charm. Blender also supports some pretty professional filters, so it’s going to be my video editing tool of choice from now on.

Cheers,
Andreas

Tags: , , , , , , ,