Al et WML

First, I have created some new pages regarding old university projects. Among them is a condensed page about light propagation volumes, which also made me update the project files to Visual Studio 2012, a page about my bachelor thesis in mathematics (Discrete Elastic Rods) and a page about my master thesis in computer science (Assisted Object Placement). I have not written about the latter two subjects before. Maybe I’ll talk some more about them later and write a full post-mortem on them.

This is the first of a number of posts that will be related to my master thesis, or rather code drops from its code base. I have written about 60k LoC in the 6 months of my master thesis and there are a few bits that might be useful in the future.

The first one that I want to talk about is a very simple file format I came up with. Devising new text file formats is not something that I have been very keen about lately. Especially not as some many already exist. However, I have found none which has really fit my requirements:

  • minimal clutter (preferably indentation-based),
  • support for raw text inclusion, and
  • good C++ support.

JSON has too much clutter and doesn’t support raw text. YAML, on the other hand, sounds like the perfect choice, even though it’s not that easy to find a good library for it. However, when it comes to raw text, you run into the issue that tab characters are never allowed as indentation. Moreover, I was not very happy with the API choices and some bugs in the libraries that I tried to use.

So I decided to develop a very simple text-based data storage format:

WML – Whitespace Markup Language

I’ll just start with an example, ie my readme. You can find the full readme here.

As you can see, it is a whitespace-based format inspired by Python. Like in Python, indentation is used to convey structure. A WML file represents a map structure: every node has a name and possibly multiple children.
A node definition in a WML file either contains the node name first and then multiple children names (which won’t have any children themselves), the node name followed by one colon to signify that a nested definition follows (similar to JSON), or the node name followed by two colons to signify a raw text block. Two different kinds of strings are supported: single-quoted raw strings that do not interpret escape sequences, and double-quoted C strings that support escape sequences.

All in all, the grammar is very simple:

I’ve used this format for custom shaders as well as for my settings files and the declaration files of my test scenes:

I’ve uploaded the current code for WML to GitHub, and you can find the code here. The API supports an overloaded index operator to access children of a node and contains both a parser and an emitter for WML.

Afterthoughts

First, the current API isn’t brilliant. It would be nice to separate the data model from the parser and emitter by using templates and type traits to improve abstraction. I think I might go and investigate different API types in the future and see which one works best for some simple cases.

Second, it would be possible to reduce clutter even more and remove the need for single colons to denote nested maps.
A even simpler format could look like this:

This would yield the following JSON-equivalent:

This still separates raw text from normal data. A node that contains raw text can never contain other children this way. However, I cannot think of a good way to accomplish that without introducing a special character to end a raw text block.

That’s it for now :)

Leave a Reply