ANTLR Stupidity (Warning 209)

I’ve been playing around with a Java grammar for ANTLR that was supposed to work straight-away but it did not, with very strange warnings and errors, that made it look like ANTLR only supports lexers with a lookahead of 1 character:

warning(209): ...: Multiple token rules can match input such as "'*'": STAR, STAREQ

while STAR matches only ‘’ and STAREQ only matches ’=’. This is a huge w-t-f, especially if you have worked with ANTLR before and didn’t have issues with this. This also contradicted all documentation you can find about ANTLR and its lexer rules.

I’ve spent a considerable amount of time with Google trying to find how to fix it. First I’ve found lots of posts on the ANTLR mailing list [antlr-interest] from people who had the same issue and no replies to them. People had issues with replacing character ranges with unicode ranges (or rather a huge list of unicode characters), which probably caused the problem in my grammar, too. Others found that ANTLR suddenly behaved as if it only had a one character lookahead, but only if more than 300 lexer rules were used in the grammar.

After searching for a long time and almost giving up on the mini-project I’ve wanted to use ANTLR for, I’ve found this post: http://www.antlr.org/pipermail/antlr-interest/2009-September/035954.html (which matches my problem more or less but with additional insight) and someone even replied (someone being the guy who maintains the C runtime of ANTLR): http://www.antlr.org/pipermail/antlr-interest/2009-September/035955.html

If you are sure that the messages are not correct and the lexer rules are not ambiguous, then you probably need to increase the conversion timeout: -Xconversiontimeout 30000 if that does not work, then there is a conflict in your rules. Jim

And that turns out to be the right advice and the remedy to my problems and the problems of lots of other people probably. However, no warning or error message I encountered mentioned that ANTLR’s internal processing actually timed-out and there was no ambiguity in the grammar itself…

This comes to show that any good tool like ANTLR can quickly degrade to a piece of crap and a major source of annoyance if error and warning messages aren’t clear and helpful.

On further investigation, you can trigger warnings that the conversion times out:

internal error: org.antlr.tool.Grammar.createLookaheadDFA(Grammar.java:1279):
    could not even do k=1 for decision 121; reason: timed out (>1ms)

but not consistently. I guess this is a bug - either in ANTLR or in ANTLRWorks… :-|