ANTLR Stupidity (Warning 209)

I've been playing around with a Java grammar for ANTLR that was supposed to work straight-away but it did not, with very strange warnings and errors, that made it look like ANTLR only supports lexers with a lookahead of 1 character:

warning(209): ...: Multiple token rules can match input such as "'*'": STAR, STAREQ

while STAR matches only '*' and STAREQ only matches '*='. This is a huge w-t-f, especially if you have worked with ANTLR before and didn't have issues with this. This also contradicted all documentation you can find about ANTLR and its lexer rules.

I've spent a considerable amount of time with Google trying to find how to fix it. First I've found lots of posts on the ANTLR mailing list [antlr-interest] from people who had the same issue and no replies to them (really helpful, eh?). People had issues with replacing character ranges with unicode ranges (or rather a huge list of unicode characters), which probably caused the problem in my grammar, too. Others found that ANTLR suddenly behaved as if it only had a one character lookahead, but only if more than 300 lexer rules were used in the grammar.

After searching for a long time and almost giving up on the mini-project I've wanted to use ANTLR for, I've found this post: http://www.antlr.org/pipermail/antlr-interest/2009-September/035954.html (which matches my problem more or less but with additional insight)
and someone even replied (someone being the guy who maintains the C runtime of ANTLR):
http://www.antlr.org/pipermail/antlr-interest/2009-September/035955.html

If you are sure that the messages are not correct and the lexer rules
are not ambiguous, then you probably need to increase the conversion
timeout:

-Xconversiontimeout 30000

if that does not work, then there is a conflict in your rules.

Jim

And that turns out to be the right advice and the remedy to my problems and the problems of lots of other people probably.
However, no warning or error message I encountered mentioned that ANTLR's internal processing actually timed-out and there was no ambiguity in the grammar itself...

This comes to show that any good tool like ANTLR can quickly degrade to a piece of crap and a major source of annoyance, if error and warning messages aren't clear and helpful.

On further investigation, you can trigger warnings that the conversion times out:

internal error: org.antlr.tool.Grammar.createLookaheadDFA(Grammar.java:1279):
    could not even do k=1 for decision 121; reason: timed out (>1ms)

but not consistently. I guess this is a bug - either in ANTLR or in ANTLRWorks... :-|

  1. Hi. The grammar clearly states you should use that option; at least it used to. :) Did you look inside the grammar?
    Ter

    • Hello :o
      Yeah, you're right, in the middle of the comment section it says:

      * NOTE: If you try to compile this file from command line and Antlr gives an exception
      * like error message while compiling, add option
      * -Xconversiontimeout 100000
      * to the command line.
      * If it still doesn't work or the compilation process
      * takes too long, try to comment out the following two lines:
      * | {isValidSurrogateIdentifierStart((char)input.LT(1), (char)input.LT(2))}?=>('\ud800'..'\udbff') ('\udc00'..'\udfff')
      * | {isValidSurrogateIdentifierPart((char)input.LT(1), (char)input.LT(2))}?=>('\ud800'..'\udbff') ('\udc00'..'\udfff')

      I haven't seen the comment yesterday (it's right in the middle of the huge comment section at the top of the grammar and doesn't jump into the eye - that's only to my defense). However, I'm not sure, if that comment helps, when you have never heard about the fix before - it also only says to try it in the context of compiling the grammar from the command-line (again to my defense).

      I would suggest to move this comment up to the top - or, if it's an easy change, to make ANTLR print a warning if the conversion timeout is hit and suggest trying the command-line option.

      Otherwise kudos for creating such an useful tool :-)

      PS: Yesterday I've also found an unused lexer fragment rule in the grammar: SurrogateIdentifer at the bottom of the file.

Leave a Comment


NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>