<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>BlackHC's Adventures in the Dev World &#187; ANTLR</title>
	<atom:link href="http://blog.blackhc.net/tag/antlr/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.blackhc.net</link>
	<description>Just another weblog</description>
	<lastBuildDate>Wed, 16 Nov 2011 23:12:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>ANTLR Stupidity (Warning 209)</title>
		<link>http://blog.blackhc.net/2010/09/antlr-stupidity/</link>
		<comments>http://blog.blackhc.net/2010/09/antlr-stupidity/#comments</comments>
		<pubDate>Mon, 06 Sep 2010 11:16:48 +0000</pubDate>
		<dc:creator>BlackHC</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Personal Rantings]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[University]]></category>
		<category><![CDATA[ANTLR]]></category>
		<category><![CDATA[bug]]></category>
		<category><![CDATA[conversiontimeout]]></category>
		<category><![CDATA[fix]]></category>
		<category><![CDATA[warning 209]]></category>

		<guid isPermaLink="false">http://blog.blackhc.net/?p=829</guid>
		<description><![CDATA[<a href="http://blog.blackhc.net/2010/09/antlr-stupidity/" title="ANTLR Stupidity (Warning 209)"></a>I've been playing around with a Java grammar for ANTLR that was supposed to work straight-away but it did not, with very strange warnings and errors, that made it look like ANTLR only supports lexers with a lookahead of 1 &#8230;<p class="read-more"><a href="http://blog.blackhc.net/2010/09/antlr-stupidity/">Read more &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<a href="http://blog.blackhc.net/2010/09/antlr-stupidity/" title="ANTLR Stupidity (Warning 209)"></a><p>I've been playing around with a Java grammar for ANTLR that was supposed to work straight-away but it did not, with very strange warnings and errors, that made it look like ANTLR only supports lexers with a lookahead of 1 character:</p>
<pre>warning(209): ...: Multiple token rules can match input such as "'*'": STAR, STAREQ</pre>
<p>while STAR matches only '*' and STAREQ only matches '*='. This is a huge w-t-f, especially if you have worked with ANTLR before and didn't have issues with this. This also contradicted all documentation you can find about ANTLR and its lexer rules.</p>
<p>I've spent a considerable amount of time with Google trying to find how to fix it. First I've found lots of posts on the ANTLR mailing list [antlr-interest] from people <a href="http://www.mail-archive.com/il-antlr-interest@googlegroups.com/msg04183.html" target="_blank">who had the same issue and no replies to them (really helpful, eh?)</a>. <a href="http://www.antlr.org/pipermail/antlr-interest/2009-September/035954.html" target="_blank">People had issues with replacing character ranges with unicode ranges (or rather a huge list of unicode characters)</a>, which probably caused the problem in my grammar, too. <a href="http://groups.google.com.pe/group/il-antlr-interest/browse_thread/thread/2a126c02758d6693" target="_blank">Others found that ANTLR suddenly behaved as if it only had a one character lookahead, but only if more than 300 lexer rules were used in the grammar</a>.</p>
<p>After searching for a long time and almost giving up on the mini-project I've wanted to use ANTLR for, I've found this post: <a href="http://www.antlr.org/pipermail/antlr-interest/2009-September/035954.html" target="_blank">http://www.antlr.org/pipermail/antlr-interest/2009-September/035954.html</a> (which matches my problem more or less but with additional insight)<br />
and someone even replied (someone being the guy who maintains the C runtime of ANTLR):<br />
<a href="http://www.antlr.org/pipermail/antlr-interest/2009-September/035955.html" target="_blank">http://www.antlr.org/pipermail/antlr-interest/2009-September/035955.html</a></p>
<blockquote><p>If you are sure that the messages are not correct and the lexer rules<br />
are not ambiguous, then you probably need to increase the conversion<br />
timeout:</p>
<p>-Xconversiontimeout 30000</p>
<p>if that does not work, then there is a conflict in your rules.</p>
<p>Jim</p></blockquote>
<p>And that turns out to be the right advice and the remedy to my problems and the problems of lots of other people probably.<br />
However, no warning or error message I encountered mentioned that ANTLR's internal processing actually timed-out and there was no ambiguity in the grammar itself...</p>
<p>This comes to show that any good tool like ANTLR can quickly degrade to a piece of crap and a major source of annoyance, if error and warning messages aren't clear and helpful.</p>
<p>On further investigation, you can trigger warnings that the conversion times out:</p>
<pre>internal error: org.antlr.tool.Grammar.createLookaheadDFA(Grammar.java:1279):
    could not even do k=1 for decision 121; reason: timed out (&gt;1ms)</pre>
<p>but not consistently. I guess this is a bug - either in ANTLR or in ANTLRWorks... <img src='http://blog.blackhc.net/wp-includes/images/smilies/icon_neutral.gif' alt=':-|' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.blackhc.net/2010/09/antlr-stupidity/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Extending Java and Javac</title>
		<link>http://blog.blackhc.net/2009/06/extending-java-and-javac/</link>
		<comments>http://blog.blackhc.net/2009/06/extending-java-and-javac/#comments</comments>
		<pubDate>Sun, 21 Jun 2009 00:07:19 +0000</pubDate>
		<dc:creator>BlackHC</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[University]]></category>
		<category><![CDATA[ANTLR]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[Compiler]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[JavaC]]></category>
		<category><![CDATA[OpenJDK]]></category>

		<guid isPermaLink="false">http://blog.blackhc.net/?p=488</guid>
		<description><![CDATA[<a href="http://blog.blackhc.net/2009/06/extending-java-and-javac/" title="Extending Java and Javac"></a>Today I want to write about something I've been working ages ago - specifically in March I wanted to see if I can extend a Java compiler to support LINQ&#180; expressions, too. I probably spend more time on finding a &#8230;<p class="read-more"><a href="http://blog.blackhc.net/2009/06/extending-java-and-javac/">Read more &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<a href="http://blog.blackhc.net/2009/06/extending-java-and-javac/" title="Extending Java and Javac"></a><p>Today I want to write about something I've been working ages ago - specifically in March I wanted to see if I can extend a Java compiler to support <a href="http://en.wikipedia.org/wiki/Language_Integrated_Query" target="_blank" >LINQ</a><a class="annotation" title="Language Integrated Query" href="javascript:;"><strong>&#180;</strong></a> expressions, too.</p>
<p>I probably spend more time on finding a good open-source compiler to experiment with than I later spent on trying things out, so let me share my preferred source with you: <a href="http://openjdk.java.net/">http://openjdk.java.net/</a> is a good address to start with.<br />
More specifically <a href="http://openjdk.java.net/groups/compiler/">http://openjdk.java.net/groups/compiler/</a> contains some valuable information about the way the compiler works.<br />
A nice thing is that there is a branch that has added support for <a href="http://www.antlr.org/">ANTLR </a>which makes added language a tad bit easier since you get to change a grammar file instead of tweaking hand-written lexers and parsers. More info about it can be found at <a href="http://openjdk.java.net/projects/compiler-grammar/">http://openjdk.java.net/projects/compiler-grammar/</a>.<br />
You can download the source code from <a href="http://hg.openjdk.java.net/">http://hg.openjdk.java.net/</a> - don't follow the link to http://hg.openjdk.java.net/compiler-grammar/compiler-grammar, that one will only allow you to download part of the branch<a class="annotation" title="and nothing interesting either, which was very frustrating at the beginning" href="javascript:;"><strong>&#180;</strong></a>. </p>
<p>I didn't come around to add support for LINQ in the end, but to get known to the compiler and the ANTLR grammer, I added support for the <strong>var</strong> keyword as known from C#, which allows for automatic type deduction and for <strong>anonymous objects</strong> (again using the C# syntax).  Thus my changes allowed for the following to compile and execute correctly:</p>
<pre class="brush: java; title: ; notranslate">public class Test {
	public  static void main(String[] args) {
		// automatic type deduction
		var t = Math.atan(1);
		System.out.println( t );

		// anonymous type
		var i = new { Amount = 108, message = &quot;hello&quot; };
		System.out.println( i.Amount );
	}
}</pre>
<p><span id="more-488"></span></p>
<h3>Automatic Type Deduction</h3>
<p>Let's take a look at how I added support for the <strong>var</strong> keyword, which requires an initializer at the variable declaration and automatically deduces the type used and uses that.</p>
<p>This itself is not a feature that I'd actually recommend using in general because it obfuscates a variable's type<a class="annotation" title="obviously..." href="javascript:;"><strong>&#180;</strong></a> and makes it harder to read and understand the code but in conjunction with LINQ and anonymous types, it is very useful, because you don't want to know the type of the query<a class="annotation" title="and you actually can't if the result is based on an anonymous type" href="javascript:;"><strong>&#180;</strong></a>.</p>
<p>The nice thing about ANTLR is that it's.. <em>nice</em>.<br />
I'm going post the diff from my repository to show how easy it was to add support for the <strong>var</strong> keyword in the grammar file:</p>
<p>It originally looked like this:</p>
<pre class="brush: java; title: ; notranslate">localVariableDeclaration returns [com.sun.tools.javac.util.List&lt;JCStatement&gt; list]
        @init {
            [...]
        }
        @after {
            [...]
        }
    :   variableModifiers type
            {
                mods = $variableModifiers.tree;
                type = $type.tree;
            }
        va1=variableDeclarator
            {
                JCExpression ntype = pu.makeTypeArray(type,$va1.i, $va1.dimPosition, $va1.endPosition);
                JCStatement ntype1 = T.at($va1.pos).VarDef(mods, $va1.name, ntype, $va1.tree);
                //pu.storeEnd(ntype1, $va1.stop);
                ptree = ntype1;
                listBuffer.append(ntype1);
            }
        (cm=',' va2=variableDeclarator
            {
                JCExpression ntype = pu.makeTypeArray(type,$va2.i, $va2.dimPosition, $va2.endPosition);
                JCStatement ntype1 = T.at($va2.pos).VarDef(mods, $va2.name, ntype, $va2.tree);
                pu.storeEnd(ptree, $cm);
                ptree = ntype1;
                listBuffer.append(ntype1);
            }
        )*
    ;
</pre>
<p>I changed it to:</p>
<pre class="brush: java; highlight: [12,13,14,15,16,17,21,22,23,31,32,33]; title: ; notranslate">
localVariableDeclaration returns [com.sun.tools.javac.util.List&lt;JCStatement&gt; list]
        @init {
            [...]
        }
        @after {
            [...]
        }
    :   variableModifiers
            {
                mods = $variableModifiers.tree;
            }
        (VAR |
         type
        	{
        		type = $type.tree;
        	}
        )

        va1=variableDeclarator
            {
                JCExpression ntype = null;
                if( type != null )
	                ntype = pu.makeTypeArray(type,$va1.i, $va1.dimPosition, $va1.endPosition);
                JCStatement ntype1 = T.at($va1.pos).VarDef(mods, $va1.name, ntype, $va1.tree);
                //pu.storeEnd(ntype1, $va1.stop);
                ptree = ntype1;
                listBuffer.append(ntype1);
            }
        (cm=',' va2=variableDeclarator
            {
                JCExpression ntype = null;
                if( type != null )
	                ntype = pu.makeTypeArray(type,$va1.i, $va1.dimPosition, $va1.endPosition);
                JCStatement ntype1 = T.at($va2.pos).VarDef(mods, $va2.name, ntype, $va2.tree);
                pu.storeEnd(ptree, $cm);
                ptree = ntype1;
                listBuffer.append(ntype1);
            }
        )*
    ;
</pre>
<p>The code changes the way a local variable declaration works by using <strong>(VAR | type)</strong> instead of <strong>type</strong> and later in the grammar the VAR token is added</p>
<pre class="brush: java; light: true; title: ; notranslate">VAR	:	'var';</pre>
<p>and a lookahead rule also needs to be adapted<a class="annotation" title="trial and error ftw..." href="javascript:;"><strong>&#180;</strong></a>:</p>
<pre class="brush: java; light: true; title: ; notranslate">localVariableHeader
    :   variableModifiers (type|VAR) IDENTIFIER ('['']')* ('='|','|';')
    ;</pre>
<p>The code doesn't enforce that a <strong>var</strong> variable needs to have an initializer, this is later done in the actual code. It would be easy to add a flag to the variableDeclarator but it would require even more changes to the grammar file.<br />
Now we only need to run ANTLR and regenerate the parser and lexer from the grammar and we're done with this part.</p>
<p>The main change is in visitVarDef in MemberEnter (which completes the Enter stage - see <a href="http://openjdk.java.net/groups/compiler/doc/compilation-overview/index.html">http://openjdk.java.net/groups/compiler/doc/compilation-overview/index.html</a> for more info):</p>
<pre class="brush: java; highlight: [9,10,11,12,13,14,15,16,17,18,19,20,21,22]; title: ; notranslate">
public void visitVarDef(JCVariableDecl tree) {
        Env&lt;AttrContext&gt; localEnv = env;
        if ((tree.mods.flags &amp; STATIC) != 0 ||
            (env.info.scope.owner.flags() &amp; INTERFACE) != 0) {
            localEnv = env.dup(tree, env.info.dup());
            localEnv.info.staticLevel++;
        }
        // old: attr.attribType(tree.vartype, localEnv);
        // BlackHC: deduce the type from the initializer if we have a variant
        if( tree.vartype == null ) {
            if( tree.init != null ) {
                tree.vartype = make.Type(attr.attribExpr(tree.init, localEnv));
                tree.vartype.type = tree.init.type;
            }
            else {
                log.error(tree.pos, &amp;amp;quot;initializer.required.for.implicit.type&amp;amp;quot;);
                return;
            }
        }
        else {
            attr.attribType(tree.vartype, localEnv);
        }

        Scope enclScope = enter.enterScope(env);
        VarSymbol v =
            new VarSymbol(0, tree.name, tree.vartype.type, enclScope.owner);
        v.flags_field = chk.checkFlags(tree.pos(), tree.mods.flags, v, tree);
        tree.sym = v;
        if (tree.init != null) {
            v.flags_field |= HASINIT;
            if ((v.flags_field &amp; FINAL) != 0 &amp;&amp; tree.init.getTag() != JCTree.NEWCLASS) {
                Env&lt;AttrContext&gt; initEnv = getInitEnv(tree, env);
                initEnv.info.enclVar = v;
                v.setLazyConstValue(initEnv(tree, initEnv), log, attr, tree.init);
            }
        }
        if (chk.checkUnique(tree.pos(), v, enclScope)) {
            chk.checkTransparentVar(tree.pos(), v, enclScope);
            enclScope.enter(v);
        }
        annotateLater(tree.mods.annotations, localEnv, v);
        v.pos = tree.pos;
    }
</pre>
<p>The code simply initializes the initializer expression's type early and sets the variable's type to it.<br />
Because of this Attr's visitVarDef needs to be adapted to avoid recreating the type later<a class="annotation" title="this is actually hack but since it's prototype code it's not that big an issue hopefully" href="javascript:;"><strong>&#180;</strong></a>:</p>
<pre class="brush: java; first-line: 735; title: ; notranslate">
                    // BlackHC: this if condition is a hack to keep anonymous objects, etc. from breaking &amp;amp;gt;_&amp;amp;lt;
                    if( tree.init.type != tree.vartype.type )
                        attribExpr(tree.init, initEnv, v.type);
</pre>
<p>Now only an additional line needs to be added to res/compiler.properties to add the error message text that should appear if the initializer is missing and we're done:</p>
<pre class="brush: plain; first-line: 475; title: ; notranslate">
compiler.err.initializer.required.for.implicit.type=
    initializer required for implicitly-typed variables
</pre>
<p>I also added a line to com.sun.tools.javac.main.Main's compile function to display a custom string to make sure that the correct compiler is run, but that's just cosmetic.</p>
<p>With this, a new keyword has been added to the Java compiler with a few lines being changed only. The compiler itself is not that straight-forward to understand if you're not used to its design, but it's still amazing that it's that easy.<br />
It took me about 15 hours at most to implement this feature. 80% of the time was spent looking through the code and grammar and identifying how to best add the keyword and implement it.</p>
<p>If you want to test your compiler something like the following command-line is needed (on Windows)<a class="annotation" title="or you can configure Eclipse accordingly.." href="javascript:;"><strong>&#180;</strong></a>:</p>
<pre class="brush: plain; light: true; title: ; notranslate">
java -cp MyJavaC\bin;antlrworks-1.2.3.jar com.sun.tools.javac.Main Test\Test.java
</pre>
<h3>Anonymous Objects</h3>
<p>This was an even simpler feature to implement that did not require any code changes at all.<br />
The change only allows for local <strong>var</strong> variables but this is just because we only changed the localVariableDeclaration rule.<br />
Adding anonymous objects (ie. <strong>new { fieldName = initializer [, ...] }</strong>) is straight-forward once you have automatic type deduction and if you think about it, it's obvious that it's nothing but a rewrite of <strong>new Object() { public var fieldName = initializer; [...] }</strong>.<br />
ANTLR shows its strength here:</p>
<pre class="brush: java; highlight: [17,18,19,20,21,22,23,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58]; title: ; notranslate">
creator returns [JCExpression tree]
        @init {
                [...]
        }
    :   'new' nonWildcardTypeArguments cr1=classOrInterfaceType cl1=classCreatorRest
            {
                [...]
            }
    |   'new' cr2=classOrInterfaceType cl2=classCreatorRest
            {
                createdName = $cr2.tree;
                args = $cl2.list;
                body = $cl2.tree;
                $tree = T.at(pos).NewClass(null, typeArgs, createdName, args, body);
                pu.storeEnd($tree, $cl2.stop);
            }
    // BlacHC: add C# anonymous types
    |	'new' '{' typebody=anonymousTypeBody b2='}'
		    {
		    	createdName = T.at(pos).Ident(names.fromString(&amp;amp;quot;Object&amp;amp;quot;));
		    	$tree = T.at(pos).NewClass(null, typeArgs, createdName, args, $typebody.tree);
		    	pu.storeEnd($tree, $b2);
		    }
    |   arrayCreator
            {
                $tree = $arrayCreator.tree;
            }
    ;

anonymousTypeBody returns [JCClassDecl tree]
	    @init {
            ListBuffer&amp;amp;lt;JCTree&amp;amp;gt; defs = new ListBuffer&amp;amp;lt;JCTree&amp;amp;gt;();
            JCTree ptree = null;
            String dc = ((AntlrJavacToken) $start).docComment;
        }
		@after {
 			JCModifiers mods = T.at(Position.NOPOS).Modifiers(0);
            $tree = T.at(((AntlrJavacToken) $start).getStartIndex()).AnonymousClassDef(mods, defs.toList());
            if (ptree != null) {
               	pu.storeEnd(ptree, $stop);
           	}
        }
    :	(va1=variableDeclarator
            {
                JCVariableDecl tree = T.at($va1.pos).VarDef(T.at(Position.NOPOS).Modifiers(Flags.PUBLIC | Flags.FINAL), $va1.name, null, $va1.tree);
                pu.attach(tree, dc);
                ptree = tree;
                defs.append(tree);
             }
        (cm=',' va2=variableDeclarator
            {
                JCVariableDecl tree = T.at(va2.pos).VarDef(T.at(Position.NOPOS).Modifiers(Flags.PUBLIC | Flags.FINAL), $va2.name, null, $va2.tree);
                pu.storeEnd(ptree, $cm);
                ptree = tree;
                pu.attach(tree, dc);
                defs.append(tree);
            }
        )*)?
	;
</pre>
<p>This is all that is needed. The code is mostly copy'n'pasted from other rules (classOrInterfaceType and classCreatorRest) and it wasn't really that difficult. The long compile times of ANTLR were the only obstacles when writing it.</p>
<h3>Try it</h3>
<p>I've uploaded my current sources (ready to compile and run) and you can download the zip <a href="http://blog.blackhc.net/wp-content/uploads/2009/06/javaext_blog.zip">here</a>.<br />
Just execute the compileAndRun.bat and my javac and the test should be compiled and run.</p>
<h3>What Else?</h3>
<p>This is it for today. But let me tell you about a few final thoughts:</p>
<ul>
<li>Hacking away on compiler code and grammar files is a lot fun<a class="annotation" title="as soon as things start to run" href="javascript:;"><strong>&#180;</strong></a></li>
<li>It's not feasible for real projects, because you don't want to start questioning the validity of the compiler you're using - I had that with QuakeC and it wasn't fun at all - and chasing compiler bugs is terrible in general when you want to spend your time working on project's actual code</li>
<li>I have started working on a preprocessor that would read in Java code with the extended syntax and emit normal 1.6 Java code.<br />
If you want to do something like this, you can probably find a grammar of your language on ANTLR's homepage - for Java it is: <a href="http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g" target="_blank">http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g</a>.</li>
<li>This approach is more difficult though because you suddenly lose the nice functionality that gives you an expression's type for free (which is a non-trivial thing to code on your own if you consider imports and local classes, etc.).</li>
<li>It's best thing to do, if you want to extend the language and it's a good idea to give it a thought if you have a project whose code could greatly benefit from some additional language features that can be easily emulated using normal code, to<a class="annotation" title="it's just that it's a lot of code that doesn't change a great bit and you want to avoid copy&amp;amp;paste lots of times" href="javascript:;"><strong>&#180;</strong></a>.</li>
<li>ANTLR is nice for quick prototyping even though the hand-written Java parser and lexer are faster in general.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.blackhc.net/2009/06/extending-java-and-javac/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

