9 September 2009

Things wot I dun wrong, or how I learned to love compiler development

The Delve typechecker is nearly done, a patch should be pushed by the end of the week. It infers interfaces of objects instead of their concrete types, and so it acts like a strongly typed Smalltalk or Ruby, which was the plan.

Anyone who has looked at the Delve repository might have been struck by the absence of automatic unit testing available. This won't be a problem for the typechecker - I can think up loads of expressions which should typecheck, and loads which shouldn't - then I simply have to look at the results. The rest of the compiler/VM is not so simple.

For instance, how is one to automatically check that tail-call optimization is working properly? We can look at the internals of the virtual machine, but this is a bad idea - I initially debugged the virtual machine with unit tests written in the target VM language - which can take a hundred lines to do something that takes a few in the source language Delve. Since the machine is very fluid, every time I get a better idea then the machine changes. Having hundreds of lines of test code which have to change when the machine changes is too much work, especially considering I'm the only person currently hacking on Delve :)

To solve this problem, I'm gonna add unit tests for tail recursion, coroutines using continuations, mutually recursive routines which I can be sure allocate a large, fixed quantity of memory every recursion. Then I'd like to use POSIX resource management to see if the limit is exceeded.

However, this is POSIX only, and will also require statistical analysis to ensure it works correctly across many systems.

I need a better plan... :)

Also I had made a bit of a silly mistake. See, the way the Delve in the public repository works is:
String is parsed into Core
Core is VARAPPED into VarCore - a version of core without nested expressions - temporary variables overcome this.
VarCore is RESOLVED into LocalCore - like var core, but local variables are associated with a stack index - since the VM records a stack of local scopes, it's better to pre-compute which local scope is associated with each variable.
LocalCore is then compiled into VM code.

However, I realized that the RESOLUTION into local core should occur before type checking, and also since VARAPPING a legal program should result in another legal program, that type checking should occur before VARAPP.

Soo the process is now

String is parsed into Core
Core is RESOLVED into LocalCore
LocalCore is type-checked.
LocalCore is VARAPPED into VarCore
VarCore is compiled to DCode.

This is requiring much of the compiler to be changed to support different data-types - an advantage of the small codebase (only now 6.2KSLOC) is that this should only really take an hour or three. Of course, this is poor software engineering - several solutions spring to mind:

I was tempted to define an abstract form of core, AbstractCore, which would have kind * -> *. So it would be AbstractCore md, where md is associated MetaData. Compilation would proceed by transforming the metadata.

This has the disadvantage of errors like nested expression occuring in the syntax tree after the varapp phase not being caught at compile time - I would have to check they would be caught at run time.

Or I could associate each element in the tree with an integer, and just have an IntMap which maps the integer to associated MetaData. I think this would be easier to manage - since the metadata would be separated from the Tree.

This still suffers from that disadvantage.

However, if the various syntax trees are queried through a type-class, then they could be used independently. Also, this could return various elements in a type-safe manner, for instance, before the VarApp phase, there could be a type class instance for function application, which with multiple parameters would associate the application with arbitrary, possibly nested expressions. The instance used in the VarApp phase would associate the application with variables only.

This bit of randomness has turned into an idea! I think I will try this. Firstly, I'm just gonna get it working again, though, in the old and simple style - changing it to support new abstract types is going to a be a little bit more work than simply renaming a few things, which is all I have to do now.

No comments:

Post a Comment