30 July 2010

Grammar

Refining Obelisk's Grammar yet more. Now there is no need for the type terminator #

So the function from last post would now be written.

// An accumulating implementation of the factorial function. Tail recursive.
(Int -> Int)
def factorial x
{
   (fact 1 x)
}
where
{
   (Int -> Int -> Int)
   def fact acc i
   {
      if (i > 0)
         // Recurse
         {(fact (i * acc) (i - 1))}
         // We've recursed enough.  Return.
         {acc}
   }
}

After so much Haskell Programming, I'm liking the syntax without significant whitespace a lot.

27 July 2010

Obelisk syntax

I decided to change Obelisk's syntax.

For example, to define a function, you used to do (still the case on the online repository):

// An accumulating implementation of the factorial function. Tail recursive.
((Int -> Int) #
def factorial x
   ((fact 1 x))
   where
   ((Int -> Int -> Int #
   def fact acc i
      ((if (i > 0)
         ((fact (i * acc) (i - 1)))
         (acc)))))) 



That was okay because the syntax is unambiguous and was not affected by whitespace.

However, I thought it would make it clearer whether what you were looking at is a function application or a code block if it were to use curly braces.

This was especially a problem with where clauses! Multiple closing parenthesis are very ugly.

So the above function will now be written as:

// An accumulating implementation of the factorial function. Tail recursive.
(Int -> Int) #
def factorial x
{
   (fact 1 x)
}
where
{
   (Int -> Int -> Int) #
   def fact acc i
   {
      if (i > 0)
         // Recurse
         {(fact (i * acc) (i - 1))}
         // We've recursed enough.  Return.
         {acc}
   }
}



It looks much tidier, and still doesn't care about whitespace, which is great.

Edit: I'm still not so happy with it. The if statement looks ugly. Perhaps ruby style do and end would look nicer.

Edit2: Fixed it, removed need for functional application parenthesis around if expression.

Language.c

Just came across this really cool haskell library for manipulating C

Language-C

snm

...The Simple Nice-Looking Manual Generator!

I became annoyed maintaining Obelisk's long report in xhtml, so I wrote a program to make writing documentation easy.

snm github
snm on hackage

snm allows you to write clean, web-friendly reports, user guides and manuals without having to edit fickle html.

snm allows you to structure your document in a modular fashion.

snm document sections are written in yaml and are easy to write and understand.

snm is a generator of small, valid xhtml files.

Read the snm manual online!

23 July 2010

Code generation and testing

Lexing, parsing and type-checking have this in common: they can succeed or they can fail.

It is very easy to write tests to confirm the correctness of these stages. Much more difficult is to write tests for code-generation.

Even if a formal method is given which proves the correctness of a code-generation pass, it must still be tested for implementation errors.

Hence, THE PLAN:

Every code generation phase takes one form of intermediate language and produces the next.
To test Obelisk, a series of sample programs will be written, which under the semantics will produce a specific output for a specific input.

Then an interpreter must be written for every intermediate language. Each interpreter must be tested on all sample programs.

If an interpreter produces incorrect output when running a sample program, the compiler stage associated with that interpreter has a bug!

Writing several interpreters sounds like a lot of work, just for testing, but it's going to be easier than having to read through several thousand lines of code-generator every time erroneous behaviour is spotted.

21 July 2010

Learning basic assembler.

I've been learning assembler, so I can write a code generator for Obelisk. I'm at page 98 of (the PDF version) of Dr Paul Carter's PC Assembly Language.

So far, I've learned how to call C functions. I need to do this because the Obelisk runtime is written in C!

Anyway, it aint much, but here's a 'hello world', written for NASM, the netwide assembler..

hello.asm:




; Hello world program!

; The data segment contains data that is always in memory (unlike the stack)
segment .data
format db 'Hello world!', 10, 0 ; The format string for printf

; The text segment contains the code the machine will execute
segment .text
   global main ; Main must be global for gcc to find it!
   extern printf ; We must declare that printf is defined in another object file.

main: push dword format ; Push the argument to printf on to the stack.
      call printf ; Call printf
      pop eax ; Remove the argument from the stack



This will need to be linked to libc because it uses printf. I'm using gcc because it's simple.
So, in the shell, assemble and link and run with:




$ nasm -f elf hello.asm && gcc hello.o -o hello && ./hello
> Hello world!



On my computer, the object file hell.o is only 608 bytes long

20 July 2010

Garbage collection

Obelisk has a simple copying garbage collector now :)

18 July 2010

Obelisk

I've started a new programming language project: Obelisk.

Check out what I have so far at

Obelisk github repository

That is the place to go if you want to understand the syntax and semantics of Obelisk.

The rest of this post is to outline the reasons why it's going to work! Some may consider it a bit of a rant, but I assure you that I am not a crackpot!

Firstly, about my last programming language project, Delve. It petered out for a number of reasons:
I didn't have a proper lexer.
I wrote my own (buggy) parser.
I was trying to write a type-checker for a language where new methods can be added to an object, or an object can change its class, at any time.
I wrote the runtime in Haskell, which processed high-level bytecodes. It was very slow.
My approach was a rather chaotic, and I didn't produce very much documentation (thank you beer).

On the plus side, I did learn a lot about programming language design, and the process definitely improved my programming skills .

However, Obelisk is different!

Obelisk is a strongly typed, systems programming language. Obelisk is theoretically based on the lambda calculus, but has a class based object system to create its data types.

Well, it will be and have all these things; and it will work out this time! I know this because:

I have a solid lexer, based on Text.Parsec.Token

The parser is generated by happy, which also has the benefit of showing that the grammar is unambiguous. This is A Good Thing.

Obelisk's semantics are much simpler, so I already have a type-checker for non-polymorphic non-higher order (lower order?) functions.

I am writing the run-time in C, and initial tests indicate it should be quite fast. And because the semantics of Obelisk are again quite simple, I am able to adapt much material in The Dragon Book to suit its needs.

Also my approach is better:

I have divided development into iterations. An iteration is a phase of development where a large proportion of the code-base must be changed in order to support new features. For instance, I am not going to worry about classes and objects, until I have a full working system for compiling simple-functions. The latter will take place in iteration #1, the former in iteration #2.

I also have ~1500 words of documentation so far, in a html file. This doesn't sound like a lot, but since I'm only concerned with iteration #1 objectives (simple functions) there isn't so much needed. Still, it documents all scoping and type rules, and gives examples.

So aye, you might say that I'm rather a chufty at this juncture. That's about all for now. Look at the repository for more information!