22 September 2010

PROGRAMMER KEYBOARD

EDIT: fixed a bug! Results should be good now :) Damn, why did I forget to run the tests :p


Computer keyboards are fucking stupid, especially for programmers.

They fuck everyone too: QWERTY is so we don't jam the machinery. Even dvorak keyboards have staggered keys so the MECHANICAL arms can reach them.

However, as programmers, we use all sorts of symbols not in the ken of mere mortals (like *). And for most, you have to press shift just to get at them.

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

The solution: design a programmer keyboard!

First off, we need some numbers so it can be known which keys are most often pressed. To do this I wrote a key frequency program*. I then used it to find the key frequencies of the source files of a number of different programs, in various languages. These are:

linux - C
adagide - ada
axiom - c#
django - python
ghc - haskell
go - go
irrlicht - c++
limewire - java
phpbb3 - php
prototype and jquery - javascript
rails - ruby
regexkit - objective c
racket - racket

To simplify things, I only considered characters ASCII characters between horizontal tab (ASCII 9) and '~' (ASCII 126). These are usually the ones I type, at least.

My hypothesis is that many symbols are going to be more popular than letters. This would prove a need for these symbols to have their OWN FUCKING KEYS instead of being the hangers on and miscreants of the keyboard world.

Here are the results of my analysis. The character is on the left, the fraction of the whole is on the right. They are presented in descending order of popularity.

' ': 0.12601734868442382
'e': 5.672758827442585e-2
't': 4.506425247731305e-2
'i': 3.6282361484752744e-2
'r': 3.5587647133639044e-2
'\t': 3.52634742130502e-2
'\n': 3.45691915152049e-2
'_': 3.438222642926237e-2
's': 3.201379627014596e-2
'n': 3.0954960014858875e-2
'a': 3.058217872499613e-2
'o': 2.6776405034843204e-2
'c': 2.3632039961301494e-2
'd': 2.329124049378265e-2
'l': 1.9319481645750994e-2
'u': 1.761928496254444e-2
'0': 1.737397825253189e-2
'p': 1.6228575776688497e-2
'f': 1.5556196850182672e-2
',': 1.4572080038859849e-2
'*': 1.3460275453084754e-2
'm': 1.2857509828221805e-2
')': 1.2389877996691128e-2
'(': 1.2383996720909098e-2
';': 1.140731013627353e-2
'h': 1.0710680981789305e-2
'-': 1.0232090684508581e-2
'x': 1.0086694597923173e-2
'g': 9.400323788643899e-3
'b': 8.972889667733516e-3
'E': 8.898045202243342e-3
'=': 8.357994322825149e-3
'v': 7.983041606459754e-3
'S': 7.407318795659219e-3
'T': 7.301129459599258e-3
'R': 7.1601262346295165e-3
'/': 7.1380145327522146e-3
'A': 7.117130673860958e-3
'I': 7.099489478546994e-3
'C': 6.66979575805779e-3
'>': 6.316573098911656e-3
'.': 5.726058267571846e-3
'1': 5.685534184968124e-3
'N': 5.413383379318554e-3
'D': 5.24449377399052e-3
'P': 5.0353985619458206e-3
'L': 5.008180717747133e-3
'O': 5.000104327173332e-3
'k': 4.9537792457673745e-3
'w': 4.742584988103445e-3
'y': 4.725471515230428e-3
'2': 4.638324931590688e-3
'M': 4.5006262229531635e-3
'"': 3.800355652024797e-3
'F': 3.764420517430011e-3
'U': 3.1530152471186336e-3
'{': 3.055959062530431e-3
'}': 3.0551233923309254e-3
'B': 3.0211399095564015e-3
'3': 2.9180155749053346e-3
'G': 2.7770136659516554e-3
'4': 2.5617397584948654e-3
'8': 2.3449629606950287e-3
'#': 2.2478633475767726e-3
'&': 2.2210126718594423e-3
'6': 2.162497333669207e-3
':': 1.9414777000847955e-3
'H': 1.8788563917804559e-3
'q': 1.8209240487057808e-3
'<': 1.6977541573950867e-3
'5': 1.6778586265665519e-3
'V': 1.665461755260506e-3
'\\': 1.5621847627462226e-3
'X': 1.4939677541453535e-3
'[': 1.4782940028443977e-3
']': 1.4776491549739134e-3
'+': 1.4496245929290891e-3
'9': 1.3384093915275372e-3
'7': 1.3184046313658388e-3
'W': 1.2953598741004282e-3
'K': 1.2184137309589008e-3
'z': 1.2113717290099994e-3
'\r': 1.143696919026762e-3
'\'': 1.0093106228178275e-3
'Y': 9.986745810030226e-4
'|': 9.358519222411643e-4
'%': 8.897926760797742e-4
'!': 7.546115061692628e-4
'j': 6.665502913662852e-4
'Q': 4.7705713856824653e-4
'Z': 3.924557299928266e-4
'@': 3.550282331834911e-4
'?': 1.816549611314985e-4
'$': 1.7383124564164268e-4
'J': 1.5937875724642036e-4
'~': 1.2080895849508404e-4
'^': 3.6191757726917574e-5
'`': 3.1301442039734356e-5
'\f': 3.8032864197953453e-7
'\ESC': 5.264064248851689e-9
Total characters: 759869145

You suck \ESC. No one loves you!

Note that the total does not reflect the number of characters I ignored ala the last paragraph.

It's interesting to observe that _ is a rather popular character for programmers indeed! Also it is interesting that * and , are both more popular than m.

By far the most interesting thing to note is that '(' is more popular than ')'. Don't they come in pairs?

The next step is to design a keyboard layout which makes it easy for people to type the more popular characters. I think I will do this by experimentation - how easy it it to type various phrases? Some research is available on the topic, but a lot of things are easy to get, eg:

Most people are right handed.
Your index fingers are stronger than your pinkies.

Hence, you're gonna want the most used keys in the middle-right of the keyboard.

The last step is to actually make the bastarding thing.

I'm not exactly sure on the best way to actually create a keyboard. With enough time, I could design a program which would run on a microcontroller, interface with the keyboard matrix and communicate with the pc over PS/2.

However, that sounds complicated. Perhaps I shall just move the keys around (but how to put enter and tab in the middle?), and see if I can make it work under Xorg and the terminal.

* The program is split into two parts.

frequency.hs:

-- Copyright John Morrice 2010.
-- Distributed under the terms of the GNU General Public License v 3

{-# LANGUAGE BangPatterns #-}

module Main
   (freq_test
   ,file_test
   ,main) where

-- | GHC haskell key frequency program

import Control.Monad

import qualified Data.ByteString.Lazy.Char8 as Z

import qualified Data.ByteString.Char8 as B

import Data.Array.MArray
import Data.Array.IO

import Data.List

import System.IO

import System.Posix.Process

import System.Environment

import GHC.Int

main :: IO ()
main = do
   as <- getArgs
   if length as /= 1
      then
         error "Usage: frequency DIR"
      else do
         let dir = head as
         prgs <- find_prgs dir program_exts 
         contents <- fmap Z.concat $ mapM strict_read prgs
         (freq, total) <- frequency contents
         format_frequency freq
         putStrLn $ "Total characters: " ++ show total

strict_read :: FilePath -> IO Z.ByteString
strict_read fp = do
   contents <- B.readFile fp
   return $ Z.fromChunks [contents]

format_frequency :: IOArray Char Double -> IO ()
format_frequency ma = do
   as <- getAssocs ma
   let sorted = reverse $ sortBy (\(_,fr1) (_,fr2) -> fr1 `compare` fr2) as
   mapM_ (\ (ch,fr) -> putStrLn $ (show ch) ++ ": " ++ show fr) sorted

freq_test :: IO Bool
freq_test = do
   d <- freq_data >>= getAssocs
   roccs <- new_occs
   right <- mapArray fromIntegral roccs
   writeArray right 'e' 0.5
   writeArray right 'd' (3/8)
   writeArray right 'a' (1/8)
   rightl <- getAssocs right
   return $ d == rightl

freq_data :: IO (IOArray Char Double)
freq_data =
   fmap fst $ frequency freq_input

freq_input :: Z.ByteString
freq_input = Z.pack "eeeeddda"

file_test :: IO Bool
file_test = do
   files <- fmap sort $ find_prgs "test_files" ["c", "rb"]
   putStrLn "Found:"
   print files
   let expected = sort (actual "test_files/")
   putStrLn "Expected"
   print expected
   return $ files == expected
   where
   actual dir = map (dir ++) ["a.c", "b.c", "c.rb", "d.rb"]

-- | Determine the frequency of characters in a bytestring as a fraction of the whole
frequency :: Z.ByteString -> IO (IOArray Char Double, Int64)
frequency by = do
   fr <- occur by >>= mapArray (\o -> fromIntegral o / fromIntegral total)
   return (fr, total)
   where
   total = Z.length by

new_occs :: IO (IOArray Char Int64)
new_occs =
   newListArray (min_char, max_char) (repeat 0)

min_char :: Char
min_char = '\9'

max_char :: Char
max_char = '\126'

occur :: Z.ByteString -> IO (IOArray Char Int64)
occur bz = do
   occs <- new_occs
   zfoldrM increase occs bz
   where
   increase :: Char -> IOArray Char Int64 -> IO ()
   increase ch oc = 
      if ch >= min_char && ch <= max_char
      then do
         count <- readArray oc ch
         let n = count + 1
         n `seq` writeArray oc ch n
      else
         return ()

zfoldrM :: (Char -> IOArray Char Int64 -> IO ()) -> IOArray Char Int64 -> Z.ByteString -> IO (IOArray Char Int64) 
zfoldrM f ar bz =
   if Z.null bz
      then return ar
      else do
         f (Z.head bz) ar
         zfoldrM f ar (Z.tail bz)

program_exts :: [String]
program_exts = 
   ["bas", -- basic
   "java", -- java
   "hs", -- haskell 
   "rb", -- ruby
   "py", -- python
   "php", -- php
   "m", -- objective c
   "c", "h", -- c
   "cs", -- c#
   "js", -- javascript
   "ads", "adb", -- ada
   "cxx", "cpp", "c++", -- c++
   "scm", -- scheme, racket
   "pas", -- pascal, delphi
   "go" -- go
   ]

-- | A regular expression which matches files with the given extension
extension_regex :: String -> String
extension_regex = (".+\\." ++)

-- | Find program files in the given directory matching the given extensions
find_prgs :: String -> [String] -> IO [String]
find_prgs dir exts = do
   writeFile "result" ""
   mapM_ (find_prg dir) exts
   fmap lines $ readFile "result"

--- | Find program files in the given directory matching the given extension
find_prg :: String -> String -> IO () 
find_prg dir ext = do
   pid <- forkProcess $
      executeFile "./finder.sh" False [dir, "-regex", extension_regex ext] Nothing
   getProcessStatus True True pid
   return ()


-- Source ends with this line


finder.sh

# Source begins with this line
#! /bin/bash

find $* >> result
# Source ends with this line

No comments:

Post a Comment