EDIT: fixed a bug! Results should be good now :) Damn, why did I forget to run the tests :p
Computer keyboards are fucking stupid, especially for programmers.
They fuck everyone too: QWERTY is so we don't jam the machinery. Even dvorak keyboards have staggered keys so the MECHANICAL arms can reach them.
However, as programmers, we use all sorts of symbols not in the ken of mere mortals (like *). And for most, you have to press shift just to get at them.
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
The solution: design a programmer keyboard!
First off, we need some numbers so it can be known which keys are most often pressed. To do this I wrote a key frequency program*. I then used it to find the key frequencies of the source files of a number of different programs, in various languages. These are:
linux - C
adagide - ada
axiom - c#
django - python
ghc - haskell
go - go
irrlicht - c++
limewire - java
phpbb3 - php
prototype and jquery - javascript
rails - ruby
regexkit - objective c
racket - racket
To simplify things, I only considered characters ASCII characters between horizontal tab (ASCII 9) and '~' (ASCII 126). These are usually the ones I type, at least.
My hypothesis is that many symbols are going to be more popular than letters. This would prove a need for these symbols to have their OWN FUCKING KEYS instead of being the hangers on and miscreants of the keyboard world.
Here are the results of my analysis. The character is on the left, the fraction of the whole is on the right. They are presented in descending order of popularity.
' ': 0.12601734868442382
'e': 5.672758827442585e-2
't': 4.506425247731305e-2
'i': 3.6282361484752744e-2
'r': 3.5587647133639044e-2
'\t': 3.52634742130502e-2
'\n': 3.45691915152049e-2
'_': 3.438222642926237e-2
's': 3.201379627014596e-2
'n': 3.0954960014858875e-2
'a': 3.058217872499613e-2
'o': 2.6776405034843204e-2
'c': 2.3632039961301494e-2
'd': 2.329124049378265e-2
'l': 1.9319481645750994e-2
'u': 1.761928496254444e-2
'0': 1.737397825253189e-2
'p': 1.6228575776688497e-2
'f': 1.5556196850182672e-2
',': 1.4572080038859849e-2
'*': 1.3460275453084754e-2
'm': 1.2857509828221805e-2
')': 1.2389877996691128e-2
'(': 1.2383996720909098e-2
';': 1.140731013627353e-2
'h': 1.0710680981789305e-2
'-': 1.0232090684508581e-2
'x': 1.0086694597923173e-2
'g': 9.400323788643899e-3
'b': 8.972889667733516e-3
'E': 8.898045202243342e-3
'=': 8.357994322825149e-3
'v': 7.983041606459754e-3
'S': 7.407318795659219e-3
'T': 7.301129459599258e-3
'R': 7.1601262346295165e-3
'/': 7.1380145327522146e-3
'A': 7.117130673860958e-3
'I': 7.099489478546994e-3
'C': 6.66979575805779e-3
'>': 6.316573098911656e-3
'.': 5.726058267571846e-3
'1': 5.685534184968124e-3
'N': 5.413383379318554e-3
'D': 5.24449377399052e-3
'P': 5.0353985619458206e-3
'L': 5.008180717747133e-3
'O': 5.000104327173332e-3
'k': 4.9537792457673745e-3
'w': 4.742584988103445e-3
'y': 4.725471515230428e-3
'2': 4.638324931590688e-3
'M': 4.5006262229531635e-3
'"': 3.800355652024797e-3
'F': 3.764420517430011e-3
'U': 3.1530152471186336e-3
'{': 3.055959062530431e-3
'}': 3.0551233923309254e-3
'B': 3.0211399095564015e-3
'3': 2.9180155749053346e-3
'G': 2.7770136659516554e-3
'4': 2.5617397584948654e-3
'8': 2.3449629606950287e-3
'#': 2.2478633475767726e-3
'&': 2.2210126718594423e-3
'6': 2.162497333669207e-3
':': 1.9414777000847955e-3
'H': 1.8788563917804559e-3
'q': 1.8209240487057808e-3
'<': 1.6977541573950867e-3
'5': 1.6778586265665519e-3
'V': 1.665461755260506e-3
'\\': 1.5621847627462226e-3
'X': 1.4939677541453535e-3
'[': 1.4782940028443977e-3
']': 1.4776491549739134e-3
'+': 1.4496245929290891e-3
'9': 1.3384093915275372e-3
'7': 1.3184046313658388e-3
'W': 1.2953598741004282e-3
'K': 1.2184137309589008e-3
'z': 1.2113717290099994e-3
'\r': 1.143696919026762e-3
'\'': 1.0093106228178275e-3
'Y': 9.986745810030226e-4
'|': 9.358519222411643e-4
'%': 8.897926760797742e-4
'!': 7.546115061692628e-4
'j': 6.665502913662852e-4
'Q': 4.7705713856824653e-4
'Z': 3.924557299928266e-4
'@': 3.550282331834911e-4
'?': 1.816549611314985e-4
'$': 1.7383124564164268e-4
'J': 1.5937875724642036e-4
'~': 1.2080895849508404e-4
'^': 3.6191757726917574e-5
'`': 3.1301442039734356e-5
'\f': 3.8032864197953453e-7
'\ESC': 5.264064248851689e-9
Total characters: 759869145
You suck \ESC. No one loves you!
Note that the total does not reflect the number of characters I ignored ala the last paragraph.
It's interesting to observe that _ is a rather popular character for programmers indeed! Also it is interesting that * and , are both more popular than m.
By far the most interesting thing to note is that '(' is more popular than ')'. Don't they come in pairs?
The next step is to design a keyboard layout which makes it easy for people to type the more popular characters. I think I will do this by experimentation - how easy it it to type various phrases? Some research is available on the topic, but a lot of things are easy to get, eg:
Most people are right handed.
Your index fingers are stronger than your pinkies.
Hence, you're gonna want the most used keys in the middle-right of the keyboard.
The last step is to actually make the bastarding thing.
I'm not exactly sure on the best way to actually create a keyboard. With enough time, I could design a program which would run on a microcontroller, interface with the keyboard matrix and communicate with the pc over PS/2.
However, that sounds complicated. Perhaps I shall just move the keys around (but how to put enter and tab in the middle?), and see if I can make it work under Xorg and the terminal.
* The program is split into two parts.
frequency.hs:
-- Copyright John Morrice 2010.
-- Distributed under the terms of the GNU General Public License v 3
{-# LANGUAGE BangPatterns #-}
module Main
(freq_test
,file_test
,main) where
-- | GHC haskell key frequency program
import Control.Monad
import qualified Data.ByteString.Lazy.Char8 as Z
import qualified Data.ByteString.Char8 as B
import Data.Array.MArray
import Data.Array.IO
import Data.List
import System.IO
import System.Posix.Process
import System.Environment
import GHC.Int
main :: IO ()
main = do
as <- getArgs
if length as /= 1
then
error "Usage: frequency DIR"
else do
let dir = head as
prgs <- find_prgs dir program_exts
contents <- fmap Z.concat $ mapM strict_read prgs
(freq, total) <- frequency contents
format_frequency freq
putStrLn $ "Total characters: " ++ show total
strict_read :: FilePath -> IO Z.ByteString
strict_read fp = do
contents <- B.readFile fp
return $ Z.fromChunks [contents]
format_frequency :: IOArray Char Double -> IO ()
format_frequency ma = do
as <- getAssocs ma
let sorted = reverse $ sortBy (\(_,fr1) (_,fr2) -> fr1 `compare` fr2) as
mapM_ (\ (ch,fr) -> putStrLn $ (show ch) ++ ": " ++ show fr) sorted
freq_test :: IO Bool
freq_test = do
d <- freq_data >>= getAssocs
roccs <- new_occs
right <- mapArray fromIntegral roccs
writeArray right 'e' 0.5
writeArray right 'd' (3/8)
writeArray right 'a' (1/8)
rightl <- getAssocs right
return $ d == rightl
freq_data :: IO (IOArray Char Double)
freq_data =
fmap fst $ frequency freq_input
freq_input :: Z.ByteString
freq_input = Z.pack "eeeeddda"
file_test :: IO Bool
file_test = do
files <- fmap sort $ find_prgs "test_files" ["c", "rb"]
putStrLn "Found:"
print files
let expected = sort (actual "test_files/")
putStrLn "Expected"
print expected
return $ files == expected
where
actual dir = map (dir ++) ["a.c", "b.c", "c.rb", "d.rb"]
-- | Determine the frequency of characters in a bytestring as a fraction of the whole
frequency :: Z.ByteString -> IO (IOArray Char Double, Int64)
frequency by = do
fr <- occur by >>= mapArray (\o -> fromIntegral o / fromIntegral total)
return (fr, total)
where
total = Z.length by
new_occs :: IO (IOArray Char Int64)
new_occs =
newListArray (min_char, max_char) (repeat 0)
min_char :: Char
min_char = '\9'
max_char :: Char
max_char = '\126'
occur :: Z.ByteString -> IO (IOArray Char Int64)
occur bz = do
occs <- new_occs
zfoldrM increase occs bz
where
increase :: Char -> IOArray Char Int64 -> IO ()
increase ch oc =
if ch >= min_char && ch <= max_char
then do
count <- readArray oc ch
let n = count + 1
n `seq` writeArray oc ch n
else
return ()
zfoldrM :: (Char -> IOArray Char Int64 -> IO ()) -> IOArray Char Int64 -> Z.ByteString -> IO (IOArray Char Int64)
zfoldrM f ar bz =
if Z.null bz
then return ar
else do
f (Z.head bz) ar
zfoldrM f ar (Z.tail bz)
program_exts :: [String]
program_exts =
["bas", -- basic
"java", -- java
"hs", -- haskell
"rb", -- ruby
"py", -- python
"php", -- php
"m", -- objective c
"c", "h", -- c
"cs", -- c#
"js", -- javascript
"ads", "adb", -- ada
"cxx", "cpp", "c++", -- c++
"scm", -- scheme, racket
"pas", -- pascal, delphi
"go" -- go
]
-- | A regular expression which matches files with the given extension
extension_regex :: String -> String
extension_regex = (".+\\." ++)
-- | Find program files in the given directory matching the given extensions
find_prgs :: String -> [String] -> IO [String]
find_prgs dir exts = do
writeFile "result" ""
mapM_ (find_prg dir) exts
fmap lines $ readFile "result"
--- | Find program files in the given directory matching the given extension
find_prg :: String -> String -> IO ()
find_prg dir ext = do
pid <- forkProcess $
executeFile "./finder.sh" False [dir, "-regex", extension_regex ext] Nothing
getProcessStatus True True pid
return ()
-- Source ends with this line
finder.sh
# Source begins with this line
#! /bin/bash
find $* >> result
# Source ends with this line
22 September 2010
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment