01 September 2010

Trying out jsMath

\left(\, \sum_{k=1}^n a_k b_k \right)^2 \le
\left(\, \sum_{k=1}^n a_k^2 \right) \left(\, \sum_{k=1}^n b_k^2 \right)

Well, that was pretty easy.

I just followed the instructions here on downloading and installing.

Next, I went looked in the tests directory and copied some lines out the test file:
 
 

Next I went to the html view of the template for my blog and I put those lines at the start of the document body and pointed the paths at my install of jsMath (in a web accessible directory). Next, I put the following lines at the very bottom of the document body:
 

According to the install instructions there are variations on this that should work, but I couldn't get them to work satisfactorily.

In later posts maybe I'll play with some of the plugins.

30 August 2010

REVISED: More fun pointless code metrics

My apologies to everyone who looked at the numbers for GCC before. I had run cloc from the wrong directory and it was counting all the lines of code in all the source trees on my computer. Oops! The GCC numbers listed at the bottom are corrected. While I'm at it, let's look at GHC vs. GCC. Here is GHC:
    1177 text files.
    1159 unique files.                                          
     247 files ignored.

http://cloc.sourceforge.net v 1.51  T=56.0 s (16.6 files/s, 6818.4 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Haskell                         424          38213          64078         125830
C                               139          10199          13583          46412
XML                              33           3839            539          29416
C/C++ Header                    170           3171           5236           8509
HTML                             41            708             82           6761
Perl                             31           1578           1526           6284
Bourne Shell                     26            410            794           3983
Pascal                            2            753            356           2711
m4                                2            293            171           1959
yacc                              4            261              0           1465
make                             42            216            397            606
Bourne Again Shell                5             75            138            393
Lisp                              1             65             76            291
Assembly                          1             34             30            125
Teamcenter def                    3             20              0             54
CSS                               2              8              0             36
D                                 1             10             29             23
C Shell                           2             24             41             20
--------------------------------------------------------------------------------
SUM:                            929          59877          87076         234878
--------------------------------------------------------------------------------
And GCC:
   58499 text files.
   57735 unique files.                                          
  116986 files ignored.

http://cloc.sourceforge.net v 1.51  T=2794.0 s (19.4 files/s, 2776.8 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
C                             15704         350409         365314        1790292
Java                           6324         169097         641639         680923
C/C++ Header                   9712         135787         126549         659612
Ada                            4518         227132         307713         659411
C++                           12898          98958         127224         428592
Bourne Shell                    126          56208          48639         317192
HTML                            440          30509           5304         139342
Fortran 90                     2978          12630          22272          71604
m4                              163           6547           1879          57799
Assembly                        195           7543           9400          45902
XML                              56           3621            214          29679
make                            115           3193            941          20374
Teamcenter def                   76           2903            379          18601
Expect                          219           4214           7719          16402
Fortran 77                      381            917           2776          10119
Objective C                     275           1922           1092           7021
Perl                             24            760           1157           4121
XSLT                             20            563            436           2805
Bourne Again Shell               12            372            599           1675
CSS                               8            332            143           1427
awk                              10            221            373           1370
Python                            4            301            142           1311
yacc                              2            107            109            987
Pascal                            4            218            200            985
C#                                9            230            506            879
MUMPS                             4            121              0            521
Tcl/Tk                            1             72            112            393
ASP.Net                           7             37              0            203
lex                               1             36             27            150
NAnt scripts                      2             17              0            148
MSBuild scripts                   1              1              0            140
Javascript                        2             20             81            122
Haskell                          35             15              0            109
Lisp                              1              4             21             59
MATLAB                            3             13              0             52
DTD                               3             28             70             26
Fortran 95                        2             10              7             21
DOS Batch                         3              0              0              7
--------------------------------------------------------------------------------
SUM:                          54338        1115068        1673037        4970376
--------------------------------------------------------------------------------
Keep in mind, the GCC tree includes lots of languages because it's a compiler for lots of things. And it has a lot of tests.

29 August 2010

Fun, but pointless, code metrics

I'm not really sure what motivated this, but I just used cloc to count the lines of code in both the darcs source and the git source.  Here are the numbers. The git source tree:
    1951 text files.
    1836 unique files.                                          
     848 files ignored.

http://cloc.sourceforge.net v 1.51  T=15.0 s (72.3 files/s, 20377.3 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
C                               267          15517          13469         100133
Bourne Shell                    589          15127           5508          84826
Perl                             40           3798           3441          23825
Tcl/Tk                           39           1453            375           9762
C/C++ Header                     99           1977           3557           8301
make                             12            413            434           2673
Bourne Again Shell                1            144            110           2165
Lisp                              2            231            170           1779
Python                           13            465            442           1384
ASP.Net                           8            141              0            931
m4                                2             87             21            858
CSS                               2            154             24            710
Javascript                        2            113            319            477
Assembly                          1             26            100             98
XSLT                              7             15             29             77
DOS Batch                         1              0              0              1
--------------------------------------------------------------------------------
SUM:                           1085          39661          27999         238000
--------------------------------------------------------------------------------
The darcs source tree:
     561 text files.
     549 unique files.                                          
      57 files ignored.

http://cloc.sourceforge.net v 1.51  T=189.0 s (2.7 files/s, 298.0 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Haskell                         169           4361           7374          27760
Bourne Shell                    300           2071           2869           8333
C                                 7            325            153           1494
HTML                              5             41              4            316
C/C++ Header                     12             92             83            308
Bourne Again Shell                3             51             95            180
Perl                              2             43             36            130
CSS                               1             21              3             79
make                              1             12              6             53
Lisp                              1              5              6             23
--------------------------------------------------------------------------------
SUM:                            501           7022          10629          38676
--------------------------------------------------------------------------------
Take those categories with a grain of salt. For example, the darcs source does not have any lisp files. It is interesting that git has 200k more lines than darcs. I'm not sure what that says. C is far more verbose than Haskell? Although, that's not really fair because they also have an order of magnitude more shell code. If you're just comparing C to Haskell it's a factor of about 4.

Syntaxhighlighter on blogspot?

I'm testing out SyntexHighlighter as a way to highlight Haskell snippets on my blog. Install instructions can be found here.

I had to create my own "brush" file to do the highlighting. You can find it here and covered under a GPL license as it is a derivative of a GPL'd example.

The regexs it use are a bit too simple to get good results, but it's quicker, easier, and nicer than the results I was getting by doing it by hand. Below is a demo for curious folks. I've highlighted lines 1 and 33 just to show off that feature.

import Prelude hiding (log)
import Control.Monad ( (<$>) )
import Control.Monad.Trans ( liftIO, when, baz, Bar(..) ) -- some made up names

{- | Our very important FOO compilation option -}
#ifdef FOO
foo = 1
#endif

{- {- Nested comments aren't quite right -} nope... -}

-- Sample Hello, World program

main :: IO ()
main = do
  (x:xs) <- getArgs
  let y = tail xs
  putStrLn "hello, world!"
  return ()

a :: Char
a = 'a'

fibs :: [Integer]
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

foo :: Num a => a -> a
foo 1 = 4
foo n = n+1

-- | baz combines map and Maybe.
baz :: (a -> Maybe b) -> [a] -> [Maybe b]
baz _ [] = [Nothing]
baz f (x:xs) = f x : baz f xs

integers = [1,2,3,4,5 ..]

pi :: Maybe Double
pi = Just 3.14159

01 August 2010

Takusen Tutorial, Part 1: Hello, Takusen

With the recent release of Takusen 0.8.6, several people asked for a tutorial.  Hopefully I can help people get up to speed.

If you haven't heard, Takusen is an industrial strength database library written in Haskell.  Some of the reasons I like Takusen over its competitors:
  • Good performance
  • Supports the iteratee style so that you can stream your results from the database
  • BSD license
  • It has a test suite
The BSD license is nice because it means we can use it at Galois, and we do.  A project I worked on recently used Takusen to communicate with the database.  I found Takusen easy to debug with and it worked quite well.

If you like to start by reading lots of background information you might like to read these articles, but I will not assume you have read them in this tutorial:
In this tutorial you will learn:
  1. How to install Takusen with the right backend(s) for your database.
  2. How to write a "Hello, Takusen!" query.
Let's Get Started!
If you already have your database setup, just skip to section 2 below.

1. Getting things Setup!
First off, I will describe the environment I used to write this tutorial.  I like to develop using virtual machines whenever I can.  This allows me to start from a clean environment without interfering with any of my other projects.  I chose Virtual Box as my emulation software.  I installed Debian Squeeze, from here.

During the install I requested a web server, ssh server, and SQL database.  My normal account name is 'dagit', as you will see below.

After the install finished, I installed the following packages:
  • sudo
  • pkg-config
  • ghc6
  • ghc6-prof
  • cabal-install
  • libz-dev
  • postgresql-server-dev-8.4
  • sqlite3
  • libsqlite3-dev
  • unixodbc-dev
If you are only going to use a specific database then you can safely leave out the other database packages above.  For example, you only need unixodbc-dev if you are going to install the ODBC backend, see below.  This command should install everything above:
$ apt-get install sudo pkg-config ghc6 ghc6-prof cabal-install libz-dev postgresql-server-dev-8.4 sqlite3 libsqlite3-dev unixodbc-dev
Next, I ran:
$ cabal install QuickCheck --constraint="==1.*"
At this point you should have all the dependencies of Takusen.  The next step is to install the backends we might want to use.  For example:
$ cabal install Takusen -fpostgres -fsqlite -fodbc
The command above will give us the postgres, sqlite, and odbc backends.  If you don't need a backend, just omit it during the cabal-install command above.
At this point I installed a few other things such as emacs and darcs to make my development experience friendlier.

Configuring postgres is beyond the scope of this tutorial, but for reference here are the commands I used to get started with a "hellotakusen" database.
$ sudo su postgres # Switching to postgres user
$ createuser dagit # same as your unix account name
Shall the new role be a superuser? (y/n) y
$ createdb dagit # create dagit's default db
I made myself a superuser for convenience, after all this is just a dev machine.   Now switch back to your normal user account.  In my case, that account is 'dagit'.  Now as 'dagit', I run:
$ createdb hellotakusen
This gives us a demo database separate from the default user database.  Let's double check our database:
$ psql -d hellotakusen
psql (8.4.4)
Type "help" for help.


hellotakusen=#
Okay, looks good.  Now let's try a simple query:
hellotakusen=# select 'Hello, Takusen!';

?column?
----------------
Hello, Takusen!

(1 row)
2. Time do a query with Takusen!
Now we're ready to do the same thing, but from Haskell.  Let's start with GHCi.  I'll run through the minimum set of commands to get this query working, then I'll explain what we're doing at each step, and why:
$ ghci
GHCi, version 6.12.1: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> :m + Database.PostgreSQL.Enumerator
Prelude Database.PostgreSQL.Enumerator> let connection = connect [CAdbname "hellotakusen"]
Loading package syb-0.1.0.2 ... linking ... done.
Loading package base-3.0.3.2 ... linking ... done.
Loading package mtl-1.1.0.2 ... linking ... done.
Loading package old-locale-1.0.0.2 ... linking ... done.
Loading package old-time-1.0.0.3 ... linking ... done.
Loading package time-1.1.4 ... linking ... done.
Loading package Takusen-0.8.6 ... linking ... done.
Now we have a connection structure to our Postgres database.  Now we switch to the interface just one level above the database specific one to define an iteratee to fetch our results.
Prelude Database.PostgreSQL.Enumerator> :m + Database.Enumerator
Prelude Database.PostgreSQL.Enumerator Database.Enumerator> let { iter :: Monad m => String -> IterAct m (Maybe String); iter msg accum = result' (Just msg) }
We will use iter with doQuery, to fetch the result of our query,  and immediately print it like this:
Prelude Database.PostgreSQL.Enumerator Database.Enumerator> :m + Control.Monad.Trans
Prelude Database.PostgreSQL.Enumerator Database.Enumerator Control.Monad.Trans> withSession connection (doQuery (sql "select 'Hello, Takusen!'") iter Nothing >>= \(Just r) -> liftIO (putStrLn r))

Hello, Takusen!
Now it's time to start explaining!

We have to give a type signature to iter or else we'll get an incomprehensible error message involving functional dependencies:
Prelude Database.PostgreSQL.Enumerator Database.Enumerator Control.Monad.Trans> withSession connection (doQuery (sql "select 'Hello, Takusen!'") iter Nothing >>= \(Just r) -> liftIO (putStrLn r))

:1:24:
    Overlapping instances for Database.Enumerator.QueryIteratee
                                (DBM mark Session)
                                Database.PostgreSQL.Enumerator.Query
                                (a -> t -> m (IterResult (Maybe a)))
                                (Maybe String)
                                Database.PostgreSQL.Enumerator.ColumnBuffer
      arising from a use of `doQuery' at :1:24-75
    Matching instances:
      instance [overlap ok] (Database.Enumerator.QueryIteratee
                               m q i' seed b,
                             DBType a q b) =>
                            Database.Enumerator.QueryIteratee m q (a -> i') seed b
        -- Defined in Database.Enumerator
      instance [overlap ok] (DBType a q b, MonadIO m) =>
                            Database.Enumerator.QueryIteratee
                              m q (a -> seed -> m (IterResult seed)) seed b
        -- Defined in Database.Enumerator
    (The choice depends on the instantiation of `mark, a, t, m'
     To pick the first instance above, use -XIncoherentInstances
     when compiling the other instance declarations)
    In the first argument of `(>>=)', namely
        `doQuery (sql "select 'Hello, Takusen!'") iter Nothing'
    In the second argument of `withSession', namely
        `(doQuery (sql "select 'Hello, Takusen!'") iter Nothing
        >>=
          \ (Just r) -> liftIO (putStrLn r))'
    In the expression:
        withSession
          connection
          (doQuery (sql "select 'Hello, Takusen!'") iter Nothing
         >>=
           \ (Just r) -> liftIO (putStrLn r))
The problem itself is fairly simple.  If we don't give the explicit signature, then the inferred type of iter is:
iter :: (Monad m) => a -> t -> m (IterResult (Maybe a))

Compared with our supplied type:
iter :: Monad m => String -> IterAct m (Maybe String)

You should be asking yourself, what is the difference between "t -> m (IterResult (Maybe a))" and "IterAct m (Maybe String)".  Checking with the Takusen haddock, we see this definition:
type IterAct m seedType = seedType -> m (IterResult seedType)
Let's expand the IterAct type so we can more clearly compare the inferred type to the correct type:
iter :: Monad m => String -> Maybe String -> m (IterResult (Maybe String))
So there is the problem.  If we don't write the explicit type signature there was nothing in our definition of iter to help ghci infer that the second parameter is a Maybe.  You'll also noticed that in the type synonym, IterAct that it is a function from a seed type to an action containing an IterResult.

The seed here works just like the seed in a left fold.  The type of foldl is:
foldl :: (a -> b -> a) -> a -> [b] -> a
The second parameter is the seed type of foldl and it is of type a, here.  If you expand out a left fold, you will get an expression like this:
foldl f z [a,b,c,d] = f (f (f (f z a) b) c) d

When we pass iter to doQuery it uses iter much like foldl uses f.  In the call to foldl, I called the seed z, and it is passed as the first parameter to f.  First f will combine z and a.  Then the result will be passed to f along with b.  The result keeps getting fed to f in this manner till we hit the end of the list.  So the first parameter to f is an accumulator because f can pass to itself the result of the previous call of f.  The first time f is called it will receive the seed value as the accumulator.

When we defined iter we used the same exact convention, but you didn't see the seed in the definition because the function result' hides this detail for convenience.  You can see it in the type when we expand out the type synonym IterAct.

Where is the seed?  The seed was the last parameter we passed to doQuery.  In the example above, it was Nothing, and iter ignored it.  As this tutorial series expands, I will show you how to use the seed value.

Time to recap.

One simple rule is: Always try to give an explicit type signature to your iteratee function, here we called it iter.  If you get it wrong, the error message will be more forgiving than the error message you get from using it with doQuery.

Another important lesson we learned is that our iteratees work with doQuery in exactly the same way as functions passed to foldl.  Specifically, they take an accumulator that has the same type as the seed we pass to doQuery.  The iteratee is responsible for combining the accumulator and a 'current value' to produce a result.  In a future tutorial in this series, I will show you how to accumulate a list of results in the iteratee.

10 June 2010

Delimited Continuations and version control: an update

Last time I presented the idea that version control and delimited continuations are related.  I left off with a question how how to make Darcs fit this model.  I think I understand now what I was missing.

I forgot to think about Darcs operations in terms of the intermediate operations that get performed.  In Darcs, everything is based on commuting patches, even merging.  Therefore, to see how Darcs fits into this model it's important to think about commuting patches in terms of delimited continuations.

Specifically, I now believe that commuting two patches introduces marks that can be shifted to later.

I have several ideas for the next steps of this.  One is to start modeling toy versions of svn and darcs in Haskell via delimited continuations.  After that, I would like to figure out the correspondence between the delimited continuations that I've identified and their data structure reification as zippers.  I hope to have more about that later.

Judging by a paper written by Oleg, there should be a natural way to convert the delimited continuation representation into a zipper.  Investigating this model might shed light on the Darcs patch model, or even lead to a more concise formalism.

06 June 2010

Delimited Continuations and Version Control

Delimited continuations give us a way to create markers that we can jump back to. We can construct the future of the computation, work with the computation so far, or abort the current continuation and go down a new path (create a new future).

These primitives have a natural correspondence with version control systems that snapshot the world like
SVN.  Focusing on SVN for a moment:
  • The current continuation is the transformation of repository state that you're working on.  It's the diff you're creating.
  • For commits, each revision created by commit is a marker, so we model this with reset.
  • Checking out an older revision, or reverting changes, corresponds to a shift.  We discard the current continuation and move back to a marker created by a specific reset.
  • Updating consists of having the client copy learn the current state of the continuation on the server and applying it to the local copy.
  • Starting a branch corresponds to a reset.
  • Merging two branches is a bit trickier I suspect.  I haven't worked out all the details sufficiently to convince myself I have it right, but here is how I think this case works.  The merge first shifts to each of the markers and then combines those two continuations into one future.  The part that seems weird to me about this, is that I haven't really seen examples of delimited continuations were the continuation of two different markers (prompts) were combined.
Now, if you accept the above it gives us some intuition to build on.  Although my correspondence is terribly informal at the moment, if we took some time to make it formal by working out enough details to model it in, say, Haskell, then we'd have a nice formal backing for how SVN works.  I think the above model applies equally well to git, but I'm not confident with git's model.

One insight the above gives, is that the way merging works is not described by the continuations in general.  It's up to the exact combining function to determine the merge.  We know that an automatic merge can fail in practice due to things like conflicts between the changes in the branches.  So, in the SVN implementation considerable work has gone into implementing logic for creating the proper continuation.

Now, in general when a merge, or update, is performed human intervention may be required.  This typically happens when the changes are in conflict.  What this means for our model is that in general the creation of the continuation requires knowledge outside of our model.  What does that correspond to?  Well, it's essentially saying that calculating the correct continuation to resolve the merge is non-deterministic!

In other words, this continuation view of version control gives us a rigorous way to talk about our intuition.  Of course we can easily tell that merging is going to require human intervention without needing to study delimited continuations, but this framework of reasoning now gives us a more mathematical way to say it.

The next question is:   How does the delimited continuation model of vcs apply to Darcs?

So far I'm not sure.  I suspect, without working through the details, that in Darcs, every time you record a patch multiple resets happen, instead of just one.  The model really breaks down for Darcs because you can seemingly visit points in "history" that did not exist when the patches were created, but they are valid repository states.

For example, imagine a repository that only exists locally.  You create a sequence of patches.  Now, take a patch in the middle that can be commuted to the end of the patch sequence.  Doing so has not created a new repository state; so far this is fine with the above model as no new markers need to be created.  Suppose we remove the patch from the end of the patch sequence.  This is exactly how darcs unpull works.  The funny thing is, we've now created a state that never existed previously.  So in the delimited continuation model, what marker did we just shift to?

I don't know the answer yet, but I think it's an interesting question.  I suspect there are multiple "correct" answers, but that only some answers will yield elegant and robust models here.