Takusen Tutorial, Part 1: Hello, Takusen
With the recent release of Takusen 0.8.6, several people asked for a tutorial. Hopefully I can help people get up to speed.
If you haven’t heard, Takusen is an industrial strength database library written in Haskell. Some of the reasons I like Takusen over its competitors:
- Good performance
- Supports the iteratee style so that you can stream your results from the database
- BSD license
- It has a test suite
The BSD license is nice because it means we can use it at Galois, and we do. A project I worked on recently used Takusen to communicate with the database. I found Takusen easy to debug with and it worked quite well.
If you like to start by reading lots of background information you might like to read these articles, but I will not assume you have read them in this tutorial:
In this tutorial you will learn:
- How to install Takusen with the right backend(s) for your database.
- How to write a “Hello, Takusen!” query.
Let’s Get Started!
If you already have your database setup, just skip to section 2 below.
1. Getting things Setup!
First off, I will describe the environment I used to write this tutorial. I like to develop using virtual machines whenever I can. This allows me to start from a clean environment without interfering with any of my other projects. I chose Virtual Box as my emulation software. I installed Debian Squeeze, from here.
During the install I requested a web server, ssh server, and SQL database. My normal account name is ‘dagit’, as you will see below.
After the install finished, I installed the following packages:
- sudo
- pkg-config
- ghc6
- ghc6-prof
- cabal-install
- libz-dev
- postgresql-server-dev-8.4
- sqlite3
- libsqlite3-dev
- unixodbc-dev
If you are only going to use a specific database then you can safely leave out the other database packages above. For example, you only need unixodbc-dev if you are going to install the ODBC backend, see below. This command should install everything above:
$ apt-get install sudo pkg-config ghc6 ghc6-prof cabal-install libz-dev postgresql-server-dev-8.4 sqlite3 libsqlite3-dev unixodbc-dev
Next, I ran:
$ cabal install QuickCheck --constraint="==1.*"At this point you should have all the dependencies of Takusen. The next step is to install the backends we might want to use. For example:
$ cabal install Takusen -fpostgres -fsqlite -fodbc
The command above will give us the postgres, sqlite, and odbc backends. If you don’t need a backend, just omit it during the cabal-install command above. At this point I installed a few other things such as emacs and darcs to make my development experience friendlier.
Configuring postgres is beyond the scope of this tutorial, but for reference here are the commands I used to get started with a “hellotakusen” database.$ sudo su postgres # Switching to postgres user $ createuser dagit # same as your unix account name Shall the new role be a superuser? (y/n) y $ createdb dagit # create dagit's default dbI made myself a superuser for convenience, after all this is just a dev machine. Now switch back to your normal user account. In my case, that account is ‘dagit’. Now as ‘dagit’, I run:
$ createdb hellotakusenThis gives us a demo database separate from the default user database. Let’s double check our database:
$ psql -d hellotakusen psql (8.4.4) Type "help" for help. hellotakusen=#
Okay, looks good. Now let’s try a simple query:
hellotakusen=# select 'Hello, Takusen!'; ?column? ---------------- Hello, Takusen! (1 row)
2. Time do a query with Takusen!
Now we’re ready to do the same thing, but from Haskell. Let’s start with GHCi. I’ll run through the minimum set of commands to get this query working, then I’ll explain what we’re doing at each step, and why:
$ ghci GHCi, version 6.12.1: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Prelude> :m + Database.PostgreSQL.Enumerator Prelude Database.PostgreSQL.Enumerator> let connection = connect [CAdbname "hellotakusen"] Loading package syb-0.1.0.2 ... linking ... done. Loading package base-3.0.3.2 ... linking ... done. Loading package mtl-1.1.0.2 ... linking ... done. Loading package old-locale-1.0.0.2 ... linking ... done. Loading package old-time-1.0.0.3 ... linking ... done. Loading package time-1.1.4 ... linking ... done. Loading package Takusen-0.8.6 ... linking ... done.
Now we have a connection structure to our Postgres database. Now we switch to the interface just one level above the database specific one to define an iteratee to fetch our results.
Prelude Database.PostgreSQL.Enumerator> :m + Database.Enumerator Prelude Database.PostgreSQL.Enumerator Database.Enumerator> let { iter :: Monad m => String -> IterAct m (Maybe String); iter msg accum = result' (Just msg) }
We will use iter with doQuery, to fetch the result of our query, and immediately print it like this:
Prelude Database.PostgreSQL.Enumerator Database.Enumerator> :m + Control.Monad.Trans Prelude Database.PostgreSQL.Enumerator Database.Enumerator Control.Monad.Trans> withSession connection (doQuery (sql "select 'Hello, Takusen!'") iter Nothing >>= \(Just r) -> liftIO (putStrLn r)) Hello, Takusen!
Now it’s time to start explaining!
We have to give a type signature to iter or else we’ll get an incomprehensible error message involving functional dependencies:
Prelude Database.PostgreSQL.Enumerator Database.Enumerator Control.Monad.Trans> withSession connection (doQuery (sql "select 'Hello, Takusen!'") iter Nothing >>= \(Just r) -> liftIO (putStrLn r)) :1:24: Overlapping instances for Database.Enumerator.QueryIteratee (DBM mark Session) Database.PostgreSQL.Enumerator.Query (a -> t -> m (IterResult (Maybe a))) (Maybe String) Database.PostgreSQL.Enumerator.ColumnBuffer arising from a use of `doQuery' at :1:24-75 Matching instances: instance [overlap ok] (Database.Enumerator.QueryIteratee m q i' seed b, DBType a q b) => Database.Enumerator.QueryIteratee m q (a -> i') seed b -- Defined in Database.Enumerator instance [overlap ok] (DBType a q b, MonadIO m) => Database.Enumerator.QueryIteratee m q (a -> seed -> m (IterResult seed)) seed b -- Defined in Database.Enumerator (The choice depends on the instantiation of `mark, a, t, m' To pick the first instance above, use -XIncoherentInstances when compiling the other instance declarations) In the first argument of `(>>=)', namely `doQuery (sql "select 'Hello, Takusen!'") iter Nothing' In the second argument of `withSession', namely `(doQuery (sql "select 'Hello, Takusen!'") iter Nothing >>= \ (Just r) -> liftIO (putStrLn r))' In the expression: withSession connection (doQuery (sql "select 'Hello, Takusen!'") iter Nothing >>= \ (Just r) -> liftIO (putStrLn r))
The problem itself is fairly simple. If we don’t give the explicit signature, then the inferred type of iter is:
iter :: (Monad m) => a -> t -> m (IterResult (Maybe a))
Compared with our supplied type:
iter :: Monad m => String -> IterAct m (Maybe String)
You should be asking yourself, what is the difference between “t -> m (IterResult (Maybe a))” and “IterAct m (Maybe String)”. Checking with the Takusen haddock, we see this definition:
type IterAct m seedType = seedType -> m (IterResult seedType)
Let’s expand the IterAct type so we can more clearly compare the inferred type to the correct type:
iter :: Monad m => String -> Maybe String -> m (IterResult (Maybe String))
So there is the problem. If we don’t write the explicit type signature there was nothing in our definition of iter to help ghci infer that the second parameter is a Maybe. You’ll also noticed that in the type synonym, IterAct that it is a function from a seed type to an action containing an IterResult.
The seed here works just like the seed in a left fold. The type of foldl is:
foldl :: (a -> b -> a) -> a -> [b] -> a
The second parameter is the seed type of foldl and it is of type a, here. If you expand out a left fold, you will get an expression like this:
foldl f z [a,b,c,d] = f (f (f (f z a) b) c) d
When we pass iter to doQuery it uses iter much like foldl uses f. In the call to foldl, I called the seed z, and it is passed as the first parameter to f. First f will combine z and a. Then the result will be passed to f along with b. The result keeps getting fed to f in this manner till we hit the end of the list. So the first parameter to f is an accumulator because f can pass to itself the result of the previous call of f. The first time f is called it will receive the seed value as the accumulator.
When we defined iter we used the same exact convention, but you didn’t see the seed in the definition because the function result’ hides this detail for convenience. You can see it in the type when we expand out the type synonym IterAct.
Where is the seed? The seed was the last parameter we passed to doQuery. In the example above, it was Nothing, and iter ignored it. As this tutorial series expands, I will show you how to use the seed value.
Time to recap.
One simple rule is: Always try to give an explicit type signature to your iteratee function, here we called it iter. If you get it wrong, the error message will be more forgiving than the error message you get from using it with doQuery.
Another important lesson we learned is that our iteratees work with doQuery in exactly the same way as functions passed to foldl. Specifically, they take an accumulator that has the same type as the seed we pass to doQuery. The iteratee is responsible for combining the accumulator and a ‘current value’ to produce a result. In a future tutorial in this series, I will show you how to accumulate a list of results in the iteratee.