Archive for the 'Haskell' Category

Phantom Types, Existentials and Controlling Unification — Part 1

Monday, November 10th, 2008

A phantom type is a type that has no value associated with it, such as the following:

data P phantom = P Int

Above, the type “phantom'’ has no value associated with it on the right-hand side of the equal sign. This means that whenever we construct a value of type P we also need to give a type for phantom, but because it has no value associated with it to constrain its type the type system can make it unify with anything.

For example these are all valid:

P 5 :: P String
P 5 :: P [Int]
P 5 :: P (IO ())

The reason we care about phantom types is that they allow us to embed extra bits of information in our types. In this regard, you can think of phantom types as a tagging system for types. This allows you to, for example, encode a simpler type system within Haskell’s types. You could use this when making an evaluator for a language in Haskell.

Now, there is a well known problem with the unification above. The problem is that P can be made to unify with all kinds of things. So people often use smart constructors to control the unification. For example, P would be declared in a module and the data constructor would not be exported from that module and instead you’d export functions like this:

mkIntP :: Int -> P Int
mkIntP n = P n


mkStringP :: String -> P String
mkStringP s = P (length s)

Now you’ve controlled how the unification of the phantom type by not allowing users of your data type to choose how it unifies.

Suppose we don’t know the full extent of the types that the user wants the phantom to unify with. Which is to say, there are an unbounded number of types for which the phantom can unify but you want to give the user of your code a way to control the unification.

Let’s talk about existentials for a moment. We can give existential types by using a language extension that allows us to explicitly give a “forall” in the type. Now the oddity of giving an existential with a universal quantification is well documented but I won’t discuss it here. You might create a Seal type like this:

data Seal a = forall x. Seal (a x)

Now when we put a value inside a Seal we forget everything we know about the type x. The only thing we remember about x is that it exists. This means that when we open up the seal:

f :: Seal a -> ()
f (Seal a) = ()

We have to invent a new type for x. Here the type system is smart and creates a new fresh name, let’s call it an eigenvariable, for x inside the pattern match of f.

This eigenvariable for x is distinct. The only type it is equal to is itself. This is because when we put x in the Seal we agreed to forget everything we ever knew about it–except that it exists.

We could try this:

g :: Seal a -> a x
g (Seal a) = a

Here the type system is very smart and complains. The error message is a bit confusing, but what it is trying to tell us is that we cannot safely let the eigenvariable for x escape to a higher scope. Letting this happen has implications I won’t go into.

Now, remember what I was saying about letting phantom types unify and wanting to control the unification? Well, Seal gives us a way to let the user put whatever types they want in the phantom type and it gives the user a way to control how that type unifies!


mkSealP :: a -> Seal P
mkSealP a = Seal (P a)

Now the user of our code can make a P with an arbitrary phantom type that we didn’t have to anticipate with a smart constructor and the user gains back control over how the phantom type of P will unify. With the current bit of code it won’t unify with much :)

Now we’ve moved the problem from unifying with too much to not unifying with anything. Next time I’ll discuss some strategies for recovering information about x so that you can do something meaningful with that type.

Darcs Hacking Sprint — Summary from Portland Team

Tuesday, October 28th, 2008

The weekend of 24-25 October 2008 was an International darcs hacking sprint! The sprint was a lot of fun and we’ll be having more. The sprint provides a very productive atmosphere for hacking.

We had a team in Brighton with posts from Day 1 and Day 2. We also had team in Paris but I don’t have a link for them.

Here are just some of the highlights from the Portland Team:

  • Adding language pragmas in all files:
    • makes the code cleaner when it’s time to drop ghc6.6 support
    • all required language extensions are now known
    • makes it easier to check for Haskell’ compatibility
  • Removed OldFastPackedStrings
  • Replaced FastPackedStrings api in favor of Data.ByteString api
    • lots of small optimizations, less pack/unpack, more standard
      ByteString code
    • removed a fair bit of C code, new code compiles to same or
      faster assembly (Don checked)
  • cabalization:
    • no autoconf or make needed
    • cabal install tested and working on linux / osx, windows testing
      soon to follow
    • builds out of the box w/ 6.8 and 6.10
    • configure is much faster
  • module graph (depends on cabalization)
  • Duncan improved zlib
    • soon to be available on hackage
    • allows us to replace our own implementation of zlib bindings
      with the main stream one
    • will make building on windows easier
    • can use lazy bytestrings

Here are some random pictures from the Portland Sprint.

Packages we should consider:
Packages we should consider

The TODO list we made on the first day:
The TODO list we made on the first day

Checking on Team Brighton:
Checking on Team Brighton

Duncan and Jason looking at the projector:
Duncan and Jason looking at the projector

Ah, beautiful Portland in fall:
Ah, beautiful Portland in fall

Darcs 2 Real-World Push Performance Evaluation

Thursday, August 21st, 2008

Thanks to Eric Kow, Duncan Coutts and Ian Lynagh we have some great timing data for using darcs2 and darcs1 to push patches over ssh.

Eric wrote a script to test three different scenarios of using darcs to push patches:

  1. Scenario l1r1:
  2. This is a local darcs1 client talking to a remote darcs1 executable.

  3. Scenario l1r2:
  4. This is a local darcs1 client talking to a remote darcs2 executable.

  5. Scenario l2r2:
  6. This is a local darcs2 client talking to a remote darcs2 executable.

Next, Duncan and Ian provided us with access to 131 real-world repositories hosted at http://code.haskell.org. We ran the script to push patches to each repository, this gave us a ton of times. Then in Excel we crunched these numbers to see that not only is scenario l2r2 no worse than the other two, it’s actually faster on the time consuming cases!

The one caveat we found is that the minimum start-up time for the first two scenarios is 1 second and in the last scenario it’s 2 seconds. I’m confident we can shave off this 1 second difference in the future.

This is a histogram that shows you how the push times distribute, click on it for a large image. Along the bottom we have how many seconds the push took, and along the vertical axis we have the number of data points in that range. At a glance you can see that most repositories take just a few seconds to push. We can also see that darcs2 is slower on small pushes by about one second. Darcs2 in this chart corresponds to l2r2 and darcs1 corresponds to l1r1.

On a side note, we also tested converting all the repositories to darcs2 repository format and that worked great as well. Converting all the repositories at once takes less than 20 minutes on my laptop without a single error. There were a few warnings, but that’s to be expected as potentially exponential merges are fixed in the new darcs2 format, but darcs emits a warning when fixing them.

For anyone that wants to see the raw numbers click here. The link does work, but not all web browsers are showing the numbers. Opera and FF3 work on some platforms and not others.

Simple Unit Testing in Haskell

Friday, September 1st, 2006

Recently I started using QuickCheck but things were a bit hard to get working so I’ll help write down what I’ve learned now that it’s working nicely.

I wanted to store all my tests in one file, say, Tests.hs and only mention them once in the entire code base. So, once a test is defined I want everything else to be automated, I don’t want to have to list it for inclusion in a harness or any of that junk.

Prerequisites

My setup requires Template Haskell (TH) and a Haskell parser which means you’ll need GHC. You’ll need quickcheck and a desire to test :-) I don’t assume any knowledge of TH, quickcheck, cabal or parsing haskell but I don’t really explain them either. If you get lost by my lack of details shoot me an email or post a comment and I’ll be happy to clarify.

Setup

Template Haskell has a restriction on top level splices that says you cannot use a function in a splice if the function was defined in the same module as the splice. I already have a file in my project called “Utility.hs” where I store my general purpose and misc. functions so this is where I place things to be used in top level splices. This will make more sense when we get to mkCheck.

When you define a test for QuickCheck the name always begins with “prop_”. Here is an example test:

-- from Tests.hs
prop_lrotate1 xs = lrotate (length xs) xs == xs
  where types = xs :: [Int]

This test says, whenver you rotate a list by its length you better get the original list back (obviously we are assuming a finite list). This test, also known as a property in quickcheck terminology, goes into Tests.hs.

In Utility.hs I’ve defined some functions to read over Tests.hs and pull out the names of any properties. I decided to use a Haskell98 parser just to be safe, but you could use regular expressions here.

-- From Utility.hs
{- | looks in Tests.hs for functions like prop_foo and returns
  the list.  Requires that Tests.hs be valid Haskell98. -}
tests :: [String]
tests = unsafePerformIO $
  do h <- openFile "src/Tests.hs" ReadMode
     s <- hGetContents h
     case parseModule s of
       (ParseOk (HsModule _ _ _ _ ds)) -> return (map declName (filter isProp ds))
       (ParseFailed loc s’)            -> error (s’ ++ ” ” ++ show loc)

{- | checks if function binding name starts with @prop_@ indicating
 that it is a quickcheck property -}
isProp :: HsDecl -> Bool
isProp d@(HsFunBind _) = “prop_” `isPrefixOf` (declName d)
isProp _ = False

{- | takes an HsDecl and returns the name of the declaration -}
declName :: HsDecl -> String
declName (HsFunBind (HsMatch _ (HsIdent name) _ _ _:_)) = name
declName _                                              = undefined

Why do I need the unsafePerformIO? Well, that’s to get around a little problem I was having with top level splices. Perhaps if I were a little bit better with Template Haskell I wouldn’t need it. In this case it’s perfectly fine because we won’t be changing Tests.hs while we’re compiling the testsuite so the list of tests will not change while we’re using this function. Now that we have a list of test names we can build an AST to execute the tests. This is where Template Haskell comes in.

-- From Utility.hs
mkCheck name = [| putStr (name ++ ": ")
               >> quickCheck $(varE (mkName name)) |]

mkChecks []        = undefined -- if we don't have any tests, then the test suite is undefined right?
mkChecks [name]    = mkCheck name
mkChecks (name:ns) = [| $(mkCheck name) >> $(mkChecks ns) |]

You can get fancier if you like, but I simply output the name of the test right before the status. That way when a test fails it’s easy to see which one.

Next we create a very simple module, Unit.hs, to run the tests for us.

{-# OPTIONS_GHC -fno-warn-unused-imports -no-recomp -fth #-}
module Unit where

import Utility -- our TH functions
import Tests -- our test cases

runTests :: IO ()
runTests = $(mkChecks tests)

The GHC options will take a bit of explaining. I’ll get back to why -no-recomp is there when I talk about cabal. The -fth is for template haskell, you’ll need that in Utility.hs also. If you compile with -Wall -Werror then -fno-warn-unused-imports keeps GHC from complaining about importing Tests. You need the import because we splice the test cases in but the unused imports check doesn’t know about what we’re doing with TH.

Alright, at this point all you need to do build and run your tests is:

ghc --make Unit.hs -main-is Unit.runTests -o unit

Followed by

unit

(or use unit.exe if you’re on windows)

Cabal

I went a step further and made things work with Cabal.

To do this, go into Setup.hs and make these changes:

import Distribution.Simple
import System.Cmd
import System.Exit

main = defaultMainWithHooks (defaultUserHooks { runTests = quickCheck } )
  where
  quickCheck _ _ _ _ = do ec <- system $ "ghc --make -odir dist/build -hidir dist/build -idist/build:src src/Unit.hs -main-is Unit.runTests -o unit"
                          case ec of
                            ExitSuccess -> System “unit”
                            _           -> return ec

Here I’m assuming you keep your source code in the “src” directory relative to the .cabal file. Now after you build, you should be able to test with, “runghc Setup.hs test”. I mentioned I’d talk more about that -no-recomp option in Unit.hs. I noticed that whever I compiled my program then compiled Unit.hs everything went smoothly but when I compiled Unit.hs in the normal flow of compiling my program that I would get errors about undefined symbols when I typed “runghc Setup.hs test”. To force ghc to always rebuild Unit.hs you just need to add -no-recomp. Another option would be system “touch src/Unit.hs” right before the ghc line in quickCheck above.

Note: The setup described here matches mine as close as possible without some extra details specific to my project (I have a rule in my cabal file for building a dll. I didn’t show it here, but I’d be happy to send it to you if you need such a thing :). So it’s possible I’ve left out something simple like an import somewhere. So if you try these steps and get stuck let me know.

Improving your tools with open source

Wednesday, August 9th, 2006

I’ve been using Haskell to work with SpreadSheetML. There is a nice XML library called HaXml which comes with a tool to convert DTDs into Haskell code (gives you code for reading, writing and manipulating XML in your schema).

When I hand coded a spreadsheet using the definitions provided by HaXml I found it to be too verbose. Fortunately, dtd2haskell is covered by the GPL. I grabbed the source code and after just a few hours of hacking I had fixed one bug and I added named records to most of the generated datatypes (there is one case that my dtd doesn’t use which I didn’t bother to extend). Now that I have named records I can add constructor functions for all the elements so that optional and default parameters are assumed, meaning you’ll only need to specify the attributes/children that you care about. This should reduce my 1200 line hand coded spreadsheet to probably about 10-15 lines of code.

Next I’ll probably also add functionality so that dtd2haskell generates instance declarations for the “Data” type class ala “Scrap Your Boilerplate
(this was a suggestion from someone on Haskell-Cafe).
It’s so nice that I’m able to customize my tool to help me.

The moral of the story seem so be that open source tools enable you to help yourself. In this case, I didn’t have to wait for the original authors to add features for me. I was able to step into the source, add what I need and then go back to hacking the important features.