## 18 January 2012

### Keeping the "science" in Software Development

If you already feel that math is a science you can safely skip the first section.  What follows are my musings on what it means to be "correct", especially in software.  I pose a challenge at the end for practitioners and researchers.

Math is science too
According to Wikipedia the scientific method is:
Scientific method refers to a body of techniques for investigating phenomena, acquiring new knowledge, or correcting and integrating previous knowledge.[1] To be termed scientific, a method of inquiry must be based on gathering empirical andmeasurable evidence subject to specific principles of reasoning.[2]
At first glance it may appear that mathematics doesn't follow the scientific method above. The mathematics that you're most likely familiar with is a collection of definitions and the facts, lemmas and theorems, about those definitions that result from applying logical deduction.  You might notice that if you reason that way, then textbook applications of newton's three laws do not follow the scientific method either.

You might argue that in this case the applications of newton's laws is now mathematics and no longer science.  I would agree that it is mathematics and science.  Let's see if I can convince you.

By the time you see the definitions and corresponding theorems in a math text they have been reduced to facts.  The trial and error, as well as correction and integration of previous knowledge, have already happened.  The same is true of a textbook that shows you how to apply newton's three laws.  I recommend taking a moment to familiarize yourself with Lockhart's lament.

Math only seems to not be a science because of our relationship to learning it.  Most of us get taught math as facts.  If you've taken a proof based course you may have had the experience of searching for proofs.  There is a lot of trial and error involved.  Mathematics as a field may have already integrated previous knowledge, but if you are taking a course chances are you have not.  So you employ trial and error as well as hypothesis refinement to build logical deductions to establish what your intuition, or pen and paper examples, says in true.

The set of assumptions that you choose can make or break a result.  Even though mathematical results are timeless, they still bend to the sway of assumptions.  This is exactly what happens to empirical scientists too.  They think they know something, or have the right model, but later it turns out they reasoned from a flawed assumption and suddenly new data transform their field.

Evidence vs Proof
If you've studied proofs in different logical settings you may have noticed that what constitutes a proof depends on what you seek to gain from the proof.  See for example classical logic versus constructive logic.

In a court case or a forensics study, people look for evidence or witnesses to give data points in support of a particular position within a larger case (or claim).

Think about that for a minute.  A claim is made, then people try to establish it or refute it.  Often times, the assertion needs only be accepted or rejected by a "reasonable" skeptic that is willing to accept partial proof.  Such a skeptic might be a jury or peers reviewing a publication for acceptance in a journal.  Because we used partial proof new information could turn up later that reverses our understanding.

At the other extreme we have mathematical proofs, which are composed of logical statements following from assumptions.  Proofs are timeless.  If we get our logic right and we're clear about our assumptions then the result will stand forever.  I say "forever", but in reality, proofs can be found to be incomplete, insufficiently rigorous, or using the wrong logic at a later date.  Sometimes years after the arguments are accepted.

It seems that the difference here isn't in whether the resulting "facts" are immutable but rather the strength with which we intend to make the claim.  In the forensics case, we are making a claim to the best of our incomplete knowledge about what happened.  In the case of mathematical proof, we are making a claim that should be true forever.

In my opinion, the difference between "evidence" and "proof" exists to quantify the strength of our assertion that the claim or conjecture is true.  Claims can roughly be placed on a continuum from weak to strong.  Some claims, such as "a human authored this blog post", are weak in nature and need little evidence.  Other claims, such as "the author of this blog post can see through walls", is a hard to believe as there is good evidence that humans can't see through walls. To prove the second claim I would hope that you demand really strong "evidence".  Perhaps in the form of a demonstration, a scientific model for how I can do it, and validation from peer review.

Due to the continuum for strength of evidence described above, I consider proofs to be a form of evidence.  Proofs, in the mathematical sense, being the strongest form of evidence.  Claims made by politicians would be on the extreme other end of my continuum.

The strength of the evidence for the claim has practical applications too.  I wouldn't engineer a shuttle to the moon based on arguments built by politicians.  I would base my engineering on mathematical proofs as getting to the moon requires a high degree of correctness.

So then, what level of evidence is right if I need to be correct?

Evidence Based Correctness
Now we're really at the heart of the matter.  Sometimes it just doesn't matter if we're correct or not, for example when arguing with a spouse.  Other times it matters deeply, like when we want to engineer a shuttle to the moon.

While mathematics is obsessed with being unequivocally true, other branches of science have been trying to deepen their understanding while accepting a greater risk of being wrong.  This results in a different level of evidence required for progress.

In other words, you have to figure out how important it is to you (or your application) that you know you're correct versus it being reasonable that you are correct.  I like continuums and this is no exception.  I believe that you can build a continuum of how strongly you believe things.  This continuum is a natural dual to the continuum for strength of evidence.

In other words, the bolder your claim is, the stronger the evidence you need.  Unfortunately, no one has a magical formula for knowing when evidence is strong enough.  If we did science would become more precise, court would become more cut and dry, and mathematicians could relax and quantify results in terms of proof strength.

Now wait a second.  Quantify results in terms of proof strength?  That sounds like statistical reasoning doesn't it?  Statisticians are able to cobble together data points and talk about the strength of that evidence in terms of supporting a particular hypothesis.

In software development, testing is probably the cheapest way to have some evidence that your program is correct.  Testing is almost never enough to have a proof of correctness though.  On the other end of the spectrum we have theorem proving, which is known to be costly in terms of human effort.

My point is this, you don't have to force yourself into a staunch mathematical camp to have "correctness" in your software development.  Nor do you have to give up on correctness in order to fit into a budget.  It's not a dichotomy where you are forced to give up on either truth or practicality.  Testing is not "wrong" and proofs are not "required".

What is required is having a level of evidence that is strong enough to support the level of belief we need for our chosen level of correctness.  Currently we have to carefully evaluate where on the continuum of correctness our result rests and then pick a commiserate level of evidence.  You probably do this already intuitively to some degree.  My challenge to you is to do this explicitly and consciously.

Furthermore, can we take the precedent set by the statisticians and find a way to quantify strength of evidence?  Can testing and theorem proving be unified on a continuum by this measure?  Is there a way to look at a successful run of a test suite, that may or may not include theorem proving, and quantify the strength of that evidence that our program is correct?

#### 1 comment:

1. You might be interested in work on "self-testing", e.g. http://scholar.google.com/scholar?&q=%22self-testing%22+%22self-correcting%22