Archive for April, 2009

As you build a relationship with various languages, it seems like each one has its own little points of attractiveness that you miss having around when forced into another context.  For instance, Ruby (and in many ways Perl) has a bunch of really succinct, elegant operators that contribute to the satisfaction of using the language.  Honestly, Ruby has this delight thing going for it that is difficult to pinpoint or quantify, but absolutely has a presence sorely pined after when working with other languages.

Since operators tend to be easy to overload in many languages, I try to bring the more helpful ones along from one language to another.  F# has the nice property of not only supporting operator overloading, but operator definition as well.  This is great for when you want to grab an operator from another language and add it to the mix.  Since F# doesn’t implement many of my favorite Ruby operators, we’ll put together some implementations here.

A Quick Word

F# has a very straight forward syntax for declaring both infix (binary) and prefix (unary) operators, which can be stand alone definitions or static members of types.  Operator definitions are distinguished by parenthesis around the operator identifier.  Here we have a stand alone definition of a robust division operator that returns a tuple of the result and the remainder.

    let (/%) left right =
        left / right, left % right

This operator is assumed to be infix because it has two parameters.  If we had just a single parameter, it would be considered prefix.  We could place this same definition on a type as a static member as well.

    type System.Int32 with
        static member (/%) left right =
            left / right, left % right 

EDIT:  It turns out that operator type extensions are currently unsupported as of the Sept 2008 CTP.

In either case the operator can then be used like any other:

    > 50 /% 15;;
    val it : int * int = (3, 5)

F# lets you create operators out of any* sequence of the following characters: !$%&*+-./<=>?@^|~: so knock yourself out.

*There are some special rules, for example any operator that starts with anything in the set of { !, ~, ? } must be unary.

Sequences, Lists, Arrays, etc

Many of the operators defined below operate on collections.  I’m choosing to make these List operators since I work mostly with Lists and the results of the operation print nicely for blog purposes (as opposed to sequences that only print 4 deep).  In practice, I would implement these on Sequence so that any collection type is supported, but these examples are just for demonstration.  Further, any of these can easily be added to the any of the standard collection types directly (list, array, etc) by using the type extension syntax above and replacing calls to the List module with calls to the appropriate function on Seq, Array, or whatever.

Also, the F# compiler is protective of well known operators and will emit warnings if you change (extend) their definitions.  Primarily you’ll see things like: warning FS0086: The name ‘(&)’ should not be used as a member name. If defining a static member for use from other .NET languages then use the name ‘op_Amp’ instead. To avoid these clashes I’ll throw a dot at the beginning of my new operators.

The Ruby Collection Operators

The operators defined for Ruby collections are intuitive, concise, and handy for many common activities.  Need to identify who hasn’t responded to an invite or which nodes haven’t completed an operation?  Use the difference operator.  Need to identify elements common to several collections?  Use the intersect.  None of these are ground shaking new concepts, but the terse, clear representation is great for readability and utility.

The difference and intersect operators make use of a temporary HashSet to allow constant time lookup of values in the right hand collection.  This yields O(m+n) time complexity instead of the O(m*n) you’d see in a brute force linear lookup.

Concatenation (+)

The concatenation operator is very straight forward and produces a composition of both collections allowing duplicates.

    let (.+) left right =
        List.append left right

    > [1;2;3;4] .+ [3;4;5;6];;
    val it : int list = [1; 2; 3; 4; 3; 4; 5; 6]

EDIT:  There is also a (@) operator that achieves this behavior.

Difference (-)

As expected, the difference operator produces a collection of items contained in the left argument that do not appear in the right argument.  Again, this is nice for singling out people who haven’t responded to an invite or operations that haven’t completed.  The HashSet is chosen because of its constant time lookup.  We avoid in this case the standard Set since it is based on a binary tree and thus yields logarithmic time.

    let (.-) left right =
        let mem = HashSet<_>(right)
        left |> List.filter (fun n -> not (mem.Contains n))

    > [1;2;3;4] .- [3;4;5;6];;
    val it : int list = [1; 2]

Intersect (&)

The intersect operator makes use of the HashSet similar to the difference operator and produces a collection of items common to both the left and right arguments.  Useful for “you might also like this”-style suggestion algorithms and trend analysis.

    let (.&) left right =
        let mem = HashSet<_>(right)
        left |> Seq.filter (fun n -> mem.Contains n)

    > [1;2;3;4] .& [3;4;5;6];;
    val it : int list = [3; 4]

Union (|)

The union operator produces a collection of items similar to concatenation, but with uniqueness of constituent items.

    let (.|) left right =
        Seq.append left right |> Seq.distinct |> List.of_seq

    > [1;2;3;4] .| [3;4;5;6];;
    val it : int list = [1; 2; 3; 4; 5; 6]

Repetition (*)

In Ruby and Perl the repetition operator applies not only to collections, but strings as well.  For our case here we implement the collection version.  One useful case is in generating input sequences for automated tests.

    let (.*) n list =
        let res = List.empty
        [1 .. n] |> List.fold_left (fun acc i -> List.append acc list) res

    > 3 .* [1;2;3];;
    val it : int list = [1; 2; 3; 1; 2; 3; 1; 2; 3]    

Regular Expression Binding Operator (=~) (Perl/Ruby)

For simplifying the representation of regular expression matching, Perl and Ruby make use of a Binding operator that tests the string on the left hand side against a pattern on the right.  If the string is a match for the pattern, the expression returns true.

    let (=~) s pattern =

    > "Mississippi" =~ "M(i(ss|pp)*)+";;
    val it : bool = true  

A Functional Binding Operator (=~~)

One Perl/Ruby facility not included in the .net Regex implementation is the population of $~ variables after a match operation.  If the matched pattern had groups that need to be referred to, the $1, $2, etc variables can be used after the =~ match to access the data.  In our non-Perl/Ruby case, we don’t get the magic $~ variables so the =~ isn’t as useful since we don’t have a resultant Match object with our data.  To deal with this we’ll make a new operator =~~.

    let (=~~) s pattern =
        let m = Regex.Match(s,pattern)
        if m.Success then Some m
        else None

    > "Mississippi" =~~ "M(i(ss|pp)*)+";;
    val it : Match option = Some Mississippi {Captures = seq [...];
                                Groups = seq [...; i; pp];
                                Index = 0;
                                Length = 11;
                                Success = true;
                                Value = "Mississippi";}  

The idea here is to wrap the Match object in an option so we can use the operator in a pattern matching context rather then a simple conditional.

    let testMatch s =
        match s =~~ "^M(i(ss|pp)*)+" with
        | Some m -> printf "%s\n" (m.ToString())
        | None -> printf "No Match"    

This avoids the need to rely on implicit $~ variables and feels much more like a functional idiom.

The Null Coalescing Operator (??) (C#)

For dealing with many common situations where a potentially null value needs to be replaced with a dependable backup, C# uses the null coalescing operator.  EDIT: the F# compiler ignores leading . and $ characters not only for determining the precedence of operators, but also for identifying if they are prefix or infix.  For clarity, I’ll avoid use of ?? altogether in favor of $$.

    let ($$) left right =
        match left with
        | null -> right
        | _ -> left   

This tests the left hand side for null and if a non-null target was given that value is used.  Otherwise, the right hand argument is used instead.  This is very similar to the Ruby idiom of testing a value with ||= to initialize with a known value in the case of nil.

A Functional Null Coalescing Operator

The standard null coalesce is useful when dealing with .net objects from libraries external to F#.  When working with F# types however, null doesn’t come up anywhere near as often as option types since it is preferable to allow the type system to handle null (None) values.   Again, to feel more at home in a functional context, we’ll create a new operator $$$.

    let ($$$) left right =
        match left with
        | Some _ -> left
        | _ -> right  

This then allows you to test a value that may potentially be None and substitute a known value should the case arise.

    let userPrefs = getUserSpecifiedPrefs() $$$ getDefaultPrefs()   

Nice.  So yeah, these are my favorites.  I’d love to hear feedback on improvements that can be made or interesting operators from your favorite language that I’ve ommited.



Read Full Post »

Remember those Real Men of Genius Bud Light commercials on the radio? You know, the ones that salute unsung heroes like “Mr Hawaiian Shirt Pattern Designer,” “Mr Beach Metal Detector Guy,” and “Mr Male Fur Coat Wearer” with David Bickler singing rock ballad style in the background?

If they did one of these for Mads Torgersen, it would have to be:

“Mr Creates Recursive Lambda Expressions in C# via Y-Combinator Generated Fixed Point Functions Guy.”

And it wouldn’t be mocking anything because seriously Mads, you are my freakin’ hero.

I don’t come from a deep academic CS background rich with experience in the Lisp and ML families, so the functional paradigm has been something I’ve slowly grown into. Right now I’m working my way through Expert F# (by Don Syme [F# creator] and others) and F# for Scientists while trying to reinforce my understanding of each topic by devising comparable implementations of each example (where possible) in C#, ruby, or some other familiar imperative curly braced language.

For instance, right now I’m looking at the subjects of precomputation and memoization. You can think of memoization as an optimization technique that involves caching the results of each call to a given function to avoid the hit of continual recalculation. In other words, a function remembers its own results for previously applied inputs.  So if there is some expensive computation needed to arrive at an answer, you hang on to the result (and the input) for future reference so if someone calls the function again with the same input you can turn around and give the result without all the hairy computation.  This is primarily applicable to recursive functions which happen to be very common in functional programming.

For example, consider Fibonacci:

    let rec fib n =
      match n with
      | 1 | 2 -> 1
      | _     -> fib(n-1) + fib(n-2)

The Fibonacci implementation is very straight forward and really simple with recursion, thus its become sort of the functional hello world.  Something you’ll notice is that running this for large numbers is heinously slow.  The reason for this is that through the recursion the same calls are occurring over and over again.  Look how much redundancy you get even for fib 7.


If we wanted to quantify how many calls are going on for any given n we could wire up our fib function like so:

    let fib n =
        let calls = new Dictionary<_,_>()
        let rec trackedFib n =
            if calls.ContainsKey(n) then calls.[n] <- calls.[n] + 1 
            else calls.Add(n,1)
            match n with
            | 1 | 2 -> 1
            | _     -> trackedFib(n-1) + trackedFib(n-2)
        (trackedFib n, calls)

All this is doing is embedding a dictionary that keeps track of how many times fib is called with each input.  The function returns a tuple of the actual calculation result as well as the dictionary that tracked each call.  So lets see what this yields for an input of 10.

    let (res, calls) = fib 10

And if we examine this in F# interactive we get:

    > calls;;
    val it : Dictionary
    = dict
        [(10, 1); (9, 1); (8, 2); (7, 3); (6, 5); (5, 8); (4, 13); (3, 21);
         (2, 34); (1, 21)]

Hey wait a minute.  Look at that output again.

        [(10, 1); (9, 1); (8, 2); (7, 3); (6, 5); (5, 8); (4, 13); (3, 21);
         (2, 34); (1, 21)]

No THAT isn’t what I expected, but yeah it makes sense that the number of calls would follow the Fibonacci sequence given we are recursively counting it.  The weird value for the 1 case occurs because if we hit 2, we terminate without proceeding to 1.  (look at the tree for fib 7 again)  Had each 2 continued to a 1 we would have 34 more 1’s which would give the expected 55 total calls.

Ok, so the standard recursive implementation is really slow because of all the repeated calls and will get slower and slower as input gets bigger.  So if we update the fib implementation to remember the values of each call, we can short circuit most of the recurring chains.

    let mem_fib n =
        let cache = new Dictionary<_,_>()
        let rec cachedFib n =
            match n with
            | 1 | 2 -> 1
            | _     ->  if cache.ContainsKey(n) then cache.[n]
                        else let result = cachedFib(n-1) + cachedFib(n-2)
        cachedFib n 

Notice how the dictionary is now caching the results of the call instead of the number of times called.  We can test this and see we get the right answer.

    > mem_fib 10;;
    val it : int = 55

Good.  And to demonstrate that its way faster you can wire the memoized version with call tracking:

    let fast_fib n =
        let calls = new Dictionary<_,_>()
        let cache = new Dictionary<_,_>()
        let rec cachedTrackedFib n =
            if calls.ContainsKey(n) then calls.[n] <- calls.[n] + 1 
            else calls.Add(n,1)
            match n with
            | 1 | 2 ->  1
            | _     ->  if cache.ContainsKey(n) then cache.[n]
                        else let result = cachedTrackedFib(n-1)
                                          + cachedTrackedFib(n-2)
                             cache.[n] <- result
        (cachedTrackedFib n, calls)

Run for fib 10:

    let (mem_res, mem_calls) = fast_fib 10

And see that the call graph is far smaller:

    > mem_calls;;
    val it : Dictionary
    = dict
        [(10, 1); (9, 1); (8, 2); (7, 2); (6, 2); (5, 2); (4, 2); (3, 2);
         (2, 2); (1, 1)]

Nice.  We went from 109 calls down to 17.

My imperative roots are beginning to show in that last example, so lets make the result a little more functional.  First of all the standard generic memoize function looks like this:

    let memoize f =
        let cache = new Dictionary<_,_>()
        fun n ->
            if cache.ContainsKey(n) then cache.[n]
            else let result = f n

and then we can add a generic tracking function:

    let calls = new Dictionary<int,int>()
    let trackCalls f =
        fun n ->
            if calls.ContainsKey(n) then calls.[n] <- calls[n] + 1
            else calls.Add(n,1)
            f n

and finally the fib impl:

    let rec memoizedFib =
        ( fun n ->            
            match n with
            | 1 | 2 -> 1
            | _     -> memoizedFib(n-1) + memoizedFib(n-2) )
        |> memoize |> trackCalls

Much better. And we reverify:

    > memoizedFib 10;;
    val it : int = 55

    > calls;;
    val it : Dictionary
    = dict
        [(10, 1); (9, 1); (8, 2); (7, 2); (6, 2); (5, 2); (4, 2); (3, 2);
         (2, 2); (1, 1)]

Looks like it works to me.  Ok, so how do we implement this in C#?  Well, my first inclination was to write it up as a standard lambda expression with the same look and feel.

    Func<int, int> fib = n => n <= 2 ? 1
                              : fib(n - 1) + fib(n - 2);

But this comes back with an error reading: Use of unassigned local variable ‘fib’. What?  Use of unassigned local variable?   I thought that I was declaring fib with that expression, right?  Well yes and no.  C# lambda expression syntax is really just syntactic sugar for a boatload of expression tree code that implements the anonymous method.  That code needs to reference fib to call it to perform the recursion, but fib hasn’t been fully defined yet since the expression tree itself is what defines fib.  That implies that at the end of the day, the current C# implementation as of 3.0 does not support pure recursive lambda expressions.  Crap.

It turns out there is, of course a workaround where you can define your lambda ahead of time and refer back to it to get a pseudorecursive result.

    Func<int, int> fib = null;
    fib = n => n <= 2 ? 1
          : fib(n - 1) + fib(n - 2);

This is good enough for many applications but isn’t pure and requires two expressions to implement, which is problematic for use in fluent interfaces like linq.

So I started searching around to find out what the deal with this was.  That’s when I stumbled over an ancient post (2007) by Mads Torgersen.  This is a gem.  Its freakin’ awesome.  I mean seriously, go read it.  Right now.

Here’s the synopsis

  • C# can’t implement a recursive definition of factorial as a lambda
  • A fixed point for a function f is a value x for which f(x) = x
  • Its hard to find a fixed point for a function, but finding fixed points for functions of functions is a solved problem
  • If we define our factorial as a function of a function, its fixed point would be the implementation of factorial itself (which is what we want)
  • We can define omega as a function of a function that self applies.  This is recursive, but endless since it doesn’t describe input or a base case.
  • We can leverage the knowledge of our Lambda Calculus forefathers and use the Y-Combinator to create a self applying function that also applies another input function.  In this context, we get a fixed point generator.
  • Apply our factorial function to the fixed point finder and we get a true recursive lambda in C#.

It is granted that the C# generics syntax is really, really noisy and distracting, but it helps to make you appreciate Hindley-Milner languages all the more.  Mads’ walkthrough is suprisingly clear for such potentially opaque subject matter (to newbies like me at least).  And well, its just really, really interesting.

If you didn’t actually read it, really, go do so.

Mads you are a giant among men.  Here’s to you, Mr Creates Recursive Lambda Expressions in C# via Y-Combinator Generated Fixed Point Functions Guy.

Read Full Post »