Tea Drinking Development No sooner does one's right, already works.

X

This is X.

Paying It F#rward

This article is an entry in the 2019 F# Advent Calendar in English. Thanks to Sergey Tihon for organizing it, as always!

The F# community, while small compared to those of many other languages, is known to be exceedingly open and helpful to beginners. I experienced this myself when I started learning the language nearly six years ago, and people (especially the F# MVPs) were always quick and eager to answer my newbie questions on Twitter, or wrote extensive answers on Stack Overflow or Code Review. Having had this experience early on has caused me and others to try and make the start easier for those who come after us where we can, be it on the aforementioned platforms or for example in the mentorship program run by the F# Software Foundation.

Lighthouse

Another platform with the specific aim of providing mentoring for learning programming languages (F# is one of over 50 language “tracks” at this point, from assembly and Bash to Haskell and Prolog) that I’ve been involved with over the past eight months is Exercism.

The concept of Exercism has two pillars - test driven development, and dedicated feedback from people with experience in the specific language you are trying to learn, whom the site calls “mentors”.

You download an exercise via the command line client - for F#, that gives you a .NET Core project with two files, one of which contains unit tests, and the other is the file in which you implement your solutions. The tests start out mostly skipped; the idea is to implement enough of the solution to get one test to pass, and then un-skip the next test and make that pass in turn. There’s a lot of similarity here to the well-known concept of “katas”, except that writing the tests is not part of the exercise, as they already exist.

When all the tests pass, you can submit the solution through the client, and can then request feedback from a mentor. One of the mentors will then pick the solution from the queue and make suggestions for improvement; you can then update your solution and submit a new iteration, and then either get more suggestions for another round, or the mentor will “approve” your solution, which you can then publish on your profile and move on to the next exercise.

If you are going through the language track in “mentored mode”, and the exercise was one of the “core” exercises, there will also be a new core exercise and probably several “side” exercises unlocked. (The other mode is “practice mode”, where all exercises are available immediately, and you can tackle them in any order; you will still be able to request mentor feedback in this mode.)

The Art Of Reason

My task as a mentor is to help the students take their solution from working code to an idiomatic, readable and maintainable implementation by suggesting improvements. These can be more cosmetic things, such as clearer naming or more readable formatting, changes around the usage of various language constructs, types or core library functions, or sometimes a completely new approach if a solution is very imperative or, in the other extreme, goes overboard with functional programming gimmicks.

The idea there of course is not to just give people finished code and say “this is what you should have done”. Instead, I will point out things that could be improved, explain why I think that is, and often mention specific language concepts (say, Active Patterns) or functions in the core library to consider. There is also no point in students just writing what I tell them to get me to approve a solution - the goal is always to help people get a better understanding of the language and build habits for writing better F# code. I have been reading Kit Eason’s “Stylish F#” recently, and this early paragraph sums up my idea of mentoring perfectly:

In that spirit, I aim to let students keep as many of their own ideas as possible, even if the result is not the exact code I would have written. A handful of the simpler exercises don’t leave a lot of room for variation, and there’s exactly one good solution, sometimes down to the letter. For many, the idea will always be more or less the same, but details of the implementation can vary, and for the more complex exercises, pretty much every solution will be unique to that student, from the initial attempt through the entire trail of revisions.

Keen Of Knowledge

The students on the F# track come from all kinds of different programming backgrounds (where I can tell). Of course, a lot of them have experience with C#, but many don’t know .NET at all, and every now and then, there’s someone who’s well-versed in functional programming in languages such as Haskell or Lisp.

Consequently, students’ approaches and attitude can vary greatly. Often people will have a hunch that their solution is not optimal, but lack the knowledge about certain language features (such as function composition, or destructuring tuples instead of using fst and snd) or functionality available in the core library (like the functions in the Option module to avoid pattern matching on options). Some will naturally reach for mutable values or data structures, because it’s what they know (that doesn’t happen all that much, though). And those that already know functional programming will try to work around F#’s “inadequacies” regarding certain higher functional concepts (and have me stumped when they explain to me what they’re trying to achieve - with a Lisp example).

Related to that, every student will have their own specific target regarding how “good” they want the code to be. Many are content with achieving the minimum solution that will make me approve the code, while others will also implement additional suggestions or ask about was to improve things they think may not be great (as mentioned above). And occasionally, a student will just tinker on their own for a few days and and submit a number of iterations with all the ideas they tried. This is all of course dependent on the individual student’s general programming proficiency - more experienced programmers will more easily delve deeper when learning a new language.

Different students also behave very differently in terms of communication. Many never write anything in the comments area unless they’re really stuck and have to ask to get unblocked. Those typically seem to be people who are generally more experienced with programming and can work just with the hints I give them. Less experienced programmers of course ask more questions, because they may need more guidance to understand certain concepts. Others are just very interested about the background of various things, how things are implemented under the covers, or are concerned about performance.

Time, What Is Time

Depending on the student’s needs and understanding, and the specific exercise, the time I need to invest also varies widely. For some simple exercises, I usually don’t even run the tests anymore - I know what a good, working solution looks like, and I know how to get there from various other states, and many for many things I have the answers or suggestions in a text file I’ve built up over time (because I don’t want to write the same sentence suggesting Option.defaultValue 200 times; it also helps with being consistent in my answers).

For more complicated exercises, or when students ask very specific questions, the effort will be greater, sometimes by a lot - I try every improvement I want to suggest for a student’s code myself to make sure it actually works and makes sense, and occasionally I’ll even dive into the FSharp.Core source code for some background information (and everyone who’s seen that knows it can be a confusing place). In rare cases, a single reply will take me half an hour or more to work out, and while most solutions go through no more than five iterations (and many are approved sooner), every now and then one will reach ten or more.

The Learning

Now, all that can be a significant time investment at times, but it’s definitely worth it. One aspect is simply giving back to the F# community as mentioned and giving others an easier start. Another one is that teaching something is an important way to get better at it oneself, for a few reasons:

  • To explain not only how to do something, but also the reasoning behind it, I need to understand more than I would just for doing it.
  • If I know how to do something, I will usually always do it that way and have little reason to experiment. Seeing students trying to get the same thing to work without my body of knowledge often lets me see different ideas and occasionally even teaches me things about F# I did not know.
  • Seeing many different solutions for the same problems and thinking through ways to improve them trains my code reading, analytic and reasoning skills. I solved all of the core exercises and a number of the additional ones before I started mentoring in April, but many of the solutions from back then look quite different from how I would implement them now.

One bigger thing I need to mention here specifically: The exercise “Tree Building” is about refactoring working code to a more idiomatic solution, and as the instructions also mentioned it was “slow”, I tried to check solutions for performance as well, but did not really have a way to do that. When I mentored Scott Hutchinson’s solution, we got talking about that, and Scott suggested using BenchmarkDotNet, which I had no experience with.

With some sample code from Scott I set up benchmarking for the exercise for myself to be able to really judge students’ submissions in that regard, comparing them with the original code, the fastest solution I could find (which incidentally was Scott’s at the time), and mine (just out of interest). Then, to actually be able to tell students what they could to do improve the performance of their solutions, I needed to actually have made it faster myself - incidentally my own solution at that point was still fairly slow (and in particular allocated a lot of memory), so I didn’t have to look far. As I knew Scotts code was fast, I looked at that for inspiration - and with a few changes “accidentally” made mine not only faster than it was before, but even slightly faster than his (which probably mostly comes down to slightly lower memory allocations).

Now, while that helped me for mentoring, students still wouldn’t have a way to know how much faster or slower any changes made their code, which of course wasn’t ideal. I suggested officially adding benchmarking to the exercise and make the performance aspect an official consideration, an idea that Erik Schierboom, the maintainer of the F# track (among many other things at Exercism), was very excited about, and I eventually sent a pull request to implement the suggestion.

Final Thoughts

If you’re just starting to learn F# (or any of the 50 other languages), or if you want to improve your fluency in the language, have a look at Exercism. There are over 100 F# exercises covering pretty much any language feature (short of type providers), so you can keep yourself (and us) busy for a while.

And if you think you don’t need the practice, or at least not the mentoring, consider becoming a mentor; we’ve cleaned up the queue quite well in recent weeks, but we could do with one or two more active mentors who can spare an hour two or three times a week, to even out the wait times for students a bit more.

Feeding On F#

This article is an entry in the 2016 F# Advent Calendar in English.

Vorgeplänkel

First, I need to apologize for being so late with my contribution. I had actually booked a slot on December 20th, but my originally chosen topic turned out to be a very bad fit for a written article (I'd like to do that as a video at some point, though), so I had to start over with something else, which also proved very difficult to write for some reason, but here we go....

Chances are that you've come to this article from the 2016 F# Advent Calendar overview on Sergey Tihon's blog, where the full schedule is presented in a table showing the already written articles as well as the dates and authors for the remaining ones, so you get a full overview of the whole event.

One thing that this doesn't facilitate (as inherent in the medium) is keeping track of which articles you have or haven't read. What would be nice for that is an RSS feed, but we don't have that. We have all the information that we need, though, to build one ourselves - using F#, of course.

This will be a rather low-tech solution - we run a script that generates the XML file for the RSS feed, and then we put that somewhere on the Internet. That means there is some manual work involved once a day or so, but that way we only need somewhere to put a static file (I'm going to use GitHub Pages). This manual step can be eliminated if there's a place available to run a small webservice that checks the overview a few times a day and regenerates the feed on the server as needed.

Reading

We will use the HTML type provider from FSharp.Data to deal with the HTML. So we add the FSharp.Data package with Paket, reference the DLL in a new .fsx script file and create our Page type from the type provider.

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
#r @"..\packages\FSharp.Data\lib\net40\FSharp.Data.dll"

open FSharp.Data

[<Literal>]
let OverviewUrl = @"https://sergeytihon.wordpress.com/2016/10/23/f-advent-calendar-in-english-2016/"

type Page = HtmlProvider<OverviewUrl>

Note that this does not constrain us to processing that one specific page. As long as the structure is what we expect, we can still pass in a different page later.

If we look at the HTML for a single post, that is a single row in the table, we find this:

1: 
2: 
3: 
4: 
5: 
<tr>
  <td>Dec 15 (Thu)</td>
  <td><a href="https://twitter.com/ReedCopsey/status/790203670637912064">Reed Copsey, Jr.</a></td>
  <td><a href="http://reedcopsey.com/2016/12/15/christmas-trees-in-wpf-2016-edition/">Christmas Trees in WPF, 2016 Edition </a></td>
</tr>

What we need from this is the date, the author's name, the post title and the post URL, all of which are easy enough to obtain. We'll keep them in this small F# record:

1: 
type Entry = { Date : string; Author : string; Title : string; Url : string }

However, before we start implementing this, we'll have a look at a special case from 2015:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
<tr>
  <td> Dec 05 (Saturday)</td>
  <td> <a href="https://twitter.com/ScottWlaschin/status/658388060032401412">Scott Wlaschin</a></td>
  <td>
    <a href="http://fsharpforfunandprofit.com/posts/13-ways-of-looking-at-a-turtle/">Thirteen ways of looking at a turtle</a><br />
    <a href="http://fsharpforfunandprofit.com/posts/13-ways-of-looking-at-a-turtle-2/">Thirteen ways of looking at a turtle (part 2)</a><br />
    <a href="http://fsharpforfunandprofit.com/posts/13-ways-of-looking-at-a-turtle-3/">Thirteen ways of looking at a turtle &#8211; addendum</a>
  </td>
</tr>

Scott had so much content (as usual, one might say) that a single post would have gotten far, far too long, so he published several installments, all of which are listed for his one calendar entry. The proper way to handle that in our feed is probably creating an entry for each installment, so we need to be able to extract all the links and then treat them as if they were separate rows.

There are also rows in the table that we want to ignore - the header row - entry rows without actual links to articles, either because they're simply in the future, and the articles don't exist yet, or because they contain a "coming soon" notice, like the one for this post at the time of writing

If we look at a row, it's always a <tr> node containing three <td> nodes, the first of which is the date as simple text, the second the author's name in a link to the tweet or message announcing their participation, and the third contains the article links as <a> tags, if there are any.

A delay notice in the third column starts with a <span> tag to change the text color:

1: 
2: 
3: 
4: 
5: 
<tr>
  <td>Dec 20 (Tue)</td>
  <td><a href="https://twitter.com/TeaDrivenDev/status/790270967356555268">Тэ дрэвэт утвикλэрэн</a></td>
  <td><span style="color:#ff6600;">slightly delayed, coming <a href="https://twitter.com/TeaDrivenDev/status/811208426562945029">soon</a>&#8230;</span></td>
</tr>

We'll read the content of the cells with Active Patterns, because F# programmers love Active Patterns (I know I do).

And they actually make things nice and clear in this case, as we'll see.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
31: 
32: 
33: 
34: 
35: 
36: 
37: 
38: 
39: 
40: 
41: 
42: 
43: 
44: 
45: 
46: 
47: 
48: 
49: 
50: 
51: 
52: 
53: 
54: 
55: 
56: 
57: 
// HtmlNode.InnerText() returns the visible text of a node, ignoring any "meta" information like
// links, so we can use this to extract both the date and the author's name.
let (|TextValue|) (n : HtmlNode) = n.InnerText().Trim()

// If the <td> node's first descendant is a <span>, this is a delay notice, so there are no
// articles here. Otherwise we'll assume there will be 1..n links to articles in this cell,
// given in the normal HTML way in <a> tags, from which we'll need the "href" attribute and the
// InnerText respectively.
let (|Posts|) (n : HtmlNode) =
    match n.Descendants() |> Seq.tryHead with
    | Some node when node.Name() = "span" -> Seq.empty
    | _ ->
        n.Descendants("a")
        |> Seq.map (fun link -> link.Attribute("href").Value(), link.InnerText())

// This creates an Entry record with the date and author from the first two columns for each
// article found in the third column. All the non-article table rows are implicitly ignored here
// because there are no link/title tuples returned for them.
let parseRow (row : HtmlNode) =
    match row.Descendants("td") |> Seq.toList with
    | [ TextValue date; TextValue name; Posts posts ] ->
        posts
        |> Seq.map (fun (url, title) ->
            { Date = date.Substring(0, 6); Author = name; Title = title; Url = url })
    | _ -> Seq.empty

// A few small helper functions to make processing the HTML more readable
let descendantsNamed name (node : HtmlNode) =
    node |> HtmlNode.descendants false (fun n -> n.HasName name)

let firstDescendantNamed name node =
    descendantsNamed name node |> Seq.head

let innerText (node : HtmlNode) = node.InnerText()

// In order to be able to easily use this script for processing the calendar overview pages for
// different years, we also need to extract the year.
let getYear body =
    let title = body |> firstDescendantNamed "article" |> firstDescendantNamed "h1" |> innerText

    let xmatch = System.Text.RegularExpressions.Regex.Match(title, @"\d{4}$")
    xmatch.Captures.[0].Value |> int

// This creates the Entry records for the whole HTML page. Note that we use the Page type we've
// created  above, but load the HTML file that we specify here, so we can just as well use this to
// parse the page for 2015's advent calendar.
let getEntries body =
    body
    |> firstDescendantNamed "table"
    |> firstDescendantNamed "tbody"
    |> descendantsNamed "tr"
    |> Seq.collect parseRow

let getValues (pageUrl : string) =
    let body = Page.Load(pageUrl).Html.Body()

    getYear body, getEntries body

Let's look what that gets us.

1: 
2: 
3: 
4: 
let year, entries =
    let year, entries = getValues OverviewUrl

    year, entries |> Seq.toList

Building

Now that we have all the information, we can reassemble it into the RSS feed.

The basic XML for an RSS feed looks like this:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
<rss version="2.0">
  <channel>
    <title><!-- The title to show in RSS readers --></title>
    <link><!-- The address of the site the feed is for, e.g. the home page of a news website --></link>
    <description><!-- A potentially longer description --></description>
    <pubDate><!-- The original publication date --></pubDate>
    <lastBuildDate><!-- The date the feed was last modified --></lastBuildDate>
    <language><!-- The content language, of course --></language>
  </channel>
</rss>

Inside the channel tag, there will be the individual items, each looking like this:

1: 
2: 
3: 
4: 
5: 
6: 
<item>
  <title><!-- The title to be displayed in readers --></title>
  <link><!-- The link to the actual content --></link>
  <guid><!-- A globally unique ID, apparently needs to be a URL --></guid>
  <pubDate><!-- The publication date --></pubDate>
</item>

We could use format strings here, but as I don't feel like dealing with multiline strings, it's going to be XLINQ.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
31: 
32: 
33: 
34: 
35: 
36: 
37: 
38: 
39: 
40: 
41: 
42: 
43: 
44: 
45: 
46: 
47: 
48: 
49: 
50: 
51: 
52: 
53: 
54: 
55: 
56: 
57: 
58: 
59: 
60: 
61: 
62: 
63: 
64: 
65: 
66: 
67: 
#r "System.Xml.Linq"

open System
open System.Linq
open System.Xml.Linq


let xn name = XName.Get name

// As we will get the unencoded characters for apostrophes etc. from the HTML, we'll use this small
// function to make sure we have our text properly encoded for use in XML.
let xmlEncode s =
    let node = Xml.XmlDocument().CreateElement "root"
    node.InnerText <- s
    node.InnerXml

// As there are sometimes posts on January 1st of the new year, we need to make sure they get the
// correct publication date as well.
let getEntryYear baseYear (date : string) =
    match date.Split(' ') |> Array.head with
    | "Jan" -> baseYear + 1
    | _ -> baseYear

// Dates in RSS must be in RFC 1123 format
let encodeDate (date : DateTime) = date.ToString "r"

let entryXml baseYear entry =
    XElement(xn "item",
        XElement(xn "title", sprintf "%s | %s | %s" entry.Date (xmlEncode entry.Author) (xmlEncode entry.Title)),
        XElement(xn "link", entry.Url),
        XElement(xn "guid", entry.Url),
        XElement(xn "pubDate", sprintf "%s %i" entry.Date (getEntryYear baseYear entry.Date) |> DateTime.Parse |> encodeDate))


let emptyFeed year =
    let xe =
        let date = encodeDate DateTime.Now

        let title = sprintf "F# Advent Calendar %i" year

        XElement(xn "rss",
            XAttribute(xn "version", "2.0"),
            XElement(xn "channel",
                XElement(xn "title", title),
                XElement(xn "link", OverviewUrl),
                XElement(xn "description",  title),
                XElement(xn "pubDate", date),
                XElement(xn "lastBuildDate", date),
                XElement(xn "language", "en")))

    xe

let addEntries (feed : XElement) (entries : XElement seq) =
    entries
    |> Seq.toArray
    |> feed.Descendants(xn "channel").First().Add

    feed

let createFeed url =
    let year, entries = getValues url

    year,
    entries
    |> Seq.rev // put the latest entry first in the feed
    |> Seq.map (entryXml year)
    |> addEntries (emptyFeed year)

Writing

That was all pretty straightforward. Now that we can create the feed XML, we only need to write the actual XML file to disk.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
open System.IO

let repositoryDir = @"D:\Development\Projects\Active\TeaDrivenDev.github.io"

let rssFile = @"public\fsadvent%i.rss"

let writeFeed fileName feed =
    File.WriteAllText(fileName, string feed)

OverviewUrl
|> createFeed
|> (fun (year, feed) -> writeFeed (Path.Combine(repositoryDir, sprintf (Printf.StringFormat<_> rssFile) year)) feed)

Now that we have created the RSS file(s), the only thing left is publishing it. Depending on where we do that, it might require different steps, but for the typical things like FTP upload or publishing via Git, FAKE makes that easy anyway. I'm putting the feed on my GitHub Pages site, so we need to commit it to Git. We use Paket to add FakeLib to our project, then reference it in our script, and add the calls to stage and commit the RSS file(s). We could also push to GitHub from here, but I prefer to do that by hand.

1: 
2: 
3: 
4: 
5: 
6: 
#r @"..\packages\FAKE\tools\FakeLib.dll"

open Fake.Git.CommandHelper

gitCommand repositoryDir "add *.rss"
gitCommand repositoryDir "commit -m \"Updated F# advent calendar RSS feed\""

Not Dea.... Done Yet

That's it, essentially. For the (in this case short remaining) duration of the F# Advent Calendar, we run this once a day or so and publish the updated feed to GitHub.

This script can process the overview pages for 2015 and 2016; the one for 2014 is a little different and would need special treatment.

There is one very annoying problem that we are going to have with the feed specifically if (as with me) Feedly is being used as the RSS aggregator. When subscribing to a feed, Feedly only pulls the first 10 items (which in our case are the latest ones, since we've reversed the order). If we're adding feed with over 50 entries at once now, we'd lose most of the articles. There is a workaround, though, but it requires control over publishing the feed: Updating a feed in Feedly appears to have no limit on the number of items. That means if we initially create the feed with only 10 items, publish it and add it to Feedly, and only then generate the rest of the items and publish again, Feedly will correctly pick up all the entries. So for the first time creating the feed, we need a slightly modified version of the createFeed function:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
let createFeed' url =
    let year, entries = getValues url

    entries
    |> Seq.truncate 10
    |> Seq.rev
    |> Seq.map (entryXml year)
    |> addEntries (emptyFeed year)

For all subsequent generation runs, we switch back to the original function without the Seq.truncate call.

There's still one issue with that solution that I haven't been able to fix, though: Feedly will only give those first 10 entries their actual pubDate values from the RSS file, and date all others as they arrive. That means that in the case of 2016, we'll get correctly dated entries until about the beginning of December, and after that, there will be 40+ articles with the date when we first publish the full feed. But as we have the date in the title for each entry, and the specific dates generally shouldn't have any relevance for the articles' content, that's probably something we can live with.

Possible improvements

  • Make this a webservice as mentioned at the beginning, to get rid of the manual updating
  • Get the actual publication dates and descriptions/content for the individual posts. That would require following the link to each post, getting the RSS feed for the respective blog, finding the entry for the post and more or less copying the XML from there.
  • Probably other things I'm currently not thinking of

End

The 2016 feed is at https://teadrivendev.github.io/public/fsadvent2016.rss; I will update it for the few remaining days of the 2016 calendar. So in case you find this useful (and don't use Feedly), you can just use that.

The raw script code from this article can be found in this gist.

namespace Microsoft.FSharp.Data
Multiple items
type LiteralAttribute =
  inherit Attribute
  new : unit -> LiteralAttribute

Full name: Microsoft.FSharp.Core.LiteralAttribute

--------------------
new : unit -> LiteralAttribute
Multiple items
val string : value:'T -> string

Full name: Microsoft.FSharp.Core.Operators.string

--------------------
type string = System.String

Full name: Microsoft.FSharp.Core.string
module Seq

from Microsoft.FSharp.Collections
union case Option.Some: Value: 'T -> Option<'T>
val empty<'T> : seq<'T>

Full name: Microsoft.FSharp.Collections.Seq.empty
val map : mapping:('T -> 'U) -> source:seq<'T> -> seq<'U>

Full name: Microsoft.FSharp.Collections.Seq.map
val toList : source:seq<'T> -> 'T list

Full name: Microsoft.FSharp.Collections.Seq.toList
val head : source:seq<'T> -> 'T

Full name: Microsoft.FSharp.Collections.Seq.head
namespace System
namespace System.Text
namespace System.Text.RegularExpressions
Multiple items
type Regex =
  new : pattern:string -> Regex + 1 overload
  member GetGroupNames : unit -> string[]
  member GetGroupNumbers : unit -> int[]
  member GroupNameFromNumber : i:int -> string
  member GroupNumberFromName : name:string -> int
  member IsMatch : input:string -> bool + 1 overload
  member Match : input:string -> Match + 2 overloads
  member Matches : input:string -> MatchCollection + 1 overload
  member Options : RegexOptions
  member Replace : input:string * replacement:string -> string + 5 overloads
  ...

Full name: System.Text.RegularExpressions.Regex

--------------------
System.Text.RegularExpressions.Regex(pattern: string) : unit
System.Text.RegularExpressions.Regex(pattern: string, options: System.Text.RegularExpressions.RegexOptions) : unit
System.Text.RegularExpressions.Regex.Match(input: string, pattern: string) : System.Text.RegularExpressions.Match
System.Text.RegularExpressions.Regex.Match(input: string, pattern: string, options: System.Text.RegularExpressions.RegexOptions) : System.Text.RegularExpressions.Match
Multiple items
val int : value:'T -> int (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.int

--------------------
type int = int32

Full name: Microsoft.FSharp.Core.int

--------------------
type int<'Measure> = int

Full name: Microsoft.FSharp.Core.int<_>
val collect : mapping:('T -> #seq<'U>) -> source:seq<'T> -> seq<'U>

Full name: Microsoft.FSharp.Collections.Seq.collect
namespace System.Linq
namespace System.Xml
namespace Microsoft.FSharp.Linq
module Array

from Microsoft.FSharp.Collections
val sprintf : format:Printf.StringFormat<'T> -> 'T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.sprintf
Multiple items
val seq : sequence:seq<'T> -> seq<'T>

Full name: Microsoft.FSharp.Core.Operators.seq

--------------------
type seq<'T> = System.Collections.Generic.IEnumerable<'T>

Full name: Microsoft.FSharp.Collections.seq<_>
val toArray : source:seq<'T> -> 'T []

Full name: Microsoft.FSharp.Collections.Seq.toArray
namespace System.IO
module Printf

from Microsoft.FSharp.Core
type StringFormat<'T,'Result> = Format<'T,unit,string,'Result>

Full name: Microsoft.FSharp.Core.PrintfModule.StringFormat<_,_>
val truncate : count:int -> source:seq<'T> -> seq<'T>

Full name: Microsoft.FSharp.Collections.Seq.truncate

Pattern Matching (not only) on Record Fields

Quintuple Ways Of Matching Tuples

Everyone who has used F# pattern matching knows how powerful it is, but interestingly it is often overlooked that all the different aspects are always equally applicable.

Destructuring tuples is something every F# programmer is probably well versed in in all its forms:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
// simple destructuring assignment
let getName (nameTuple : string * string) =
    let firstName, lastName = nameTuple // firstName and lastName now contain the two parts of the tuple
    sprintf "%s %s" firstName lastName

// destructuring directly in a method signature
let printName (firstName, lastName) =
    // again, firstName and lastName now contain the two parts of the tuple that was passed into the function
    printfn "Name: %s, %s" lastName firstName

// destructuring in a lambda
let printNameLambda = fun (firstName, lastName) -> printfn "Name: %s, %s" lastName firstName

// destructuring in a match expression
let getTitle nameTuple =
    match nameTuple with
    | "Don", "Syme" -> "Creator of F#"
    | firstName, lastName -> sprintf "Someone else named %s %s" firstName lastName

// destructuring in a for loop
// (This one may be a little less well known; I learned that from Dave Fancher's "The Book of F#".)
let printNames (names : (string * string) seq) =
    for firstName, lastName in names do
        printfn "Person named %s %s" firstName lastName

....you get the picture. Did I forget something important?

Now, in all those cases (there may be rare exceptions that I'm not presently aware of), you can also use the other forms of pattern matching, such as Active Patterns.

Yes, you can use Active Patterns in function signatures.

You can also use a type test in a for loop:

1: 
for :? string as x in Seq.empty<obj> do printfn "%s" x

(The compiler will notify you that there might be unmatched cases that would then be ignored - so this works like LINQ's .OfType<'T>().)

Record Breaking Pattern Matching

Something that seems to be a little less well known is pattern matching on the fields of record types, both in the form of destructuring assignments and actual comparison matches, but that works just as well in all those cases. The syntax for it is the same as if you were constructing a record of the respective type, except that it is not required to use all the fields of the type.

Let's say we have turned the above name tuple into a a proper type:

1: 
type Person = { FirstName : string; LastName : string; Age : int }

We can use this now in all the ways we used the tuple above:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
// simple destructuring assignment
let getNameFromRecord person =
    let { FirstName = firstName; LastName = lastName } = person
    sprintf "%s %s" firstName lastName
    
// destructuring directly in a method signature
let printNameFromRecord { FirstName = firstName; LastName = lastName } =
    // again, firstName and lastName now contain the two parts of the tuple that was passed into the function
    printfn "Name: %s, %s" lastName firstName

// destructuring in a lambda
let printNameLambdaFromRecord =
    fun { FirstName = firstName; LastName = lastName } ->
        printfn "Name: %s, %s" lastName firstName

// destructuring in a match expression
let getTitleFromRecord person =
    match person with
    | { FirstName = "Don"; LastName = "Syme"} -> "Creator of F#"
    | { FirstName = firstName; LastName = lastName; Age = age } ->
        sprintf "Someone else named %s %s, aged %i" firstName lastName age

// destructuring in a for loop
let printNamesFromRecord persons =
    for { FirstName = firstName; LastName = lastName } in persons do
        printfn "Person named %s %s" firstName lastName

Now, while this is pretty neat, I don't find myself using it a lot, which is in part due to the fact that when for example matching the argument of a lambda this way, you lose the reference to the record itself.

'as' Is It

When I wrote this on Twitter, Shane Charles had the right answer: You can always additionally bind the original value to a name with the as keyword. Always.

The most common use of this concept in pattern matching are probably type tests:

1: 
2: 
3: 
4: 
5: 
try
    ()
with
| :? System.OutOfMemoryException as ex -> printfn "Out of memory: %s" ex.Message
| ex -> printfn "Something else happened: %s" ex.Message

But this really works pretty much everywhere (even though it might not always make sense to use it):

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
// destructuring directly in a method signature
let printNameFromRecordTee ({ FirstName = firstName; LastName = lastName } as person) =
    // again, firstName and lastName now contain the two parts of the tuple that was passed into the function
    printfn "Name: %s, %s" lastName firstName
    person

// destructuring in a lambda
let printNameLambdaFromRecordTee =
    fun ({ FirstName = firstName; LastName = lastName } as person) ->
        printfn "Name: %s, %s" lastName firstName
        person

// destructuring in a match expression
let getTitleFromRecordWithAge person =
    match person with
    | { FirstName = "Don"; LastName = "Syme"} -> "Creator of F#"
    | { FirstName = firstName; LastName = lastName } as p ->
        sprintf "Someone else named %s %s, aged %i" firstName lastName p.Age

// destructuring in a for loop
let printSymesFromREcord persons =
    for { FirstName = firstName; LastName = "Syme" } as syme in persons do
        printfn "Syme named %s, aged %i" firstName syme.Age

That last one again gives us a compiler warning about incomplete matches, but anyone not from the Syme family will simply be ignored.

By the way, you can of course also use as with Active Patterns:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
let (|FizzBuzz|_|) i = 
    match i % 3, i % 5 with
    | 0, 0 -> Some "FizzBuzz"
    | 0, _ -> Some "Fizz"
    | _, 0 -> Some "Buzz"
    | _ -> None

let fizzBuzz = function
    | FizzBuzz s as x -> s, x
    | _ as x -> "", x

While this is a bit nonsensical, it demonstrates the point: The fizzBuzz function has type int -> string * int, and while the signature does not actually give a reference to the integer argument, we can still obtain it from the match expression.

Did you know about pattern matching on record fields? If you didn't, you too (along with Reid Evans and John Azariah) are now one of today's 10.000!

(via XKCD)


Written on my HP ZBook with Ionide and Visual Studio Code

Multiple items
val string : value:'T -> string

Full name: Microsoft.FSharp.Core.Operators.string

--------------------
type string = System.String

Full name: Microsoft.FSharp.Core.string
val sprintf : format:Printf.StringFormat<'T> -> 'T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.sprintf
val printfn : format:Printf.TextWriterFormat<'T> -> 'T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
Multiple items
val seq : sequence:seq<'T> -> seq<'T>

Full name: Microsoft.FSharp.Core.Operators.seq

--------------------
type seq<'T> = System.Collections.Generic.IEnumerable<'T>

Full name: Microsoft.FSharp.Collections.seq<_>
module Seq

from Microsoft.FSharp.Collections
val empty<'T> : seq<'T>

Full name: Microsoft.FSharp.Collections.Seq.empty
type obj = System.Object

Full name: Microsoft.FSharp.Core.obj
Multiple items
val int : value:'T -> int (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.int

--------------------
type int = int32

Full name: Microsoft.FSharp.Core.int

--------------------
type int<'Measure> = int

Full name: Microsoft.FSharp.Core.int<_>
namespace System
Multiple items
type OutOfMemoryException =
  inherit SystemException
  new : unit -> OutOfMemoryException + 2 overloads

Full name: System.OutOfMemoryException

--------------------
System.OutOfMemoryException() : unit
System.OutOfMemoryException(message: string) : unit
System.OutOfMemoryException(message: string, innerException: exn) : unit
union case Option.Some: Value: 'T -> Option<'T>
union case Option.None: Option<'T>

Making Busy Progress in F#

This post is one of the December 11th entries in the English language 2015 F# Advent Calendar.

Apologies

  1. This is way, way, way longer than I indended it to be. Sorry.
  2. I started overhauling the blog style a while back, but obviously never completed it, which is why the code blocks look really strange, and also, and more annoyingly, the compiler tooltips currently don't work. I do need to fix that.

Introduction

As much as we like our software to be fast, and although computers get ever more powerful and paralleloriz0red, still not everything can happen instantly. There are long-running calculations, file I/O, network operations, Thread.Sleep and other things that require the user to wait for their completion before they can use the result in any way.

It is considered common courtesy these days to not just let the application stall until an operation is completed and expect patience and understanding from the user; instead we'd like to let them know that something is happening and there is still hope that this "happening" will be over in a finite amount of time. And if we have any way of telling at which point of the process we are, we'll also want to pass that information on to the user, so they can plan the rest of their day accordingly.

Busify!

The overall premise for what we're doing here is that a) we have a desktop application, say in WPF, and b) the "result" of our long-running operation is a state change in the application. The latter isn't very functional, because seen from our operation, it is a side effect, but that is how .NET desktop applications usually work.

First we define a basic type to describe our concept of "busy":

1: 
2: 
3: 
4: 
5: 
6: 
7: 
type IBusy =
    abstract member IsBusy : bool with get, set
    
    // Setting this to `None` means we don't know what the actual progress is; we're just "busy" somehow.
    // UI controls like progress bars often have an "indeterminate" state for things like this.
    abstract member ProgressPercentage : float option with get, set
    abstract member BusyMessage : string with get, set

The implementation of IBusy will likely be something like a ViewModel, which will then take care of displaying a progress bar, a text control with the busy message and maybe some kind of overlay to lock down the application until our operation is complete - if we're loading new data for example, it probably makes no sense to let the user work in the application in the mean time. When we're done, we set IsBusy to false, and the application can be used again.

Using this might look like this:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
busy.IsBusy <- true
busy.BusyMessage <- "Processing things"
busy.ProgressPercentage <- None

dataThings
|> Seq.filter (fun thing -> thing.Value.GetHashCode() <> 12)
|> Seq.map (processDataThing >> string)
|> Seq.toArray
|> asSnd "C:\DataThingResults\result.txt"
|> File.WriteAllLines 

busy.IsBusy <- false

This is of course the most naive way possible, and it requires doing everything by hand correctly every time we want to do something "busy".

That will get annoying pretty quickly, so we'll put the whole IBusy handling in a function and then pass it the operation we want to execute:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
let doBusy (busy : IBusy) busyMessage operation =
    busy.BusyMessage <- busyMessage
    busy.IsBusy <- true
    busy.ProgressPercentage <- None

    operation()

    busy.IsBusy <- false

That's better - it takes all the manual work from us. We only need to say "do operation while telling the user you're busy, and when the operation is through, signal the user you're done". This is nice and declarative and really the minimum work we can have with this.

But.... it doesn't work. Well, it does, but not as we'd like it to. I said above that we want to "not just let the application stall until an operation is completed and expect patience and understanding from the user" - but that is what will happen. Why? Because we're just running everything on the current thread, which in a desktop application will usually be the UI thread.

If we're in WPF, chances are the thread will be blocked by operation before it has even managed to properly show the "busy" indication. And even if the user sees that, the whole application will now be frozen; the Windows task manager might even say "not responding", until at some point our operation completes and unblocks the thread. That is not very friendly.

Thread Carefully

There are a number of ways to solve this in .NET, from "manual" threading control and BackgroundWorker to Tasks and C#'s async/await - and in F# we have the nicest of them all: asynchronous workflows. They all work a bit differently, and not with all of them, what we want to do is straightforward to achieve - with manual threading control via the Thread class for example, it would be quite an ordeal (but then, pretty much everything is).

One thing that is different with F# async workflows from, say, BackgroundWorker or async/await is that we have to explicitly switch to a background thread, because it doesn't happen automatically. The upside is that we have that control, unlike the other implementations.

Let's look at a doBusyBackground implementation of our function:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
let doBusyBackground (busy : IBusy) busyMessage operation =
    busy.BusyMessage <- busyMessage
    busy.IsBusy <- true
    busy.ProgressPercentage <- None

    async {
        let context = System.Threading.SynchronizationContext.Current
        do! Async.SwitchToThreadPool()
        
        do! async { operation() }
        
        do! Async.SwitchToContext context
        
        busy.IsBusy <- false
    }
    |> Async.StartImmediate

As I said, we have control over which thread we're running on, and we have to exert it; that's why we first save the current synchronization context and restore it later on.

Note that even though our operation function may have no asynchronous aspects in itself, we have to call it in an asynchronous workflow to have it executed on the thread we switched to on the Async class.

But look at how easy that was! We just wrap something in an async { } block (some people claim it's a monad, but I refuse to accept that), and we can "await" it on a non-blocking fashion, which is what the do! accomplishes. It needs to be said that in this case operation has the type unit -> unit, that means it is a completely self-contained effectful operation and returns no useful value. We will later see that we can just as well return values of any type from asynchronous workflows.

Async.StartImmediate is of type Async<unit> -> unit, that means it takes an asynchronous operation and runs it in a "fire-and-forget" fashion while itself returning immediately, but without a useful return value.

This current solution is a bit inflexible, though. Our operation will always run on the ThreadPool all the time, which may not be what we want, for example when we need to update UI elements at certain points. Of course we could capture the UI context when we construct the function, but that would start getting a bit convoluted.

Let's split out moving an operation to the ThreadPool from our doBusy function (on the side we will also introduce a default busy message that should be correct in a lot of cases and that we can replace when we really need to):

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
// This is essentially a glorified `string option`, but naming the cases makes it clearer what this is about.
// And thanks to F#'s syntax it's almost free anyway.
type BusyMessage = DefaultMessage | Message of string

let doBusyAsync (busy : IBusy) busyMessage operation =
    busy.BusyMessage <-
        match busyMessage with
        | Message message -> message
        | DefaultMessage -> "Updating information"
    busy.IsBusy <- true
    busy.ProgressPercentage <- None

    async {
        do! operation
        busy.IsBusy <- false
    }
    |> Async.StartImmediate

let onThreadPool operation =
    async {
        let context = System.Threading.SynchronizationContext.Current
        do! Async.SwitchToThreadPool()

        let! result = operation

        do! Async.SwitchToContext context

        return result
    }

If we look at the type of the operation argument of doBusyAsync, it has changed from unit -> unit in doBusyBackground to Async<unit>, that means it has to come "pre-wrapped" in an async { } block. That is necessary so we can decide what should go on the ThreadPool and what we want to run on the UI thread (or implement any other threading requirements we may have).

Another thing to note is that we call operation without parentheses now - because it is not a normal F# function anymore, but an asynchronous workflow. "Awaiting" that with do! or let! (which have the same effect, with the difference that let! binds the result to a value we can then keep using, while do!.... doesn't) results it its execution, yielding the 'T value of the Async<'T>. To be exact, nothing inside the async { } blocks actually gets executed until Async.StartImmediate (or one of a number of other Async functions like Async.RunSynchronously or Async.StartWithContinuations) is called. Everything up to that is just setting up the computation.

Our new onThreadPool function is a little more generic than running on the ThreadPool in doBusyBackground was: operation is not a function of type unit -> unit, but also an Async<'a>, which means it returns a value of type 'a (which can still be unit, of course). That means we can not only run "fire-and-forget" style code on the ThreadPool, but we can actually get results back. That sounds like a useful property for our code to have.

It is especially useful when we consider something that should start to become clear when looking at our existing functions that use asynchronous workflows: They are highly composable. We can nest them and chain them, await them without hassle using let! and then use the result as a normal F# value. Did I mention we can make any piece of code "asynchronous" by putting async { } around it?

If we put together what we have now, we can run an operation on a background thread while our application is "busy" as follows:

1: 
2: 
3: 
async { operationFunction() } 
|> onThreadPool
|> doBusyAsync busy DefaultMessage

When operationFunction completes, the "wrapper" asynchronous workflow created by onThreadPool returns the unit value to doBusyAsync, which does nothing with it, but sets busy.IsBusy to false and thus signals the application is idle again.

Progress!

Now, as you may have noticed, all of our doBusy implementations so far set busy.ProgressPercentage to None, because we have no way of way of knowing our actual progress - at least not in doBusy. The actual operation that we run may very well have that knowledge. How do we make use of it?

Let's take our nonsensical "long-running operation" from the start of the post and turn it into a function.

1: 
2: 
3: 
4: 
5: 
6: 
7: 
let processDataThings dataThings =
    dataThings
    |> List.filter (fun thing -> thing.Value.GetHashCode() <> 12)
    |> List.map (processDataThing >> string)
    |> List.toArray
    |> asSnd "C:\DataThingResults\result.txt"
    |> File.WriteAllLines 

We know the total number of items in the list, and we can find a point in our processing where each individual item is processed, so in addition to the processing, we can also report a progress percentage.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
let processDataThingsAndReportPercentage reportProgressPercentage dataThings =
    let totalNumber = List.length dataThings |> float

    let reportProgress i =
        reportProgressPercentage (float i / totalNumber)

    dataThings
    |> List.filter (fun thing -> thing.Value.GetHashCode() <> 12)
    |> List.mapi (fun i thing ->
        reportProgress i
        
        thing |> processDataThing |> string)
    |> List.toArray
    |> asSnd "C:\DataThingResults\result.txt"
    |> File.WriteAllLines 

In the simplest case, the implementation of reportProgressPercentage will just be something like fun percentage -> busy.ProgressPercentage <- Some percentage.

Well, that probably kind of works, but rather looks like handicraft work (German: Gebastel, read: "gabustel"). Especially that List.mapi usage (that we need to be able to tell how many items we have already processed) can't even really be called a workaround anymore. It's just clueless gebastel.

Another issue is that this way we would report our progress for every single item we process - if there are a lot of items (and if there weren't, we wouldn't need a computer to process them), that would mean using significant resources just to update the progress display with a frequency the human eye cannot possibly perceive - probably many thousands of times per second.

The Mailman Cometh

Before we look at a better solution, let's consider something I mentioned at the very beginning: parrale.... pallellara.... running something on multiple cores instead of just one.

Using the FSharp.Collections.ParallelSeq library, which provides an F# friendly API for Parallel LINQ, we can write the above as follows:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
let processDataThingsParallel reportProgress dataThings =
    dataThings
    |> PSeq.filter (fun thing -> thing.Value.GetHashCode() <> 12)
    |> PSeq.map (fun thing ->
        reportProgress 1
        
        thing |> processDataThing |> string)
    |> PSeq.toArray
    |> asSnd "C:\DataThingResults\result.txt"
    |> File.WriteAllLines

Depending on the number of cores available and how well the work to be done is parallelizable, that can provide a massive performance boost (obviously, because that is kind of the point of having multiple cores in the first place). What happens here is that our input data will be split up into several parts, and the parts will be processed in parallel threads, each on its own core. Again depending on what our data and the actual processing look like, there is no way to tell how fast which thread is processing items, so until PSeq.map returns, we never actually know our progress. The only way to get at that would be counting processed items over all threads - and the way we do that is using Interlocked.Increment.

Of course it isn't, but you believed that for a few clock cycles, right?

What we will really be using is a nice facility that F# offers us called MailboxProcessor, F#'s own lightweight implementation of the Actor model. In a very small nutshell, an "actor" is a.... thing that has a queue of messages (its own private one that nobody else can access), processes any incoming messages sequentially (that means it doesn't care about any new messages until it's done with the currently processed one) and generally can only be communicated with through messages sent to its queue.

("MailboxProcessor" is a somewhat odd and unclear name; it is a common habit among F# programmers to alias the type to Agent.)

A very nice property is that we do not have to care about it threading-wise - as long as we only send it messages and don't immediately expect an answer, it is detached from the threads that we have to handle and care about. It achieves that by also using asynchronous workflows and non-blocking waiting for incoming messages using let!, as we've seen before. The functionality of a MailboxProcessor is usually implemented using a tail recursive function that calls itself with the computed state from processing the current message to then wait for the next message.

What we will let our Agent/Actor do is "collect" the progress updates from all the different threads our parallel sequence uses (that we don't even know about ourselves) and update our IBusy in a sensible interval, let's say every half second - that's more than fast enough for a user to get a good understanding of the actual progress.

What we need to tell the MailboxProcessor's processing function for that is the total nuber of items to process - so we can calculate a percentage - and the update interval.

When receiving a message, it will add the number of items reported to the current count, then check if the time passed since last settingbusy.ProgressPercentage is greater than the update interval, and if so, update the progress value. (We will assume for this that the implementation of IBusy will take care of updating the value that's actually bound to the UI on the appropriate thread; otherwise we would need to do that in the MailboxProcessor, which we could, but don't really want to do.)

For what we want to do, our processing logic doesn't need to know about the actual MailboxProcessor, so we will just give it a function of type int -> unit to call, which the MailboxProcessor already provides with its Post() method.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
let getReportProgress updateInterval numberOfItems (busy : IBusy) =
    let calculatePercentage currentValue = float currentValue / float numberOfItems * 100.
    
    let progress = MailboxProcessor<int>.Start(fun queue ->
        let rec loop lastCount (lastPulseTime : DateTimeOffset) =
            async {
                let! increment = queue.Receive()

                let newCount = lastCount + increment

                let pulseNow, newLastPulseTime =
                    let now = DateTimeOffset.Now
                    
                    if now - lastPulseTime > updateInterval
                    then (true, now)
                    else (false, lastPulseTime)

                if pulseNow then
                    newCount
                    |> calculatePercentage
                    |> (fun progressValue -> busy.ProgressPercentage <- Some progressValue)

                return! loop newCount newLastPulseTime
            }

        loop 0 (DateTimeOffset.Now - (updateInterval + (TimeSpan.FromTicks 1L))))
        
    progress.Post

It's Actually Really Simple

Now again let's use all the things we've built together:

1: 
2: 
3: 
4: 
5: 
let reportProgress = getReportProgress (TimeSpan.FromMilliseconds 500.) (Seq.length dataThings) busy

async { processDataThingsParallel reportProgress dataThings }
|> onThreadPool
|> doBusyAsync busy (Message "Processing thingses")

I think that ended up looking pretty tidy; didn't it?

processDataThingsParallel will be run on the ThreadPool and process its items using Parallel LINQ, while we have a "busy" display and get progress updates all the time. And when it's done, the "busy" display will disappear, and the application can be used again.

Remember that all our doBusy implementations set busy.ProgressPercentage to None, so by default, we should have an "indeterminate progress" display, and in case our processing logic actually reports progress, it will then be set to Some p, and we'll have a value to show on screen.

One issue I realized while writing this is that handling reportProgress like this isn't ideal, because it's easy to think "hey, that's a function to use for reporting progress, so now that I have that, I'll use it whenever I have progress to report", but in reality, it is parameterized for exactly this one usage (especially because we have already given it the total number of items), and it has the current count as a piece of internal state that can't be reset. That means it is necessary semantically to create a new reportProgress function every time we want an operation to report its progress, but there is nothing that forces us to do so. So that part of our API leaves room for improvement.

A small note on the side - I always went with the concept of the "busy" display locking the application from user interaction; that of course doesn't have to be. It could just as well simply be a small progress bar in a tool bar or status bar that just shows up and disappears as needed. In that case we'd probably have two different implementations of IBusy - one that actually prevents the user from doing things, and one that doesn't.

Possible Enhancements

Although this has taken some time now, it is actually a rather simple solution (but probably sufficient much of the time) based on something I use in practice. This concept could be expanded upon in various ways, including things like

  • Cancellation of "busy" operations
  • "Batched" busy operations that run in a sequence and each report their own progress, with an additional overall progress value and display
  • Concurrent, but independent busy operations that begin and end each in their own time, with a busy/progress display present as long as at least one busy operation is running

Acknowledgements

Shoutouts to Reid Evans, whose idea the doBusy concept was and who even turned it into a busy { } computation expression (and who incidentally is my F# advent calendar date-mate today), and to Tomas Petricek, who validated the idea of the onThreadPool function.


Written on my HP ZBook with Ionide and Visual Studio Code

namespace System
namespace System.IO
namespace Microsoft.FSharp.Collections
type bool = System.Boolean

Full name: Microsoft.FSharp.Core.bool
val set : elements:seq<'T> -> Set<'T> (requires comparison)

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.set
Multiple items
val float : value:'T -> float (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.float

--------------------
type float = System.Double

Full name: Microsoft.FSharp.Core.float

--------------------
type float<'Measure> = float

Full name: Microsoft.FSharp.Core.float<_>
type 'T option = Option<'T>

Full name: Microsoft.FSharp.Core.option<_>
Multiple items
val string : value:'T -> string

Full name: Microsoft.FSharp.Core.Operators.string

--------------------
type string = System.String

Full name: Microsoft.FSharp.Core.string
union case Option.None: Option<'T>
type obj = System.Object

Full name: Microsoft.FSharp.Core.obj
module Seq

from Microsoft.FSharp.Collections
val empty<'T> : seq<'T>

Full name: Microsoft.FSharp.Collections.Seq.empty
val filter : predicate:('T -> bool) -> source:seq<'T> -> seq<'T>

Full name: Microsoft.FSharp.Collections.Seq.filter
val map : mapping:('T -> 'U) -> source:seq<'T> -> seq<'U>

Full name: Microsoft.FSharp.Collections.Seq.map
val toArray : source:seq<'T> -> 'T []

Full name: Microsoft.FSharp.Collections.Seq.toArray
val async : AsyncBuilder

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.async
namespace System.Threading
Multiple items
type SynchronizationContext =
  new : unit -> SynchronizationContext
  member CreateCopy : unit -> SynchronizationContext
  member IsWaitNotificationRequired : unit -> bool
  member OperationCompleted : unit -> unit
  member OperationStarted : unit -> unit
  member Post : d:SendOrPostCallback * state:obj -> unit
  member Send : d:SendOrPostCallback * state:obj -> unit
  member Wait : waitHandles:nativeint[] * waitAll:bool * millisecondsTimeout:int -> int
  static member Current : SynchronizationContext
  static member SetSynchronizationContext : syncContext:SynchronizationContext -> unit

Full name: System.Threading.SynchronizationContext

--------------------
System.Threading.SynchronizationContext() : unit
property System.Threading.SynchronizationContext.Current: System.Threading.SynchronizationContext
Multiple items
type Async
static member AsBeginEnd : computation:('Arg -> Async<'T>) -> ('Arg * AsyncCallback * obj -> IAsyncResult) * (IAsyncResult -> 'T) * (IAsyncResult -> unit)
static member AwaitEvent : event:IEvent<'Del,'T> * ?cancelAction:(unit -> unit) -> Async<'T> (requires delegate and 'Del :> Delegate)
static member AwaitIAsyncResult : iar:IAsyncResult * ?millisecondsTimeout:int -> Async<bool>
static member AwaitTask : task:Task<'T> -> Async<'T>
static member AwaitWaitHandle : waitHandle:WaitHandle * ?millisecondsTimeout:int -> Async<bool>
static member CancelDefaultToken : unit -> unit
static member Catch : computation:Async<'T> -> Async<Choice<'T,exn>>
static member FromBeginEnd : beginAction:(AsyncCallback * obj -> IAsyncResult) * endAction:(IAsyncResult -> 'T) * ?cancelAction:(unit -> unit) -> Async<'T>
static member FromBeginEnd : arg:'Arg1 * beginAction:('Arg1 * AsyncCallback * obj -> IAsyncResult) * endAction:(IAsyncResult -> 'T) * ?cancelAction:(unit -> unit) -> Async<'T>
static member FromBeginEnd : arg1:'Arg1 * arg2:'Arg2 * beginAction:('Arg1 * 'Arg2 * AsyncCallback * obj -> IAsyncResult) * endAction:(IAsyncResult -> 'T) * ?cancelAction:(unit -> unit) -> Async<'T>
static member FromBeginEnd : arg1:'Arg1 * arg2:'Arg2 * arg3:'Arg3 * beginAction:('Arg1 * 'Arg2 * 'Arg3 * AsyncCallback * obj -> IAsyncResult) * endAction:(IAsyncResult -> 'T) * ?cancelAction:(unit -> unit) -> Async<'T>
static member FromContinuations : callback:(('T -> unit) * (exn -> unit) * (OperationCanceledException -> unit) -> unit) -> Async<'T>
static member Ignore : computation:Async<'T> -> Async<unit>
static member OnCancel : interruption:(unit -> unit) -> Async<IDisposable>
static member Parallel : computations:seq<Async<'T>> -> Async<'T []>
static member RunSynchronously : computation:Async<'T> * ?timeout:int * ?cancellationToken:CancellationToken -> 'T
static member Sleep : millisecondsDueTime:int -> Async<unit>
static member Start : computation:Async<unit> * ?cancellationToken:CancellationToken -> unit
static member StartAsTask : computation:Async<'T> * ?taskCreationOptions:TaskCreationOptions * ?cancellationToken:CancellationToken -> Task<'T>
static member StartChild : computation:Async<'T> * ?millisecondsTimeout:int -> Async<Async<'T>>
static member StartChildAsTask : computation:Async<'T> * ?taskCreationOptions:TaskCreationOptions -> Async<Task<'T>>
static member StartImmediate : computation:Async<unit> * ?cancellationToken:CancellationToken -> unit
static member StartWithContinuations : computation:Async<'T> * continuation:('T -> unit) * exceptionContinuation:(exn -> unit) * cancellationContinuation:(OperationCanceledException -> unit) * ?cancellationToken:CancellationToken -> unit
static member SwitchToContext : syncContext:SynchronizationContext -> Async<unit>
static member SwitchToNewThread : unit -> Async<unit>
static member SwitchToThreadPool : unit -> Async<unit>
static member TryCancelled : computation:Async<'T> * compensation:(OperationCanceledException -> unit) -> Async<'T>
static member CancellationToken : Async<CancellationToken>
static member DefaultCancellationToken : CancellationToken

Full name: Microsoft.FSharp.Control.Async

--------------------
type Async<'T>

Full name: Microsoft.FSharp.Control.Async<_>
static member Async.SwitchToThreadPool : unit -> Async<unit>
static member Async.SwitchToContext : syncContext:System.Threading.SynchronizationContext -> Async<unit>
static member Async.StartImmediate : computation:Async<unit> * ?cancellationToken:System.Threading.CancellationToken -> unit
Multiple items
module List

from Microsoft.FSharp.Collections

--------------------
type List<'T> =
  | ( [] )
  | ( :: ) of Head: 'T * Tail: 'T list
  interface IEnumerable
  interface IEnumerable<'T>
  member Head : 'T
  member IsEmpty : bool
  member Item : index:int -> 'T with get
  member Length : int
  member Tail : 'T list
  static member Cons : head:'T * tail:'T list -> 'T list
  static member Empty : 'T list

Full name: Microsoft.FSharp.Collections.List<_>
val filter : predicate:('T -> bool) -> list:'T list -> 'T list

Full name: Microsoft.FSharp.Collections.List.filter
val map : mapping:('T -> 'U) -> list:'T list -> 'U list

Full name: Microsoft.FSharp.Collections.List.map
val toArray : list:'T list -> 'T []

Full name: Microsoft.FSharp.Collections.List.toArray
val length : list:'T list -> int

Full name: Microsoft.FSharp.Collections.List.length
val mapi : mapping:(int -> 'T -> 'U) -> list:'T list -> 'U list

Full name: Microsoft.FSharp.Collections.List.mapi
Multiple items
type MailboxProcessor<'Msg> =
  interface IDisposable
  new : body:(MailboxProcessor<'Msg> -> Async<unit>) * ?cancellationToken:CancellationToken -> MailboxProcessor<'Msg>
  member Post : message:'Msg -> unit
  member PostAndAsyncReply : buildMessage:(AsyncReplyChannel<'Reply> -> 'Msg) * ?timeout:int -> Async<'Reply>
  member PostAndReply : buildMessage:(AsyncReplyChannel<'Reply> -> 'Msg) * ?timeout:int -> 'Reply
  member PostAndTryAsyncReply : buildMessage:(AsyncReplyChannel<'Reply> -> 'Msg) * ?timeout:int -> Async<'Reply option>
  member Receive : ?timeout:int -> Async<'Msg>
  member Scan : scanner:('Msg -> Async<'T> option) * ?timeout:int -> Async<'T>
  member Start : unit -> unit
  member TryPostAndReply : buildMessage:(AsyncReplyChannel<'Reply> -> 'Msg) * ?timeout:int -> 'Reply option
  ...

Full name: Microsoft.FSharp.Control.MailboxProcessor<_>

--------------------
new : body:(MailboxProcessor<'Msg> -> Async<unit>) * ?cancellationToken:System.Threading.CancellationToken -> MailboxProcessor<'Msg>
Multiple items
val int : value:'T -> int (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.int

--------------------
type int = int32

Full name: Microsoft.FSharp.Core.int

--------------------
type int<'Measure> = int

Full name: Microsoft.FSharp.Core.int<_>
union case Option.Some: Value: 'T -> Option<'T>
val length : source:seq<'T> -> int

Full name: Microsoft.FSharp.Collections.Seq.length