Visualizing population using WorldBank
In this walkthrough, we look at a larger example of data analysis that uses FnuPlot to visualize the results. We use the WorldBank as the data source and we plot how the total population in the countries of the world changed between 1990 and 2005. The example is inspired by the Asynchrnous and data-driven programming chapter from Real-World Functional Programming.
Downloading data from the WorldBank
In the chapter used as the inspiration, the data download is done using the LINQ to XML library
and the XDocument
type. Here, we use F# type providers instead. We use XML type provider to
parse the data and WorldBank type provider to find the ID of the required indicator.
First, we connect to the WorldBank to get the ID and write a function that sends HTTP request to download one page of data for the specified indicator:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: |
#r "FSharp.Data.dll" open FSharp.Data // Get indicator code from WorldBank let wb = WorldBankData.GetDataContext() let indCode = wb.Countries.World.Indicators.``Population, total``.IndicatorCode let root = "http://api.worldbank.org/countries/indicators/" let key = "hq8byg8k7t2fxc6hp7jmbx26" /// Asynchronously downloads the population /// data for the specified year & page let asyncGetPage year page = let range = sprintf "%d:%d" year year Http.AsyncRequestString ( root + indCode, query=[ "api_key", key; "per_page", "100"; "date", range; "page", string page ] ) |
Now, we want to save a sample response to a local file and use the XML type provider
to parse the response. The easiest way to do this is to use File.WriteAllText
to
save a sample result, say for asyncGetPage 2000 1
to a local file. Here, we use
worldbank.xml
in the current folder, so we can load the type provider as follows:
1: 2: |
#r "System.Xml.Linq.dll" type WB = XmlProvider<"worldbank.xml"> |
The generated type WB
lets us parse the responses returned by asyncGetPage
. To
download all data, we first need to request the first page, so that we know how many
pages in total are there. Then we can download the rest of the pages in parallel. This
is done in the following async
function:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: |
let asyncGetPopulation year = async { // Download the first page & get total no. of pages let! first = asyncGetPage year 1 let parsed = WB.Parse(first) // Download the remaining pages in parallel let! rest = [ for pg in 2 .. parsed.Pages -> async { let! response = asyncGetPage year pg return WB.Parse(response) } ] |> Async.Parallel // Return all pages return Seq.append [parsed] rest } |
The return type of asyncGetPopulation
is Async<seq<WB.Data>>
, which means that it
asynchronously returns a collection of <data>
elements from the XML document. Now,
we use Async.Parallel
again to download data for 3 different years:
1: 2: 3: 4: 5: 6: |
let allData = [ for y in [ 1990; 2000; 2005 ] -> asyncGetPopulation y ] |> Async.Parallel |> Async.RunSynchronously |> Seq.concat |
The operation creates a list of asynchronous operations, composes them so that they are
performed asynchronously, runs them and then it concatenates the results, so that we get
just a single collection of <data>
nodes.
Processing downloaded data
Before we can do the visualization, we need to find the countries for which we have all the data. To do that, we build two lookup tables. One that maps country code and year to a value and another, which maps country code to a country name.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: |
// Lookup table mapping IDs to country names let idToName = [ for d in allData do for item in d.Datas do yield item.Country.Id, item.Country.Value ] |> dict // Lookup table mapping ID and Year to a value let idAndYearToValue = [ for d in allData do for item in d.Datas do if item.Value.IsSome then yield (item.Country.Id, item.Date), item.Value.Value ] |> dict |
Now that we have the two dictionaries, we can create collections of numbers in a format
that can be passed to gnuplot
. First of all, we want to display data only for some of
the countries. We choose countries and regions that have data for all 3 years and have
population over 500 million. This will give us large countries and aggregated regions that
WorldBank monitors as a whole.
Next, we need to pick the names of the regions/countries (for axis labels) and a list of collections with numerical values for each year:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: |
/// Returns true if data is available for all 3 monitored years /// and the country or region has over 500 million let isVisibleCountry id = [1990; 2000; 2005] |> Seq.forall (fun y -> match idAndYearToValue.TryGetValue( (id, y) ) with | true, v -> v > int64 5e8 | _ -> false ) /// Names of visible countries/regions for axis labels let names = [ for KeyValue(id, name) in idToName do if isVisibleCountry id then yield name ] /// List of tuples consisting of a year and data for the year let stats = [ for y in [1990; 2000; 2005] do let data = [ for KeyValue(id, name) in idToName do if isVisibleCountry id then yield float (idAndYearToValue.[id, y]) ] yield y, data ] |
Visualizing data with gnuplot
To create a plot using gnuplot
, we first need to reference the FnuPlot.dll
library, open the required namespace and create GnuPlot
instance. Here, we pass
the path to the gnuplot
executable as an argument. If it is available directly in
your path, you can just call new GnuPlot()
without parameters:
1: 2: 3: 4: 5: |
#r "FnuPlot.dll" open FnuPlot open System.Drawing let gp = new GnuPlot(path) |
To make the chart nicer, we configure a number of options first. We specify the output type to X11 (to create chart in a window) and a font. Then we also set the range and the style of histogram bars (filled). Finally, we add titles and specify that the chart should be rotated (from the bottom to the top):
1: 2: 3: 4: 5: |
gp.Set ( output = Output(X11, font = "arial"), style = Style(Solid), range = RangeY.[ 5e8 .. 75e8 ], titles = Titles(x = names, xrotate = -90) ) |
We want to create histogram using custom colors, so we'll zip the data with the following list of colors:
1:
|
let colors = [ Color.OliveDrab; Color.SteelBlue; Color.Goldenrod ] |
Now we have all we need to create the chart. We use gp.Plot
to display the plot. As an
argument, we give it a collection of series created using Series.Histogram
. When
createing a series, we specify the lineColor
parameter to get the required color (one
color for each year):
1: 2: 3: 4: |
gp.Plot [ for (y, values), clr in Seq.zip stats colors -> Series.Histogram ( data = values, title = string y, lineColor = clr) ] |
Full name: Worldbank.pathOpt
from Microsoft.FSharp.Collections
Full name: Microsoft.FSharp.Collections.Seq.tryFind
static member AppendAllLines : path:string * contents:IEnumerable<string> -> unit + 1 overload
static member AppendAllText : path:string * contents:string -> unit + 1 overload
static member AppendText : path:string -> StreamWriter
static member Copy : sourceFileName:string * destFileName:string -> unit + 1 overload
static member Create : path:string -> FileStream + 3 overloads
static member CreateText : path:string -> StreamWriter
static member Decrypt : path:string -> unit
static member Delete : path:string -> unit
static member Encrypt : path:string -> unit
static member Exists : path:string -> bool
...
Full name: System.IO.File
Full name: Worldbank.path
Full name: Microsoft.FSharp.Core.Operators.defaultArg
Full name: Worldbank.wb
static member GetDataContext : unit -> WorldBankDataService
nested type ServiceTypes
Full name: FSharp.Data.WorldBankData
<summary>Typed representation of WorldBank data. See http://www.worldbank.org for terms and conditions.</summary>
Full name: Worldbank.indCode
The data for country 'World'
<summary>The indicators for the country</summary>
Full name: Worldbank.root
Full name: Worldbank.key
Full name: Worldbank.asyncGetPage
Asynchronously downloads the population
data for the specified year & page
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.sprintf
private new : unit -> Http
static member private AppendQueryToUrl : url:string * query:(string * string) list -> string
static member AsyncRequest : url:string * ?query:(string * string) list * ?headers:seq<string * string> * ?httpMethod:string * ?body:HttpRequestBody * ?cookies:seq<string * string> * ?cookieContainer:CookieContainer * ?silentHttpErrors:bool * ?responseEncodingOverride:string * ?customizeHttpRequest:(HttpWebRequest -> HttpWebRequest) -> Async<HttpResponse>
static member AsyncRequestStream : url:string * ?query:(string * string) list * ?headers:seq<string * string> * ?httpMethod:string * ?body:HttpRequestBody * ?cookies:seq<string * string> * ?cookieContainer:CookieContainer * ?silentHttpErrors:bool * ?customizeHttpRequest:(HttpWebRequest -> HttpWebRequest) -> Async<HttpResponseWithStream>
static member AsyncRequestString : url:string * ?query:(string * string) list * ?headers:seq<string * string> * ?httpMethod:string * ?body:HttpRequestBody * ?cookies:seq<string * string> * ?cookieContainer:CookieContainer * ?silentHttpErrors:bool * ?responseEncodingOverride:string * ?customizeHttpRequest:(HttpWebRequest -> HttpWebRequest) -> Async<string>
static member private InnerRequest : url:string * toHttpResponse:(string -> int -> string -> string -> string -> 'a0 option -> Map<string,string> -> Map<string,string> -> Stream -> Async<'a1>) * ?query:(string * string) list * ?headers:seq<string * string> * ?httpMethod:string * ?body:HttpRequestBody * ?cookies:seq<string * string> * ?cookieContainer:CookieContainer * ?silentHttpErrors:bool * ?responseEncodingOverride:'a0 * ?customizeHttpRequest:(HttpWebRequest -> HttpWebRequest) -> Async<'a1>
static member Request : url:string * ?query:(string * string) list * ?headers:seq<string * string> * ?httpMethod:string * ?body:HttpRequestBody * ?cookies:seq<string * string> * ?cookieContainer:CookieContainer * ?silentHttpErrors:bool * ?responseEncodingOverride:string * ?customizeHttpRequest:(HttpWebRequest -> HttpWebRequest) -> HttpResponse
static member RequestStream : url:string * ?query:(string * string) list * ?headers:seq<string * string> * ?httpMethod:string * ?body:HttpRequestBody * ?cookies:seq<string * string> * ?cookieContainer:CookieContainer * ?silentHttpErrors:bool * ?customizeHttpRequest:(HttpWebRequest -> HttpWebRequest) -> HttpResponseWithStream
static member RequestString : url:string * ?query:(string * string) list * ?headers:seq<string * string> * ?httpMethod:string * ?body:HttpRequestBody * ?cookies:seq<string * string> * ?cookieContainer:CookieContainer * ?silentHttpErrors:bool * ?responseEncodingOverride:string * ?customizeHttpRequest:(HttpWebRequest -> HttpWebRequest) -> string
Full name: FSharp.Data.Http
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.query
val string : value:'T -> string
Full name: Microsoft.FSharp.Core.Operators.string
--------------------
type string = System.String
Full name: Microsoft.FSharp.Core.string
Full name: Worldbank.WB
Full name: FSharp.Data.XmlProvider
<summary>Typed representation of a XML file.</summary>
<param name='Sample'>Location of a XML sample file or a string containing a sample XML document.</param>
<param name='SampleIsList'>If true, the children of the root in the sample document represent individual samples for the inference.</param>
<param name='Global'>If true, the inference unifies all XML elements with the same name.</param>
<param name='Culture'>The culture used for parsing numbers and dates. Defaults to the invariant culture.</param>
<param name='Encoding'>The encoding used to read the sample. You can specify either the character set name or the codepage number. Defaults to UTF8 for files, and to ISO-8859-1 the for HTTP requests, unless `charset` is specified in the `Content-Type` response header.</param>
<param name='ResolutionFolder'>A directory that is used when resolving relative file references (at design time and in hosted execution).</param>
<param name='EmbeddedResource'>When specified, the type provider first attempts to load the sample from the specified resource
(e.g. 'MyCompany.MyAssembly, resource_name.xml'). This is useful when exposing types generated by the type provider.</param>
<param name='InferTypesFromValues'>If true, turns on additional type inference from values.
(e.g. type inference infers string values such as "123" as ints and values constrained to 0 and 1 as booleans. The XmlProvider also infers string values as JSON.)</param>
Full name: Worldbank.asyncGetPopulation
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.async
Parses the specified XML string
type Async
static member AsBeginEnd : computation:('Arg -> Async<'T>) -> ('Arg * AsyncCallback * obj -> IAsyncResult) * (IAsyncResult -> 'T) * (IAsyncResult -> unit)
static member AwaitEvent : event:IEvent<'Del,'T> * ?cancelAction:(unit -> unit) -> Async<'T> (requires delegate and 'Del :> Delegate)
static member AwaitIAsyncResult : iar:IAsyncResult * ?millisecondsTimeout:int -> Async<bool>
static member AwaitTask : task:Task<'T> -> Async<'T>
static member AwaitWaitHandle : waitHandle:WaitHandle * ?millisecondsTimeout:int -> Async<bool>
static member CancelDefaultToken : unit -> unit
static member Catch : computation:Async<'T> -> Async<Choice<'T,exn>>
static member FromBeginEnd : beginAction:(AsyncCallback * obj -> IAsyncResult) * endAction:(IAsyncResult -> 'T) * ?cancelAction:(unit -> unit) -> Async<'T>
static member FromBeginEnd : arg:'Arg1 * beginAction:('Arg1 * AsyncCallback * obj -> IAsyncResult) * endAction:(IAsyncResult -> 'T) * ?cancelAction:(unit -> unit) -> Async<'T>
static member FromBeginEnd : arg1:'Arg1 * arg2:'Arg2 * beginAction:('Arg1 * 'Arg2 * AsyncCallback * obj -> IAsyncResult) * endAction:(IAsyncResult -> 'T) * ?cancelAction:(unit -> unit) -> Async<'T>
static member FromBeginEnd : arg1:'Arg1 * arg2:'Arg2 * arg3:'Arg3 * beginAction:('Arg1 * 'Arg2 * 'Arg3 * AsyncCallback * obj -> IAsyncResult) * endAction:(IAsyncResult -> 'T) * ?cancelAction:(unit -> unit) -> Async<'T>
static member FromContinuations : callback:(('T -> unit) * (exn -> unit) * (OperationCanceledException -> unit) -> unit) -> Async<'T>
static member Ignore : computation:Async<'T> -> Async<unit>
static member OnCancel : interruption:(unit -> unit) -> Async<IDisposable>
static member Parallel : computations:seq<Async<'T>> -> Async<'T []>
static member RunSynchronously : computation:Async<'T> * ?timeout:int * ?cancellationToken:CancellationToken -> 'T
static member Sleep : millisecondsDueTime:int -> Async<unit>
static member Start : computation:Async<unit> * ?cancellationToken:CancellationToken -> unit
static member StartAsTask : computation:Async<'T> * ?taskCreationOptions:TaskCreationOptions * ?cancellationToken:CancellationToken -> Task<'T>
static member StartChild : computation:Async<'T> * ?millisecondsTimeout:int -> Async<Async<'T>>
static member StartChildAsTask : computation:Async<'T> * ?taskCreationOptions:TaskCreationOptions -> Async<Task<'T>>
static member StartImmediate : computation:Async<unit> * ?cancellationToken:CancellationToken -> unit
static member StartWithContinuations : computation:Async<'T> * continuation:('T -> unit) * exceptionContinuation:(exn -> unit) * cancellationContinuation:(OperationCanceledException -> unit) * ?cancellationToken:CancellationToken -> unit
static member SwitchToContext : syncContext:SynchronizationContext -> Async<unit>
static member SwitchToNewThread : unit -> Async<unit>
static member SwitchToThreadPool : unit -> Async<unit>
static member TryCancelled : computation:Async<'T> * compensation:(OperationCanceledException -> unit) -> Async<'T>
static member CancellationToken : Async<CancellationToken>
static member DefaultCancellationToken : CancellationToken
Full name: Microsoft.FSharp.Control.Async
--------------------
type Async<'T>
Full name: Microsoft.FSharp.Control.Async<_>
Full name: Microsoft.FSharp.Collections.Seq.append
Full name: Worldbank.allData
Full name: Microsoft.FSharp.Collections.Seq.concat
Full name: Worldbank.idToName
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.dict
Full name: Worldbank.idAndYearToValue
Full name: Worldbank.isVisibleCountry
Returns true if data is available for all 3 monitored years
and the country or region has over 500 million
Full name: Microsoft.FSharp.Collections.Seq.forall
val int64 : value:'T -> int64 (requires member op_Explicit)
Full name: Microsoft.FSharp.Core.Operators.int64
--------------------
type int64 = System.Int64
Full name: Microsoft.FSharp.Core.int64
--------------------
type int64<'Measure> = int64
Full name: Microsoft.FSharp.Core.int64<_>
Full name: Worldbank.names
Names of visible countries/regions for axis labels
Full name: Microsoft.FSharp.Core.Operators.( |KeyValue| )
Full name: Worldbank.stats
List of tuples consisting of a year and data for the year
val float : value:'T -> float (requires member op_Explicit)
Full name: Microsoft.FSharp.Core.Operators.float
--------------------
type float = System.Double
Full name: Microsoft.FSharp.Core.float
--------------------
type float<'Measure> = float
Full name: Microsoft.FSharp.Core.float<_>
Full name: Worldbank.gp
type GnuPlot =
interface IDisposable
new : ?path:string -> GnuPlot
private new : actualPath:string -> GnuPlot
member private Dispose : disposing:bool -> unit
override Finalize : unit -> unit
member Plot : data:seq<Series> * ?style:Style * ?range:Range * ?output:Output * ?titles:Titles -> unit
member Plot : data:Series * ?style:Style * ?range:Range * ?output:Output * ?titles:Titles -> unit
member Plot : func:string * ?style:Style * ?range:Range * ?output:Output * ?titles:Titles -> unit
member SendCommand : str:string -> unit
member Set : ?style:Style * ?range:Range * ?output:Output * ?titles:Titles * ?TimeFormatX:TimeFormatX -> unit
...
Full name: FnuPlot.GnuPlot
--------------------
new : ?path:string -> GnuPlot
type Output =
interface ICommand
new : output:OutputType * ?font:string -> Output
Full name: FnuPlot.Output
--------------------
new : output:OutputType * ?font:string -> Output
type Style =
interface ICommand
new : ?fill:FillStyle -> Style
Full name: FnuPlot.Style
--------------------
new : ?fill:FillStyle -> Style
Full name: FnuPlot.Ranges.RangeY
type Titles =
interface ICommand
new : ?x:string list * ?xrotate:int * ?y:string list * ?yrotate:int -> Titles
Full name: FnuPlot.Titles
--------------------
new : ?x:string list * ?xrotate:int * ?y:string list * ?yrotate:int -> Titles
Full name: Worldbank.colors
struct
member A : byte
member B : byte
member Equals : obj:obj -> bool
member G : byte
member GetBrightness : unit -> float32
member GetHashCode : unit -> int
member GetHue : unit -> float32
member GetSaturation : unit -> float32
member IsEmpty : bool
member IsKnownColor : bool
...
end
Full name: System.Drawing.Color
member GnuPlot.Plot : data:Series * ?style:Style * ?range:Internal.Range * ?output:Output * ?titles:Titles -> unit
member GnuPlot.Plot : func:string * ?style:Style * ?range:Internal.Range * ?output:Output * ?titles:Titles -> unit
Full name: Microsoft.FSharp.Collections.Seq.zip
type Series =
new : plot:SeriesType * data:Data * ?title:string * ?lineColor:Color * ?weight:int * ?fill:FillStyle -> Series
member Command : string
member Data : Data
static member Histogram : data:seq<float> * ?title:string * ?lineColor:Color * ?weight:int * ?fill:FillStyle -> Series
static member Impulses : data:seq<DateTime * float> * ?title:string * ?lineColor:Color * ?weight:int -> Series
static member Impulses : data:string * ?title:string * ?lineColor:Color * ?weight:int -> Series
static member Impulses : data:seq<float * float> * ?title:string * ?lineColor:Color * ?weight:int -> Series
static member Impulses : data:seq<float> * ?title:string * ?lineColor:Color * ?weight:int -> Series
static member Lines : data:seq<DateTime * float> * ?title:string * ?lineColor:Color * ?weight:int -> Series
static member Lines : data:string * ?title:string * ?lineColor:Color * ?weight:int -> Series
...
Full name: FnuPlot.Series
--------------------
new : plot:SeriesType * data:Data * ?title:string * ?lineColor:Color * ?weight:int * ?fill:FillStyle -> Series