F# Data Toolbox


F# Data Toolbox: SAS dataset type provider

The SAS dataset (sas7bdat) type provider allows exploratory programming with SAS files and provides native access to SAS datasets. No SAS software or OLE DB providers required.

Opening a SAS dataset file

If you are using F# Interactive, you first need to reference the SAS type provider assembly. Assuming you obtain the package from NuGet and the assembly is in packages, this would look as follows:

1: 
2: 
3: 
#I @"packages/FSharp.Data.Toolbox.Sas.0.3/lib/net40"
#r "FSharp.Data.Toolbox.Sas.dll"
open FSharp.Data.Toolbox.Sas

Open SAS dataset by passing file name to SasFileTypeProvider type

The library gives you a parameterized type provider SasFileTypeProvider that takes the SAS data file as an argument:

1: 
2: 
3: 
[<Literal>] 
let sasPath = @"../../tests/FSharp.Data.Toolbox.Sas.Tests/files/acadindx.sas7bdat"
let sasFile = new SasFileTypeProvider<sasPath>()

After openning the dataset, you can call methods to access SAS metadata and the data itself.

Accessing metadata

The following examples show how to access meta-information about SAS dataset.

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
let datasetName = sasFile.Header.DataSet.Trim()
let architecture = sasFile.Header.Bits
let rowCount = sasFile.MetaData.RowCount

// Get a list of dataset columns
let cols = sasFile.MetaData.Columns
printfn "Number of columns: %d" (cols |> Seq.length)
printfn "Names of columns: %s"  (cols |> Seq.map (fun col -> col.Name) |> String.concat ", ")

Accessing data in a strongly-typed fashion

Good for exploratory programming. IntelliSense access to column names.

1: 
2: 
3: 
4: 
// read sixth row of data
let row = sasFile.Observations |> Seq.skip 5 |> Seq.head
printfn "Column 'id' value: %A" row.id
printfn "Column 'reading' value: %A" row.reading

The following examples show a couple of calculations that you can write using the standard F# library functions over the data obtained using the type provider:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
// sum first 10 'reading' variable values
sasFile.Observations
|> Seq.take 10
|> Seq.sumBy ( fun obs -> obs.reading )

// calculate mean
let readingMean = 
  sasFile.Observations
  |> Seq.averageBy (fun obs -> obs.reading )

// standard deviation 
let readingStdDev =
  let sum =
    sasFile.Observations
    |> Seq.map (fun obs -> (obs.reading - readingMean) ** 2.0)
    |> Seq.sum
  sqrt (sum / Seq.length sasFile.Observations)

// min
sasFile.Observations
|> Seq.map (fun obs -> obs.reading)
|> Seq.min

// ...and max
sasFile.Observations
|> Seq.map (fun obs -> obs.reading)
|> Seq.max

Accessing data with F# Query Expressions

'query { expression } ' syntax can be used to access SAS dataset

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
// multiply 'reading' by 'writing' and sum
query {
    for obs in sasFile.Observations do
    sumBy (obs.reading * obs.writing)
}

// ..is equivalent to:
sasFile.Observations
|> Seq.map (fun obs -> obs.reading * obs.writing)
|> Seq.sum

You can use other constructs available inside F# query expressions to filter the data or perform aggregations:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
// filter data
query {
    for obs in sasFile.Observations do
    where (obs.female = Number 1. )
    select obs.female
    }

// aggregate 
query {
    for obs in sasFile.Observations do
    where (obs.female <> Number 1. )
    count
    }

query {
    for obs in sasFile.Observations do
    where (obs.female <> Number 1. )
    sumBy obs.writing
    }

The following is a slightly more interesting example which joins data from two data sets:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
// join two datasets
[<Literal>] 
let crimePath = @"../../tests/FSharp.Data.Toolbox.Sas.Tests/files/crime.sas7bdat" 
let crimeFile = new SasFileTypeProvider<crimePath>()

[<Literal>] 
let statesPath = @"../../tests/FSharp.Data.Toolbox.Sas.Tests/files/states.sas7bdat" 
let statesFile = new SasFileTypeProvider<statesPath>()

let trim x = 
  let (Character s) = x 
  s.Trim()

query {
  for crime in crimeFile.Observations do
  join state in statesFile.Observations 
      on (trim crime.State = trim state.State)
  select (crime.murder_rate, state.State)
}

Accessing data in a generic way

Can be used for bulk data processing or converting SAS files to text files.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
let valueToText value = 
  match value with
  | Number n -> n.ToString()
  | Character s -> s.Trim()
  | Time t -> t.ToString("HH:mm:ss")
  | Date d -> d.ToString("yyyy-MM-dd")
  | DateAndTime dt -> dt.ToString("O")
  | Empty -> ""

sasFile.Rows
  |> Seq.take 100
  |> Seq.iter (fun row ->
      let line =
          row
          |> Seq.map valueToText
          |> String.concat "," 
      printfn "%s" line )

Displaying data in a grid

We can display the data in a grid.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
31: 
32: 
33: 
34: 
35: 
36: 
37: 
38: 
39: 
40: 
41: 
42: 
43: 
44: 
45: 
46: 
47: 
48: 
49: 
50: 
51: 
52: 
53: 
54: 
55: 
56: 
57: 
58: 
59: 
60: 
61: 
open System.Windows.Forms

// Create a window with a grid
let frm = 
    new Form(TopMost = true, Visible = true, 
        Text = "F# Type Provider for SAS: " + System.IO.Path.GetFileName sasFile.FileName, 
        Width = 600, Height = 600)
let grid = new DataGridView(Dock = DockStyle.Fill, ReadOnly = true)
let btn = new Button(Text = "Read next page", Dock = DockStyle.Bottom)
let status = new StatusBar(ShowPanels = true, Dock = DockStyle.Bottom)
let pageStatus = new StatusBarPanel(Text = "Page")
let recordStatus = new StatusBarPanel(Text = "Records", Width = 300 )
status.Panels.Add pageStatus
status.Panels.Add recordStatus
frm.Controls.Add grid
frm.Controls.Add btn
frm.Controls.Add status

let pageSize = 100

let read page = 
    sasFile.Observations 
    |> Seq.skip (pageSize*(page - 1))
    |> Seq.truncate pageSize

// Add columns
let columns = sasFile.MetaData.Columns
grid.ColumnCount <- columns.Length
for i = 0 to columns.Length - 1 do
    grid.Columns.[i].HeaderText <- columns.[i].Name

// Display data
let show page =
    let data = read page
    grid.Rows.Clear()
    pageStatus.Text <- sprintf "Page %i" page
    recordStatus.Text <- sprintf "Records %i-%i of %i" 
        <| (page-1)*pageSize + 1 
        <| min (page*pageSize) sasFile.MetaData.RowCount 
        <| sasFile.MetaData.RowCount
    for row in data do
        let values = [| for value in row -> valueToText value |]

        let gridRow = new DataGridViewRow()
        gridRow.CreateCells grid

        for col = 0 to columns.Length - 1 do
            gridRow.Cells.[col].Value <- values.[col]

        grid.Rows.Add gridRow |> ignore
    
let mutable page = 1
btn.Click.Add(fun _ -> 
    if page*pageSize < sasFile.MetaData.RowCount then
        page <- page + 1
    else 
        page <- 1
    show page 
    )

show page

SAS dataset viewer

Multiple items
namespace FSharp

--------------------
namespace Microsoft.FSharp
Multiple items
namespace FSharp.Data

--------------------
namespace Microsoft.FSharp.Data
namespace FSharp.Data.Toolbox
namespace FSharp.Data.Toolbox.Sas
Multiple items
type LiteralAttribute =
  inherit Attribute
  new : unit -> LiteralAttribute

Full name: Microsoft.FSharp.Core.LiteralAttribute

--------------------
new : unit -> LiteralAttribute
val sasPath : string

Full name: SasProvider.sasPath
val sasFile : SasFileTypeProvider<...>

Full name: SasProvider.sasFile
type SasFileTypeProvider

Full name: FSharp.Data.Toolbox.Sas.SasFileTypeProvider
val datasetName : string

Full name: SasProvider.datasetName
property SasFile.Header: Header
Header.DataSet: string
System.String.Trim() : string
System.String.Trim([<System.ParamArray>] trimChars: char []) : string
val architecture : Bits

Full name: SasProvider.architecture
Header.Bits: Bits
val rowCount : int

Full name: SasProvider.rowCount
property SasFile.MetaData: MetaData
MetaData.RowCount: int
val cols : Column array

Full name: SasProvider.cols
MetaData.Columns: Column array
val printfn : format:Printf.TextWriterFormat<'T> -> 'T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
module Seq

from Microsoft.FSharp.Collections
val length : source:seq<'T> -> int

Full name: Microsoft.FSharp.Collections.Seq.length
val map : mapping:('T -> 'U) -> source:seq<'T> -> seq<'U>

Full name: Microsoft.FSharp.Collections.Seq.map
val col : Column
Column.Name: string
module String

from Microsoft.FSharp.Core
val concat : sep:string -> strings:seq<string> -> string

Full name: Microsoft.FSharp.Core.String.concat
val row : SasFileTypeProvider<...>.Observation

Full name: SasProvider.row
property SasFileTypeProvider<...>.Observations: System.Collections.Generic.IEnumerable<SasFileTypeProvider<...>.Observation>
val skip : count:int -> source:seq<'T> -> seq<'T>

Full name: Microsoft.FSharp.Collections.Seq.skip
val head : source:seq<'T> -> 'T

Full name: Microsoft.FSharp.Collections.Seq.head
property SasFileTypeProvider<...>.Observation.id: Value
property SasFileTypeProvider<...>.Observation.reading: Value
val take : count:int -> source:seq<'T> -> seq<'T>

Full name: Microsoft.FSharp.Collections.Seq.take
val sumBy : projection:('T -> 'U) -> source:seq<'T> -> 'U (requires member ( + ) and member get_Zero)

Full name: Microsoft.FSharp.Collections.Seq.sumBy
val obs : SasFileTypeProvider<...>.Observation
val readingMean : Value

Full name: SasProvider.readingMean
val averageBy : projection:('T -> 'U) -> source:seq<'T> -> 'U (requires member ( + ) and member DivideByInt and member get_Zero)

Full name: Microsoft.FSharp.Collections.Seq.averageBy
val readingStdDev : Value

Full name: SasProvider.readingStdDev
val sum : Value
val sum : source:seq<'T> -> 'T (requires member ( + ) and member get_Zero)

Full name: Microsoft.FSharp.Collections.Seq.sum
val sqrt : value:'T -> 'U (requires member Sqrt)

Full name: Microsoft.FSharp.Core.Operators.sqrt
val min : source:seq<'T> -> 'T (requires comparison)

Full name: Microsoft.FSharp.Collections.Seq.min
val max : source:seq<'T> -> 'T (requires comparison)

Full name: Microsoft.FSharp.Collections.Seq.max
val query : Linq.QueryBuilder

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.query
custom operation: sumBy ('Value)

Calls Linq.QueryBuilder.SumBy
property SasFileTypeProvider<...>.Observation.writing: Value
custom operation: where (bool)

Calls Linq.QueryBuilder.Where
property SasFileTypeProvider<...>.Observation.female: Value
union case Value.Number: float -> Value
custom operation: select ('Result)

Calls Linq.QueryBuilder.Select
custom operation: count

Calls Linq.QueryBuilder.Count
val crimePath : string

Full name: SasProvider.crimePath
val crimeFile : SasFileTypeProvider<...>

Full name: SasProvider.crimeFile
val statesPath : string

Full name: SasProvider.statesPath
val statesFile : obj

Full name: SasProvider.statesFile
val trim : x:Value -> string

Full name: SasProvider.trim
val x : Value
union case Value.Character: string -> Value
val s : string
val crime : SasFileTypeProvider<...>.Observation
custom operation: join var in collection on (outerKey = innerKey). Note that parentheses are required after 'on'

Calls Linq.QueryBuilder.Join
val state : obj
val valueToText : value:Value -> string

Full name: SasProvider.valueToText
val value : Value
val n : float
System.Double.ToString() : string
System.Double.ToString(provider: System.IFormatProvider) : string
System.Double.ToString(format: string) : string
System.Double.ToString(format: string, provider: System.IFormatProvider) : string
union case Value.Time: System.DateTime -> Value
val t : System.DateTime
System.DateTime.ToString() : string
System.DateTime.ToString(provider: System.IFormatProvider) : string
System.DateTime.ToString(format: string) : string
System.DateTime.ToString(format: string, provider: System.IFormatProvider) : string
union case Value.Date: System.DateTime -> Value
val d : System.DateTime
union case Value.DateAndTime: System.DateTime -> Value
val dt : System.DateTime
union case Value.Empty: Value
property SasFile.Rows: seq<seq<Value>>
val iter : action:('T -> unit) -> source:seq<'T> -> unit

Full name: Microsoft.FSharp.Collections.Seq.iter
val row : seq<Value>
val line : string
namespace System
namespace System.Windows
namespace System.Windows.Forms
val frm : Form

Full name: SasProvider.frm
Multiple items
type Form =
  inherit ContainerControl
  new : unit -> Form
  member AcceptButton : IButtonControl with get, set
  member Activate : unit -> unit
  member ActiveMdiChild : Form
  member AddOwnedForm : ownedForm:Form -> unit
  member AllowTransparency : bool with get, set
  member AutoScale : bool with get, set
  member AutoScaleBaseSize : Size with get, set
  member AutoScroll : bool with get, set
  member AutoSize : bool with get, set
  ...
  nested type ControlCollection

Full name: System.Windows.Forms.Form

--------------------
Form() : unit
union case ColumnType.Text: ColumnType
namespace System.IO
type Path =
  static val DirectorySeparatorChar : char
  static val AltDirectorySeparatorChar : char
  static val VolumeSeparatorChar : char
  static val InvalidPathChars : char[]
  static val PathSeparator : char
  static member ChangeExtension : path:string * extension:string -> string
  static member Combine : [<ParamArray>] paths:string[] -> string + 3 overloads
  static member GetDirectoryName : path:string -> string
  static member GetExtension : path:string -> string
  static member GetFileName : path:string -> string
  ...

Full name: System.IO.Path
System.IO.Path.GetFileName(path: string) : string
property SasFile.FileName: string
val grid : DataGridView

Full name: SasProvider.grid
Multiple items
type DataGridView =
  inherit Control
  new : unit -> DataGridView
  member AdjustColumnHeaderBorderStyle : dataGridViewAdvancedBorderStyleInput:DataGridViewAdvancedBorderStyle * dataGridViewAdvancedBorderStylePlaceholder:DataGridViewAdvancedBorderStyle * isFirstDisplayedColumn:bool * isLastVisibleColumn:bool -> DataGridViewAdvancedBorderStyle
  member AdjustedTopLeftHeaderBorderStyle : DataGridViewAdvancedBorderStyle
  member AdvancedCellBorderStyle : DataGridViewAdvancedBorderStyle
  member AdvancedColumnHeadersBorderStyle : DataGridViewAdvancedBorderStyle
  member AdvancedRowHeadersBorderStyle : DataGridViewAdvancedBorderStyle
  member AllowUserToAddRows : bool with get, set
  member AllowUserToDeleteRows : bool with get, set
  member AllowUserToOrderColumns : bool with get, set
  member AllowUserToResizeColumns : bool with get, set
  ...
  nested type DataGridViewControlCollection
  nested type HitTestInfo

Full name: System.Windows.Forms.DataGridView

--------------------
DataGridView() : unit
type DockStyle =
  | None = 0
  | Top = 1
  | Bottom = 2
  | Left = 3
  | Right = 4
  | Fill = 5

Full name: System.Windows.Forms.DockStyle
field DockStyle.Fill = 5
val btn : Button

Full name: SasProvider.btn
Multiple items
type Button =
  inherit ButtonBase
  new : unit -> Button
  member AutoSizeMode : AutoSizeMode with get, set
  member DialogResult : DialogResult with get, set
  member NotifyDefault : value:bool -> unit
  member PerformClick : unit -> unit
  member ToString : unit -> string
  event DoubleClick : EventHandler
  event MouseDoubleClick : MouseEventHandler

Full name: System.Windows.Forms.Button

--------------------
Button() : unit
field DockStyle.Bottom = 2
val status : StatusBar

Full name: SasProvider.status
Multiple items
type StatusBar =
  inherit Control
  new : unit -> StatusBar
  member BackColor : Color with get, set
  member BackgroundImage : Image with get, set
  member BackgroundImageLayout : ImageLayout with get, set
  member Dock : DockStyle with get, set
  member Font : Font with get, set
  member ForeColor : Color with get, set
  member ImeMode : ImeMode with get, set
  member Panels : StatusBarPanelCollection
  member ShowPanels : bool with get, set
  ...
  nested type StatusBarPanelCollection

Full name: System.Windows.Forms.StatusBar

--------------------
StatusBar() : unit
val pageStatus : StatusBarPanel

Full name: SasProvider.pageStatus
Multiple items
type StatusBarPanel =
  inherit Component
  new : unit -> StatusBarPanel
  member Alignment : HorizontalAlignment with get, set
  member AutoSize : StatusBarPanelAutoSize with get, set
  member BeginInit : unit -> unit
  member BorderStyle : StatusBarPanelBorderStyle with get, set
  member EndInit : unit -> unit
  member Icon : Icon with get, set
  member MinWidth : int with get, set
  member Name : string with get, set
  member Parent : StatusBar
  ...

Full name: System.Windows.Forms.StatusBarPanel

--------------------
StatusBarPanel() : unit
val recordStatus : StatusBarPanel

Full name: SasProvider.recordStatus
property StatusBar.Panels: StatusBar.StatusBarPanelCollection
StatusBar.StatusBarPanelCollection.Add(value: StatusBarPanel) : int
StatusBar.StatusBarPanelCollection.Add(text: string) : StatusBarPanel
property Control.Controls: Control.ControlCollection
Control.ControlCollection.Add(value: Control) : unit
val pageSize : int

Full name: SasProvider.pageSize
val read : page:int -> seq<SasFileTypeProvider<...>.Observation>

Full name: SasProvider.read
val page : int
val truncate : count:int -> source:seq<'T> -> seq<'T>

Full name: Microsoft.FSharp.Collections.Seq.truncate
val columns : Column array

Full name: SasProvider.columns
property DataGridView.ColumnCount: int
property System.Array.Length: int
val i : int
property DataGridView.Columns: DataGridViewColumnCollection
val show : page:int -> unit

Full name: SasProvider.show
val data : seq<SasFileTypeProvider<...>.Observation>
property DataGridView.Rows: DataGridViewRowCollection
DataGridViewRowCollection.Clear() : unit
property StatusBarPanel.Text: string
val sprintf : format:Printf.StringFormat<'T> -> 'T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.sprintf
val min : e1:'T -> e2:'T -> 'T (requires comparison)

Full name: Microsoft.FSharp.Core.Operators.min
val row : SasFileTypeProvider<...>.Observation
val values : string []
val gridRow : DataGridViewRow
Multiple items
type DataGridViewRow =
  inherit DataGridViewBand
  new : unit -> DataGridViewRow
  member AccessibilityObject : AccessibleObject
  member AdjustRowHeaderBorderStyle : dataGridViewAdvancedBorderStyleInput:DataGridViewAdvancedBorderStyle * dataGridViewAdvancedBorderStylePlaceholder:DataGridViewAdvancedBorderStyle * singleVerticalBorderAdded:bool * singleHorizontalBorderAdded:bool * isFirstDisplayedRow:bool * isLastVisibleRow:bool -> DataGridViewAdvancedBorderStyle
  member Cells : DataGridViewCellCollection
  member Clone : unit -> obj
  member ContextMenuStrip : ContextMenuStrip with get, set
  member CreateCells : dataGridView:DataGridView -> unit + 1 overload
  member DataBoundItem : obj
  member DefaultCellStyle : DataGridViewCellStyle with get, set
  member Displayed : bool
  ...

Full name: System.Windows.Forms.DataGridViewRow

--------------------
DataGridViewRow() : unit
DataGridViewRow.CreateCells(dataGridView: DataGridView) : unit
DataGridViewRow.CreateCells(dataGridView: DataGridView, [<System.ParamArray>] values: obj []) : unit
val col : int
property DataGridViewRow.Cells: DataGridViewCellCollection
type Value =
  | Number of float
  | DateAndTime of DateTime
  | Date of DateTime
  | Time of DateTime
  | Character of string
  | Empty
  static member DivideByInt : value:Value * n:int -> Value
  static member Pow : value:Value * n:'a0 -> Value (requires member op_Explicit)
  static member Sqrt : value:Value -> Value
  static member Zero : Value
  static member ( + ) : x:'a0 * value:Value -> Value (requires member op_Explicit)
  static member ( + ) : value:Value * x:'a0 -> Value (requires member op_Explicit)
  static member ( + ) : value1:Value * value2:Value -> Value
  static member ( / ) : x:'a0 * value:Value -> Value (requires member op_Explicit)
  static member ( / ) : value:Value * x:'a0 -> Value (requires member op_Explicit)
  static member ( / ) : value1:Value * value2:Value -> Value
  static member ( * ) : x:'a0 * value:Value -> Value (requires member op_Explicit and member op_Explicit)
  static member ( * ) : value:Value * x:'a0 -> Value (requires member op_Explicit and member op_Explicit)
  static member ( * ) : value1:Value * value2:Value -> Value
  static member ( - ) : x:'a0 * value:Value -> Value (requires member op_Explicit)
  static member ( - ) : value:Value * x:'a0 -> Value (requires member op_Explicit and member ( ~- ))
  static member ( - ) : value1:Value * value2:Value -> Value

Full name: FSharp.Data.Toolbox.Sas.Core.Value
DataGridViewRowCollection.Add() : int
DataGridViewRowCollection.Add(count: int) : int
DataGridViewRowCollection.Add(dataGridViewRow: DataGridViewRow) : int
DataGridViewRowCollection.Add([<System.ParamArray>] values: obj []) : int
val ignore : value:'T -> unit

Full name: Microsoft.FSharp.Core.Operators.ignore
val mutable page : int

Full name: SasProvider.page
event Control.Click: IEvent<System.EventHandler,System.EventArgs>
member System.IObservable.Add : callback:('T -> unit) -> unit
Fork me on GitHub