Header menu logo FSharp.Data

BinderScriptNotebook

CSV Parser

The F# CSV Type Provider is built on top of an efficient CSV parser written in F#. There's also a simple API that can be used to access values dynamically.

When working with well-defined CSV documents, it is easier to use the type provider, but in a more dynamic scenario or when writing quick and simple scripts, the parser might be a simpler option.

Loading CSV documents

To load a sample CSV document, we first need to reference the FSharp.Data package.

open FSharp.Data

The FSharp.Data namespace contains the CsvFile type that provides two static methods for loading data. The CsvFile.Parse method can be used if we have the data in a string value. The CsvFile.Load method allows reading the data from a file or from a web resource (and there's also an asynchronous CsvFile.AsyncLoad version). The following sample calls CsvFile.Load with a URL that points to a live CSV file on the Yahoo finance web site:

// Download the stock prices
let msft =
    CsvFile
        .Load(__SOURCE_DIRECTORY__ + "/../data/MSFT.csv")
        .Cache()

// Print the prices in the HLOC format
for row in msft.Rows |> Seq.truncate 10 do
    printfn "HLOC: (%s, %s, %s)" (row.GetColumn "High") (row.GetColumn "Low") (row.GetColumn "Date")
HLOC: (76.55, 75.86, 9-Oct-17)
HLOC: (76.03, 75.54, 6-Oct-17)
HLOC: (76.12, 74.96, 5-Oct-17)
HLOC: (74.72, 73.71, 4-Oct-17)
HLOC: (74.88, 74.20, 3-Oct-17)
HLOC: (75.01, 74.30, 2-Oct-17)
HLOC: (74.54, 73.88, 29-Sep-17)
HLOC: (73.97, 73.31, 28-Sep-17)
HLOC: (74.17, 73.17, 27-Sep-17)
HLOC: (73.81, 72.99, 26-Sep-17)
val msft: Runtime.CsvFile<CsvRow>
val it: unit = ()

Note that unlike CsvProvider, CsvFile works in streaming mode for performance reasons, which means that CsvFile.Rows can only be iterated once. If you need to iterate multiple times, use the CsvFile.Cache method, but please note that this will increase memory usage and should not be used in large datasets.

Using CSV extensions

Now we look at a number of extensions that become available after opening the CsvExtensions namespace. Once opened, we can write:

Methods that may need to parse a numeric value or date (such as AsFloat and AsDateTime) receive an optional culture parameter.

The following example shows how to process the sample previous CSV sample using these extensions:

open FSharp.Data.CsvExtensions

for row in msft.Rows |> Seq.truncate 10 do
    printfn "HLOC: (%f, %M, %O)" (row.["High"].AsFloat()) (row?Low.AsDecimal()) (row?Date.AsDateTime())
HLOC: (76.550000, 75.86, 10/9/2017 12:00:00 AM)
HLOC: (76.030000, 75.54, 10/6/2017 12:00:00 AM)
HLOC: (76.120000, 74.96, 10/5/2017 12:00:00 AM)
HLOC: (74.720000, 73.71, 10/4/2017 12:00:00 AM)
HLOC: (74.880000, 74.20, 10/3/2017 12:00:00 AM)
HLOC: (75.010000, 74.30, 10/2/2017 12:00:00 AM)
HLOC: (74.540000, 73.88, 9/29/2017 12:00:00 AM)
HLOC: (73.970000, 73.31, 9/28/2017 12:00:00 AM)
HLOC: (74.170000, 73.17, 9/27/2017 12:00:00 AM)
HLOC: (73.810000, 72.99, 9/26/2017 12:00:00 AM)
val it: unit = ()

Transforming CSV files

In addition to reading, CsvFile also has support for transforming CSV files. The operations available are CsvFile.Filter, Take, TakeWhile, Skip, SkipWhile, and Truncate. After transforming you can save the results by using one of the overloads of the Save method. You can choose different separator and quote characters when saving.

// Saving the first 10 stock prices where the closing price is higher than the opening price in TSV format:
msft
    .Filter(fun row -> row?Close.AsFloat() > row?Open.AsFloat())
    .Truncate(10)
    .SaveToString('\t')
val it: string =
  "Date	Open	High	Low	Close	Volume
9-Oct-17	75.97	76.55	75.86	76.29	11386502
6-Oct-17	75.67	76.03	75.54	76.00	13959814
5-Oct-17	75.22	76.12	74.96	75.97	21195261
4-Oct-17	74.09	74.72	73.71	74.69	13317681
29-Sep-17	73.94	74.54	73.88	74.49	17079114
28-Sep-17	73.54	73.97	73.31	73.87	10883787
27-Sep-17	73.55	74.17	73.17	73.85	19375099
22-Sep-17	73.99	74.51	73.85	74.41	14111365
19-Sep-17	75.21	75.71	75.01	75.44	16093344
15-Sep-17	74.83	75.39	74.07	75.31	38578441
"

Related articles

Multiple items
namespace FSharp

--------------------
namespace Microsoft.FSharp
Multiple items
namespace FSharp.Data

--------------------
namespace Microsoft.FSharp.Data
val msft: Runtime.CsvFile<CsvRow>
type CsvFile = inherit CsvFile<CsvRow> member GetColumnIndex: columnName: string -> int member TryGetColumnIndex: columnName: string -> int option static member AsyncLoad: uri: string * [<Optional>] ?separators: string * [<Optional>] ?quote: char * [<Optional>] ?hasHeaders: bool * [<Optional>] ?ignoreErrors: bool * [<Optional>] ?skipRows: int * [<Optional>] ?encoding: Encoding -> Async<CsvFile> static member Load: stream: Stream * [<Optional>] ?separators: string * [<Optional>] ?quote: char * [<Optional>] ?hasHeaders: bool * [<Optional>] ?ignoreErrors: bool * [<Optional>] ?skipRows: int -> CsvFile + 2 overloads static member Parse: text: string * [<Optional>] ?separators: string * [<Optional>] ?quote: char * [<Optional>] ?hasHeaders: bool * [<Optional>] ?ignoreErrors: bool * [<Optional>] ?skipRows: int -> CsvFile
<summary> Represents a CSV file. The lines are read on demand from <c>reader</c>. Columns are delimited by one of the chars passed by <c>separators</c> (defaults to just <c>,</c>), and to escape the separator chars, the <c>quote</c> character will be used (defaults to <c>"</c>). If <c>hasHeaders</c> is true (the default), the first line read by <c>reader</c> will not be considered part of data. If <c>ignoreErrors</c> is true (the default is false), rows with a different number of columns from the header row (or the first row if headers are not present) will be ignored. The first <c>skipRows</c> lines will be skipped. </summary>
val row: CsvRow
property Runtime.CsvFile.Rows: CsvRow seq with get
<summary> The rows with data </summary>
module Seq from Microsoft.FSharp.Collections
val truncate: count: int -> source: 'T seq -> 'T seq
val printfn: format: Printf.TextWriterFormat<'T> -> 'T
member CsvRow.GetColumn: columnName: string -> string
member CsvRow.GetColumn: index: int -> string
module CsvExtensions from FSharp.Data
<summary> Provides the dynamic operator for getting column values by name from CSV rows </summary>

Type something to start searching.