The F# CSV Type Provider is built on top of an efficient CSV parser written in F#. There's also a simple API that can be used to access values dynamically.
When working with well-defined CSV documents, it is easier to use the type provider, but in a more dynamic scenario or when writing quick and simple scripts, the parser might be a simpler option.
To load a sample CSV document, we first need to reference the
(when using F# Interactive) or to add reference to a project.
FSharp.Data namespace contains the
CsvFile type that provides two static methods
for loading data. The
Parse method can be used if we have the data in a
Load method allows reading the data from a file or from a web resource (and there's
also an asynchronous
AsyncLoad version). The following sample calls
Load with a URL that
points to a live CSV file on the Yahoo finance web site:
// Download the stock prices let msft = CsvFile.Load(__SOURCE_DIRECTORY__ + "/../data/MSFT.csv").Cache() // Print the prices in the HLOC format for row in msft.Rows do printfn "HLOC: (%s, %s, %s)" (row.GetColumn "High") (row.GetColumn "Low") (row.GetColumn "Date")
Note that unlike
CsvFile works in streaming mode for performance reasons, which means
Rows can only be iterated once. If you need to iterate multiple times, use the
but please note that this will increase memory usage and should not be used in large datasets.
Now we look at a number of extensions that become available after
FSharp.Data.CsvExtensions namespace. Once opened, we can write:
row?columnuses the dynamic operator to obtain the column value named
column; alternatively, you can also use an indexer
value.AsBoolean()returns the value as boolean if it is either
value.AsInteger()returns the value as integer if it is numeric and can be converted to an integer;
value.AsDateTime()returns the value as a
DateTimevalue using either the ISO 8601 format, or using the
\/Date(...)\/JSON format containing number of milliseconds since 1/1/1970.
value.AsDateTimeOffset()parses the string as a
DateTimeOffsetvalue using either the ISO 8601 format, or using the
\/Date(...[+/-]offset)\/JSON format containing number of milliseconds since 1/1/1970, [+/-] the 4 digit offset. Example-
value.AsTimeSpan()parses the string as a
value.AsGuid()returns the value as a
Methods that may need to parse a numeric value or date (such as
AsDateTime) receive an optional culture parameter.
The following example shows how to process the sample previous CSV sample using these extensions:
open FSharp.Data.CsvExtensions for row in msft.Rows do printfn "HLOC: (%f, %M, %O)" (row.["High"].AsFloat()) (row?Low.AsDecimal()) (row?Date.AsDateTime())
In addition to reading,
CsvFiles also has support for transforming CSV files. The operations
Truncate. After transforming
you can save the results by using one of the overloads of the
Save method. You can choose different
separator and quote characters when saving.
// Saving the first 10 stock prices where the closing price is higher than the opening price in TSV format: msft.Filter(fun row -> row?Close.AsFloat() > row?Open.AsFloat()) .Truncate(10) .SaveToString('\t')