Type Inference and Missing Values
This page describes the type inference rules used by the FSharp.Data type providers (CSV, JSON, XML and HTML). Understanding these rules helps you know what F# types to expect for each property, and how to handle missing, null, or optional values at runtime.
Overview
All FSharp.Data type providers infer types from a sample document (or a list of samples) at compile time (design time). The generated F# types reflect the structure of the sample. At runtime, any document with a compatible structure can be read — but the generated types are fixed by the sample.
A key principle: the sample should be representative. If a property is present in the
sample but absent from runtime data, it can raise a KeyNotFoundException. Conversely,
if runtime data contains new properties not in the sample, they are not accessible via the
generated type (though they may still be reachable through the underlying JsonValue,
XElement, etc.).
Numeric Type Inference
When inferring numeric types, the providers prefer the most precise type that can represent all values. The preference order (most preferred first) is:
int– 32-bit signed integerint64– 64-bit signed integerdecimal– exact decimal arithmetic (preferred for financial/monetary values)-
float– 64-bit floating point (used whendecimalcannot represent the value, or when missing values appear in a CSV column that would otherwise bedecimal)
If values in a column or array mix two types, the provider automatically promotes to the
wider type. For example, a JSON array [1, 2, 3.14] will produce decimal values.
open FSharp.Data
// int is inferred when all values are integers
type IntsOnly = JsonProvider<""" [1, 2, 3] """>
// decimal is inferred when any value has a fractional part
type WithDecimal = JsonProvider<""" [1, 2, 3.14] """>
|
Boolean Inference (CSV)
In CSV files, columns whose values are exclusively drawn from the set
0, 1, Yes, No, True, False (case-insensitive) are inferred as bool.
Any other values in the column cause it to be treated as a string.
Date and Time Inference
The providers recognise date and time strings in standard ISO 8601 formats:
Inferred Type |
When Used |
Example Value |
|---|---|---|
|
Date + time strings (default) |
|
|
Date + time + timezone offset (always) |
|
|
Any date + time string when |
|
|
Date-only strings when |
|
|
Time-only strings when |
|
By default (PreferDateOnly = false), date-only strings such as "2023-06-15" are
inferred as DateTime for backward compatibility. Set PreferDateOnly = true on
.NET 6 and later to infer them as DateOnly instead.
Set PreferDateTimeOffset = true to infer all date-time values (that would otherwise be
DateTime) as DateTimeOffset instead. Values that already carry an explicit timezone
offset (e.g. "2023-06-15T12:00:00+02:00") are always inferred as DateTimeOffset
regardless of this flag. PreferDateTimeOffset and PreferDateOnly are independent:
DateOnly values stay as DateOnly even when PreferDateTimeOffset=true.
If a column mixes DateOnly and DateTime values, they are unified to DateTime.
Missing Values and Optionals
This is the most important topic for understanding how the providers behave at runtime. The rules differ slightly across providers.
JSON Provider
In JSON, a property can be absent from an object, or its value can be null (null literal).
Both cases are handled the same way by the JSON type provider:
- If a property is missing in some samples, it is inferred as
option<T>. - If a property has a null value in some samples, it is inferred as
option<T>.
This means None represents either a missing key or a null value at runtime.
// 'age' is missing from the second record → inferred as option<int>
type People =
JsonProvider<"""
[ { "name":"Alice", "age":30 },
{ "name":"Bob" } ] """>
for person in People.GetSamples() do
printf "%s" person.Name
match person.Age with
| Some age -> printfn " (age %d)" age
| None -> printfn " (age unknown)"
|
Important runtime note: If a property is present and non-null in all samples, it will be inferred as a non-optional type. If such a property is then absent or null in runtime data, accessing it will throw a runtime exception. Use multiple samples (or
SampleIsList=true) to ensure optional properties are correctly modelled.
Null values in JSON
A JSON null value that appears as the value of a typed property is treated as None.
A null value in a heterogeneous context (e.g. an array of numbers and nulls) is
represented via the option mechanism on the generated accessor.
CSV Provider
CSV files do not have a native null/missing concept. Instead, certain string values are
treated as missing. By default, the following strings (case-insensitive) are recognised
as missing: NaN, NA, N/A, #N/A, :, -, TBA, TBD (and empty string "").
You can override this list with the MissingValues static parameter.
When a column has at least one missing value, the inferred type changes as follows:
Base type |
With missing values (default) |
With |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The key differences between the default and PreferOptionals=true:
- In the default mode, integers use Nullable<T> and decimals are widened to float with Double.NaN.
- With PreferOptionals=true, all types use T option and you never get Double.NaN or Nullable<T>.
- Strings are never made into string option by default (empty string represents missing); use
PreferOptionals=true to get string option.
Design-time safety: If your sample file contains no missing values in a column, but you know
that production data may have missing values, set AssumeMissingValues=true to force the provider
to treat all columns as nullable/optional.
// With AssumeMissingValues=true, all columns become nullable/optional
// even if the sample has no missing values
type SafeCsv = CsvProvider<"A,B\n1,2\n3,4", AssumeMissingValues=true>
// With PreferOptionals=true, all columns use 'option' instead of Nullable or NaN
type OptionalsCsv = CsvProvider<"A,B\n1,2\n3,4", PreferOptionals=true>
|
XML Provider
In XML, values can be missing at the attribute or element level:
-
If an attribute is present in some sample elements but absent in others, it is
inferred as
option<T>. - If a child element is present in some samples but not all, it is inferred as optional.
-
If an attribute or element is never present in the sample, it cannot be accessed via the
generated type at all (use
XElement.Attribute(...)dynamically in that case).
// 'born' attribute missing from one author → option<int>
type Authors =
XmlProvider<"""
<authors>
<author name="Karl Popper" born="1902" />
<author name="Thomas Kuhn" />
</authors>
""">
let sample = Authors.GetSample()
for author in sample.Authors do
printf "%s" author.Name
match author.Born with
| Some year -> printfn " (born %d)" year
| None -> printfn ""
|
Note: If an attribute or element is absent from all sample data but present at runtime, it cannot be accessed through the generated type. You must include at least one occurrence (possibly with a dummy value) in the sample to have the provider generate an optional property.
Heterogeneous Types
Sometimes a property can hold values of different types. The JSON type provider handles this by generating a type with multiple optional accessors — one per observed type.
// Value can be int or string → generates .Number and .String accessors
type HetValues = JsonProvider<""" [{"value":94}, {"value":"hello"}] """>
for item in HetValues.GetSamples() do
match item.Value.Number, item.Value.String with
| Some n, _ -> printfn "Number: %d" n
| _, Some s -> printfn "String: %s" s
| _ -> ()
|
Design-Time vs Runtime Behaviour
The type providers perform inference at compile time using the sample document. At runtime, the actual data is parsed against the inferred schema. This has a few important implications:
-
Properties that are required at design-time may be missing at runtime. If a property is always present and non-null in your sample, the provider generates a non-optional accessor. If runtime data omits that property, a
KeyNotFoundExceptionis thrown when you access it. -
New properties in runtime data are ignored. If runtime JSON has extra keys that are not in the sample, those keys are simply not accessible via the generated type.
-
The sample should cover the full range of variability. Include examples of all optional properties and heterogeneous value types in your sample. Use
SampleIsList=truefor JSON/XML when the root is an array of samples. -
Runtime errors are lazy. The providers do not validate the entire document on load. A missing or mistyped field only causes an error when that specific property is accessed.
Summary of Inference-Control Parameters
The following static parameters let you override the default inference behaviour:
Parameter |
Providers |
Effect |
|---|---|---|
|
CSV, JSON, XML |
Use |
|
CSV |
Treat every column as nullable/optional even if the sample has no missing values |
|
CSV |
Comma-separated list of strings to recognise as missing (replaces defaults) |
|
CSV |
Number of rows to use for type inference (default 1000; 0 = all rows) |
|
JSON, XML |
Treat the top-level array as a list of sample objects, not a single sample |
|
CSV, JSON, XML |
Infer date-only strings as |
|
CSV, JSON, XML |
Infer all date-time values as |
|
JSON, XML |
Enable inline schema annotations ( |
|
CSV |
Override column names and/or types directly |
For full details on each parameter, see the individual provider documentation: CSV · JSON · XML · HTML
namespace FSharp
--------------------
namespace Microsoft.FSharp
namespace FSharp.Data
--------------------
namespace Microsoft.FSharp.Data
<summary>Typed representation of a JSON document.</summary> <param name='Sample'>Location of a JSON sample file or a string containing a sample JSON document.</param> <param name='SampleIsList'>If true, sample should be a list of individual samples for the inference.</param> <param name='RootName'>The name to be used to the root type. Defaults to `Root`.</param> <param name='Culture'>The culture used for parsing numbers and dates. Defaults to the invariant culture.</param> <param name='Encoding'>The encoding used to read the sample. You can specify either the character set name or the codepage number. Defaults to UTF8 for files, and to ISO-8859-1 the for HTTP requests, unless `charset` is specified in the `Content-Type` response header.</param> <param name='ResolutionFolder'>A directory that is used when resolving relative file references (at design time and in hosted execution).</param> <param name='EmbeddedResource'>When specified, the type provider first attempts to load the sample from the specified resource (e.g. 'MyCompany.MyAssembly, resource_name.json'). This is useful when exposing types generated by the type provider.</param> <param name='InferTypesFromValues'> This parameter is deprecated. Please use InferenceMode instead. If true, turns on additional type inference from values. (e.g. type inference infers string values such as "123" as ints and values constrained to 0 and 1 as booleans.)</param> <param name='PreferDictionaries'>If true, json records are interpreted as dictionaries when the names of all the fields are inferred (by type inference rules) into the same non-string primitive type.</param> <param name='InferenceMode'>Possible values: | NoInference -> Inference is disabled. All values are inferred as the most basic type permitted for the value (i.e. string or number or bool). | ValuesOnly -> Types of values are inferred from the Sample. Inline schema support is disabled. This is the default. | ValuesAndInlineSchemasHints -> Types of values are inferred from both values and inline schemas. Inline schemas are special string values that can define a type and/or unit of measure. Supported syntax: typeof<type> or typeof{type} or typeof<type<measure>> or typeof{type{measure}}. Valid measures are the default SI units, and valid types are <c>int</c>, <c>int64</c>, <c>bool</c>, <c>float</c>, <c>decimal</c>, <c>date</c>, <c>datetimeoffset</c>, <c>timespan</c>, <c>guid</c> and <c>string</c>. | ValuesAndInlineSchemasOverrides -> Same as ValuesAndInlineSchemasHints, but value inferred types are ignored when an inline schema is present. </param> <param name='Schema'>Location of a JSON Schema file or a string containing a JSON Schema document. When specified, Sample and SampleIsList must not be used.</param> <param name='PreferDateOnly'>When true on .NET 6+, date-only strings (e.g. "2023-01-15") are inferred as DateOnly and time-only strings as TimeOnly. Defaults to false for backward compatibility.</param> <param name='UseOriginalNames'>When true, JSON property names are used as-is for generated property names instead of being normalized to PascalCase. Defaults to false.</param> <param name='OmitNullFields'>When true, optional fields with value None are omitted from the generated JSON rather than serialized as null. Defaults to false.</param> <param name='PreferOptionals'>When set to true (default), inference will use the option type for missing or null values. When false, inference will prefer to use empty string or double.NaN for missing values where possible, matching the default CsvProvider behavior.</param> <param name='PreferDateTimeOffset'>When true, date-time strings without an explicit timezone offset are inferred as DateTimeOffset (using the local offset) instead of DateTime. Defaults to false.</param>
<summary>Typed representation of a CSV file.</summary> <param name='Sample'>Location of a CSV sample file or a string containing a sample CSV document.</param> <param name='Separators'>Column delimiter(s). Defaults to <c>,</c>.</param> <param name='InferRows'>Number of rows to use for inference. Defaults to <c>1000</c>. If this is zero, all rows are used.</param> <param name='Schema'>Optional column types, in a comma separated list. Valid types are <c>int</c>, <c>int64</c>, <c>bool</c>, <c>float</c>, <c>decimal</c>, <c>date</c>, <c>datetimeoffset</c>, <c>timespan</c>, <c>guid</c>, <c>string</c>, <c>int?</c>, <c>int64?</c>, <c>bool?</c>, <c>float?</c>, <c>decimal?</c>, <c>date?</c>, <c>datetimeoffset?</c>, <c>timespan?</c>, <c>guid?</c>, <c>int option</c>, <c>int64 option</c>, <c>bool option</c>, <c>float option</c>, <c>decimal option</c>, <c>date option</c>, <c>datetimeoffset option</c>, <c>timespan option</c>, <c>guid option</c> and <c>string option</c>. You can also specify a unit and the name of the column like this: <c>Name (type<unit>)</c>, or you can override only the name. If you don't want to specify all the columns, you can reference the columns by name like this: <c>ColumnName=type</c>.</param> <param name='HasHeaders'>Whether the sample contains the names of the columns as its first line.</param> <param name='IgnoreErrors'>Whether to ignore rows that have the wrong number of columns or which can't be parsed using the inferred or specified schema. Otherwise an exception is thrown when these rows are encountered.</param> <param name='SkipRows'>Skips the first n rows of the CSV file.</param> <param name='AssumeMissingValues'>When set to true, the type provider will assume all columns can have missing values, even if in the provided sample all values are present. Defaults to false.</param> <param name='PreferOptionals'>When set to true, inference will prefer to use the option type instead of nullable types, <c>double.NaN</c> or <c>""</c> for missing values. Defaults to false.</param> <param name='Quote'>The quotation mark (for surrounding values containing the delimiter). Defaults to <c>"</c>.</param> <param name='MissingValues'>The set of strings recognized as missing values specified as a comma-separated string (e.g., "NA,N/A"). Defaults to <c>NaN,NA,N/A,#N/A,:,-,TBA,TBD</c>.</param> <param name='CacheRows'>Whether the rows should be caches so they can be iterated multiple times. Defaults to true. Disable for large datasets.</param> <param name='Culture'>The culture used for parsing numbers and dates. Defaults to the invariant culture.</param> <param name='Encoding'>The encoding used to read the sample. You can specify either the character set name or the codepage number. Defaults to UTF8 for files, and to ISO-8859-1 the for HTTP requests, unless <c>charset</c> is specified in the <c>Content-Type</c> response header.</param> <param name='ResolutionFolder'>A directory that is used when resolving relative file references (at design time and in hosted execution).</param> <param name='EmbeddedResource'>When specified, the type provider first attempts to load the sample from the specified resource (e.g. 'MyCompany.MyAssembly, resource_name.csv'). This is useful when exposing types generated by the type provider.</param> <param name='PreferDateOnly'>When true on .NET 6+, date-only strings are inferred as DateOnly and time-only strings as TimeOnly. Defaults to false for backward compatibility.</param> <param name='StrictBooleans'>When true, only <c>true</c> and <c>false</c> (case-insensitive) are inferred as boolean. Values such as <c>0</c>, <c>1</c>, <c>yes</c>, and <c>no</c> are treated as integers or strings respectively. Defaults to false.</param> <param name='UseOriginalNames'>When true, CSV column header names are used as-is for generated property names instead of being normalized (e.g. capitalizing the first letter). Defaults to false.</param> <param name='PreferDateTimeOffset'>When true, date-time strings without an explicit timezone offset are inferred as DateTimeOffset (using the local offset) instead of DateTime. Defaults to false.</param>
<summary>Typed representation of a XML file.</summary> <param name='Sample'>Location of a XML sample file or a string containing a sample XML document.</param> <param name='SampleIsList'>If true, the children of the root in the sample document represent individual samples for the inference.</param> <param name='Global'>If true, the inference unifies all XML elements with the same name.</param> <param name='Culture'>The culture used for parsing numbers and dates. Defaults to the invariant culture.</param> <param name='Encoding'>The encoding used to read the sample. You can specify either the character set name or the codepage number. Defaults to UTF8 for files, and to ISO-8859-1 the for HTTP requests, unless <c>charset</c> is specified in the <c>Content-Type</c> response header.</param> <param name='ResolutionFolder'>A directory that is used when resolving relative file references (at design time and in hosted execution).</param> <param name='EmbeddedResource'>When specified, the type provider first attempts to load the sample from the specified resource (e.g. 'MyCompany.MyAssembly, resource_name.xml'). This is useful when exposing types generated by the type provider.</param> <param name='InferTypesFromValues'> This parameter is deprecated. Please use InferenceMode instead. If true, turns on additional type inference from values. (e.g. type inference infers string values such as "123" as ints and values constrained to 0 and 1 as booleans. The XmlProvider also infers string values as JSON.)</param> <param name='Schema'>Location of a schema file or a string containing xsd.</param> <param name='InferenceMode'>Possible values: | NoInference -> Inference is disabled. All values are inferred as the most basic type permitted for the value (usually string). | ValuesOnly -> Types of values are inferred from the Sample. Inline schema support is disabled. This is the default. | ValuesAndInlineSchemasHints -> Types of values are inferred from both values and inline schemas. Inline schemas are special string values that can define a type and/or unit of measure. Supported syntax: typeof<type> or typeof{type} or typeof<type<measure>> or typeof{type{measure}}. Valid measures are the default SI units, and valid types are <c>int</c>, <c>int64</c>, <c>bool</c>, <c>float</c>, <c>decimal</c>, <c>date</c>, <c>datetimeoffset</c>, <c>timespan</c>, <c>guid</c> and <c>string</c>. | ValuesAndInlineSchemasOverrides -> Same as ValuesAndInlineSchemasHints, but value inferred types are ignored when an inline schema is present. Note inline schemas are not used from Xsd documents. </param> <param name='PreferDateOnly'>When true on .NET 6+, date-only strings are inferred as DateOnly and time-only strings as TimeOnly. Defaults to false for backward compatibility.</param> <param name='DtdProcessing'>Controls how DTD declarations in the XML are handled. Accepted values: "Ignore" (default, silently skips DTD processing, safe for most cases), "Prohibit" (throws on any DTD declaration), "Parse" (enables full DTD processing including entity expansion, use with caution).</param> <param name='UseOriginalNames'>When true, XML element and attribute names are used as-is for generated property names instead of being normalized to PascalCase. Defaults to false.</param> <param name='PreferOptionals'>When set to true (default), inference will use the option type for missing or absent values. When false, inference will prefer to use empty string or double.NaN for missing values where possible, matching the default CsvProvider behavior.</param> <param name='PreferDateTimeOffset'>When true, date-time strings without an explicit timezone offset are inferred as DateTimeOffset (using the local offset) instead of DateTime. Defaults to false.</param>
FSharp.Data