Header menu logo FSharp.Data

BinderScriptNotebook

XML Type Provider

This article demonstrates how to use the XML Type Provider to access XML documents in a statically typed way. We first look at how the structure is inferred and then demonstrate the provider by parsing an RSS feed.

The XML Type Provider provides statically typed access to XML documents. It takes a sample document as an input (or document containing a root XML node with multiple child nodes that are used as samples). The generated type can then be used to read files with the same structure

If the loaded file does not match the structure of the sample, a runtime error may occur (but only when explicitly accessing an element incompatible with the original sample — e.g. if it is no longer present)

Starting from version 3.0.0 there is also the option of using a schema (XSD) instead of relying on samples.

Introducing the provider

The type provider is located in the FSharp.Data.dll assembly. Assuming the assembly is located in the ../../bin directory, we can load it in F# Interactive as follows: (note we also need a reference to System.Xml.Linq, because the provider uses the XDocument type internally):

#r "System.Xml.Linq.dll"

open FSharp.Data

Inferring type from sample

The XmlProvider<...> takes one static parameter of type string. The parameter can be either a sample XML string or a sample file (relative to the current folder or online accessible via http or https). It is not likely that this could lead to ambiguities.

The following sample generates a type that can read simple XML documents with a root node containing two attributes:

type Author = XmlProvider<"""<author name="Paul Feyerabend" born="1924" />""">
let sample = Author.Parse("""<author name="Karl Popper" born="1902" />""")

printfn "%s (%d)" sample.Name sample.Born

The type provider generates a type Author that has properties corresponding to the attributes of the root element of the XML document. The types of the properties are inferred based on the values in the sample document. In this case, the Name property has a type string and Born is int.

XML is a quite flexible format, so we could represent the same document differently. Instead of using attributes, we could use nested nodes (<name> and <born> nested under <author>) that directly contain the values:

type AuthorAlt = XmlProvider<"<author><name>Karl Popper</name><born>1902</born></author>">
let doc = "<author><name>Paul Feyerabend</name><born>1924</born></author>"
let sampleAlt = AuthorAlt.Parse(doc)

printfn "%s (%d)" sampleAlt.Name sampleAlt.Born
Paul Feyerabend (1924)
type AuthorAlt = XmlProvider<...>
val doc: string =
  "<author><name>Paul Feyerabend</name><born>1924</born></author>"
val sampleAlt: XmlProvider<...>.Author =
  <author>
  <name>Paul Feyerabend</name>
  <born>1924</born>
</author>
val it: unit = ()

The generated type provides exactly the same API for reading documents following this convention (Note that you cannot use AuthorAlt to parse samples that use the first style - the implementation of the types differs, they just provide the same public API.)

The provider turns a node into a simply typed property only when the node contains just a primitive value and has no children or attributes.

Types for more complex structure

Now let's look at a number of examples that have more interesting structure. First of all, what if a node contains some value, but also has some attributes?

type Detailed = XmlProvider<"""<author><name full="true">Karl Popper</name></author>""">

let info =
    Detailed.Parse("""<author><name full="false">Thomas Kuhn</name></author>""")

printfn "%s (full=%b)" info.Name.Value info.Name.Full
Thomas Kuhn (full=false)
type Detailed = XmlProvider<...>
val info: XmlProvider<...>.Author =
  <author>
  <name full="false">Thomas Kuhn</name>
</author>
val it: unit = ()

If the node cannot be represented as a simple type (like string) then the provider builds a new type with multiple properties. Here, it generates a property Full (based on the name of the attribute) and infers its type to be boolean. Then it adds a property with a (special) name Value that returns the content of the element.

Types for multiple simple elements

Another interesting case is when there are multiple nodes that contain just a primitive value. The following example shows what happens when the root node contains multiple <value> nodes (note that if we leave out the parameter to the Parse method, the same text used for the schema will be used as the runtime value).

type Test = XmlProvider<"<root><value>1</value><value>3</value></root>">

for v in Test.GetSample().Values do
    printfn "%d" v

The type provider generates a property Values that returns an array with the values - as the <value> nodes do not contain any attributes or children, they are turned into int values and so the Values property returns just int[]!

Type inference hints / inline schemas

Starting with version 4.2.10 of this package, it's possible to enable basic type annotations directly in the sample used by the provider, to complete or to override type inference. (Only basic types are supported. See the reference documentation of the provider for the full list)

This feature is disabled by default and has to be explicitly enabled with the InferenceMode static parameter.

Let's consider an example where this can be useful:

type AmbiguousEntity =
    XmlProvider<Sample="""
        <Entity Code="000" Length="0"/>
        <Entity Code="123" Length="42"/>
        <Entity Code="4E5" Length="1.83"/>
        """, SampleIsList=true>

let code = (AmbiguousEntity.GetSamples()[1]).Code
let length = (AmbiguousEntity.GetSamples()[1]).Length
type AmbiguousEntity = XmlProvider<...>
val code: float = 123.0
val length: decimal = 42M

In the previous example, Code is inferred as a float, even though it looks more like it should be a string. (4E5 is interpreted as an exponential float notation instead of a string)

Now let's enable inline schemas:

open FSharp.Data.Runtime.StructuralInference

type AmbiguousEntity2 =
    XmlProvider<Sample="""
        <Entity Code="typeof{string}" Length="typeof{float{metre}}"/>
        <Entity Code="123" Length="42"/>
        <Entity Code="4E5" Length="1.83"/>
        """, SampleIsList=true, InferenceMode=InferenceMode.ValuesAndInlineSchemasOverrides>

let code2 = (AmbiguousEntity2.GetSamples()[1]).Code
let length2 = (AmbiguousEntity2.GetSamples()[1]).Length
type AmbiguousEntity2 = XmlProvider<...>
val code2: string = "123"
val length2: float<UnitSystems.SI.UnitNames.metre> = 42.0

With the ValuesAndInlineSchemasOverrides inference mode, the typeof{string} inline schema takes priority over the type inferred from other values. Code is now a string, as we wanted it to be!

Note that an alternative to obtain the same result would have been to replace all the Code values in the samples with unambiguous string values. (But this can be very cumbersome, especially with big samples)

If we had used the ValuesAndInlineSchemasHints inference mode instead, our inline schema would have had the same precedence as the types inferred from other values, and Code would have been inferred as a choice between either a number or a string, exactly as if we had added another sample with an unambiguous string value for Code.

Units of measure

Inline schemas also enable support for units of measure.

In the previous example, the Length property is now inferred as a float with the metre unit of measure (from the default SI units).

Warning: units of measures are discarded when merged with types without a unit or with a different unit. As mentioned previously, with the ValuesAndInlineSchemasHints inference mode, inline schemas types are merged with other inferred types with the same precedence. Since values-inferred types never have units, inline-schemas-inferred types will lose their unit if the sample contains other values...

Processing philosophers

In this section we look at an example that demonstrates how the type provider works on a simple document that lists authors that write about a specific topic. The sample document data/Writers.xml looks as follows:

<authors topic="Philosophy of Science">
  <author name="Paul Feyerabend" born="1924" />
  <author name="Thomas Kuhn" />
</authors>

At runtime, we use the generated type provider to parse the following string (which has the same structure as the sample document with the exception that one of the author nodes also contains a died attribute):

let authors =
    """
  <authors topic="Philosophy of Mathematics">
    <author name="Bertrand Russell" />
    <author name="Ludwig Wittgenstein" born="1889" />
    <author name="Alfred North Whitehead" died="1947" />
  </authors> """

When initializing the XmlProvider, we can pass it a file name or a web URL. The Load and AsyncLoad methods allows reading the data from a file or from a web resource. The Parse method takes the data as a string, so we can now print the information as follows:

[<Literal>]
let ResolutionFolder = __SOURCE_DIRECTORY__

type Authors = XmlProvider<"../data/Writers.xml", ResolutionFolder=ResolutionFolder>
let topic = Authors.Parse(authors)

printfn "%s" topic.Topic

for author in topic.Authors do
    printf " - %s" author.Name
    author.Born |> Option.iter (printf " (%d)")
    printfn ""
Philosophy of Mathematics
 - Bertrand Russell
 - Ludwig Wittgenstein (1889)
 - Alfred North Whitehead
[<Literal>]
val ResolutionFolder: string = "D:\a\FSharp.Data\FSharp.Data\docs\library"
type Authors = XmlProvider<...>
val topic: XmlProvider<...>.Authors =
  <authors topic="Philosophy of Mathematics">
    <author name="Bertrand Russell" />
    <author name="Ludwig Wittgenstein" born="1889" />
    <author name="Alfred North Whitehead" died="1947" />
  </authors>
val it: unit = ()

The value topic has a property Topic (of type string) which returns the value of the attribute with the same name. It also has a property Authors that returns an array with all the authors. The Born property is missing for some authors, so it becomes option<int> and we need to print it using Option.iter.

The died attribute was not present in the sample used for the inference, so we cannot obtain it in a statically typed way (although it can still be obtained dynamically using author.XElement.Attribute(XName.Get("died"))).

Global inference mode

In the examples shown earlier, an element was never (recursively) contained in an element of the same name (for example <author> never contained another <author>). However, when we work with documents such as XHTML files, this can often be the case. Consider for example, the following sample (a simplified version of data/HtmlBody.xml):

<div id="root">
  <span>Main text</span>
  <div id="first">
    <div>Second text</div>
  </div>
</div>

Here, a <div> element can contain other <div> elements and it is quite clear that they should all have the same type - we want to be able to write a recursive function that processes <div> elements. To make this possible, you need to set an optional parameter Global to true:

type Html = XmlProvider<"../data/HtmlBody.xml", Global=true, ResolutionFolder=ResolutionFolder>
let html = Html.GetSample()

When the Global parameter is true, the type provider unifies all elements of the same name. This means that all <div> elements have the same type (with a union of all attributes and all possible children nodes that appear in the sample document).

The type is located under a type Html, so we can write a printDiv function that takes Html.Div and acts as follows:

/// Prints the content of a <div> element
let rec printDiv (div: Html.Div) =
    div.Spans |> Seq.iter (printfn "%s")
    div.Divs |> Seq.iter printDiv

    if div.Spans.Length = 0 && div.Divs.Length = 0 then
        div.Value |> Option.iter (printfn "%s")

// Print the root <div> element with all children
printDiv html
Main text
First text
Another text
Second text
val printDiv: div: XmlProvider<...>.Div -> unit
val it: unit = ()

The function first prints all text included as <span> (the element never has any attributes in our sample, so it is inferred as string), then it recursively prints the content of all <div> elements. If the element does not contain nested elements, then we print the Value (inner text).

Loading Directly from a File or URL

In many cases we might want to define schema using a local sample file, but then directly load the data from disk or from a URL either synchronously (with Load) or asynchronously (with AsyncLoad).

For this example I am using the US Census data set from https://api.census.gov/data.xml, a sample of which I have used here for ../data/Census.xml. This sample is greatly reduced from the live data, so that it contains only the elements and attributes relevant to us:

<census-api
    xmlns="http://thedataweb.rm.census.gov/api/discovery/"
    xmlns:dcat="http://www.w3.org/ns/dcat#"
    xmlns:dct="http://purl.org/dc/terms/">
    <dct:dataset>
        <dct:title>2006-2010 American Community Survey 5-Year Estimates</dct:title>
        <dcat:distribution
            dcat:accessURL="https://api.census.gov/data/2010/acs5">
        </dcat:distribution>
    </dct:dataset>
    <dct:dataset>
        <dct:title>2006-2010 American Community Survey 5-Year Estimates</dct:title>
        <dcat:distribution
            dcat:accessURL="https://api.census.gov/data/2010/acs5">
        </dcat:distribution>
    </dct:dataset>
</census-api>

When doing this for your scenario, be careful to ensure that enough data is given for the provider to infer the schema correctly. For example, the first level <dct:dataset> element must be included at least twice for the provider to infer the Datasets array rather than a single Dataset object.

type Census = XmlProvider<"../data/Census.xml", ResolutionFolder=ResolutionFolder>

let data = Census.Load("https://api.census.gov/data.xml")

let apiLinks =
    data.Datasets
    |> Array.map (fun ds -> ds.Title, ds.Distribution.AccessUrl)
    |> Array.truncate 10
type Census = XmlProvider<...>
val data: XmlProvider<...>.CensusApi =
  <census-api xmlns="http://thedataweb.rm.census.gov/api/discovery/" xmlns:dcat="http://www.w3.org/ns/dcat#" xmlns:dct="http://purl.org/dc/terms/" xmlns:pod="https://project-open-data.cio.gov/v1.1/schema/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:org="http://www.w3.org/ns/org#" xmlns:vcard="http://www.w3.org/2006/vcard/ns#">
    <dct:dataset vintage="1994" geographyLink="http://api.census.gov/data/1994/cps/basic/jun/geography.xml" variablesLink="http://api.census.gov/data/1994/cps/basic/jun/variables...
val apiLinks: (string * string) array =
  [|("Jun 1994 Current Population Survey: Basic Monthly",
     "http://api.census.gov/data/1994/cps/basic/jun");
    ("1986 County Business Patterns: Business Patterns",
     "http://api.census.gov/data/1986/cbp");
    ("1994 County Business Patterns - Zip Code Business Patterns: T"+[17 chars],
     "http://api.census.gov/data/1994/zbp");
    ("1987 County Business Patterns: Business Patterns",
     "http://api.census.gov/data/1987/cbp");
    ("1995 County Business Patterns: Business Patterns",
     "http://api.census.gov/data/1995/cbp");
    ("1988 County Business Patterns: Business Patterns",
     "http://api.census.gov/data/1988/cbp");
    ("1989 County Business Patterns: Business Patterns",
     "http://api.census.gov/data/1989/cbp");
    ("1995 County Business Patterns - Zip Code Business Patterns: T"+[17 chars],
     "http://api.census.gov/data/1995/zbp");
    ("Mar 1994 Current Population Survey: Basic Monthly",
     "http://api.census.gov/data/1994/cps/basic/mar");
    ("1990 County Business Patterns: Business Patterns",
     "http://api.census.gov/data/1990/cbp")|]

This US Census data is an interesting dataset with this top level API returning hundreds of other datasets each with their own API. Here we use the Census data to get a list of titles and URLs for the lower level APIs.

Bringing in Some Async Action

Let's go one step further and assume here a slightly contrived but certainly plausible example where we cache the Census URLs and refresh once in a while. Perhaps we want to load this in the background and then post each link over (for example) a message queue.

This is where AsyncLoad comes into play:

let enqueue (title, apiUrl) =
    // do the real message enqueueing here instead of
    printfn "%s -> %s" title apiUrl

// helper task which gets scheduled on some background thread somewhere...
let cacheJanitor () =
    async {
        let! reloadData = Census.AsyncLoad("https://api.census.gov/data.xml")

        reloadData.Datasets
        |> Array.map (fun ds -> ds.Title, ds.Distribution.AccessUrl)
        |> Array.iter enqueue
    }
val enqueue: title: string * apiUrl: string -> unit
val cacheJanitor: unit -> Async<unit>

Reading RSS feeds

To conclude this introduction with a more interesting example, let's look how to parse an RSS feed. As discussed earlier, we can use relative paths or web addresses when calling the type provider:

type Rss = XmlProvider<"https://tomasp.net/rss.xml">

This code builds a type Rss that represents RSS feeds (with the features that are used on https://tomasp.net). The type Rss provides static methods Parse, Load and AsyncLoad to construct it - here, we just want to reuse the same URI of the schema, so we use the GetSample static method:

let blog = Rss.GetSample()

Printing the title of the RSS feed together with a list of recent posts is now quite easy - you can simply type blog followed by . and see what the autocompletion offers. The code looks like this:

// Title is a property returning string
printfn "%s" blog.Channel.Title

// Get all item nodes and print title with link
for item in blog.Channel.Items do
    printfn " - %s (%s)" item.Title item.Link
Tomas Petricek - Languages and tools, open-source, philosophy of science and F# coding
 - What can routers at Centre Pompidou teach us about software evolution? (http://tomasp.net/blog/2023/pompidou/)
 - Where programs live? Vague spaces and software systems (http://tomasp.net/blog/2023/vague-spaces/)
 - The Timeless Way of Programming (http://tomasp.net/blog/2022/timeless-way/)
 - No-code, no thought? Substrates for simple programming for all (http://tomasp.net/blog/2022/no-code-substrates/)
 - Pop-up from Hell: On the growing opacity of web programs (http://tomasp.net/blog/2021/popup-from-hell/)
 - Software designers, not engineers: An interview from alternative universe (http://tomasp.net/blog/2021/software-designers/)
 - Is deep learning a new kind of programming? Operationalistic look at programming (http://tomasp.net/blog/2020/learning-and-programming/)
 - Creating interactive You Draw bar chart with Compost (http://tomasp.net/blog/2020/youdraw-compost-visualization/)
 - Data exploration calculus: Capturing the essence of exploratory data scripting (http://tomasp.net/blog/2020/data-exploration-calculus/)
 - On architecture, urban planning and software construction (http://tomasp.net/blog/2020/cities-and-programming/)
 - What to teach as the first programming language and why (http://tomasp.net/blog/2019/first-language/)
 - What should a Software Engineering course look like? (http://tomasp.net/blog/2019/software-engineering/)
 - Write your own Excel in 100 lines of F# (http://tomasp.net/blog/2018/write-your-own-excel/)
 - Programming as interaction: A new perspective for programming language research (http://tomasp.net/blog/2018/programming-interaction/)
 - Would aliens understand lambda calculus? (http://tomasp.net/blog/2018/alien-lambda-calculus/)
 - The design side of programming language design (http://tomasp.net/blog/2017/design-side-of-pl/)
 - Getting started with The Gamma just got easier (http://tomasp.net/blog/2017/thegamma-getting-started/)
 - Papers we Scrutinize: How to critically read papers (http://tomasp.net/blog/2017/papers-we-scrutinize/)
 - The mythology of programming language ideas (http://tomasp.net/blog/2017/programming-mythology/)
 - Towards open and transparent data-driven storytelling: Notes from my Alan Turing Institute talk (http://tomasp.net/blog/2017/thegamma-talk/)
val it: unit = ()

Transforming XML

In this example we will now also create XML in addition to consuming it. Consider the problem of flattening a data set. Let's say you have xml data that looks like this:

[<Literal>]
let customersXmlSample =
    """
  <Customers>
    <Customer name="ACME">
      <Order Number="A012345">
        <OrderLine Item="widget" Quantity="1"/>
      </Order>
      <Order Number="A012346">
        <OrderLine Item="trinket" Quantity="2"/>
      </Order>
    </Customer>
    <Customer name="Southwind">
      <Order Number="A012347">
        <OrderLine Item="skyhook" Quantity="3"/>
        <OrderLine Item="gizmo" Quantity="4"/>
      </Order>
    </Customer>
  </Customers>"""

and you want to transform it into something like this:

[<Literal>]
let orderLinesXmlSample =
    """
  <OrderLines>
    <OrderLine Customer="ACME" Order="A012345" Item="widget" Quantity="1"/>
    <OrderLine Customer="ACME" Order="A012346" Item="trinket" Quantity="2"/>
    <OrderLine Customer="Southwind" Order="A012347" Item="skyhook" Quantity="3"/>
    <OrderLine Customer="Southwind" Order="A012347" Item="gizmo" Quantity="4"/>
  </OrderLines>"""

We'll create types from both the input and output samples and use the constructors on the types generated by the XmlProvider:

type InputXml = XmlProvider<customersXmlSample>
type OutputXml = XmlProvider<orderLinesXmlSample>

let orderLines =
    OutputXml.OrderLines
        [| for customer in InputXml.GetSample().Customers do
               for order in customer.Orders do
                   for line in order.OrderLines do
                       yield OutputXml.OrderLine(customer.Name, order.Number, line.Item, line.Quantity) |]
type InputXml = XmlProvider<...>
type OutputXml = XmlProvider<...>
val orderLines: XmlProvider<...>.OrderLines =
  <OrderLines>
  <OrderLine Customer="ACME" Order="A012345" Item="widget" Quantity="1" />
  <OrderLine Customer="ACME" Order="A012346" Item="trinket" Quantity="2" />
  <OrderLine Customer="Southwind" Order="A012347" Item="skyhook" Quantity="3" />
  <OrderLine Customer="Southwind" Order="A012347" Item="gizmo" Quantity="4" />
</OrderLines>

Using a schema (XSD)

The Schema parameter can be used (instead of Sample) to specify an XML schema. The value of the parameter can be either the name of a schema file or plain text like in the following example:

type Person =
    XmlProvider<Schema="""
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
    elementFormDefault="qualified" attributeFormDefault="unqualified">
    <xs:element name="person">
      <xs:complexType>
        <xs:sequence>
          <xs:element name="surname" type="xs:string"/>
          <xs:element name="birthDate" type="xs:date"/>
        </xs:sequence>
      </xs:complexType>
    </xs:element>
  </xs:schema>""">

let turing =
    Person.Parse
        """
  <person>
    <surname>Turing</surname>
    <birthDate>1912-06-23</birthDate>
  </person>
  """

printfn "%s was born in %d" turing.Surname turing.BirthDate.Year

The properties of the provided type are derived from the schema instead of being inferred from samples.

Usually a schema is not specified as plain text but stored in a file like data/po.xsd and the uri is set in the Schema parameter:

type PurchaseOrder = XmlProvider<Schema="../data/po.xsd">

When the file includes other schema files, the ResolutionFolder parameter can help locating them. The uri may also refer to online resources:

type RssXsd = XmlProvider<Schema="https://www.w3schools.com/xml/note.xsd">

The schema is expected to define a root element (a global element with complex type). In case of multiple root elements:

type TwoRoots =
    XmlProvider<Schema="""
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
    elementFormDefault="qualified" attributeFormDefault="unqualified">
    <xs:element name="root1">
      <xs:complexType>
        <xs:attribute name="foo" type="xs:string" use="required" />
        <xs:attribute name="fow" type="xs:int" />
      </xs:complexType>
    </xs:element>
    <xs:element name="root2">
      <xs:complexType>
        <xs:attribute name="bar" type="xs:string" use="required" />
        <xs:attribute name="baz" type="xs:date" use="required" />
      </xs:complexType>
    </xs:element>
  </xs:schema>
""">

the provided type has an optional property for each alternative:

let e1 = TwoRoots.Parse "<root1 foo='aa' fow='2' />"

match e1.Root1, e1.Root2 with
| Some x, None -> printfn "Foo = %s and Fow = %A" x.Foo x.Fow
| _ -> failwith "Unexpected"

let e2 = TwoRoots.Parse "<root2 bar='aa' baz='2017-12-22' />"

match e2.Root1, e2.Root2 with
| None, Some x -> printfn "Bar = %s and Baz = %O" x.Bar x.Baz
| _ -> failwith "Unexpected"
Foo = aa and Fow = Some 2
Bar = aa and Baz = 12/22/2017 12:00:00 AM
val e1: XmlProvider<...>.Choice = <root1 foo="aa" fow="2" />
val e2: XmlProvider<...>.Choice = <root2 bar="aa" baz="2017-12-22" />
val it: unit = ()

Common XSD constructs: sequence and choice

A sequence is the most common way of structuring elements in a schema. The following xsd defines foo as a sequence made of an arbitrary number of bar elements followed by a single baz element.

type FooSequence =
    XmlProvider<Schema="""
    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
      elementFormDefault="qualified" attributeFormDefault="unqualified">
        <xs:element name="foo">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="bar" type="xs:int" maxOccurs="unbounded" />
              <xs:element name="baz" type="xs:date" minOccurs="1" />
            </xs:sequence>
          </xs:complexType>
        </xs:element>
    </xs:schema>""">

here a valid xml element is parsed as an instance of the provided type, with two properties corresponding to barand baz elements, where the former is an array in order to hold multiple elements:

let fooSequence =
    FooSequence.Parse
        """
<foo>
    <bar>42</bar>
    <bar>43</bar>
    <baz>1957-08-13</baz>
</foo>"""

printfn "%d" fooSequence.Bars.[0] // 42
printfn "%d" fooSequence.Bars.[1] // 43
printfn "%d" fooSequence.Baz.Year // 1957

Instead of a sequence we may have a choice:

type FooChoice =
    XmlProvider<Schema="""
    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
      elementFormDefault="qualified" attributeFormDefault="unqualified">
        <xs:element name="foo">
          <xs:complexType>
            <xs:choice>
              <xs:element name="bar" type="xs:int" maxOccurs="unbounded" />
              <xs:element name="baz" type="xs:date" minOccurs="1" />
            </xs:choice>
          </xs:complexType>
        </xs:element>
    </xs:schema>""">

although a choice is akin to a union type in F#, the provided type still has properties for bar and baz directly available on the foo object; in fact the properties representing alternatives in a choice are simply made optional (notice that for arrays this is not even necessary because an array can be empty). This decision is due to technical limitations (discriminated unions are not supported in type providers) but also preferred because it improves discoverability: intellisense can show both alternatives. There is a lack of precision but this is not the main goal.

let fooChoice =
    FooChoice.Parse
        """
<foo>
  <baz>1957-08-13</baz>
</foo>"""

printfn "%d items" fooChoice.Bars.Length // 0 items

match fooChoice.Baz with
| Some date -> printfn "%d" date.Year // 1957
| None -> ()
0 items
1957
val fooChoice: XmlProvider<...>.Foo = <foo>
  <baz>1957-08-13</baz>
</foo>
val it: unit = ()

Another xsd construct to model the content of an element is all, which is used less often and it's like a sequence where the order of elements does not matter. The corresponding provided type in fact is essentially the same as for a sequence.

Advanced schema constructs

XML Schema provides various extensibility mechanisms. The following example is a terse summary mixing substitution groups with abstract recursive definitions.

type Prop =
    XmlProvider<Schema="""
    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
      elementFormDefault="qualified" attributeFormDefault="unqualified">
        <xs:element name="Formula" abstract="true"/>
        <xs:element name="Prop" type="xs:string" substitutionGroup="Formula"/>
        <xs:element name="And" substitutionGroup="Formula">
          <xs:complexType>
            <xs:sequence>
              <xs:element ref="Formula" minOccurs="2" maxOccurs="2"/>
              </xs:sequence>
          </xs:complexType>
        </xs:element>
    </xs:schema>""">

let formula =
    Prop.Parse
        """
    <And>
        <Prop>p1</Prop>
        <And>
            <Prop>p2</Prop>
            <Prop>p3</Prop>
        </And>
    </And>
    """

printfn "%s" formula.Props.[0] // p1
printfn "%s" formula.Ands.[0].Props.[0] // p2
printfn "%s" formula.Ands.[0].Props.[1] // p3
p1
p2
p3
type Prop = XmlProvider<...>
val formula: XmlProvider<...>.And =
  <And>
        <Prop>p1</Prop>
        <And>
            <Prop>p2</Prop>
            <Prop>p3</Prop>
        </And>
    </And>
val it: unit = ()

Substitution groups are like choices, and the type provider produces an optional property for each alternative.

Validation

The GetSchema method on the generated type returns an instance of System.Xml.Schema.XmlSchemaSet that can be used to validate documents:

open System.Xml.Schema
let schema = Person.GetSchema()
turing.XElement.Document.Validate(schema, validationEventHandler = null)

The Validate method accepts a callback to handle validation issues; passing null will turn validation errors into exceptions. There are overloads to allow other effects (for example setting default values by enabling the population of the XML tree with the post-schema-validation infoset; for details see the documentation).

Remarks on using a schema

The XML Type Provider supports most XSD features. Anyway the XML Schema specification is rich and complex and also provides a fair degree of openness which may be difficult to handle in data binding tools; but in FSharp.Data, when providing typed views on elements becomes too challenging (take for example wildcards) the underlying XElement is still available.

An important design decision is to focus on elements and not on complex types; while the latter may be valuable in schema design, our goal is simply to obtain an easy and safe way to access xml data. In other words the provided types are not intended for domain modeling (it's one of the very few cases where optional properties are preferred to sum types). Hence, we do not provide types corresponding to complex types in a schema but only corresponding to elements (of course the underlying complex types still affect the shape of the provided types but this happens only implicitly). Focusing on element shapes let us generate a type that should be essentially the same as one inferred from a significant set of valid samples. This allows a smooth transition (replacing Sample with Schema) when a schema becomes available.

Note that inline schemas (values of the form typeof{...}) are not supported inside XSD documents.

Related articles

Multiple items
namespace FSharp

--------------------
namespace Microsoft.FSharp
Multiple items
namespace FSharp.Data

--------------------
namespace Microsoft.FSharp.Data
type Author = XmlProvider<...>
type XmlProvider
<summary>Typed representation of a XML file.</summary> <param name='Sample'>Location of a XML sample file or a string containing a sample XML document.</param> <param name='SampleIsList'>If true, the children of the root in the sample document represent individual samples for the inference.</param> <param name='Global'>If true, the inference unifies all XML elements with the same name.</param> <param name='Culture'>The culture used for parsing numbers and dates. Defaults to the invariant culture.</param> <param name='Encoding'>The encoding used to read the sample. You can specify either the character set name or the codepage number. Defaults to UTF8 for files, and to ISO-8859-1 the for HTTP requests, unless <c>charset</c> is specified in the <c>Content-Type</c> response header.</param> <param name='ResolutionFolder'>A directory that is used when resolving relative file references (at design time and in hosted execution).</param> <param name='EmbeddedResource'>When specified, the type provider first attempts to load the sample from the specified resource (e.g. 'MyCompany.MyAssembly, resource_name.xml'). This is useful when exposing types generated by the type provider.</param> <param name='InferTypesFromValues'> This parameter is deprecated. Please use InferenceMode instead. If true, turns on additional type inference from values. (e.g. type inference infers string values such as "123" as ints and values constrained to 0 and 1 as booleans. The XmlProvider also infers string values as JSON.)</param> <param name='Schema'>Location of a schema file or a string containing xsd.</param> <param name='InferenceMode'>Possible values: | NoInference -> Inference is disabled. All values are inferred as the most basic type permitted for the value (usually string). | ValuesOnly -> Types of values are inferred from the Sample. Inline schema support is disabled. This is the default. | ValuesAndInlineSchemasHints -> Types of values are inferred from both values and inline schemas. Inline schemas are special string values that can define a type and/or unit of measure. Supported syntax: typeof&lt;type&gt; or typeof{type} or typeof&lt;type&lt;measure&gt;&gt; or typeof{type{measure}}. Valid measures are the default SI units, and valid types are <c>int</c>, <c>int64</c>, <c>bool</c>, <c>float</c>, <c>decimal</c>, <c>date</c>, <c>datetimeoffset</c>, <c>timespan</c>, <c>guid</c> and <c>string</c>. | ValuesAndInlineSchemasOverrides -> Same as ValuesAndInlineSchemasHints, but value inferred types are ignored when an inline schema is present. Note inline schemas are not used from Xsd documents. </param>
val sample: XmlProvider<...>.Author
XmlProvider<...>.Parse(text: string) : XmlProvider<...>.Author
Parses the specified XML string
val printfn: format: Printf.TextWriterFormat<'T> -> 'T
property XmlProvider<...>.Author.Name: string with get
property XmlProvider<...>.Author.Born: int with get
type AuthorAlt = XmlProvider<...>
val doc: string
val sampleAlt: XmlProvider<...>.Author
type Detailed = XmlProvider<...>
val info: XmlProvider<...>.Author
property XmlProvider<...>.Author.Name: XmlProvider<...>.Name with get
property XmlProvider<...>.Name.Value: string with get
property XmlProvider<...>.Name.Full: bool with get
type Test = XmlProvider<...>
val v: int
XmlProvider<...>.GetSample() : XmlProvider<...>.Root
type AmbiguousEntity = XmlProvider<...>
val code: float
XmlProvider<...>.GetSamples() : XmlProvider<...>.Entity array
val length: decimal
namespace FSharp.Data.Runtime
type AmbiguousEntity2 = XmlProvider<...>
[<Struct>] type InferenceMode = | BackwardCompatible = 0 | NoInference = 1 | ValuesOnly = 2 | ValuesAndInlineSchemasHints = 3 | ValuesAndInlineSchemasOverrides = 4
<summary> This is the public inference mode enum used when initializing a type provider, with backward compatibility. </summary>
InferenceMode.ValuesAndInlineSchemasOverrides: InferenceMode = 4
<summary> Inline schemas types override value infered types. (Value infered types are ignored if an inline schema is present) </summary>
val code2: string
val length2: float<UnitSystems.SI.UnitNames.metre>
val authors: string
Multiple items
type LiteralAttribute = inherit Attribute new: unit -> LiteralAttribute

--------------------
new: unit -> LiteralAttribute
[<Literal>] val ResolutionFolder: string = "D:\a\FSharp.Data\FSharp.Data\docs\library"
type Authors = XmlProvider<...>
val topic: XmlProvider<...>.Authors
XmlProvider<...>.Parse(text: string) : XmlProvider<...>.Authors
Parses the specified XML string
property XmlProvider<...>.Authors.Topic: string with get
val author: XmlProvider<...>.Author
property XmlProvider<...>.Authors.Authors: XmlProvider<...>.Author array with get
val printf: format: Printf.TextWriterFormat<'T> -> 'T
property XmlProvider<...>.Author.Born: Option<int> with get
module Option from Microsoft.FSharp.Core
val iter: action: ('T -> unit) -> option: 'T option -> unit
type Html = XmlProvider<...>
val html: XmlProvider<...>.Div
XmlProvider<...>.GetSample() : XmlProvider<...>.Div
val printDiv: div: XmlProvider<...>.Div -> unit
 Prints the content of a <div> element
val div: XmlProvider<...>.Div
type Div = inherit XmlElement new: id: Option<string> * value: Option<string> * spans: string array * divs: Div array -> Div + 1 overload member Divs: Div array member Id: Option<string> member Spans: string array member Value: Option<string>
property XmlProvider<...>.Div.Spans: string array with get
module Seq from Microsoft.FSharp.Collections
val iter: action: ('T -> unit) -> source: 'T seq -> unit
property XmlProvider<...>.Div.Divs: XmlProvider<...>.Div array with get
property System.Array.Length: int with get
<summary>Gets the total number of elements in all the dimensions of the <see cref="T:System.Array" />.</summary>
<exception cref="T:System.OverflowException">The array is multidimensional and contains more than <see cref="F:System.Int32.MaxValue" /> elements.</exception>
<returns>The total number of elements in all the dimensions of the <see cref="T:System.Array" />; zero if there are no elements in the array.</returns>
property XmlProvider<...>.Div.Value: Option<string> with get
type Census = XmlProvider<...>
val data: XmlProvider<...>.CensusApi
XmlProvider<...>.Load(uri: string) : XmlProvider<...>.CensusApi
Loads XML from the specified uri
XmlProvider<...>.Load(reader: System.IO.TextReader) : XmlProvider<...>.CensusApi
Loads XML from the specified reader
XmlProvider<...>.Load(stream: System.IO.Stream) : XmlProvider<...>.CensusApi
Loads XML from the specified stream
val apiLinks: (string * string) array
property XmlProvider<...>.CensusApi.Datasets: XmlProvider<...>.Dataset array with get
module Array from Microsoft.FSharp.Collections
val map: mapping: ('T -> 'U) -> array: 'T array -> 'U array
val ds: XmlProvider<...>.Dataset
property XmlProvider<...>.Dataset.Title: string with get
property XmlProvider<...>.Dataset.Distribution: XmlProvider<...>.Distribution with get
property XmlProvider<...>.Distribution.AccessUrl: string with get
val truncate: count: int -> array: 'T array -> 'T array
val enqueue: title: string * apiUrl: string -> unit
val title: string
val apiUrl: string
val cacheJanitor: unit -> Async<unit>
val async: AsyncBuilder
val reloadData: XmlProvider<...>.CensusApi
XmlProvider<...>.AsyncLoad(uri: string) : Async<XmlProvider<...>.CensusApi>
Loads XML from the specified uri
val iter: action: ('T -> unit) -> array: 'T array -> unit
type Rss = XmlProvider<...>
val blog: XmlProvider<...>.Rss
XmlProvider<...>.GetSample() : XmlProvider<...>.Rss
property XmlProvider<...>.Rss.Channel: XmlProvider<...>.Channel with get
property XmlProvider<...>.Channel.Title: string with get
val item: XmlProvider<...>.Item
property XmlProvider<...>.Channel.Items: XmlProvider<...>.Item array with get
property XmlProvider<...>.Item.Title: string with get
property XmlProvider<...>.Item.Link: string with get
[<Literal>] val customersXmlSample: string = " <Customers> <Customer name="ACME"> <Order Number="A012345"> <OrderLine Item="widget" Quantity="1"/> </Order> <Order Number="A012346"> <OrderLine Item="trinket" Quantity="2"/> </Order> </Customer> <Customer name="Southwind"> <Order Number="A012347"> <OrderLine Item="skyhook" Quantity="3"/> <OrderLine Item="gizmo" Quantity="4"/> </Order> </Customer> </Customers>"
[<Literal>] val orderLinesXmlSample: string = " <OrderLines> <OrderLine Customer="ACME" Order="A012345" Item="widget" Quantity="1"/> <OrderLine Customer="ACME" Order="A012346" Item="trinket" Quantity="2"/> <OrderLine Customer="Southwind" Order="A012347" Item="skyhook" Quantity="3"/> <OrderLine Customer="Southwind" Order="A012347" Item="gizmo" Quantity="4"/> </OrderLines>"
type InputXml = XmlProvider<...>
type OutputXml = XmlProvider<...>
val orderLines: XmlProvider<...>.OrderLines
type OrderLines = inherit XmlElement new: orderLines: OrderLine array -> OrderLines + 1 overload member OrderLines: OrderLine array
val customer: XmlProvider<...>.Customer
XmlProvider<...>.GetSample() : XmlProvider<...>.Customers
val order: XmlProvider<...>.Order
property XmlProvider<...>.Customer.Orders: XmlProvider<...>.Order array with get
val line: XmlProvider<...>.OrderLine
property XmlProvider<...>.Order.OrderLines: XmlProvider<...>.OrderLine array with get
type OrderLine = inherit XmlElement new: customer: string * order: string * item: string * quantity: int -> OrderLine + 1 overload member Customer: string member Item: string member Order: string member Quantity: int
property XmlProvider<...>.Customer.Name: string with get
property XmlProvider<...>.Order.Number: string with get
property XmlProvider<...>.OrderLine.Item: string with get
property XmlProvider<...>.OrderLine.Quantity: int with get
type Person = XmlProvider<...>
val turing: XmlProvider<...>.Person
XmlProvider<...>.Parse(text: string) : XmlProvider<...>.Person
Parses the specified XSD string
property XmlProvider<...>.Person.Surname: string with get
property XmlProvider<...>.Person.BirthDate: System.DateTime with get
property System.DateTime.Year: int with get
<summary>Gets the year component of the date represented by this instance.</summary>
<returns>The year, between 1 and 9999.</returns>
type PurchaseOrder = XmlProvider<...>
type RssXsd = XmlProvider<...>
type TwoRoots = XmlProvider<...>
val e1: XmlProvider<...>.Choice
XmlProvider<...>.Parse(text: string) : XmlProvider<...>.Choice
Parses the specified XSD string
property XmlProvider<...>.Choice.Root1: Option<XmlProvider<...>.Root1> with get
property XmlProvider<...>.Choice.Root2: Option<XmlProvider<...>.Root2> with get
union case Option.Some: Value: 'T -> Option<'T>
val x: XmlProvider<...>.Root1
union case Option.None: Option<'T>
property XmlProvider<...>.Root1.Foo: string with get
property XmlProvider<...>.Root1.Fow: Option<int> with get
val failwith: message: string -> 'T
val e2: XmlProvider<...>.Choice
val x: XmlProvider<...>.Root2
property XmlProvider<...>.Root2.Bar: string with get
property XmlProvider<...>.Root2.Baz: System.DateTime with get
type FooSequence = XmlProvider<...>
val fooSequence: XmlProvider<...>.Foo
XmlProvider<...>.Parse(text: string) : XmlProvider<...>.Foo
Parses the specified XSD string
property XmlProvider<...>.Foo.Bars: int array with get
property XmlProvider<...>.Foo.Baz: System.DateTime with get
type FooChoice = XmlProvider<...>
val fooChoice: XmlProvider<...>.Foo
property XmlProvider<...>.Foo.Baz: Option<System.DateTime> with get
val date: System.DateTime
type Prop = XmlProvider<...>
val formula: XmlProvider<...>.And
XmlProvider<...>.Parse(text: string) : XmlProvider<...>.And
Parses the specified XSD string
property XmlProvider<...>.And.Props: string array with get
property XmlProvider<...>.And.Ands: XmlProvider<...>.And array with get
namespace System
namespace System.Xml
namespace System.Xml.Schema
val schema: XmlSchemaSet
XmlProvider<...>.GetSchema() : XmlSchemaSet
Runtime.BaseTypes.XmlElement.XElement: System.Xml.Linq.XElement
(extension) System.Xml.Linq.XDocument.Validate(schemas: XmlSchemaSet, validationEventHandler: ValidationEventHandler) : unit
(extension) System.Xml.Linq.XDocument.Validate(schemas: XmlSchemaSet, validationEventHandler: ValidationEventHandler, addSchemaInfo: bool) : unit

Type something to start searching.