` elements. If the element does not contain nested elements,
then we print the `Value` (inner text).
## Loading Directly from a File or URL
In many cases we might want to define schema using a local sample file, but then directly
load the data from disk or from a URL either synchronously (with `Load`) or asynchronously
(with `AsyncLoad`).
For this example I am using the US Census data set from `https://api.census.gov/data.xml`, a sample of
which I have used here for `../data/Census.xml`. This sample is greatly reduced from the live data, so
that it contains only the elements and attributes relevant to us:
[lang=xml]
2006-2010 American Community Survey 5-Year Estimates
2006-2010 American Community Survey 5-Year Estimates
When doing this for your scenario, be careful to ensure that enough data is given for the provider
to infer the schema correctly. For example, the first level `
` element must be included at
least twice for the provider to infer the `Datasets` array rather than a single `Dataset` object.
*)
type Census = XmlProvider<"../data/Census.xml", ResolutionFolder=ResolutionFolder>
let data = Census.Load("https://api.census.gov/data.xml")
let apiLinks =
data.Datasets
|> Array.map (fun ds -> ds.Title, ds.Distribution.AccessUrl)
|> Array.truncate 10(* output:
type Census = XmlProvider<...>
val data: XmlProvider<...>.CensusApi =
%s" title apiUrl
// helper task which gets scheduled on some background thread somewhere...
let cacheJanitor () =
async {
let! reloadData = Census.AsyncLoad("https://api.census.gov/data.xml")
reloadData.Datasets
|> Array.map (fun ds -> ds.Title, ds.Distribution.AccessUrl)
|> Array.iter enqueue
}(* output:
val enqueue: title: string * apiUrl: string -> unit
val cacheJanitor: unit -> Async*)
(**
## Reading RSS feeds
To conclude this introduction with a more interesting example, let's look how to parse an
RSS feed. As discussed earlier, we can use relative paths or web addresses when calling
the type provider:
*)
type Rss = XmlProvider<"https://tomasp.net/rss.xml">
(**
This code builds a type `Rss` that represents RSS feeds (with the features that are used
on `https://tomasp.net`). The type `Rss` provides static methods `Parse`, `Load` and `AsyncLoad`
to construct it - here, we just want to reuse the same URI of the schema, so we
use the `GetSample` static method:
*)
let blog = Rss.GetSample()
(**
Printing the title of the RSS feed together with a list of recent posts is now quite
easy - you can simply type `blog` followed by `.` and see what the autocompletion
offers. The code looks like this:
*)
// Title is a property returning string
printfn "%s" blog.Channel.Title
// Get all item nodes and print title with link
for item in blog.Channel.Items do
printfn " - %s (%s)" item.Title item.Link(* output:
Tomas Petricek - Languages and tools, open-source, philosophy of science and F# coding
- What can routers at Centre Pompidou teach us about software evolution? (http://tomasp.net/blog/2023/pompidou/)
- Where programs live? Vague spaces and software systems (http://tomasp.net/blog/2023/vague-spaces/)
- The Timeless Way of Programming (http://tomasp.net/blog/2022/timeless-way/)
- No-code, no thought? Substrates for simple programming for all (http://tomasp.net/blog/2022/no-code-substrates/)
- Pop-up from Hell: On the growing opacity of web programs (http://tomasp.net/blog/2021/popup-from-hell/)
- Software designers, not engineers: An interview from alternative universe (http://tomasp.net/blog/2021/software-designers/)
- Is deep learning a new kind of programming? Operationalistic look at programming (http://tomasp.net/blog/2020/learning-and-programming/)
- Creating interactive You Draw bar chart with Compost (http://tomasp.net/blog/2020/youdraw-compost-visualization/)
- Data exploration calculus: Capturing the essence of exploratory data scripting (http://tomasp.net/blog/2020/data-exploration-calculus/)
- On architecture, urban planning and software construction (http://tomasp.net/blog/2020/cities-and-programming/)
- What to teach as the first programming language and why (http://tomasp.net/blog/2019/first-language/)
- What should a Software Engineering course look like? (http://tomasp.net/blog/2019/software-engineering/)
- Write your own Excel in 100 lines of F# (http://tomasp.net/blog/2018/write-your-own-excel/)
- Programming as interaction: A new perspective for programming language research (http://tomasp.net/blog/2018/programming-interaction/)
- Would aliens understand lambda calculus? (http://tomasp.net/blog/2018/alien-lambda-calculus/)
- The design side of programming language design (http://tomasp.net/blog/2017/design-side-of-pl/)
- Getting started with The Gamma just got easier (http://tomasp.net/blog/2017/thegamma-getting-started/)
- Papers we Scrutinize: How to critically read papers (http://tomasp.net/blog/2017/papers-we-scrutinize/)
- The mythology of programming language ideas (http://tomasp.net/blog/2017/programming-mythology/)
- Towards open and transparent data-driven storytelling: Notes from my Alan Turing Institute talk (http://tomasp.net/blog/2017/thegamma-talk/)
val it: unit = ()*)
(**
## Transforming XML
In this example we will now also create XML in addition to consuming it.
Consider the problem of flattening a data set. Let's say you have xml data that looks like this:
*)
[]
let customersXmlSample =
"""
"""
(**
and you want to transform it into something like this:
*)
[]
let orderLinesXmlSample =
"""
"""
(**
We'll create types from both the input and output samples and use the constructors on the types generated by the XmlProvider:
*)
type InputXml = XmlProvider
type OutputXml = XmlProvider
let orderLines =
OutputXml.OrderLines
[| for customer in InputXml.GetSample().Customers do
for order in customer.Orders do
for line in order.OrderLines do
yield OutputXml.OrderLine(customer.Name, order.Number, line.Item, line.Quantity) |](* output:
type InputXml = XmlProvider<...>
type OutputXml = XmlProvider<...>
val orderLines: XmlProvider<...>.OrderLines =
*)
(**
## Using a schema (XSD)
The `Schema` parameter can be used (instead of `Sample`) to specify an XML schema.
The value of the parameter can be either the name of a schema file or plain text
like in the following example:
*)
type Person =
XmlProvider
""">
let turing =
Person.Parse
"""
Turing
1912-06-23
"""
printfn "%s was born in %d" turing.Surname turing.BirthDate.Year
(**
The properties of the provided type are derived from the schema instead of being inferred from samples.
Usually a schema is not specified as plain text but stored in a file like
[`data/po.xsd`](../data/po.xsd) and the uri is set in the `Schema` parameter:
*)
type PurchaseOrder = XmlProvider
(**
When the file includes other schema files, the `ResolutionFolder` parameter can help locating them.
The uri may also refer to online resources:
*)
type RssXsd = XmlProvider
(**
The schema is expected to define a root element (a global element with complex type).
In case of multiple root elements:
*)
type TwoRoots =
XmlProvider
""">
(**
the provided type has an optional property for each alternative:
*)
let e1 = TwoRoots.Parse ""
match e1.Root1, e1.Root2 with
| Some x, None -> printfn "Foo = %s and Fow = %A" x.Foo x.Fow
| _ -> failwith "Unexpected"
let e2 = TwoRoots.Parse ""
match e2.Root1, e2.Root2 with
| None, Some x -> printfn "Bar = %s and Baz = %O" x.Bar x.Baz
| _ -> failwith "Unexpected"(* output:
Foo = aa and Fow = Some 2
Bar = aa and Baz = 12/22/2017 12:00:00 AM
val e1: XmlProvider<...>.Choice =
val e2: XmlProvider<...>.Choice =
val it: unit = ()*)
(**
### Common XSD constructs: sequence and choice
A `sequence` is the most common way of structuring elements in a schema.
The following xsd defines `foo` as a sequence made of an arbitrary number
of `bar` elements followed by a single `baz` element.
*)
type FooSequence =
XmlProvider
""">
(**
here a valid xml element is parsed as an instance of the provided type, with two properties corresponding to `bar`and `baz` elements, where the former is an array in order to hold multiple elements:
*)
let fooSequence =
FooSequence.Parse
"""
42
43
1957-08-13
"""
printfn "%d" fooSequence.Bars.[0] // 42
printfn "%d" fooSequence.Bars.[1] // 43
printfn "%d" fooSequence.Baz.Year // 1957
(**
Instead of a sequence we may have a `choice`:
*)
type FooChoice =
XmlProvider
""">
(**
although a choice is akin to a union type in F#, the provided type still has
properties for `bar` and `baz` directly available on the `foo` object; in fact
the properties representing alternatives in a choice are simply made optional
(notice that for arrays this is not even necessary because an array can be empty).
This decision is due to technical limitations (discriminated unions are not supported
in type providers) but also preferred because it improves discoverability:
intellisense can show both alternatives. There is a lack of precision but this is not the main goal.
*)
let fooChoice =
FooChoice.Parse
"""
1957-08-13
"""
printfn "%d items" fooChoice.Bars.Length // 0 items
match fooChoice.Baz with
| Some date -> printfn "%d" date.Year // 1957
| None -> ()(* output:
0 items
1957
val fooChoice: XmlProvider<...>.Foo =
1957-08-13
val it: unit = ()*)
(**
Another xsd construct to model the content of an element is `all`, which is used less often and
it's like a sequence where the order of elements does not matter. The corresponding provided type
in fact is essentially the same as for a sequence.
### Advanced schema constructs
XML Schema provides various extensibility mechanisms. The following example
is a terse summary mixing substitution groups with abstract recursive definitions.
*)
type Prop =
XmlProvider
""">
let formula =
Prop.Parse
"""
p1
p2
p3
"""
printfn "%s" formula.Props.[0] // p1
printfn "%s" formula.Ands.[0].Props.[0] // p2
printfn "%s" formula.Ands.[0].Props.[1] // p3(* output:
p1
p2
p3
type Prop = XmlProvider<...>
val formula: XmlProvider<...>.And =
p1
p2
p3
val it: unit = ()*)
(**
Substitution groups are like choices, and the type provider produces an optional
property for each alternative.
### Validation
The `GetSchema` method on the generated type returns an instance
of `System.Xml.Schema.XmlSchemaSet` that can be used to validate documents:
*)
open System.Xml.Schema
let schema = Person.GetSchema()
turing.XElement.Document.Validate(schema, validationEventHandler = null)
(**
The `Validate` method accepts a callback to handle validation issues;
passing `null` will turn validation errors into exceptions.
There are overloads to allow other effects (for example setting default values
by enabling the population of the XML tree with the post-schema-validation infoset;
for details see the [documentation](https://docs.microsoft.com/en-us/dotnet/api/system.xml.schema.extensions.validate?view=netframework-4.7.2)).
### Remarks on using a schema
The XML Type Provider supports most XSD features.
Anyway the [XML Schema](https://www.w3.org/XML/Schema) specification is rich and complex and also provides a
fair degree of [openness](http://docstore.mik.ua/orelly/xml/schema/ch13_02.htm)
which may be [difficult to handle](https://link.springer.com/chapter/10.1007/978-3-540-76786-2_6) in
data binding tools; but in FSharp.Data, when providing typed views on elements becomes too challenging
(take for example [wildcards](https://www.w3.org/TR/xmlschema11-1/#Wildcards)) the underlying `XElement`
is still available.
An important design decision is to focus on elements and not on complex types; while the latter
may be valuable in schema design, our goal is simply to obtain an easy and safe way to access xml data.
In other words the provided types are not intended for domain modeling (it's one of the very few cases
where optional properties are preferred to sum types).
Hence, we do not provide types corresponding to complex types in a schema but only corresponding
to elements (of course the underlying complex types still affect the shape of the provided types
but this happens only implicitly).
Focusing on element shapes let us generate a type that should be essentially the same as one
inferred from a significant set of valid samples. This allows a smooth transition (replacing `Sample` with `Schema`)
when a schema becomes available.
Note that inline schemas (values of the form `typeof{...}`) are not supported inside XSD documents.
## Related articles
* [Using JSON provider in a library](JsonProvider.html#jsonlib) also applies to XML type provider
* API Reference: [XmlProvider](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-xmlprovider.html) type provider
* API Reference: [XElementExtensions](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-xelementextensions.html)
*)