FsLexYacc


Overview

fsyacc.exe is a LALR parser generator. It follows a similar specification to the OCamlYacc parser generator (especially when used with the ml compatibility switch)

Getting Started

Install the FsLexYacc nuget package.

Sample input

Parser generators typically produce numbers represented by values in an F# Union Type. For example:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
type Expr = 
 | Val of string 
 | Int of int
 | Float of float
 | Decr of Sxpr


type Stmt = 
 | Assign of string * Sxpr
 | While of Expr * Stmt
 | Seq of Stmt list
 | IfThen of Expr * Stmt
 | IfThenElse of Expr * Stmt * Stmt
 | Print of Expr


type Prog = Prog of Stmt list

Given that, a typical parser specification is as follows:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
31: 
32: 
33: 
34: 
35: 
36: 
37: 
%{
open Ast
%}

%start start
%token <string> ID
%token <System.Int32> INT
%token <System.Double> FLOAT
%token DECR LPAREN RPAREN WHILE DO END BEGIN IF THEN ELSE PRINT SEMI ASSIGN EOF
%type < Ast.Prog > start


%%


start: Prog {  $1 }


Prog: StmtList { Prog(List.rev($1)) }


Expr: ID { Val($1) }
    | INT {  Int($1)  }
    | FLOAT {  Float($1)  }
    | DECR LPAREN Expr RPAREN {  Decr($3)  }


Stmt: ID ASSIGN Expr { Assign($1,$3) }
    | WHILE Expr DO Stmt { While($2,$4) }
    | BEGIN StmtList END { Seq(List.rev($2)) }
    | IF Expr THEN Stmt { IfThen($2,$4) }
    | IF Expr THEN Stmt ELSE Stmt { IfThenElse($2,$4,$6) }
    | PRINT Expr { Print($2) }


StmtList: Stmt { [$1] }
       | StmtList SEMI Stmt { $3 :: $1  }

The above generates a datatype for tokens and a function for each start production. Parsers are typically combined with a lexer generated using FsLex.

MSBuild support

The nuget package includes MSBuild support for FsLex and FsYacc. You must add a FsLexYacc.targets reference to your project file manually like this (adjust the nuget package number if needed):

1: 
<Import Project="..\packages\FsLexYacc.6.0.3\bin\FsLexYacc.targets" />

You must also add FsLex andd FsYacc entries like this:

1: 
2: 
3: 
4: 
5: 
6: 
<FsYacc Include="..\LexAndYaccMiniProject\Parser.fsy">
  <OtherFlags>--module Parser</OtherFlags>
</FsYacc>
<FsLex Include="..\LexAndYaccMiniProject\Lexer.fsl">
  <OtherFlags>--unicode</OtherFlags>
</FsLex>

If you want to see verbose output from FsYacc you need to add -v in OtherFlags section like this:

1: 
2: 
3: 
<FsYacc Include="..\LexAndYaccMiniProject\Parser.fsy">
  <OtherFlags>--module Parser -v</OtherFlags>
</FsYacc>

Command line options

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
fsyacc <filename> fsyacc <filename>

    -o <string> Name the output file.

    -v: Produce a listing file.

    --module <string> Define the F# module name to host the generated parser.

    --internal: Generate an internal module

    --open <string> Add the given module to the list of those to open in both the generated signature and implementation.

    --light: (ignored)

    --light-off: Add #light "off" to the top of the generated file

    --ml-compatibility: Support the use of the global state from the 'Parsing' module in FSharp.PowerPack.dll.

    --tokens: Simply tokenize the specification file itself.

    --lexlib <string> Specify the namespace for the implementation of the parser table interperter (default Microsoft.FSharp.Text.Parsing)

    --parslib <string> Specify the namespace for the implementation of the parser table interperter (default Microsoft.FSharp.Text.Parsing)

    --codepage <int> Assume input lexer specification file is encoded with the given codepage.

    --help: display this list of options

    -help: display this list of options

Managing and using position markers

Each action in an fsyacc parser has access to a parseState value through which you can access position information.

1: 
2: 
3: 
4: 
5: 
6: 
7: 
type IParseState =
    abstract InputStartPosition: int -> Position
    abstract InputEndPosition: int -> Position
    abstract InputRange: int -> Position * Position
    abstract ParserLocalStore: IDictionary<string,obj>
    abstract ResultRange  : Position * Position
    abstract RaiseError<'b> : unit -> 'b

Input relate to the indexes of the items on the right hand side of the current production, the Result relates to the entire range covered by the production. You shouldn't use GetData directly - these is called automatically by $1, $2 etc. You can call RaiseError if you like.

You must set the initial position when you create the lexbuf:

1: 
2: 
3: 
4: 
5: 
let setInitialPos (lexbuf:lexbuf) filename =
     lexbuf.EndPos <- { pos_bol = 0;
                        pos_fname=filename;
                        pos_cnum=0;
                        pos_lnum=1 }

You must also update the position recorded in the lex buffer each time you process what you consider to be a new line:

1: 
2: 
let newline (lexbuf:lexbuf) =
    lexbuf.EndPos <- lexbuf.EndPos.AsNewLinePos()

Likewise if your language includes the ability to mark source code locations, see custom essay (e.g. the #line directive in OCaml and F#) then you must similarly adjust the lexbuf.EndPos according to the information you grok from your input.

Notes on OCaml Compatibility

OCamlYacc accepts the following:

1: 
%type < context -> context > toplevel

For FsYacc you just add parentheses:

1: 
%type < (context -> context)) > toplevel
type Expr =
  | Val of string
  | Int of int
  | Float of float
  | Decr of obj

Full name: fsyacc.Expr
union case Expr.Val: string -> Expr
Multiple items
val string : value:'T -> string

Full name: Microsoft.FSharp.Core.Operators.string

--------------------
type string = System.String

Full name: Microsoft.FSharp.Core.string
union case Expr.Int: int -> Expr
Multiple items
val int : value:'T -> int (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.int

--------------------
type int = int32

Full name: Microsoft.FSharp.Core.int

--------------------
type int<'Measure> = int

Full name: Microsoft.FSharp.Core.int<_>
union case Expr.Float: float -> Expr
Multiple items
val float : value:'T -> float (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.float

--------------------
type float = System.Double

Full name: Microsoft.FSharp.Core.float

--------------------
type float<'Measure> = float

Full name: Microsoft.FSharp.Core.float<_>
union case Expr.Decr: obj -> Expr
type Stmt =
  | Assign of string * obj
  | While of Expr * Stmt
  | Seq of Stmt list
  | IfThen of Expr * Stmt
  | IfThenElse of Expr * Stmt * Stmt
  | Print of Expr

Full name: fsyacc.Stmt
union case Stmt.Assign: string * obj -> Stmt
union case Stmt.While: Expr * Stmt -> Stmt
Multiple items
union case Stmt.Seq: Stmt list -> Stmt

--------------------
module Seq

from Microsoft.FSharp.Collections
type 'T list = List<'T>

Full name: Microsoft.FSharp.Collections.list<_>
union case Stmt.IfThen: Expr * Stmt -> Stmt
union case Stmt.IfThenElse: Expr * Stmt * Stmt -> Stmt
union case Stmt.Print: Expr -> Stmt
Multiple items
union case Prog.Prog: Stmt list -> Prog

--------------------
type Prog = | Prog of Stmt list

Full name: fsyacc.Prog
namespace System
type Int32 =
  struct
    member CompareTo : value:obj -> int + 1 overload
    member Equals : obj:obj -> bool + 1 overload
    member GetHashCode : unit -> int
    member GetTypeCode : unit -> TypeCode
    member ToString : unit -> string + 3 overloads
    static val MaxValue : int
    static val MinValue : int
    static member Parse : s:string -> int + 3 overloads
    static member TryParse : s:string * result:int -> bool + 1 overload
  end

Full name: System.Int32
type Double =
  struct
    member CompareTo : value:obj -> int + 1 overload
    member Equals : obj:obj -> bool + 1 overload
    member GetHashCode : unit -> int
    member GetTypeCode : unit -> TypeCode
    member ToString : unit -> string + 3 overloads
    static val MinValue : float
    static val MaxValue : float
    static val Epsilon : float
    static val NegativeInfinity : float
    static val PositiveInfinity : float
    ...
  end

Full name: System.Double
Multiple items
module List

from Microsoft.FSharp.Collections

--------------------
type List<'T> =
  | ( [] )
  | ( :: ) of Head: 'T * Tail: 'T list
  interface IEnumerable
  interface IEnumerable<'T>
  member GetSlice : startIndex:int option * endIndex:int option -> 'T list
  member Head : 'T
  member IsEmpty : bool
  member Item : index:int -> 'T with get
  member Length : int
  member Tail : 'T list
  static member Cons : head:'T * tail:'T list -> 'T list
  static member Empty : 'T list

Full name: Microsoft.FSharp.Collections.List<_>
val rev : list:'T list -> 'T list

Full name: Microsoft.FSharp.Collections.List.rev
namespace Microsoft.FSharp
namespace Microsoft
type obj = System.Object

Full name: Microsoft.FSharp.Core.obj
type unit = Unit

Full name: Microsoft.FSharp.Core.unit
Fork me on GitHub