Header menu logo FsLexYacc

FsYacc Overview

fsyacc is a LALR parser generator. It follows a similar specification to the OCamlYacc parser generator (especially when used with the ml compatibility switch)

Getting Started

Build the tool by cloning this project and running /build.sh or build.cmd

Add a reference in your project to FsLexYacc package via Nuget or paket.

You can run the parser generator directly: dotnet fsyacc.dll inputFile -o outputFile

Sample input

Parser generators typically produce numbers represented by values in an F# Union Type. For example:

type Expr = 
 | Val of string 
 | Int of int
 | Float of float
 | Decr of Expr


type Stmt = 
 | Assign of string * Expr
 | While of Expr * Stmt
 | Seq of Stmt list
 | IfThen of Expr * Stmt
 | IfThenElse of Expr * Stmt * Stmt
 | Print of Expr


type Prog = Prog of Stmt list

Given that, a typical parser specification is as follows:

%{
open Ast
%}

%start start
%token <string> ID
%token <System.Int32> INT
%token <System.Double> FLOAT
%token DECR LPAREN RPAREN WHILE DO END BEGIN IF THEN ELSE PRINT SEMI ASSIGN EOF
%type < Ast.Prog > start


%%


start: Prog {  $1 }


Prog: StmtList { Prog(List.rev($1)) }


Expr: ID { Val($1) }
    | INT {  Int($1)  }
    | FLOAT {  Float($1)  }
    | DECR LPAREN Expr RPAREN {  Decr($3)  }


Stmt: ID ASSIGN Expr { Assign($1,$3) }
    | WHILE Expr DO Stmt { While($2,$4) }
    | BEGIN StmtList END { Seq(List.rev($2)) }
    | IF Expr THEN Stmt { IfThen($2,$4) }
    | IF Expr THEN Stmt ELSE Stmt { IfThenElse($2,$4,$6) }
    | PRINT Expr { Print($2) }


StmtList: Stmt { [$1] }
       | StmtList SEMI Stmt { $3 :: $1  }

The above generates a datatype for tokens and a function for each start production. Parsers are typically combined with a lexer generated using FsLex.

MSBuild support

The nuget package includes MSBuild support for FsLex and FsYacc. New MSBuild targets are added automatically by the nuget package.
But you must manually add FsLex andd FsYacc entries inside of an ItemGroup to your .fsproj file like this:

<FsYacc Include="..\LexAndYaccMiniProject\Parser.fsy">
  <OtherFlags>--module Parser</OtherFlags>
</FsYacc>
<FsLex Include="..\LexAndYaccMiniProject\Lexer.fsl">
  <OtherFlags>--unicode</OtherFlags>
</FsLex>

If you want to see verbose output from FsYacc you need to add -v in the OtherFlags section like this:

<FsYacc Include="..\LexAndYaccMiniProject\Parser.fsy">
  <OtherFlags>--module Parser -v</OtherFlags>
</FsYacc>

Command line options

fsyacc <filename> fsyacc <filename>

    -o <string> Name the output file.

    -v: Produce a listing file.

    --module <string> Define the F# module name to host the generated parser.

    --internal: Generate an internal module

    --open <string> Add the given module to the list of those to open in both the generated signature and implementation.

    --light: (ignored)

    --light-off: Add #light "off" to the top of the generated file

    --ml-compatibility: Support the use of the global state from the 'Parsing' module in FSharp.PowerPack.dll.

    --tokens: Simply tokenize the specification file itself.

    --lexlib <string> Specify the namespace for the implementation of the parser table interperter (default FSharp.Text.Parsing)

    --parslib <string> Specify the namespace for the implementation of the parser table interperter (default FSharp.Text.Parsing)

    --codepage <int> Assume input lexer specification file is encoded with the given codepage.

    --help: display this list of options

    -help: display this list of options

Managing and using position markers

Each action in an fsyacc parser has access to a parseState value through which you can access position information.

type IParseState =
    abstract InputStartPosition: int -> Position
    abstract InputEndPosition: int -> Position
    abstract InputRange: int -> Position * Position
    abstract ParserLocalStore: IDictionary<string,obj>
    abstract ResultRange  : Position * Position
    abstract RaiseError<'b> : unit -> 'b

Input relate to the indexes of the items on the right hand side of the current production, the Result relates to the entire range covered by the production. You shouldn't use GetData directly - these is called automatically by $1, $2 etc. You can call RaiseError if you like.

You must set the initial position when you create the lexbuf:

let setInitialPos (lexbuf:LexBuffer<_>) filename =
     lexbuf.EndPos <- { pos_bol = 0;
                        pos_fname=filename;
                        pos_cnum=0;
                        pos_lnum=1 }

You must also update the position recorded in the lex buffer each time you process what you consider to be a new line:

let newline (lexbuf:lexbuf) =
    lexbuf.EndPos <- lexbuf.EndPos.AsNewLinePos()

Likewise if your language includes the ability to mark source code locations, see custom essay (e.g. the #line directive in OCaml and F#) then you must similarly adjust the lexbuf.EndPos according to the information you grok from your input.

Notes on OCaml Compatibility

OCamlYacc accepts the following:

%type < context -> context > toplevel

For FsYacc you just add parentheses:

%type < (context -> context) > toplevel
type Expr = | Val of string | Int of int | Float of float | Decr of Expr
Multiple items
val string: value: 'T -> string

--------------------
type string = System.String
Multiple items
val int: value: 'T -> int (requires member op_Explicit)

--------------------
type int = int32

--------------------
type int<'Measure> = int
Multiple items
val float: value: 'T -> float (requires member op_Explicit)

--------------------
type float = System.Double

--------------------
type float<'Measure> = float
type Stmt = | Assign of string * Expr | While of Expr * Stmt | Seq of Stmt list | IfThen of Expr * Stmt | IfThenElse of Expr * Stmt * Stmt | Print of Expr
module Seq from Microsoft.FSharp.Collections
type 'T list = List<'T>
Multiple items
union case Prog.Prog: Stmt list -> Prog

--------------------
type Prog = | Prog of Stmt list
type Prog = | Prog of Stmt list
namespace System
[<Struct>] type Int32 = member CompareTo: value: int -> int + 1 overload member Equals: obj: int -> bool + 1 overload member GetHashCode: unit -> int member GetTypeCode: unit -> TypeCode member ToString: unit -> string + 3 overloads member TryFormat: destination: Span<char> * charsWritten: byref<int> * ?format: ReadOnlySpan<char> * ?provider: IFormatProvider -> bool static member Parse: s: ReadOnlySpan<char> * ?style: NumberStyles * ?provider: IFormatProvider -> int + 4 overloads static member TryParse: s: ReadOnlySpan<char> * style: NumberStyles * provider: IFormatProvider * result: byref<int> -> bool + 3 overloads static val MaxValue: int static val MinValue: int
<summary>Represents a 32-bit signed integer.</summary>
[<Struct>] type Double = member CompareTo: value: float -> int + 1 overload member Equals: obj: float -> bool + 1 overload member GetHashCode: unit -> int member GetTypeCode: unit -> TypeCode member ToString: unit -> string + 3 overloads member TryFormat: destination: Span<char> * charsWritten: byref<int> * ?format: ReadOnlySpan<char> * ?provider: IFormatProvider -> bool static member (<) : left: float * right: float -> bool static member (<=) : left: float * right: float -> bool static member (<>) : left: float * right: float -> bool static member (=) : left: float * right: float -> bool ...
<summary>Represents a double-precision floating-point number.</summary>
Multiple items
module List from Microsoft.FSharp.Collections

--------------------
type List<'T> = | op_Nil | op_ColonColon of Head: 'T * Tail: 'T list interface IReadOnlyList<'T> interface IReadOnlyCollection<'T> interface IEnumerable interface IEnumerable<'T> member GetReverseIndex: rank: int * offset: int -> int member GetSlice: startIndex: int option * endIndex: int option -> 'T list static member Cons: head: 'T * tail: 'T list -> 'T list member Head: 'T member IsEmpty: bool member Item: index: int -> 'T with get ...
val rev: list: 'T list -> 'T list
union case Expr.Val: string -> Expr
union case Expr.Int: int -> Expr
union case Expr.Float: float -> Expr
union case Expr.Decr: Expr -> Expr
union case Stmt.Assign: string * Expr -> Stmt
union case Stmt.While: Expr * Stmt -> Stmt
Multiple items
union case Stmt.Seq: Stmt list -> Stmt

--------------------
module Seq from Microsoft.FSharp.Collections
union case Stmt.IfThen: Expr * Stmt -> Stmt
union case Stmt.IfThenElse: Expr * Stmt * Stmt -> Stmt
union case Stmt.Print: Expr -> Stmt
namespace Microsoft.FSharp
type obj = System.Object
type unit = Unit

Type something to start searching.