Micheline

Micheline is a data format comparable to JSON, XML, S-expressions, and YAML. Its main purpose is to serve as the concrete syntax for the Michelson language.

Syntax

The structure of a Micheline node is very simple: a node can only be one of the five following constructs:

  1. An integer in decimal notation.

  2. A character string delimited by the double quotation character ".

  3. A byte sequence in hexadecimal notation prefixed by 0x.

  4. The application of a primitive to a whitespace-delimited list of nodes and annotations.

  5. A sequence of nodes delimited by curly braces ({ and }) and separated by semi-colons (;).

Primitives

Micheline primitives are character strings starting by an alphabetic or underscore (_) characters and containing only alphanumeric or underscore (_) characters. Additional restrictions on primitives can also be put to achieve the equivalent of JSON and XML schemas; for example each protocol specifies a fixed list of Michelson primitives. This restriction impacts the way Micheline nodes are serialized to binary format (see Binary serialization).

Annotations

Micheline annotations start by one of the special characters @, :, $, &, %, !, and ? followed by any number of alphanumeric, underscore (_), period (.), percent sign (%), and at sign (@) characters.

Character strings

The following usual escape sequences are allowed inside strings: \n, \\, \". Unescaped line-breaks (both \n and \r) cannot appear in Micheline strings. Since Micheline strings are delimited by double quotes, the double quote character cannot appear unescaped inside a string.

Primitive applications

A primitive application is a primitive followed by zero, one, or several arguments (either nodes or annotations):

prim arg1 arg2

When a primitive application with more than 0 arguments is the argument of another primitive application, it must be wrapped with parentheses. This wrapping is also allowed for applications without arguments.

prim (prim1 arg11 arg12) (prim2 arg21 arg22)

Sequences

Successive nodes can be grouped as a single sequence node using curly braces as delimiters and semicolon as separators.

{ expr1 ; expr2 ; expr3 ; expr4 }

A sequence can be passed as argument to a primitive.

prim arg1 arg2 { arg3_expr1 ; arg3_expr2 }

Primitive applications right inside a sequence cannot be wrapped.

{ (prim arg1 arg2) } # is not ok

Indentation

To remove ambiguities for human readers, the Micheline parser enforces some indentation rules.

  • For sequences:

    • All nodes in a sequence must be aligned on the same column.

    • An exception is made when consecutive nodes fit on the same line, as long as the first of them is correctly aligned.

    • All nodes in a sequence must be indented to the right of the opening curly brace by at least one column.

    • The closing curly brace cannot be on the left of the opening one.

  • For primitive applications:

    • All arguments in an application must be aligned on the same column.

    • An exception is made when consecutive arguments fit on the same line, as long as the first of them is correctly aligned.

    • All arguments in a sequence must be indented to the right of the primitive name by at least one column.

Comments

A hash sign (#) anywhere outside of a string literal will make the rest of the line (and itself) completely ignored, as in the following example.

{ PUSH nat 1 ; # pushes 1
  PUSH nat 2 ; # pushes 2
  ADD }        # computes 2 + 1

Comments that span on multiple lines or that stop before the end of the line can also be written, using C-like delimiters (/* ... */).

File encoding

Micheline files must be encoded in UTF-8. Non-ASCII characters can only appear in comments and strings.

BNF grammar

Formally, Micheline nodes have the following BNF grammar:

<natural> ::= [0-9]+
<char> ::= \" | \r | \n | \t | \b | \\ | [^"\]
<hexadecimal> ::= [0-9a-fA-F][0-9a-fA-F]
<primitive> ::= [a-zA-Z_][a-zA-Z_0-9]*
<annotation> ::= [@:$&%!?][_0-9a-zA-Z\.%@]*

<node> ::= <integer> | <string> | <bytes> | <primitive application> | <sequence>
<integer> ::= <natural> | - <natural>
<string> ::= " <char>* "
<bytes> ::= 0x<hexadecimal>*
<primitive application> ::= <primitive> <argument>*
<argument> ::= <integer> | <string> | <bytes> | <primitive> | ( <primitive application> ) | <sequence> | <annotation>
<sequence> ::= { nodes }
<nodes> ::= | <node> | <node> ; | <node> ; <nodes>

Serialization

Micheline nodes can be converted to JSON and binary formats. The JSON conversion is useful for example to interact with a Tezos node using RPCs. The binary encoding is used to store Micheline nodes in blocks.

Both forms of serialization are achieved using The data_encoding library.

Locations

Locations are used in error messages to indicate which part of a Micheline file is faulty. Micheline file locations are composed of two points; a starting point and a stopping point. Each point is a pair of a line number (counted from 1) and a column number (counted from 0).

Locations only make sense when we have access to the source file and they are lost when serializing Micheline nodes to either JSON or binary. During translation, locations are replaced by a canonical representation of locations. A canonical location is a single integer; the root of the node has a canonical location of zero, and each node adds one in the order of infix traversal.

A mapping between file locations and canonical locations is produced during the serialization so that errors reported by tools working on the serialized form can be printed in a meaningful way to users.

Conversion to JSON

The JSON conversion is very straightforward:

  • integers are encoded as JSON objects with a single "int" field whose value is a JSON string;

  • strings are encoded as JSON objects with a single "string" field whose value is either a JSON string or an array of bytes if the string to encode is not valid in JSON;

  • byte sequences are encoded as JSON objects with a single “bytes” field whose value is a JSON string containing the hexadecimal representation of the byte sequence without the 0x prefix.

  • sequences are encoded as JSON arrays;

  • primitive applications are encoded as JSON objects with three fields named "prim", "args", and "annots" containing respectively the primitive name, the JSON array of argument nodes, and the JSON array of annotations. Both "args" and "annots" are optional and default to the empty array.

Binary serialization

The binary serialization of Micheline is more complex because it contains some space optimisations for the very common cases of primitive applications with few arguments. Moreover, the binary encoding depends on the way primitives are encoded.

The documentation for the binary encoding specialized to Michelson primitives of the Alpha protocol can be generated by the command octez-codec describe alpha.script.expr binary schema.

Usage of the OCaml Micheline library

In the Octez codebase, Micheline nodes are handled by a small library named lib_micheline. This library is used in the Octez client (responsible for parsing the Michelson files, expanding macros, and encoding the result to JSON) and in the Tezos protocol (responsible for decoding from JSON, type checking, and serializing to binary Michelson scripts and values).

The library is composed of three modules: Micheline, Micheline_parser, and Micheline_printer. Only the Micheline module is needed for the protocol so its signature is the only one exported in the protocol environment.

The Micheline module defines the type of Micheline nodes. This type is parameterised by the type of locations 'l and the type of primitives 'p.

type annot = string list

type ('l, 'p) node =
  | Int of 'l * Z.t
  | String of 'l * string
  | Bytes of 'l * Bytes.t
  | Prim of 'l * 'p * ('l, 'p) node list * annot
  | Seq of 'l * ('l, 'p) node list

Since printing and parsing are provided by the two other modules of the library, the only way to produce or consume Micheline nodes from this module is through the encodings. To encode a Micheline node, we first canonicalize its locations using the extract_locations function that takes a ('l, 'p) node as argument and returns both a 'p canonical and an association list between canonical_locations and 'l locations. The type 'p canonical is an abstract version of (canonical_location, 'p) node that ensures the invariant on the numbering of canonical locations presented in the Locations section and canonical_location is an alias of int.

If we know an encoding for the 'p primitives, we can derive an encoding for 'p canonical using the canonical_encoding function and then produce JSON and binary outputs using the functions from The data_encoding library.

In the other direction, from a 'p canonical produced by an encoding, either we know how to map its canonical locations to file locations and we can use the inject_locations function to do so or we want to keep the locations canonical and use the root function to obtain a plain node.

The Micheline_printer module exports two pretty-printing functions based on the Format standard OCaml module: print_expr and print_expr_unwrapped. The difference is that print_expr wraps its output in parentheses if it is a primitive application with at least one argument. The input of these printing functions can be produced from a 'p canonical by the printable function.

The Micheline_parser module can be used to produce Micheline nodes from strings and files. It is done in two stages: tokenization (also known as lexing) and parsing. Tokenization is done by the tokenize function that uses the uutf OCaml library to decode UTF-8 characters. For the parsing phase, two functions parse_expression and parse_toplevel are exported; the former produces a single Micheline node while the second parses a list of expressions separated by semicolons. They both take an optional boolean parameter named check; setting it to false disables indentation checking.

Other tools and resources

The following links are not part of the Octez OCaml code base but are reimplementations of parts of lib_micheline in other tools and languages: