Octez Metrics

The Octez node is able to produce metrics information and serve them in the Open Metrics format, an emerging standard for exposing metrics data, especially used in cloud-based systems.

Supported Open Metrics

The Octez node supports the following metrics, characterized by: the name of the metric, the type of the metric as in the open metrics specification, a user friendly description on the metric and a list of labels (that can be used to aggregate or query metrics).

For more information check the openmetrics specification: https://openmetrics.io/

Name

Type

Description

Labels

ocaml_gc_allocated_bytes

Counter

Total number of bytes allocated since the program was started.

ocaml_gc_compactions

Counter

Number of heap compactions since the program was started.

ocaml_gc_heap_words

Gauge

Total size of the major heap, in words.

ocaml_gc_major_collections

Counter

Number of major collection cycles completed since the program was started.

ocaml_gc_major_words

Counter

Number of words allocated in the major heap since the program was started.

ocaml_gc_minor_collections

Counter

Number of minor collection cycles completed since the program was started.

ocaml_gc_top_heap_words

Counter

Maximum size reached by the major heap, in words.

octez_distributed_db_requester_table_length

Gauge

Number of entries (to grab) from the network present

requester_kind;entry_type

octez_mempool_pending_applied

Gauge

Mempool pending applied operations count

octez_mempool_pending_branch_delayed

Gauge

Mempool pending branch delayed operations count

octez_mempool_pending_branch_refused

Gauge

Mempool pending branch refused operations count

octez_mempool_pending_outdated

Gauge

Mempool pending outdated operations count

octez_mempool_pending_prechecked

Gauge

Mempool pending prechecked operations count

octez_mempool_pending_refused

Gauge

Mempool pending refused operations count

octez_mempool_pending_unprocessed

Gauge

Mempool pending unprocessed operations count

octez_mempool_worker_completion_count

Counter

Number of requests completed the block validator worker

octez_mempool_worker_error_count

Counter

Number of errors encountered by the block validator worker

octez_mempool_worker_request_count

Counter

Number of requests received by the block validator worker

octez_p2p_connections_active

Gauge

Number of active connections

octez_p2p_connections_incoming

Gauge

Number of incoming connections

octez_p2p_connections_outgoing

Gauge

Number of outgoing connections

octez_p2p_connections_private

Gauge

Number of private connections

octez_p2p_peers_accepted

Gauge

Number of accepted connections

octez_p2p_peers_disconnected

Gauge

Number of disconnected peers

octez_p2p_peers_running

Gauge

Number of running peers

octez_p2p_points_accepted

Gauge

Number of accepted points

octez_p2p_points_disconnected

Gauge

Number of disconnected points

octez_p2p_points_greylisted

Gauge

Number of greylisted points

octez_p2p_points_running

Gauge

Number of running points

octez_p2p_points_trusted

Gauge

Number of trusted points

octez_rpc_calls

Summary

RPC endpoint call counts and sum of execution times.

endpoint;method

octez_store_alternate_heads_count

Gauge

Current number of alternated heads known

octez_store_caboose_level

Gauge

Current caboose level

octez_store_checkpoint_level

Gauge

Current checkpoint level

octez_store_invalid_blocks

Gauge

Number of blocks known to be invalid stored on disk

octez_store_last_merge_time

Gauge

Time, in seconds, for the completion of the last store merge

octez_store_last_written_block_size

Gauge

Size, in bytes, of the last block written in store

octez_store_savepoint_level

Gauge

Current savepoint level

octez_validator_block_already_commited_blocks_count

Counter

Number of requests to validate a block already handled

octez_validator_block_last_finished_request_completion_timestamp

Gauge

Timestamp at which the latest request handled by the worker was completed

octez_validator_block_last_finished_request_push_timestamp

Gauge

Reception timestamp of the latest request handled by the worker

octez_validator_block_last_finished_request_treatment_timestamp

Gauge

Timestamp at which the worker started processing of the latest request it handled

octez_validator_block_operations_per_pass

Gauge

Number of operations per pass for the last validated block

pass_id

octez_validator_block_outdated_blocks_count

Counter

Number of requests to validate a block older than the node’s checkpoint

octez_validator_block_preapplication_errors_count

Counter

Number of refused application simulations of blocks

octez_validator_block_preapplied_blocks_count

Counter

Number of successful application simulations of blocks

octez_validator_block_precheck_failed_count

Counter

Number of block validation requests where the prechecking of a block failed

octez_validator_block_validated_blocks_count

Counter

Number of requests to validate a valid block

octez_validator_block_validation_errors_after_precheck_count

Counter

Number of requests to validate an invalid but precheckable block

octez_validator_block_validation_errors_count

Counter

Number of requests to validate an invalid block

octez_validator_block_worker_completion_count

Counter

Number of requests completed the block validator worker

octez_validator_block_worker_error_count

Counter

Number of errors encountered by the block validator worker

octez_validator_block_worker_request_count

Counter

Number of requests received by the block validator worker

octez_validator_chain_branch_switch_count

Counter

Number of times the chain_validator switched branch

chain_id

octez_validator_chain_head_consumed_gas

Gauge

Gas consumed in the current node’s head

chain_id

octez_validator_chain_head_cycle

Gauge

Cycle of the current node’s head

chain_id

octez_validator_chain_head_increment_count

Counter

Number of times the chain_validator incremented its head for a direct successor

chain_id

octez_validator_chain_head_level

Gauge

Level of the current node’s head

chain_id

octez_validator_chain_head_round

Gauge

Round of the current node’s head

chain_id

octez_validator_chain_ignored_head_count

Counter

Number of requests where the chain validator ignored a new valid block with a lower fitness than its current head

chain_id

octez_validator_chain_is_bootstrapped

Gauge

Returns 1 if the node has bootstrapped, 0 otherwise.

chain_id

octez_validator_chain_last_finished_request_completion_timestamp

Gauge

Timestamp at which the latest request handled by the worker was completed

chain_id

octez_validator_chain_last_finished_request_push_timestamp

Gauge

Reception timestamp of the latest request handled by the worker

chain_id

octez_validator_chain_last_finished_request_treatment_timestamp

Gauge

Timestamp at which the worker started processing of the latest request it handled

chain_id

octez_validator_chain_synchronisation_status

Gauge

Returns 0 if the node is unsynchronised, 1 if the node is synchronised, 2 if the node is stuck.

chain_id

octez_validator_chain_worker_completion_count

Counter

Number of requests completed the block validator worker

chain_id

octez_validator_chain_worker_error_count

Counter

Number of errors encountered by the block validator worker

chain_id

octez_validator_chain_worker_request_count

Counter

Number of requests received by the block validator worker

chain_id

octez_validator_peer_connections

Counter

Number of time we connected to a peer.

octez_validator_peer_invalid_block

Counter

Number of time we received an invalid block from a peer.

octez_validator_peer_invalid_locator

Counter

Number of time we received an invalid locator from a peer.

octez_validator_peer_new_branch_completed

Counter

Number of time we successfuly completed a new branch request from a peer.

octez_validator_peer_new_head_completed

Counter

Number of time we successfuly completed a new head request from a peer.

octez_validator_peer_on_no_request_count

Counter

Number of time we did no hear new messages from a peer since the last timeout.

octez_validator_peer_operations_fetching_canceled_new_branch

Counter

Number of time we canceled the fetching of operations on a new branch request for a peer.

octez_validator_peer_operations_fetching_canceled_new_known_valid_head

Counter

Number of time we canceled the fetching of operations on a new head request for a peer.

octez_validator_peer_operations_fetching_canceled_new_unknown_head

Counter

Number of time we canceled the fetching of operations on a new head request or an unknown head for a peer.

octez_validator_peer_system_error

Counter

Number of time a request trigerred a system error from a peer.

octez_validator_peer_too_short_locator

Counter

Number of time we received a too short locator from a peer.

octez_validator_peer_unavailable_protocol

Counter

Number of time we received an unknown protocol from a peer.

octez_validator_peer_unknown_ancestor

Counter

Number of time we received a locator with an unknown ancestor from a peer.

octez_validator_peer_unknown_error

Counter

Number of time an unknown error happened for a peer.

octez_version

Gauge

Node version

version;chain_name;distributed_db_version;p2p_version;commit_hash;commit_date

process_cpu_seconds_total

Counter

Total user and system CPU time spent in seconds.

process_start_time_seconds

Counter

Start time of the process since unix epoch in seconds.

prometheus_logs_messages_total

Counter

Total number of messages logged

level;src

Usage

To instruct the Octez node to produce metrics, the user needs to pass the option --metrics-addr=<ADDR>:<PORT>. The port specified on the command line is the port where the integrated open metrics server will be available (9932 by default). The address defaults to localhost. When the option is not supplied at all, no metrics are produced. Ex.:

tezos-node run --metrics-addr=:9091

To query the open metrics server the user can simply query the node.

Ex.:

curl http://<node_addr>:9091/metrics

Collecting metrics

Different third-party tools can be used to query the Octez node and collect metrics from it. Let us illustrates this with the example of a Prometheus server.

Update the Prometheus configuration file (typically, prometheus.yml) to add a “scrape job” - that is how Prometheus is made aware of a new data source - using adequate values:

  • job_name: Use a unique name among other scrape jobs. All metrics collected through this job will have automatically a ‘job’ label with this value added to it

  • targets: The URL of Octez node.

- job_name: 'tezos-metrics'
    scheme: http
    static_configs:
      - targets: ['localhost:9091']

Monitoring the node with metrics

Once the node is correctly set up to export metrics and those are collected by a Prometheus server, you can graphically monitor your node with a Grafana dashboard.

Dashboards suited for Octez can be easily built with the Grafazos tool. Grafazos provides several ready-to-use dashboards for Octez on the Grafazos packages page, as plain JSON files. Their sources are also available as jsonnet files, that can be adjusted to build customized dashboards, if needed:

  • octez-basic: A basic dashboard with all the node metrics

  • octez-full: A full dashboard with the logs and hardware data. This dashboard should be used with Netdata (for supporting hardware data) and Promtail (for exporting the logs).