Runtime metrics and statsd

KPHP server collects various metrics and pushes them into statsd service. This feature can be configured with the following options: --statsd-host, --statsd-port, --disable-statsd.

The master process collects, aggregates, and pushes metrics every second. By default, it tries to connect to the 8125 or 14880 port. Once the KPHP server connects to any port, it stops trying. In case of connection lost, the master process tries to reconnect.

Here it is the list of all available metrics:

1. General stats

  • kphp_server.kphp_version — the version of php code. Specified by --error-tag/-E option;
  • kphp_server.uptime — seconds passed from the KPHP server start;
  • kphp_server.cpu_utime — master + workers CPU time in user mode (check /proc/[pid]/stat (14));
  • kphp_server.cpu_stime — master + workers CPU time in kernel mode (check /proc/[pid]/stat (15));

2. Workers stats

  • kphp_server.workers_total_started — total number of started workers;
  • kphp_server.workers_total_dead — total number of dead workers;
  • kphp_server.workers_total_strange_dead — total number of dead workers by strange reasons (SIGSEGV, etc);
  • kphp_server.workers_total_hung — total number of hung workers;
  • kphp_server.workers_total_killed — total number of hung workers killed by the master process;
  • kphp_server.workers_total_terminated — total number of workers terminated by the master process;
  • kphp_server.workers_total_failed — total number of troubles on worker termination;
  • kphp_server.workers_current_total — number of available workers;
  • kphp_server.workers_current_working — number of working workers;
  • kphp_server.workers_current_working_but_waiting — number of workers that work but currently are waiting for network response;
  • kphp_server.workers_current_ready_for_accept — number of workers ready to accept a new tcp connection;
  • kphp_server.workers_running_avg_1m — average number of working workers for the last minute;
  • kphp_server.workers_running_max_1m — maximum number of working workers for the last minute;

3. Requests stats

  • kphp_server.requests_total_incoming_queries — total number of incoming queries;
  • kphp_server.requests_total_outgoing_queries — total number of outgoing (to databases) queries;
  • kphp_server.requests_script_time_total — total number of time (seconds) in php code;
  • kphp_server.requests_script_time_percentile_50 — request php code time, 50th percentile;
  • kphp_server.requests_script_time_percentile_95 — request php code time, 95th percentile;
  • kphp_server.requests_script_time_percentile_99 — request php code time, 99th percentile;
  • kphp_server.requests_net_time_total — total number of time (seconds) in network awaiting (databases);
  • kphp_server.requests_net_time_percentile_50 — request net time, 50th percentile;
  • kphp_server.requests_net_time_percentile_95 — request net time, 95th percentile;
  • kphp_server.requests_net_time_percentile_99 — request net time, 99th percentile;
  • kphp_server.requests_working_time_percentile_50 — request full time, 50th percentile;
  • kphp_server.requests_working_time_percentile_95 — request full time, 95th percentile;
  • kphp_server.requests_working_time_percentile_99 — request full time, 99th percentile;
  • kphp_server.requests_incoming_queries_per_second — requests incoming QPS;
  • kphp_server.requests_outgoing_queries_per_second — requests outgoing QPS (to databases);

4. Terminated requests stats

  • kphp_server.terminated_requests_timeout — total number of terminations due to server timeout;
  • kphp_server.terminated_requests_http_connection_close — total number of terminations due to closed HTTP connection;
  • kphp_server.terminated_requests_rpc_connection_close — total number of terminations due to closed RPC connection;
  • kphp_server.terminated_requests_memory_limit_exceeded — total number of terminations due to memory limit exceeded;
  • kphp_server.terminated_requests_exception — total number of terminations due to uncaught exceptions;
  • kphp_server.terminated_requests_stack_overflow — total number of terminations due to stack overflow;
  • kphp_server.terminated_requests_php_assert — total number of terminations due to assertion in KPHP runtime or server code;
  • kphp_server.terminated_requests_net_event_error — total number of terminations due to network errors;
  • kphp_server.terminated_requests_post_data_loading_error — total number of terminations due to POST body receiving failure;
  • kphp_server.terminated_requests_unclassified — total number of terminations due to unclassified reason;

5. Memory stats

  • kphp_server.memory_script_usage_max — request memory usage maximum (since start);
  • kphp_server.memory_script_usage_percentile_50 — request memory usage 50th percentile;
  • kphp_server.memory_script_usage_percentile_95 — request memory usage 95th percentile;
  • kphp_server.memory_script_usage_percentile_99 — request memory usage 99th percentile;
  • kphp_server.memory_script_real_usage_max — request allocator memory usage maximum;
  • kphp_server.memory_script_real_usage_percentile_50 — request allocator memory usage 50th percentile;
  • kphp_server.memory_script_real_usage_percentile_95 — request allocator memory usage 95th percentile;
  • kphp_server.memory_script_real_usage_percentile_99 — request allocator memory usage 99th percentile;
  • kphp_server.memory_vms_max — maximum vms usage by a single worker;
  • kphp_server.memory_rss_max — maximum rss usage by a single worker;
  • kphp_server.memory_shared_max — maximum shared memory usage;

6. Instance cache memory

  • kphp_server.instance_cache_memory_limit — memory limit;
  • kphp_server.instance_cache_memory_used — memory usage;
  • kphp_server.instance_cache_memory_used_max — peak memory usage;
  • kphp_server.instance_cache_memory_real_used — allocator memory usage;
  • kphp_server.instance_cache_memory_real_used_max — allocator peak memory usage;
  • kphp_server.instance_cache_memory_defragmentation_calls — allocator defragmentation calls count;
  • kphp_server.instance_cache_memory_huge_memory_pieces — allocator huge memory pieces count;
  • kphp_server.instance_cache_memory_small_memory_pieces — allocator small memory pieces count;
  • kphp_server.instance_cache_memory_buffer_swaps_ok — allocator buffer successful swaps count;
  • kphp_server.instance_cache_memory_buffer_swaps_fail — allocator buffer unsuccessful (due to worker usage) swaps count;

7. Instance cache elements

  • kphp_server.instance_cache_elements_stored — total number of elements stored to shared memory;
  • kphp_server.instance_cache_elements_stored_with_delay — total number of elements stored with delay (due to allocator lock);
  • kphp_server.instance_cache_elements_storing_skipped_due_recent_update — total number of skipped storing operations due to a recent storing from another worker;
  • kphp_server.instance_cache_elements_storing_delayed_due_mutex — total number of delayed storing operations due to allocator lock;
  • kphp_server.instance_cache_elements_fetched — total number of fetched elements;
  • kphp_server.instance_cache_elements_missed — total number of missed (not found) elements;
  • kphp_server.instance_cache_elements_missed_earlier — total number of missed in advance elements;
  • kphp_server.instance_cache_elements_expired — total number of expired elements;
  • kphp_server.instance_cache_elements_created — total number of created elements;
  • kphp_server.instance_cache_elements_destroyed — total number of destroyed elements;
  • kphp_server.instance_cache_elements_cached — total number of elements in cache;
  • kphp_server.instance_cache_elements_logically_expired_and_ignored — total number of logically expired elements and ignored on fetch;
  • kphp_server.instance_cache_elements_logically_expired_but_fetched — total number of logically expired elements but fetched;

All these metrics are supposed to be monitored with grafana.

Outputting kphp_version and uptime shows you points of restarts.
Checking for workers' stats and memory stats helps you determine whether a server has extra resources or its capacity is on the brink.
Analyzing requests stats gives you a summary of how long users are waiting for the response.
Observing terminated requests allows you to diagnose problems in PHP code.
If your code uses shared memory, instance cache stats help you ensure, that PHP code uses it correctly without useless reallocations or unconstant keys.