Skip to Content

Мониторинг служб

Оркестратор поддерживает сбор метрик через Zabbix и Prometheus для различных служб.

Подсистема мониторинга настраивается в разделе Monitoring файла конфигурации:

"Monitoring": { "Provider": "Zabbix", // Доступные значения: "", "Zabbix", "Prometheus" ("" - отключено) "Port": 10052, "IdleDurationMillis": 2000, "DotNetMonitoringEnabled": true, "NpgsqlMonitoringEnabled": true, "KestrelMonitoringEnabled": true }

Примеры настройки мониторинга для служб WebApi, ArcSight и States приведены здесь.

  1. Может использоваться любой доступный порт старше 1024,
  2. Параметр DotNetMonitoringEnabled используется для включения мониторинга .NET.
  3. Параметр KestrelMonitoringEnabled - для включения мониторинга сервера Kestrel.
  4. Параметр NpgsqlMonitoringEnabled - для мониторинга Npgsql.
  5. Ограничить скорость обновлений метрик можно с помощью параметра IdleDurationMillis - это период в милисекундах (по умолчанию, 2 секунды), в течении которого некоторые метрики не обновляются и возвращаются предыдущие значения (используется только в некоторых сервисах).
ℹ️

Подсистема мониторинга не инициирует отправку данных. Она ждёт подключения внешней службы сбора данных (провайдера) и отвечает на её запросы.

Настройка провайдеров

  1. Руководство по установке и настройке Zabbix: https://wiki.astralinux.ru/pages/viewpage.action?pageId=38699775

  2. Руководство по установке и настройке Prometheus: https://astra.ru/upload/parser_jira/certs/prot_SE17_RDY-6487.pdf https://www.dmosk.ru/instruktions.php?object=prometheus-linux

Подключение к провайдеру

При включенном провайдере Zabbix у службы доступен следующий эндпоинт:

tcp://<IP хоста службы>:<порт мониторинга>

Этот эндпоинт необходимо указывать при настройке агента Zabbix на сервере Zabbix.

При включенном провайдере Prometheus у службы доступен следующий эндпоинт:

http://<IP хоста службы>:<порт мониторинга>/metrics

Этот URL необходимо указывать при сервера Prometheus; для быстрой проверки этот эндпоинт также можно открыть в браузере.

Общие метрики

Для каждой службы можно включить сбор метрик .NET, Kestrel и Npgsql. Для некоторых служб (например RDP2 или MachineInfo) включение метрик Npgsql бессмысленно, так как эти службы сами не подключаются к БД PostgreSQL.

Метрики .NET

# HELP dotnet_collection_count_total_0 GC collection count for gen0 # TYPE dotnet_collection_count_total_0 counter dotnet_collection_count_total_0 # HELP dotnet_collection_count_total_1 GC collection count for gen1 # TYPE dotnet_collection_count_total_1 counter dotnet_collection_count_total_1 # HELP dotnet_collection_count_total_2 GC collection count for gen2 # TYPE dotnet_collection_count_total_2 counter dotnet_collection_count_total_2 # HELP dotnet_total_memory_bytes Total known allocated memory # TYPE dotnet_total_memory_bytes gauge dotnet_total_memory_bytes # HELP process_virtual_memory_bytes Virtual memory size in bytes. # TYPE process_virtual_memory_bytes gauge process_virtual_memory_bytes # HELP process_working_set_bytes Process working set # TYPE process_working_set_bytes gauge process_working_set_bytes # HELP process_private_memory_bytes Process private memory size # TYPE process_private_memory_bytes gauge process_private_memory_bytes # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds. # TYPE process_cpu_seconds_total counter process_cpu_seconds_total # HELP process_open_handles Number of open handles # TYPE process_open_handles gauge process_open_handles # HELP process_start_time_seconds Start time of the process since unix epoch in seconds. # TYPE process_start_time_seconds gauge process_start_time_seconds # HELP process_num_threads Total number of threads # TYPE process_num_threads gauge process_num_threads

Метрики сервера Kestrel

# HELP kestrel_active_connections Number of connections that are currently active on the server. # TYPE kestrel_active_connections gauge kestrel_active_connections 1 # HELP kestrel_connection_duration The duration of connections on the server. # TYPE kestrel_connection_duration gauge kestrel_connection_duration 0 # HELP kestrel_rejected_connections Number of connections rejected by the server. Connections are rejected when the currently active count exceeds the value configured with MaxConcurrentConnections. # TYPE kestrel_rejected_connections counter kestrel_rejected_connections 0 # HELP kestrel_queued_connections Number of connections that are currently queued and are waiting to start. # TYPE kestrel_queued_connections gauge kestrel_queued_connections -1 # HELP kestrel_queued_requests Number of HTTP requests on multiplexed connections (HTTP/2 and HTTP/3) that are currently queued and are waiting to start. # TYPE kestrel_queued_requests gauge kestrel_queued_requests 0 # HELP kestrel_upgraded_connections Number of HTTP connections that are currently upgraded (WebSockets). The number only tracks HTTP/1.1 connections. # TYPE kestrel_upgraded_connections gauge kestrel_upgraded_connections 0 # HELP kestrel_tls_handshake_duration The duration of TLS handshakes on the server. # TYPE kestrel_tls_handshake_duration gauge kestrel_tls_handshake_duration 0 # HELP kestrel_active_tls_handshakes Number of TLS handshakes that are currently in progress on the server. # TYPE kestrel_active_tls_handshakes gauge kestrel_active_tls_handshakes 0

Метрики Npgsql

Метрики клиентских подключений к БД PostgreSQL параметризированные, параметром является имя БД, например ltoolslogs или ltools. Формат имён параметризированных метрик (ключей элементов данных или item key для Zabbix) различается для Prometheus и Zabbix:

  • метрика Prometheus’а db_client_connections_usage{DbName="ltoolslogs"} в Zabbix’е будет выглядеть db_client_connections_usage[ltoolslogs];
  • метрика Prometheus’а db_client_commands_bytes_read{DbName="ltools"} в Zabbix’е будет выглядеть db_client_commands_bytes_read[ltools].
# HELP db_client_commands_executing The number of currently executing database commands. # TYPE db_client_commands_executing gauge db_client_commands_executing{DbName="<DbName>"} -1 # HELP db_client_commands_failed The number of database commands which have failed. # TYPE db_client_commands_failed gauge db_client_commands_failed{DbName="<DbName>"} 0 # HELP db_client_commands_duration The duration of database commands, in seconds. # TYPE db_client_commands_duration gauge db_client_commands_duration{DbName="<DbName>"} 0 # HELP db_client_commands_bytes_written The number of bytes written. # TYPE db_client_commands_bytes_written counter db_client_commands_bytes_written{DbName="<DbName>"} 3631 # HELP db_client_commands_bytes_read The number of bytes read. # TYPE db_client_commands_bytes_read counter db_client_commands_bytes_read{DbName="<DbName>"} 8192 # HELP db_client_connections_pending_requests The number of pending requests for an open connection, cumulative for the entire pool. # TYPE db_client_connections_pending_requests gauge db_client_connections_pending_requests{DbName="<DbName>"} 0 # HELP db_client_connections_timeouts The number of connection timeouts that have occurred trying to obtain a connection from the pool. # TYPE db_client_connections_timeouts counter db_client_connections_timeouts{DbName="<DbName>"} 0 # HELP db_client_connections_create_time The time it took to create a new connection. # TYPE db_client_connections_create_time gauge db_client_connections_create_time{DbName="<DbName>"} 0 # HELP db_client_connections_usage The number of connections that are currently in state described by the state attribute. # TYPE db_client_connections_usage gauge db_client_connections_usage{DbName="<DbName>"} 0 # HELP db_client_connections_max The maximum number of open connections allowed. # TYPE db_client_connections_max gauge db_client_connections_max{DbName="<DbName>"} 20 # HELP db_client_commands_prepared_ratio The ratio of prepared command executions. # TYPE db_client_commands_prepared_ratio gauge db_client_commands_prepared_ratio{DbName="<DbName>"} 0

Метрики MachineInfo

# HELP primo_mi_requests Number of requests # TYPE primo_mi_requests counter primo_mi_requests

Метрики Agent

# HELP primo_ag_running_robots Number of running robots # TYPE primo_ag_running_robots gauge primo_ag_running_robots # HELP primo_ag_successful_starts Number of successful starts # TYPE primo_ag_successful_starts counter primo_ag_successful_starts # HELP primo_ag_failed_starts Number of failed starts # TYPE primo_ag_failed_starts counter primo_ag_failed_starts

Метрики RDP2

# HELP primo_rdp2_active_sessions Number of active RDP sessions # TYPE primo_rdp2_active_sessions gauge primo_rdp2_active_sessions 4 # HELP primo_rdp2_successful_connections Number of successful connections # TYPE primo_rdp2_successful_connections counter primo_rdp2_successful_connections 4 # HELP primo_rdp2_failed_connections Number of failed connections # TYPE primo_rdp2_failed_connections counter primo_rdp2_failed_connections 0 # HELP primo_rdp2_disconnections Number of disconnections # TYPE primo_rdp2_disconnections counter primo_rdp2_disconnections 0 # HELP primo_rdp2_streams Number of active streams # TYPE primo_rdp2_streams gauge primo_rdp2_streams 0 # HELP primo_rdp2_viewers Number of active viewers # TYPE primo_rdp2_viewers gauge primo_rdp2_viewers 0 # HELP primo_rdp2_managers Number of active managers # TYPE primo_rdp2_managers gauge primo_rdp2_managers 0

Метрики RobotLogs

# HELP primo_rl_messages Number of robot messages # TYPE primo_rl_messages counter primo_rl_messages 0 # HELP primo_rl_orch_messages Number of orchestrator messages # TYPE primo_rl_orch_messages counter primo_rl_orch_messages 10012 # HELP primo_rl_custom_messages Number of robot custom messages # TYPE primo_rl_custom_messages counter primo_rl_custom_messages 0 # HELP primo_rl_attended_messages Number of attended robot messages # TYPE primo_rl_attended_messages counter primo_rl_attended_messages 0 # HELP primo_rl_attended_custom_messages Number of attended robot custom messages # TYPE primo_rl_attended_custom_messages counter primo_rl_attended_custom_messages 0 # HELP primo_rl_screen_file_requests Number of screen file requests # TYPE primo_rl_screen_file_requests counter primo_rl_screen_file_requests 0 # HELP primo_rl_screen_file_thumb_requests Number of screen file thumb requests # TYPE primo_rl_screen_file_thumb_requests counter primo_rl_screen_file_thumb_requests 0 # HELP primo_rl_queue_lost_messages Number of queue lost messages # TYPE primo_rl_queue_lost_messages counter primo_rl_queue_lost_messages 0 # HELP primo_rl_queue_length The queue length # TYPE primo_rl_queue_length gauge primo_rl_queue_length 0

При включенном мониторинге Npgsql доступны метрики для БД ltoolslog.

Метрики Notifications

# HELP primo_ntf_successfully_sent_emails The number of successfully sent emails. # TYPE primo_ntf_successfully_sent_emails counter primo_ntf_successfully_sent_emails 0 # HELP primo_ntf_unsuccessfully_sent_emails The number of unsuccessfully sent emails. # TYPE primo_ntf_unsuccessfully_sent_emails counter primo_ntf_unsuccessfully_sent_emails 0 # HELP primo_ntf_successfully_rendered_html_templates The number of successfully rendered HTML templates. # TYPE primo_ntf_successfully_rendered_html_templates counter primo_ntf_successfully_rendered_html_templates 0 # HELP primo_ntf_unsuccessfully_rendered_html_templates The number of unsuccessfully rendered HTML templates. # TYPE primo_ntf_unsuccessfully_rendered_html_templates counter primo_ntf_unsuccessfully_rendered_html_templates 0 # HELP primo_ntf_successfully_rendered_xlsx_attachments The number of successfully rendered Xlsx attachments. # TYPE primo_ntf_successfully_rendered_xlsx_attachments counter primo_ntf_successfully_rendered_xlsx_attachments 0 # HELP primo_ntf_unsuccessfully_rendered_xlsx_attachments The number of unsuccessfully rendered Xlsx attachments. # TYPE primo_ntf_unsuccessfully_rendered_xlsx_attachments counter primo_ntf_unsuccessfully_rendered_xlsx_attachments 0

Метрики LogEventsWebhook

# HELP primo_lew_login_successes The number of login successes # TYPE primo_lew_login_successes counter primo_lew_login_successes 0 # HELP primo_lew_login_failures The number of login failures # TYPE primo_lew_login_failures counter primo_lew_login_failures 0 # HELP primo_lew_event_successes The number of event processing successes # TYPE primo_lew_event_successes counter primo_lew_event_successes 0 # HELP primo_lew_event_failures The number of event processing failures # TYPE primo_lew_event_failures counter primo_lew_event_failures 0

Метрики Analytic

# HELP primo_analytic_received_orch_events The number of received orch events. # TYPE primo_analytic_received_orch_events counter primo_analytic_received_orch_events 0 # HELP primo_analytic_processed_orch_events The number of processed orch events. # TYPE primo_analytic_processed_orch_events counter primo_analytic_processed_orch_events 0 # HELP primo_analytic_failed_orch_events The number of failed orch events. # TYPE primo_analytic_failed_orch_events counter primo_analytic_failed_orch_events 0 # HELP primo_analytic_successful_refreshes The number of successful refreshes. # TYPE primo_analytic_successful_refreshes counter primo_analytic_successful_refreshes{TableName="mv_RobotsUsage"} 0 primo_analytic_successful_refreshes{TableName="mv_WorkersUsage"} 0 # HELP primo_analytic_failed_refreshes The number of failed refreshes. # TYPE primo_analytic_failed_refreshes counter primo_analytic_failed_refreshes{TableName="mv_RobotsUsage"} 0 primo_analytic_failed_refreshes{TableName="mv_WorkersUsage"} 0

При включенном мониторинге Npgsql доступны метрики для БД ltoolsanalytic.

Метрики ArcSight

# HELP primo_arcsight_written_orch_events The number of written orch events. # TYPE primo_arcsight_written_orch_events counter primo_arcsight_written_orch_events 0 # HELP primo_arcsight_converted_orch_events The number of converted orch events. # TYPE primo_arcsight_converted_orch_events counter primo_arcsight_converted_orch_events 0

Метрики States

# HELP primo_states_processed_events The number of processed events. # TYPE primo_states_processed_events counter primo_states_processed_events 0 # HELP primo_states_failed_events The number of failed events. # TYPE primo_states_failed_events counter primo_states_failed_events 0

Метрики WebApi

# HELP primo_orch_robots Number of robots # TYPE primo_orch_robots gauge primo_orch_robots 6 # HELP primo_orch_deployed_robots Number of deployed robots # TYPE primo_orch_deployed_robots gauge primo_orch_deployed_robots 6 # HELP primo_orch_running_robots Number of running robots # TYPE primo_orch_running_robots gauge primo_orch_running_robots 0 # HELP primo_orch_workers Number of workers # TYPE primo_orch_workers gauge primo_orch_workers 7 # HELP primo_orch_workers_with_robots Number of workers with robots # TYPE primo_orch_workers_with_robots gauge primo_orch_workers_with_robots 4 # HELP primo_orch_projects Number of projects # TYPE primo_orch_projects gauge primo_orch_projects 3 # HELP primo_orch_running_projects Number of running projects # TYPE primo_orch_running_projects gauge primo_orch_running_projects 0 # HELP primo_orch_versioned_projects Number of versioned projects # TYPE primo_orch_versioned_projects gauge primo_orch_versioned_projects 0 # HELP primo_orch_assets Number of assets # TYPE primo_orch_assets gauge primo_orch_assets 0 # HELP primo_orch_templates Number of templates # TYPE primo_orch_templates gauge primo_orch_templates 0 # HELP primo_orch_assignments Number of assignments # TYPE primo_orch_assignments gauge primo_orch_assignments 0 # HELP primo_orch_complete_assignments Number of complete assignments # TYPE primo_orch_complete_assignments gauge primo_orch_complete_assignments 0 # HELP primo_orch_new_assignments Number of new assignments # TYPE primo_orch_new_assignments gauge primo_orch_new_assignments 0 # HELP primo_orch_paused_assignments Number of paused assignments # TYPE primo_orch_paused_assignments gauge primo_orch_paused_assignments 0 # HELP primo_orch_running_assignments Number of running assignments # TYPE primo_orch_running_assignments gauge primo_orch_running_assignments 0 # HELP primo_orch_free_studio_licenses Number of studio licenses # TYPE primo_orch_free_studio_licenses gauge primo_orch_free_studio_licenses 100 # HELP primo_orch_busy_studio_licenses Number of busy studio licenses # TYPE primo_orch_busy_studio_licenses gauge primo_orch_busy_studio_licenses 0 # HELP primo_orch_robot_enterprise_licenses Number of robot enterprise licenses # TYPE primo_orch_robot_enterprise_licenses gauge primo_orch_robot_enterprise_licenses 100 # HELP primo_orch_robot_standard_licenses Number of robot standard licenses # TYPE primo_orch_robot_standard_licenses gauge primo_orch_robot_standard_licenses 0 # HELP primo_orch_robot_desktop_licenses Number of robot desktop licenses # TYPE primo_orch_robot_desktop_licenses gauge primo_orch_robot_desktop_licenses 100 # HELP primo_orch_robot_busy_enterprise_licenses Number of robot busy enterprise licenses # TYPE primo_orch_robot_busy_enterprise_licenses gauge primo_orch_robot_busy_enterprise_licenses 0 # HELP primo_orch_robot_busy_standard_licenses Number of robot busy standard licenses # TYPE primo_orch_robot_busy_standard_licenses gauge primo_orch_robot_busy_standard_licenses 0 # HELP primo_orch_robot_busy_desktop_licenses Number of robot busy desktop licenses # TYPE primo_orch_robot_busy_desktop_licenses gauge primo_orch_robot_busy_desktop_licenses 0 # HELP primo_orch_attended_robot_busy_enterprise_licenses Number of attended robot busy enterprise licenses # TYPE primo_orch_attended_robot_busy_enterprise_licenses gauge primo_orch_attended_robot_busy_enterprise_licenses 0 # HELP primo_orch_attended_robot_busy_standard_licenses Number of attended robot busy standard licenses # TYPE primo_orch_attended_robot_busy_standard_licenses gauge primo_orch_attended_robot_busy_standard_licenses 0 # HELP primo_orch_attended_robot_busy_desktop_licenses Number of attended robot busy desktop licenses # TYPE primo_orch_attended_robot_busy_desktop_licenses gauge primo_orch_attended_robot_busy_desktop_licenses 0

При включенном мониторинге Npgsql доступны метрики для БД ltools, ltoolsidentity, ltoolslicense, ltoolsltwrepo. Обновление метрик WebApi использует параметр IdleDurationMillis, который не может быть меньше 1 секунды.

Настройка мониторинга служб

Для включения мониторинга компонентов Оркестратора добавьте секцию Мониторинг в конфигурацию службы:

Настройка мониторинга службы WebApi

Доступно с версии Оркестратора 1.25.5+

"Monitoring": { "Provider": "Zabbix", // Доступные значения: "", "Zabbix", "Prometheus" ("" - отключено) "Port": 10063, "IdleDurationMillis": 2000, "DotNetMonitoringEnabled": true, "NpgsqlMonitoringEnabled": true, "KestrelMonitoringEnabled": true }

Служба ArcSight

Доступно с версии Оркестратора 1.25.5+

"Monitoring": { "Provider": "Zabbix", "Port": 10062, "DotNetMonitoringEnabled": true, "NpgsqlMonitoringEnabled": true, "KestrelMonitoringEnabled": true }

Служба States

Доступно с версии Оркестратора 1.25.5+

"Monitoring": { "Provider": "Zabbix", "Port": 10061, "DotNetMonitoringEnabled": true, "NpgsqlMonitoringEnabled": true, "KestrelMonitoringEnabled": true }