Мониторинг служб
Оркестратор поддерживает сбор метрик через Zabbix и Prometheus для различных служб.
Подсистема мониторинга настраивается в разделе Monitoring файла конфигурации:
"Monitoring": {
"Provider": "Zabbix", // Доступные значения: "", "Zabbix", "Prometheus" ("" - отключено)
"Port": 10052,
"IdleDurationMillis": 2000,
"DotNetMonitoringEnabled": true,
"NpgsqlMonitoringEnabled": true,
"KestrelMonitoringEnabled": true
}
Примеры настройки мониторинга для служб WebApi, ArcSight и States приведены здесь .
- Может использоваться любой доступный порт старше 1024,
- Параметр
DotNetMonitoringEnabled
используется для включения мониторинга .NET. - Параметр
KestrelMonitoringEnabled
- для включения мониторинга сервера Kestrel. - Параметр
NpgsqlMonitoringEnabled
- для мониторинга Npgsql. - Ограничить скорость обновлений метрик можно с помощью параметра
IdleDurationMillis
- это период в милисекундах (по умолчанию, 2 секунды), в течении которого некоторые метрики не обновляются и возвращаются предыдущие значения (используется только в некоторых сервисах).
Подсистема мониторинга не инициирует отправку данных. Она ждёт подключения внешней службы сбора данных (провайдера) и отвечает на её запросы.
Настройка провайдеров
-
Руководство по установке и настройке Zabbix: https://wiki.astralinux.ru/pages/viewpage.action?pageId=38699775
-
Руководство по установке и настройке Prometheus: https://astra.ru/upload/parser_jira/certs/prot_SE17_RDY-6487.pdf https://www.dmosk.ru/instruktions.php?object=prometheus-linux
Подключение к провайдеру
При включенном провайдере Zabbix у службы доступен следующий эндпоинт:
tcp://<IP хоста службы>:<порт мониторинга>
Этот эндпоинт необходимо указывать при настройке агента Zabbix на сервере Zabbix.
При включенном провайдере Prometheus у службы доступен следующий эндпоинт:
http://<IP хоста службы>:<порт мониторинга>/metrics
Этот URL необходимо указывать при сервера Prometheus; для быстрой проверки этот эндпоинт также можно открыть в браузере.
Общие метрики
Для каждой службы можно включить сбор метрик .NET, Kestrel и Npgsql. Для некоторых служб (например RDP2 или MachineInfo) включение метрик Npgsql бессмысленно, так как эти службы сами не подключаются к БД PostgreSQL.
Метрики .NET
# HELP dotnet_collection_count_total_0 GC collection count for gen0
# TYPE dotnet_collection_count_total_0 counter
dotnet_collection_count_total_0
# HELP dotnet_collection_count_total_1 GC collection count for gen1
# TYPE dotnet_collection_count_total_1 counter
dotnet_collection_count_total_1
# HELP dotnet_collection_count_total_2 GC collection count for gen2
# TYPE dotnet_collection_count_total_2 counter
dotnet_collection_count_total_2
# HELP dotnet_total_memory_bytes Total known allocated memory
# TYPE dotnet_total_memory_bytes gauge
dotnet_total_memory_bytes
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes
# HELP process_working_set_bytes Process working set
# TYPE process_working_set_bytes gauge
process_working_set_bytes
# HELP process_private_memory_bytes Process private memory size
# TYPE process_private_memory_bytes gauge
process_private_memory_bytes
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total
# HELP process_open_handles Number of open handles
# TYPE process_open_handles gauge
process_open_handles
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds
# HELP process_num_threads Total number of threads
# TYPE process_num_threads gauge
process_num_threads
Метрики сервера Kestrel
# HELP kestrel_active_connections Number of connections that are currently active on the server.
# TYPE kestrel_active_connections gauge
kestrel_active_connections 1
# HELP kestrel_connection_duration The duration of connections on the server.
# TYPE kestrel_connection_duration gauge
kestrel_connection_duration 0
# HELP kestrel_rejected_connections Number of connections rejected by the server. Connections are rejected when the currently active count exceeds the value configured with MaxConcurrentConnections.
# TYPE kestrel_rejected_connections counter
kestrel_rejected_connections 0
# HELP kestrel_queued_connections Number of connections that are currently queued and are waiting to start.
# TYPE kestrel_queued_connections gauge
kestrel_queued_connections -1
# HELP kestrel_queued_requests Number of HTTP requests on multiplexed connections (HTTP/2 and HTTP/3) that are currently queued and are waiting to start.
# TYPE kestrel_queued_requests gauge
kestrel_queued_requests 0
# HELP kestrel_upgraded_connections Number of HTTP connections that are currently upgraded (WebSockets). The number only tracks HTTP/1.1 connections.
# TYPE kestrel_upgraded_connections gauge
kestrel_upgraded_connections 0
# HELP kestrel_tls_handshake_duration The duration of TLS handshakes on the server.
# TYPE kestrel_tls_handshake_duration gauge
kestrel_tls_handshake_duration 0
# HELP kestrel_active_tls_handshakes Number of TLS handshakes that are currently in progress on the server.
# TYPE kestrel_active_tls_handshakes gauge
kestrel_active_tls_handshakes 0
Метрики Npgsql
Метрики клиентских подключений к БД PostgreSQL параметризированные, параметром является имя БД, например ltoolslogs
или ltools
. Формат имён параметризированных метрик (ключей элементов данных или item key для Zabbix) различается для Prometheus и Zabbix:
- метрика Prometheus’а
db_client_connections_usage{DbName="ltoolslogs"}
в Zabbix’е будет выглядетьdb_client_connections_usage[ltoolslogs]
; - метрика Prometheus’а
db_client_commands_bytes_read{DbName="ltools"}
в Zabbix’е будет выглядетьdb_client_commands_bytes_read[ltools]
.
# HELP db_client_commands_executing The number of currently executing database commands.
# TYPE db_client_commands_executing gauge
db_client_commands_executing{DbName="<DbName>"} -1
# HELP db_client_commands_failed The number of database commands which have failed.
# TYPE db_client_commands_failed gauge
db_client_commands_failed{DbName="<DbName>"} 0
# HELP db_client_commands_duration The duration of database commands, in seconds.
# TYPE db_client_commands_duration gauge
db_client_commands_duration{DbName="<DbName>"} 0
# HELP db_client_commands_bytes_written The number of bytes written.
# TYPE db_client_commands_bytes_written counter
db_client_commands_bytes_written{DbName="<DbName>"} 3631
# HELP db_client_commands_bytes_read The number of bytes read.
# TYPE db_client_commands_bytes_read counter
db_client_commands_bytes_read{DbName="<DbName>"} 8192
# HELP db_client_connections_pending_requests The number of pending requests for an open connection, cumulative for the entire pool.
# TYPE db_client_connections_pending_requests gauge
db_client_connections_pending_requests{DbName="<DbName>"} 0
# HELP db_client_connections_timeouts The number of connection timeouts that have occurred trying to obtain a connection from the pool.
# TYPE db_client_connections_timeouts counter
db_client_connections_timeouts{DbName="<DbName>"} 0
# HELP db_client_connections_create_time The time it took to create a new connection.
# TYPE db_client_connections_create_time gauge
db_client_connections_create_time{DbName="<DbName>"} 0
# HELP db_client_connections_usage The number of connections that are currently in state described by the state attribute.
# TYPE db_client_connections_usage gauge
db_client_connections_usage{DbName="<DbName>"} 0
# HELP db_client_connections_max The maximum number of open connections allowed.
# TYPE db_client_connections_max gauge
db_client_connections_max{DbName="<DbName>"} 20
# HELP db_client_commands_prepared_ratio The ratio of prepared command executions.
# TYPE db_client_commands_prepared_ratio gauge
db_client_commands_prepared_ratio{DbName="<DbName>"} 0
Метрики MachineInfo
# HELP primo_mi_requests Number of requests
# TYPE primo_mi_requests counter
primo_mi_requests
Метрики Agent
# HELP primo_ag_running_robots Number of running robots
# TYPE primo_ag_running_robots gauge
primo_ag_running_robots
# HELP primo_ag_successful_starts Number of successful starts
# TYPE primo_ag_successful_starts counter
primo_ag_successful_starts
# HELP primo_ag_failed_starts Number of failed starts
# TYPE primo_ag_failed_starts counter
primo_ag_failed_starts
Метрики RDP2
# HELP primo_rdp2_active_sessions Number of active RDP sessions
# TYPE primo_rdp2_active_sessions gauge
primo_rdp2_active_sessions 4
# HELP primo_rdp2_successful_connections Number of successful connections
# TYPE primo_rdp2_successful_connections counter
primo_rdp2_successful_connections 4
# HELP primo_rdp2_failed_connections Number of failed connections
# TYPE primo_rdp2_failed_connections counter
primo_rdp2_failed_connections 0
# HELP primo_rdp2_disconnections Number of disconnections
# TYPE primo_rdp2_disconnections counter
primo_rdp2_disconnections 0
# HELP primo_rdp2_streams Number of active streams
# TYPE primo_rdp2_streams gauge
primo_rdp2_streams 0
# HELP primo_rdp2_viewers Number of active viewers
# TYPE primo_rdp2_viewers gauge
primo_rdp2_viewers 0
# HELP primo_rdp2_managers Number of active managers
# TYPE primo_rdp2_managers gauge
primo_rdp2_managers 0
Метрики RobotLogs
# HELP primo_rl_messages Number of robot messages
# TYPE primo_rl_messages counter
primo_rl_messages 0
# HELP primo_rl_orch_messages Number of orchestrator messages
# TYPE primo_rl_orch_messages counter
primo_rl_orch_messages 10012
# HELP primo_rl_custom_messages Number of robot custom messages
# TYPE primo_rl_custom_messages counter
primo_rl_custom_messages 0
# HELP primo_rl_attended_messages Number of attended robot messages
# TYPE primo_rl_attended_messages counter
primo_rl_attended_messages 0
# HELP primo_rl_attended_custom_messages Number of attended robot custom messages
# TYPE primo_rl_attended_custom_messages counter
primo_rl_attended_custom_messages 0
# HELP primo_rl_screen_file_requests Number of screen file requests
# TYPE primo_rl_screen_file_requests counter
primo_rl_screen_file_requests 0
# HELP primo_rl_screen_file_thumb_requests Number of screen file thumb requests
# TYPE primo_rl_screen_file_thumb_requests counter
primo_rl_screen_file_thumb_requests 0
# HELP primo_rl_queue_lost_messages Number of queue lost messages
# TYPE primo_rl_queue_lost_messages counter
primo_rl_queue_lost_messages 0
# HELP primo_rl_queue_length The queue length
# TYPE primo_rl_queue_length gauge
primo_rl_queue_length 0
При включенном мониторинге Npgsql доступны метрики для БД ltoolslog
.
Метрики Notifications
# HELP primo_ntf_successfully_sent_emails The number of successfully sent emails.
# TYPE primo_ntf_successfully_sent_emails counter
primo_ntf_successfully_sent_emails 0
# HELP primo_ntf_unsuccessfully_sent_emails The number of unsuccessfully sent emails.
# TYPE primo_ntf_unsuccessfully_sent_emails counter
primo_ntf_unsuccessfully_sent_emails 0
# HELP primo_ntf_successfully_rendered_html_templates The number of successfully rendered HTML templates.
# TYPE primo_ntf_successfully_rendered_html_templates counter
primo_ntf_successfully_rendered_html_templates 0
# HELP primo_ntf_unsuccessfully_rendered_html_templates The number of unsuccessfully rendered HTML templates.
# TYPE primo_ntf_unsuccessfully_rendered_html_templates counter
primo_ntf_unsuccessfully_rendered_html_templates 0
# HELP primo_ntf_successfully_rendered_xlsx_attachments The number of successfully rendered Xlsx attachments.
# TYPE primo_ntf_successfully_rendered_xlsx_attachments counter
primo_ntf_successfully_rendered_xlsx_attachments 0
# HELP primo_ntf_unsuccessfully_rendered_xlsx_attachments The number of unsuccessfully rendered Xlsx attachments.
# TYPE primo_ntf_unsuccessfully_rendered_xlsx_attachments counter
primo_ntf_unsuccessfully_rendered_xlsx_attachments 0
Метрики LogEventsWebhook
# HELP primo_lew_login_successes The number of login successes
# TYPE primo_lew_login_successes counter
primo_lew_login_successes 0
# HELP primo_lew_login_failures The number of login failures
# TYPE primo_lew_login_failures counter
primo_lew_login_failures 0
# HELP primo_lew_event_successes The number of event processing successes
# TYPE primo_lew_event_successes counter
primo_lew_event_successes 0
# HELP primo_lew_event_failures The number of event processing failures
# TYPE primo_lew_event_failures counter
primo_lew_event_failures 0
Метрики Analytic
# HELP primo_analytic_received_orch_events The number of received orch events.
# TYPE primo_analytic_received_orch_events counter
primo_analytic_received_orch_events 0
# HELP primo_analytic_processed_orch_events The number of processed orch events.
# TYPE primo_analytic_processed_orch_events counter
primo_analytic_processed_orch_events 0
# HELP primo_analytic_failed_orch_events The number of failed orch events.
# TYPE primo_analytic_failed_orch_events counter
primo_analytic_failed_orch_events 0
# HELP primo_analytic_successful_refreshes The number of successful refreshes.
# TYPE primo_analytic_successful_refreshes counter
primo_analytic_successful_refreshes{TableName="mv_RobotsUsage"} 0
primo_analytic_successful_refreshes{TableName="mv_WorkersUsage"} 0
# HELP primo_analytic_failed_refreshes The number of failed refreshes.
# TYPE primo_analytic_failed_refreshes counter
primo_analytic_failed_refreshes{TableName="mv_RobotsUsage"} 0
primo_analytic_failed_refreshes{TableName="mv_WorkersUsage"} 0
При включенном мониторинге Npgsql доступны метрики для БД ltoolsanalytic
.
Метрики ArcSight
# HELP primo_arcsight_written_orch_events The number of written orch events.
# TYPE primo_arcsight_written_orch_events counter
primo_arcsight_written_orch_events 0
# HELP primo_arcsight_converted_orch_events The number of converted orch events.
# TYPE primo_arcsight_converted_orch_events counter
primo_arcsight_converted_orch_events 0
Метрики States
# HELP primo_states_processed_events The number of processed events.
# TYPE primo_states_processed_events counter
primo_states_processed_events 0
# HELP primo_states_failed_events The number of failed events.
# TYPE primo_states_failed_events counter
primo_states_failed_events 0
Метрики WebApi
# HELP primo_orch_robots Number of robots
# TYPE primo_orch_robots gauge
primo_orch_robots 6
# HELP primo_orch_deployed_robots Number of deployed robots
# TYPE primo_orch_deployed_robots gauge
primo_orch_deployed_robots 6
# HELP primo_orch_running_robots Number of running robots
# TYPE primo_orch_running_robots gauge
primo_orch_running_robots 0
# HELP primo_orch_workers Number of workers
# TYPE primo_orch_workers gauge
primo_orch_workers 7
# HELP primo_orch_workers_with_robots Number of workers with robots
# TYPE primo_orch_workers_with_robots gauge
primo_orch_workers_with_robots 4
# HELP primo_orch_projects Number of projects
# TYPE primo_orch_projects gauge
primo_orch_projects 3
# HELP primo_orch_running_projects Number of running projects
# TYPE primo_orch_running_projects gauge
primo_orch_running_projects 0
# HELP primo_orch_versioned_projects Number of versioned projects
# TYPE primo_orch_versioned_projects gauge
primo_orch_versioned_projects 0
# HELP primo_orch_assets Number of assets
# TYPE primo_orch_assets gauge
primo_orch_assets 0
# HELP primo_orch_templates Number of templates
# TYPE primo_orch_templates gauge
primo_orch_templates 0
# HELP primo_orch_assignments Number of assignments
# TYPE primo_orch_assignments gauge
primo_orch_assignments 0
# HELP primo_orch_complete_assignments Number of complete assignments
# TYPE primo_orch_complete_assignments gauge
primo_orch_complete_assignments 0
# HELP primo_orch_new_assignments Number of new assignments
# TYPE primo_orch_new_assignments gauge
primo_orch_new_assignments 0
# HELP primo_orch_paused_assignments Number of paused assignments
# TYPE primo_orch_paused_assignments gauge
primo_orch_paused_assignments 0
# HELP primo_orch_running_assignments Number of running assignments
# TYPE primo_orch_running_assignments gauge
primo_orch_running_assignments 0
# HELP primo_orch_free_studio_licenses Number of studio licenses
# TYPE primo_orch_free_studio_licenses gauge
primo_orch_free_studio_licenses 100
# HELP primo_orch_busy_studio_licenses Number of busy studio licenses
# TYPE primo_orch_busy_studio_licenses gauge
primo_orch_busy_studio_licenses 0
# HELP primo_orch_robot_enterprise_licenses Number of robot enterprise licenses
# TYPE primo_orch_robot_enterprise_licenses gauge
primo_orch_robot_enterprise_licenses 100
# HELP primo_orch_robot_standard_licenses Number of robot standard licenses
# TYPE primo_orch_robot_standard_licenses gauge
primo_orch_robot_standard_licenses 0
# HELP primo_orch_robot_desktop_licenses Number of robot desktop licenses
# TYPE primo_orch_robot_desktop_licenses gauge
primo_orch_robot_desktop_licenses 100
# HELP primo_orch_robot_busy_enterprise_licenses Number of robot busy enterprise licenses
# TYPE primo_orch_robot_busy_enterprise_licenses gauge
primo_orch_robot_busy_enterprise_licenses 0
# HELP primo_orch_robot_busy_standard_licenses Number of robot busy standard licenses
# TYPE primo_orch_robot_busy_standard_licenses gauge
primo_orch_robot_busy_standard_licenses 0
# HELP primo_orch_robot_busy_desktop_licenses Number of robot busy desktop licenses
# TYPE primo_orch_robot_busy_desktop_licenses gauge
primo_orch_robot_busy_desktop_licenses 0
# HELP primo_orch_attended_robot_busy_enterprise_licenses Number of attended robot busy enterprise licenses
# TYPE primo_orch_attended_robot_busy_enterprise_licenses gauge
primo_orch_attended_robot_busy_enterprise_licenses 0
# HELP primo_orch_attended_robot_busy_standard_licenses Number of attended robot busy standard licenses
# TYPE primo_orch_attended_robot_busy_standard_licenses gauge
primo_orch_attended_robot_busy_standard_licenses 0
# HELP primo_orch_attended_robot_busy_desktop_licenses Number of attended robot busy desktop licenses
# TYPE primo_orch_attended_robot_busy_desktop_licenses gauge
primo_orch_attended_robot_busy_desktop_licenses 0
При включенном мониторинге Npgsql доступны метрики для БД ltools
, ltoolsidentity
, ltoolslicense
, ltoolsltwrepo
. Обновление метрик WebApi использует параметр IdleDurationMillis
, который не может быть меньше 1 секунды.
Настройка мониторинга служб
Для включения мониторинга компонентов Оркестратора добавьте секцию Мониторинг в конфигурацию службы:
Настройка мониторинга службы WebApi
Доступно с версии Оркестратора 1.25.5+
"Monitoring": {
"Provider": "Zabbix", // Доступные значения: "", "Zabbix", "Prometheus" ("" - отключено)
"Port": 10063,
"IdleDurationMillis": 2000,
"DotNetMonitoringEnabled": true,
"NpgsqlMonitoringEnabled": true,
"KestrelMonitoringEnabled": true
}
Служба ArcSight
Доступно с версии Оркестратора 1.25.5+
"Monitoring": {
"Provider": "Zabbix",
"Port": 10062,
"DotNetMonitoringEnabled": true,
"NpgsqlMonitoringEnabled": true,
"KestrelMonitoringEnabled": true
}
Служба States
Доступно с версии Оркестратора 1.25.5+
"Monitoring": {
"Provider": "Zabbix",
"Port": 10061,
"DotNetMonitoringEnabled": true,
"NpgsqlMonitoringEnabled": true,
"KestrelMonitoringEnabled": true
}