Check the Health of Your Aternity Deployment (System Health)

The System Health dashboard displays the overall topology and health of your Aternity deployment. As an Administrator of Aternity who has an understanding of the implementation topology, use this dashboard regularly to check the events in your Aternity deployment, or whenever Aternity is slow, to troubleshoot the health of the entire deployment.

When there is a problem in your Aternity deployment, an icon will show up in the top bar indicating a problem and taking you to the System Health dashboard for further investigation.

For example, if you believe that there is an issue with the Messaging Broker configuration, check under Status in this dashboard for the impact of Messaging Broker on different components, for example check the status of the Vertica Writer.

The System Health dashboard

The dashboard includes the names of the components of the Aternity deployment, as described in the following diagram and table.

Component Description
Monitored Devices with the Agent for End User Devices

A monitored end user device is a desktop, laptop, smartphone or tablet which reports monitoring data to SteelCentral Aternity™.

You can monitor the performance of Windows devices (laptops, desktops, tablets), Apple Mac devices (desktops, laptops), mobile devices (iOS, Android) and virtual sessions (VDI and virtual applications).

Aternity gathers data about the performance of applications and devices using its Agent for End User Devices which runs in the background.

Aternity Management Server

The Aternity Management Server acts as Aternity's central server, which manages and integrates all the components. When users access Aternity to view the dashboards or configure it, they access this server via a browser.

Aternity Aggregation Server

An Aggregation Server gathers (aggregates) the data directly from Agent for End User Devicess on monitored devices, and passes it on to the Management Server.

Aternity Data Warehouse Server

The Data Warehouse Server stores the raw data gathered from the Aggregation Servers, and aggregates (summarizes) it for the Oracle Database Server and the Aternity Vertica Database Server.

It constantly aggregates and re-summarizes data in the main database in the background, replacing older, more detailed data with summary data as it ages. Therefore older data typically has limited drill-down capabilities.

Aternity Dashboard Server

The Dashboard Server displays Aternity's intuitive dashboards using Tableau as its engine. It presents the data from the Aternity Vertica Database Server.

Aternity Oracle Database Server

The Aternity Oracle Database Server is an Oracle database which hosts the Aternity system settings, data model and performance data, after the Data Warehouse Server summarized (aggregated) it.

Aternity Vertica Database Server

The Aternity Vertica Database Server stores the performance data from the past 31 days in the Vertica format, which is most efficient for displaying in Aternity dashboards. It receives its data from the Aternity Docker Components Server.

Analytics Server

The Analytics Server (or RCA server), calculates an activity's performance baselines, its score and status, and detects the occurrence of incidents. It is on the same computer as the Management Server.

Aternity Docker Components Server

The Aternity Docker Components Server is one of the Aternity on-premise components that hosts all other Aternity docker containers which add functionality to Aternity. Most components are mandatory, but you can choose to add or omit some of those Docker containers from your deployment.

Note

To save hardware resources, you can deploy several components on the same computer, if this comply with the hardware requirements of your deployment size. For example, Aggregation Server and Data Warehouse Server can reside together with the Aternity Management Server. You can run each component on a standalone computer, or together (except for Dashboard Gateway and Dashboard Server which always run together). For more information, see the Installation Guide.

The following components are internal to the Aternity Docker Components Server:

Component Description
Aternity Agent Management The Agent Management is responsible for displaying and operating the Agents Administration page in the Aternity console. It allows Aternity admins to start/stop Agents, analyze status, upload logs, and more. To access the Agents Administration, login to Aternity, select the Gear Icon > Agents > Agents Administration.
Vertica Writer The Vertica Writer component is responsible for aggregating, indexing and summarizing the analytic data that arrives from various Aternity servers and writing it into the Vertica Database Server.
Vertica Scheduler The Vertica Scheduler is responsible for creating the time-sensitive rollup aggregations in Vertica Database Server. As data gets older, hourly and daily aggregations are being created storing the RAW data and more compact structures. When using the Aternity dashboards, depending on the time range selection, Aternity will automatically route you to the relevant aggregation. Vertica Scheduler runs periodical tasks, such as hourly and daily aggregation, installed app snapshot calculation, and statistic computation.
Aternity Data Source for Portal

Configure the SteelCentral Portal™ to connect to your Aternity Data Source to view Aternity data in the Portal alongside data from other products in the SteelCentral Suite.

Learn more
SDA Server (Service Desk Alerts) (Optional)

A service desk alert (SDA) defines email or ServiceNow alerts on top of Aternity health events.

A service desk alert (SDA) indicates that the same health event occurred several times on the same device within a certain time. Aternity sends SDAs to draw attention to devices which suffer repeated application errors, system crashes or hardware issues. For example, you can receive an SDA whenever a device suffers from the same crash more than twice a week.

Learn more.

Aternity REST API Server(Optional)

The Aternity REST API Server is a component in Aternity on-premise which allows authorized users to send REST API queries to directly extract and analyze Aternity's data without Aternity's dashboards. You can combine the data with other data sources if needed, or transform it as required, then view it in Microsoft Excel, Power BI, or your own data application.

Learn more.

DPS (Installed Software)

The DPS is the data processing component. The DPS (Installed Software) is responsible for parsing and aggregating the Installed Software measurements, enabling analysis tasks, such as “who does not have the latest version installed” or “who already installed the latest OS patch”.

DPS (Device Resources)

The DPS is the data processing component. The DPS (Device Resource) is responsible for parsing and aggregating the device resource measurements, such as CPU, Memory, Disk usage and WiFi measurements. This data is later stored to Vertica Database Server for use in the Analyze dashboards and REST APIs.
Aternity Raw Data Docker Component (Cassandra)

The Raw Data Component houses the Cassandra Database and stores the detailed information and measurements for monitored devices for a maximum of 7 days. You view this data in the Troubleshoot Device and in the Installed Software dashboards.

Aternity Messaging Broker Docker Component (Kafka)

The Messaging Broker component is built on top of the Kafka infrastructure and serves as the messaging system between various Aternity components responsible for collecting, analyzing, aggregating and storing the collected data.

Note

Some components are an integral part of the Aternity Docker Components Server and installed together with the Docker machine; others are optional, like Aternity REST API Server and SDA Server (Service Desk Alerts). For more information, see the Installation Guide.

Procedure

  1. Step 1 Open a browser and sign in to Aternity.
  2. Step 2 Select the Gear Icon > System Health.
    Accessing the System Health dashboard
  3. Step 3 View the Aternity components status in the Server Topology section.
    View which components run and which are failed
    Field Description
    Component

    Displays the logical name of the Aternity component running on the machine. Learn more .

    Note

    The Aggregation Servers and Data Warehouse Servers will not be displayed in the dashboard in case you installed them together with the Management Server on the same machine.

    Version

    Displays the component version, mainly used for troubleshooting and post-installation verifications.

    For example, the Aggregation Server version should be aligned with the Management Server version.

    Machine Name

    Displays the server hostname or FQDN. We recommend you assign hostnames which indicate the name of the Aternity solution component.

    IP

    Displays the server IP address (always IP v4) (the computer IP where the component is running).

    Uptime

    Displays the time since the component started.

    Status

    This is a graphical representation of the Status column:

    • Green represents the Status - Running.

    • Yellow represents the Status - Running with Warnings.

    • Red represents the Status - Failed/Not reporting.

    • Gray represents the Status - Unknown/Starting.

    Server Status Tooltip

    The tooltip options appear when you hover over the Status on each row.

    • Status: The green status means there are no errors, and the red notifies about errors.
    • Errors: Lists all errors when the status is red.
    • Warnings: Lists all warnings when status is yellow .
    • OK: Displays validation rules that passed the tests and appears when the status is green.
    • Version: Displays the component version.
    • Last Update: Displays the last status update time.
    • Time zone: Displays a component time zone (will not be presented for Raw Data Component, Messaging Broker, Vertica Database Server, and Oracle Database Server).
    • Reporting Agents: A number of reporting Agents (displayed only for Management Server and Aggregation Server).
    • Download log : The link allows to download the log of all errors. It will not show for third-party components: Raw Data Component, Messaging Broker, Oracle Database Server.
      View status of the system health event
  4. Step 4 Limit the view in the Server Topology section, for example, to display only the malfunctioning machines by selecting Status > Running with Warnings.
    Zoom in on components or statuses in the Server Topology section

    The Components list appears by default with the Management Server, Aggregation Servers and Data Warehouse Server first, then you find all databases, and the Docker Components are in the end of the list. You can change this view and sort in the alphabetical order by Component, or sort by Uptime, or Status.

    Note

    After a fresh installation or update to version 11.0, the dashboard will display all version 11.0 components, and all unknown fields will appear as null or empty.

  5. Step 5 View the major or critical events on any Aternity component and troubleshoot your system.

    This dashboard displays all the defined events which occurred during the selected timeframe. If the severity is Cleared, you may take notice, but you do not need to take action. If the severity is Major or Critical, start your troubleshooting by checking the component involved and then the machine.

    View the health events of the Aternity solution which occurred during the dashboard's timeframe
    Field Description
    Time

    Displays the time the health event occurred.

    Component

    Displays the component name. Learn more.

    Type

    Displays the type of the event. There are three types of events:

    • Change in Status: Occurs when a component changes its status, for example from Running to Failing.

    • Change in Aternity Status: Indicates the Aternity entire deployment status; in other words, an aggregated status of all components. For example, if more than one component's status changed from Running to Failing it means that the Aternity deployment changed its status from Running to Failing, in this case you will also see an indication alert on top of the screen with a text that describes which components cause the Aternity deployment status to change.

    • Error: Indicates all legacy events that were in the system prior to version 11.0.

      Note

      When an event is triggered, the system sends an alert (if this option is enabled). System administrator can enable or disable system alerts for each event type. For more information, contact your system administrator.

    Severity

    Displays the severity of the event. The options are:

    • Cleared: the problem does not occur any more.

    • Major: some of Aternity's functionality is temporarily unavailable.

    • Critical: severe Aternity system failure.

    • Info: the status changed to Running or to Starting.

    Message

    Displays the error message of the event.

    Reason

    Displays the reason for the error. For example, Aggregation server is failing. Reason: No connection to Kafka.

    You can select to display only the cleared or the major events, in the drop-down menu in the upper right corner of the section, or display only part of the event types.

    For example, if a critical event appeared that the Messaging Broker is failing due to its configuration issues, look at the later events in the Event Viewer to check if this critical event cleared. Check if other components that have been impacted by Messaging Broker failure are also cleared.

    Another situation when Aternity generates a critical event is if any of the servers loses communication with the Management Server. You must immediately check the cause and remedy it.

  6. Step 6 Check if Management Server, Aggregation Servers, or Data Warehouse Server are running low on resources, for CPU usage, Java virtual machine garbage collection, or heap size, in the Performance Counters section.

    By default, this section displays the information for all the Management Server, Aggregation Servers, and Data Warehouse Server during the dashboard's timeframe. To view the performance of only one machine, select its name in the Server Topology section.

    View the machine resource usage
    Field Description
    CPU Utilization (%)

    Displays the total percentage CPU usage of the machine within the timeframe of the dashboard. Check if there was a high CPU usage for longer periods of time and try to correlate that with the memory usage and the events which occurred (see the Event Viewer section).

    JVM GC time per minute

    Displays the number of seconds per minute when the Java Virtual Machine (JVM) spent collecting garbage, during the timeframe of the dashboard. If this parameter has frequent high values, you may consider, tuning the garbage collection parameters, or increase the computer's RAM.

    Heap size

    Displays the heap size during the timeframe of the dashboard. If you see a constant increase in the Java heap memory, it could be a sign that the garbage collection is not properly performed, or, if its recurs, it could be an indication of memory leakage, which might lead to an Out Of Memory Error. Consider increasing the heap size, or perform memory leak detection to find if you have a memory leak and determine its cause.

    Note

    This section shows the performance only of the Management Server, Data Warehouse Server and Aggregation Servers. If you select a different component, the Performance section will be empty and the message will show stating that the performance data is not available for that component.

  7. Step 7 Limit the view in the dashboard to only one machine, to see its performance and health events.
    Limit the view to one machine
  8. Step 8 To download a component's logs for further investigation, hover over the Status in the Server Topology section and select Download Log Links.

    This downloads a set of log files containing system information like configurations, server statistics, discarded data, events, errors, incident analysis, and so on. You cannot access logs from this screen for the Oracle Database Server, Vertica Database Server, Raw Data Component, Messaging Broker and Aternity Dashboard Server.

    Inspecting log files of Aternity components
  9. Step 9 Select to display the data for a longer period of time to see if a critical event is repetitive or zoom in to check exactly when the CPU usage was at its highest values.
    Select the timeframe of your dashboard
    Field Description
    Timeframe

    You can change the start time of the data displayed in this dashboard in the Timeframe menu.

    This dashboard displays raw data in real time, refreshing every time you access it or whenever you manually refresh the browser page.

    You can access data in this dashboard (retention) going back up to 30 days.