Check the Health of Your Aternity Deployment (System Health)

The System Health dashboard displays the overall topology and health of your Aternity system. As an Administrator of Aternity who has an advanced understanding of the platform architecture and implementation, use this dashboard regularly to check the events in your Aternity system, or whenever the system is slow to troubleshoot the health of the entire deployment.

For example, if you believe that an Aggregation Server might be overloaded, check this dashboard for key performance indicators of the various system components.

The System Health dashboard

The dashboard includes the names of the components of the Aternity solution, as described in the following diagram.

Aternity solution information flow
Component Description
Aternity Agent

The Aternity Agent monitors end user experience by measuring device and application performance. It is a small background utility which runs on each monitored device, and reports its data to the Aternity Aggregation Server.

Aggregation Servers

An Aggregation Server gathers (aggregates) the data directly from Aternity Agents on monitored devices, and passes it on to the Management Server.

Data Server

The Data Server component is an internal module of the Aternity Management Server, integrating contextual data into the measurements which came from the Aggregation Servers, like device details, user names, error messages, and so on, and then it passes on to the Aternity Analytics Server.

Analytics Server or RCA Server

The Analytics Server (or RCA server), calculates an activity's performance baselines, its score and status, and detects the occurrence of incidents. It is on the same computer as the Management Server.

Data Warehouse Server

The Data Warehouse Server stores the raw data gathered from the Aggregation Servers, and aggregates (summarizes) it for the Database Server.

Aternity Database Server

The Aternity Database Server is an Oracle database which houses the Aternity system settings and the performance data from the past 1-2 years, aggregated by the Data Warehouse Server.

It constantly aggregates and re-summarizes data in the main database in the background, replacing older, more detailed data with summary data as it ages. Therefore older data typically has limited drill-down capabilities.

Dashboard Server

The Dashboard Server displays Aternity's intuitive dashboards using Tableau as its engine. It presents the raw data (from the Data Warehouse Server) and the older aggregated data (from the Database Server). Larger on-premise deployments require one or more additional Aternity Dashboard Server Workers to display dashboards more efficiently.

Aternity Management Server

The Aternity Management Server acts as the system's central server, which manages and integrates all the system components. Users access this server via a browser to configure the system and view the dashboards.

Note

Some components can be installed together on the same computer. For more information, see the Installation Guide.

Procedure

  1. Step 1 Open a browser and log in to Aternity.
  2. Step 2 Select the Gear Icon > System Health.
    Accessing the system health dashboard
  3. Step 3 View the Aternity system configuration in the Server Topology section.
    View which servers run and which are disconnected
    Machine

    Displays the server hostname. This can be any of the Aternity solution components in the figure above. We recommend you assign hostnames which indicate the name of the Aternity solution component.

    IP

    Displays the server IP address.

    Uptime

    Displays the time since the service started.

    Last Update

    Displays the last time the server reported to the Aternity solution, for example that it is running, its uptime, and so on.

    Status

    Displays the status of the component. The options are:

    • Stopping (yellow) - displays the number of servers in the process of stopping

    • Disconnected (orange) - no network connection to the Aternity solution, regardless of the causes (network problem, or machine is down)

    • Running (green)

    Component

    Displays the name of the Aternity solution component running on the machine.

    To save hardware resources, you can deploy several components on the same computer. You can run components on a standalone PC, or together (except for Dashboard Gateway and Dashboard Server which always run together). The dashboard displays the components as:

    • Database is the Database Server

    • DS is the Data Server

    • DW is the Data Warehouse Server

    • EPM is the Aggregation Server

    • Gateway is the Dashboard Gateway

    • Management is the Management Server

    • RCA is the Analytics Server

    • Tableau is the Dashboard Server

    Connecting

    Displays the number of Agents currently in the process of connecting to the Aggregation Server, but have not yet connected. When they have succeeded in connecting, the device becomes a Reporting Device.

    Reporting Devices

    Displays the number of monitored devices running a locally installed Agent, reporting to the Aggregation Server. This applies to physical desktops, virtual application servers, virtual desktops hosted in hypervisors (where each VM has its own Agent), and monitored apps on a mobile device.

    Reporting Endpoints

    Displays the number of reporting devices and in addition, the number of front line terminals running virtual applications (running from Citrix servers or terminal servers), where each front line terminal does not have an Agent installed. These deployments use a single Agent running on the virtual application server.

    Server Status

    This is a graphical representation of the Status column:

    • Green represents the component's Status is Running.

    • Yellow represents the component's Status is in the process of Stopping.

    • Orange represents the component's Status is Disconnected.

  4. Step 4 Limit the view in the Server Topology section, for example, to display only the disconnected machines by selecting Status > Disconnected, or to display only a certain type of server, by selecting the name of the component in the Component menu.
    Zoom in on components or statuses in the Server Topology section

    For example, if an Aggregation Server is disconnected, the Agents connected to that machine must now send their data to another Aggregation Server from their predefined server list. This could lead to an overload of those Aggregation Servers which take on the tasks of the disconnected ones. In this case, you can limit your display to the disconnected Aggregation Servers, to see at a glance which machines are affected. You can check if any of the remaining Aggregation Servers are close to or have reached their maximum number of connected devices.

    Tip

    To view the maximum number of connected devices to a single Aggregation Server, the Gear Icon > > Settings > Advanced Settings > epmservices > epmAddresses > connectionLimit. It is defined per machine (hostname) and depends on the machine's resources.

  5. Step 5 View the major or critical events on any Aternity component and to troubleshoot your system.

    This dashboard displays all the defined events which occurred during the selected timeframe. If the severity is Cleared, you may take notice but you do not need take action. If the severity is Major or Critical, start your troubleshooting by checking the component involved and then the machine.

    View the health events of the Aternity solution which occurred during the dashboard's timeframe
    Field Description
    Time

    Displays the time the health event occurred.

    Machine Name

    Displays the server hostname where the event occurred.

    Severity

    Displays the severity of the event. The options are:

    • Cleared: the problem does not occur any more.

    • Major: some of the system functionality is temporarily unavailable.

    • Critical: severe Aternity system failure.

    Event Type

    Displays the type of the event:

    • Communications alarm indicates communication problems between the system components, or with the LDAP, gateway, or the database and so on.

    • Quality of service alarm occurs when a problem impacts on the quality of performance or data displayed by the dashboards.

      For example, when an Aggregation Server is down, the Agents redirect to another Aggregation Server increasing its load and decreasing its performance. Alternatively, if the Dashboard Server does not manage to perform extracts, the displayed data is incomplete.

    • Processing error alarm occurs if an internal error occurred. For example, the Data Warehouse Server experienced an internal error, and the server is about to stop.

    • Environmental alarm, for example, the system component cannot function because of a hardware problem, like not enough disk space, or because of corrupted data.

    • Operational violation occurs when an event happens prevents the successful operation of the component. For example, if it reaches the maximum number of connected devices.

    • Time domain violation occurs if the time of any of the system components is not in sync with Management Server time.

    • Other is a catch-all for other types of errors. For example, a server was not properly configured, or the Aternity system is starting up, or it cannot send emails with the alerts to the predefined email list, and so on.

    Message

    Displays the error message of the event.

    Module

    Displays the name of the solution component (not hostname) involved in the event. This can be the whole system, or just one component.

    Event severity color coding

    Displays the severity of the event. The options are green (cleared), yellow (major), or orange (critical).

    You can select to display only the cleared or the major events, in the drop-down menu in the upper right corner of the section, or display all events to see if a critical event cleared afterwards. For example,if a critical event appeared that the data collection stopped due to low disk space on the Data Warehouse Server, look at the later events in the Event Viewer to check if this critical event cleared and if the time of missing data was significant. If you have many such events, you might consider increasing the Data Warehouse Server disk space.

    Another situation when the system generates a critical event is if any of the servers loses communication with the Management Server. You must immediately check the cause and remedy it.

    Note

    The restart of the Management Server causes the restart of the whole Aternity solution.

  6. Step 6 Check if any computer is running low on resources, for CPU usage, Java virtual machine garbage collection, or heap size, in the Performance Counters section.

    By default, this section displays the information for all the computers in the solution during the dashboard's timeframe. To view the performance of only one machine, select its name in the Server Topology section.

    View the machine resource usage
    Field Description
    CPU Utilization (%)

    Displays the total percentage CPU usage of the machine within the timeframe of the dashboard. Check if there was a high CPU usage for longer periods of time and try to correlate that with the memory usage and the events which occurred in the system (see the Event Viewer section).

    JVM GC time per minute

    Displays the number of seconds per minute when the Java Virtual Machine (JVM) spent collecting garbage, during the timeframe of the dashboard. If this parameter has frequent high values, you may consider, tuning the garbage collection parameters, or increase the computer's RAM.

    Heap size

    Displays the heap size during the timeframe of the dashboard. If you see a constant increase in the Java heap memory, it could be a sign that the garbage collection is not properly performed, or, if its recurs, it could be an indication of memory leakage, which might lead to an Out Of Memory Error. Consider increasing the heap size, or perform memory leak detection to find if you have a memory leak and determine its cause.

    Note

    This section does NOT show the performance of the Dashboard Server PC or the Database Server PC.

  7. Step 7 Limit the view in the dashboard to only one machine, to see its performance and health events.
    Limit the view to one machine
  8. Step 8 To download a component's logs for further investigation, hover over the Connecting or Reporting fields in the Server Topology section and select Download Log to Client.

    This downloads a set of log files containing system information like configurations, server statistics, discarded data, events, errors, incident analysis, and so on. You cannot access logs from this screen for the Database Server and Aternity Dashboard Server.

    Inspecting log files of Aternity components
  9. Step 9 Select to display the data for a longer period of time to see if a critical event is repetitive or if the trend of the heap size has been ascending for a longer period of time, or zoom in to check exactly when the CPU usage was at its highest values.
    Select the timeframe of your dashboard
    Field Description
    Timeframe

    Choose the start time of the data displayed in this dashboard.

    This dashboard displays raw data in real time, refreshing every time you access it or whenever you manually refresh the browser page.

    You can access data in this dashboard (retention) going back up to 30 days.