Troubleshoot an Application to Find a Common Thread (Commonalities Analysis for Application)

The Commonalities Analysis dashboard (for an application) performs automatic and intelligent troubleshooting on any application, by finding the common elements of a seemingly random problem. It checks through hundreds of possible culprits (like the location, or time of day, laptop model and many more), and displays only the highest concentration of poor performers.

Tip

There are two types of Commonalities Analysis dashboards, one for troubleshooting the overall performance of an application, and one for investigating a single user activity within an application. This one is for troubleshooting the overall performance of an application.

For example, if you receive random reports of slow performance from some users on specific application, you would usually cycle through endless possible attributes, looking for the common thread of this problem so you can troubleshoot it. Perhaps they all use a six-inch tablet, or perhaps it only happens when people connect to a particular server, or between 5 and 7am on weekdays, or only if your computer has more than four CPU cores, and so on. Commonalities Analysis checks all of this automatically in seconds, and lists the most obvious contenders in a simple, intuitive view, so that you can troubleshoot further.

Note

This dashboard is most effective at isolating a single thread which is common to all occurrences of a problem. However, if your issue is a complex combination of narrow criteria, for example if it only happens in Munich on Lenovo laptops which have 8 core CPUs during peak hours, you may find that looking at any one of these elements is not enough to show a significant change in performance. To investigate complex patterns of data correlations, use the Analyze dashboards.

The Commonalities Analysis dashboard (for an application)

Choose the performance measurement you want to inspect with the Sort By menu in the top right corner. For example, Sort By > Crash Rate. Then look for the results whose measurement (horizontal bar) is significantly worse than its average value (vertical bar).

Isolate a measurement, like Crash Rate, which is significantly worse than others with a given attribute

In this example, the crash rate is much worse in a single location, so you can investigate further on recent changes in this location which may be the cause of these crashes.

Tip

Automatic correlations still require a human to manually declutter any senseless or obvious suggestions. Any dependent attributes should be discounted.

For example, if your company mass-purchased the same configuration of devices for the office in Hyderabad, the same model laptop with 4GB RAM and 500GB HDD, and this dashboard shows a strong correlation between all those attributes in that location, it is hardly a surprise. See this step below to remove any unwanted attributes.

After you found a likely culprit, you can select to view the users most impacted by this slow performance, and (if relevant) drill down to see more information about a specific user's device.

Procedure

  1. Step 1 Open a browser and sign in to Aternity.
  2. Step 2 Access the Commonalities Analysis dashboard for applications by:
  3. Step 3 Select the measurement which best expresses the symptoms of the problem in the Sort By menu in the top-right corner.

    There is an implicit hierarchy for each of the items in this menu, as often one item contains others within its calculation, so you can narrow down the scope of the problem if you are not sure.

    Narrow down the type of problem using the Sort By menu

    For example, if you are investigating a problem in an application, you can start by checking the overall UXI score. When you hover over the UXI, you notice that application crashes are the main reason for the poor overall value, so you can zoom in on crashes, hangs and errors by selecting Health Index, and further narrow it down with the Crash Rate to see just crash patterns, or check the Hang Percent, to see when the application is not responding. Alternatively, if you are investigating a drop in performance, select Performance Index, and then check further the Wait Percent times. In addition, if your application is managed (only), you have information on activity performance, like the Activity Score for acute problems, or Activity Response time for chronic poor performance.

    Select the display of the last column and sort attributes according to this score or index

    When you select an item from Sort By, it displays that measurement in the rightmost column and sorts the attributes by that measurement. The vertical bar represents the average of this measurement during the dashboard's timeframe, so you can see if this is a mild or severe departure from the norm.

    Field Description
    Activity Response

    (For managed applications only) View the attributes associated with this level of performance, expressed as the average response time of all activities in this application.

    An activity response is the time taken for an application to complete an activity in seconds.

    Use the actual response times (not scores) to check the performance of chronic (long term) problems. You cannot rely on measurements based on the recent baselines, as those responses would be chronically slow for some time, thereby skewing baselines to make those times look normal.

    Activity Score

    (For managed applications only) View the attributes associated with this level of performance, expressed as the overall activity score.

    The activity score is a value (0-100) which summarizes the statuses of all activity response times into a single value.

    Use the score to measure short term (acute) recent or sudden changes from regular performance (baselined or manually predefined). For example, if a mail usually opens in 1.5s, (the baseline response time), it creates a minor baseline (small departure from the baseline) and a major baseline (significant departure). If performance is suddenly (acutely) much slower, like 5s, it would be beyond the major baseline, and therefore have a red status with a low score.

    Crashes

    View the attributes associated with this absolute number of crashes during the dashboard timeframe.

    Aternity registers a Windows app crash when the Event Log issues event ID 1000 (a process or DLL ends unexpectedly), event ID 1001 (.NET process ends unexpectedly), event ID 1002 (a user stops a Not Responding process), or event ID 1026 (.NET runtime error).

    Using an absolute measurement can be helpful, for example, to calculate the cost of the problem, because you can quickly multiply the number of units by the cost per unit to arrive at a rough $$$ estimate. But to detect larger patterns of performance, we recommend opting for a ratio or index measurement.

    Crash Rate

    View the attributes associated with this crash rate (not the actual absolute number of Crashes).

    The crash rate of an application is the average number of crashes which occurred in that application during an hour of active usage. It is calculated as the total number of crashes divided by the total usage time in hours.

    Hang Percent

    View the attributes associated with the percentage of hang time out of the total usage time, to spot patterns in Windows applications which freeze.

    Hang time measures the time when an application is listed as Not responding in the Windows Task Manager while it is in the foreground (in use). This measurement is used to calculate the wait time of an application, and the overall UXI.

    Health Index

    View the attributes associated with this health index value, to investigate reports of both hangs and crashes (or HTTP errors) in your application.

    The health index is a value (0-5) which measures the time an application hangs, crashes or (for web applications) experiences web errors. If users experience frequent or severe crashes in the application, this index is lower.

    Number of Page Load Errors

    (For web applications only) View the attributes associated with this actual number of web page errors to investigate specific cases of availability problems in your web application.

    Web errors are errors experienced by applications which receive an error as a response to their HTTP request for a page load, like HTTP 40x errors (like Error 404 Page Not Found), and 50x errors (like unauthorized access messages) for the whole page (not a missing element like an image).

    Using an absolute measurement can be helpful, for example, to calculate the cost of the problem, because you can quickly multiply the number of units by the cost per unit to arrive at a rough $$$ estimate. But to detect larger patterns of performance, we recommend opting for a ratio or index measurement.

    Page Error Rate

    (For web applications only) View the attributes associated with this web page error rate to investigate patterns of unavailable pages in your web application.

    The web page error rate is the percentage of errors out of all web page loads which cause the page not to load (HTTP error 40x or 50x). This is one of the elements used when calculating the UXI.

    Page Load Time

    (For web applications only) View the attributes associated with this average page load time in seconds, to investigate specific instances of slow performance.

    The web page load time is the time required for a web page to load and finish rendering in a browser, from sending a URL request to when the page's events finish loading and it has a status of Completed. This measurement does NOT include the time to load additional page elements which occur after the main page has loaded, such as iframes that are embedded separate web pages, AJAX calls after the page is complete, or bookmarks with # in the URL). It does include AJAX calls that the page makes before it is complete.

    Using an absolute measurement can be helpful, for example, to calculate the cost of the problem, because you can quickly multiply the number of units by the cost per unit to arrive at a rough $$$ estimate. But to detect larger patterns of performance, we recommend opting for a ratio or index measurement.

    Page Loads

    (For web applications only) View the attributes associated with this total number of web page loads requested, to understand the usage of this application, and the load on the web server which services these requests.

    Performance Index

    View the attributes associated with this performance index, to investigate reports of slow performance while the application is busy and keeping users waiting.

    The performance index is a value (0-5) which measures an application's responsiveness. If users must wait frequently or for long periods for an application to respond, its performance index is lower. It is calculated from the usage time and wait time.

    Remote Display Latency

    View the attributes associated with this remote display latency, to isolate performance issues for virtual applications, or applications which run in virtual desktop sessions.

    The remote display latency is the average time taken for the round trip of a network data packet to travel between the front line user and a virtual server (both ways).

    Total Activities

    View the attributes associated with this number of activities performed. This is similar to Usage, but focuses specifically on the number of times anyone performed any defined user activity in this application.

    Usage

    View the attributes associated with this usage time, to determine the popularity of this application.

    The usage time of an application is the total time it is running, in the foreground, and being used. This includes the wait time, the time a user spends waiting for the application to respond. For web applications, the usage time is when both the browser window and the application's tab are in the foreground.

    UXI

    View the attributes associated with this UXI, to determine the overall experience (performance, health and usage) of this application.

    The User Experience Index (UXI) is a value (0-5) which measures the overall performance and health of applications, based on the number of crashes per hour out of the total usage time, the percentage hang time out of the total usage time, and the percentage wait time out of the total usage time. For web applications, it also uses the percentage of web page errors out of all page loads, and the average page load time.

    Wait Percent

    View the attributes associated with the percentage wait time out of the total usage time

    The wait time of a Windows application is defined as the time users spend waiting for the application to respond when it is actively running and in use (part of the usage time).

  4. Step 4 To manually remove clutter, hide attributes which are dependent on each other or correlations based on only a few examples, using the Attributes and Total Activities drop-down menus at the top of this section of the window.

    For example, if the Tokyo office uses Japanese Windows, and this dashboard shows a strong correlation between slow Japanese Windows and the Tokyo office, you cannot yet say if it is due to the OS or the location, or some other issue, since any problem in that location would obviously be associated with that operating system. So you must find other distinguishing attributes, like looking for that OS in other locations, or focusing on entirely different attributes to determine the culprit.

    Clear all attributes except the operating system where it had at least 10 hours of use
    Field Description
    Attributes (drop-down menu)

    Deselect the attributes to hide in the Attributes column, and select Apply.

    For example, if you definitely know that the data center is not the determining factor of the problem, you can remove all data centers listed in the Attributes column by deselecting this item from the menu.

    For a full list of monitored attributes, see View Data Monitored by Aternity.

    Show

    Select to remove Attributes whose usage time is below this value. This helps remove false correlations which are only present because of coincidence in a few cases.

    For example, if the dashboard reports that at several hours during the day, people who happened to sign in briefly experienced slow performance, you may determine that for this investigation it is not the area you wish to focus.

  5. Step 5 To view the impacted users of a single common attribute (like all those performing this activity on Windows 10), select that row from the Attributes section, and view the Users with Worst Performance section.
    Tip

    You can display the data so that the worst or best performance is displayed at the top, by selecting View > Worst or View > Best from the top bar. The differences between good and poor user or device performance could give you clues to improve their performance.

    View specific devices behind this attribute and value

    At a glance, you may be able to spot additional common attribute values in the columns of the Users section, which can help isolate more common themes of poor performance.

    Field Description
    Username

    Displays the username signed in to the device's operating system.

    Device Name

    Displays the hostname of the monitored device. View it in the Windows Control Panel > System > Computer Name, or on Apple Macs in System Preferences > Sharing > Computer Name.

    Device Type

    Displays the type of device reporting performance to Aternity.

    Memory

    Displays the size of physical RAM of the device.

    Department

    Displays the name of the department to which the user or the device belongs.

    (Windows) Agent sends LDAP queries to the Active Directory (AD) to find information about the connected domain controller, then extracts the user's > Properties > Department.

    (Mobile) Mobile apps can set this manually in the Aternity Mobile SDK.

    Usage Time

    The usage time of an application is the total time it is running, in the foreground, and being used. This includes the wait time, the time a user spends waiting for the application to respond. For web applications, the usage time is when both the browser window and the application's tab are in the foreground.

    UXI

    The User Experience Index (UXI) is a value (0-5) which measures the overall performance and health of applications, based on the number of crashes per hour out of the total usage time, the percentage hang time out of the total usage time, and the percentage wait time out of the total usage time. For web applications, it also uses the percentage of web page errors out of all page loads, and the average page load time.

    Activity Score

    (For managed applications only) Displays the overall activity score for this application, calculated by condensing all the activity statuses into a single value. Use this for acute (recent) problems in performance.

    <Final column>

    Displays the measurement you chose in the Sort By menu in the top right of the dashboard.

  6. Step 6 To troubleshoot further on a single attribute, drill down to the Analyze Applications dashboard.

    The system automatically picks the application, the relevant measurement group, and the associated breakdowns, to allow you to start your investigations from the correct place.

    Drill down to analyze further information on this attribute
  7. Step 7 To view more information on a single device, hover over its measurements, and drill down to any of the following dashboards:
    View more information on a single device by hovering over its measurements
  8. Step 8 You can limit the display of the dashboard using the menus at the top of the window.
    Limit the scope of data displayed in this dashboard
    Field Description
    Timeframe

    You can change the start time of the data displayed in the dashboard in the Timeframe menu in the top right corner of the dashboard.

    You can access data in this dashboard (retention) going back up to 14 days. This dashboard's data refreshes every 10 minutes.

    View

    Select from this menu to display the best or the worst performances in the lower pane of the dashboard.

    Sort By

    Select to determine the contents of the rightmost column.