Troubleshoot a Single Activity (Troubleshoot Activity)

The Troubleshoot Activity dashboard displays the recent trend of an activity's response time, and allows you to correlate performance with any of the multitude of attributes available.

For example, if you received several calls about delays in sending an email in Microsoft Outlook, use this dashboard to manually find the common threads which characterize this problem, by checking multiple attributes, including the network traffic volume and see if it is higher than normal, or if it is associated with a certain type of device, operating system, department, location, product version, server or even a specific VDI hypervisor. You can also check the Trends section to see the changes of the activity's performance over a period of time, to know when the problem started.

The Troubleshoot Activity dashboard

The dashboard breaks down each activity's response time into three parts for all client server applications:

The response times of activities are split into client time ( dark blue), and the combination or union of the server time ( light blue) and the network time ( blue).

Server, network and client time

Procedure

  1. Step 1 Open a browser and log in to Aternity.
  2. Step 2 To access this dashboard, drill down from one of the following dashboards:
  3. Step 3 Check if the problem is critical in the Activity section. View the overall activity status, the number of activities, and the average response time during the timeframe of the dashboard.

    If the activity status is major and it affects a large number of users, it needs your immediate attention.

    Check the activity status and average response time
    Field Description
    Application

    Displays the name of the monitored application.

    Activity

    Displays the name of the monitored activity within the application.

    Activity Status

    The status of an activity is based on one response time compared to the recent expected (baselined) response time. The statuses are measured in severity: Normal , Minor , Major or Critical .

    Server Time (light blue)

    Server time is the time required by the server to process data on the server side. It starts when the client requests the server's help to respond to an activity, when the last message of that request arrives at the server side. It ends when the server sends out the first message of its response. See server time.

    Network Time (blue)

    Network time is the total time (union) for all messages between the client and server to cross the network in either direction while performing an activity. This does NOT include the time used for processing the request on the server (server time), but may overlap if there are parallel conversations with the server. See network time.

    Client Time (dark blue)

    Client time is the time used by the device itself as part of an activity to process data before sending its first message request to the server and after the last message response arrives back from the server. See client time.

    Volume

    (Managed applications only) Displays the number of activities performed by people with this combination of attributes, hence adding weight to the impact of this problem. If the same user performs the same activity twice, it counts as two.

    Score (Status)

    The activity score is a value between zero and 100 (with a status and color) which condenses many activity statuses into a single value, and is calculated with an Apdex-inspired formula.

  4. Step 4 Find a common thread by checking if the performance problems are connected to a specific department, location, device type, or operating system. Use the drop-down menu in the sections below the Activity area to display any of the available attributes.
    Select the attributes to troubleshoot the activity
    Note

    Some activities may have a long response time but its status is green. Use the Score to measure short term (acute) sudden changes in performance, as they rely on recent baseline measurements. The score clearly reflects a recent change because it would be significantly different from the established baseline response times. For example, if a mail usually opens in 1.5s, (the baseline response time), it creates a minor baseline (small departure from the baseline) and a major baseline (significant departure). If performance is suddenly (acutely) much slower, like 5s, it would be beyond the major baseline, and therefore have a red status with a low score.

    Use the actual response times (not scores) to check the performance of chronic (long term) problems. You cannot rely on measurements based on the recent baselines, as those responses would be chronically slow for some time, thereby skewing baselines to make those times look normal. In this example, if the activity for opening mails has been 5s for several weeks, the system adjusts its baselines to 5s, so this now looks normal, and therefore has a green status with a good score, which is misleading.

    Field Description
    Servers

    Check if the problem occurs for all users who connect to specific servers. For example, sending a mail might be slow for only one Microsoft Exchange server.

    Business Locations

    Check if the problem impacts all end users working from a specific location (also collated by Cities, States, Countries, and Regions). You can also view this information on a map in the Geographies - Map section.

    For example, if performance is slow only for users in the office in North London (a business location), check the networking infrastructure of that specific site. But if the problem affects all the offices in London (under Cities), you can check the wider infrastructure which is common to all those locations, like a data center.

    Device Types

    Check if the problem only affects end users working on specific types of devices, like only those accessing the application on a tablet.

    Operating Systems

    Displays the full name and exact version number of the operating system (OS), but does not include the service pack number, so you can check if an issue appears only on certain operating systems.

    Data Center Locations

    (Virtual deployments only) Monitor the application's performance by:

    • Data Center Locations in Aternity lists the locations of any virtual application servers (like Citrix XenApp) and VDI hypervisors (like in VMWare vSphere) which run the application. If the application is deployed both locally and virtually, one of the locations displays as Local.

    • Virtual App Servers displays the name of each virtual application server (like Citrix XenApp) running this application.

    For each item, it also displays the number of users, the usage time and wait time, the UXI, and (for managed applications only) the activity score.

    Hypervisors

    (VDI deployments only) Displays the hypervisor name if your application is running in a virtual desktop environment, like VMWare vSphere. You can check if the drop in performance in some virtual machines (VMs) is concentrated around a specific hypervisor.

    Departments

    Check if the drop in performance is centered around a specific department, which can point to a configuration which is unique to that group of users.

    Regions

    You can optionally define a region in Aternity to group together several locations under a single label, like the geographical region of EMEA, North America or even Southern Europe, South-Western US any other grouping you choose.

    Countries

    Displays the country of the current location of the device.

    States

    Displays the geographical state of the current location of the devices (or area, if state is not applicable).

    Cities

    Displays the city of the current location of the device.

    Versions

    For a desktop application, this shows the version on the end-user's computer (not the server version).

    Geographies (Map)

    Displays the country/state/city of the current location of the device as a map.

  5. Step 5 Check the Trends section for any recent changes over the dashboard's timeframe in response times (the upper graph) or activity statuses (the middle graph).

    Try to correlate a slowdown in performance (an increase in the response time) with an increase in network traffic volume (the lower graph). During an activity, if an application uses resources (x% CPU or RAM), or sends x MB of network traffic, it is not the same as saying that it is because of the activity. They happen at the same time, so they are correlated (see Correlation vs. Causation). However, you can be reasonably confident that these device measurements occurred because of the activity.

    Check the evolution of the response time over the timeframe of the dashboard

    For example, if many users send emails with large attachments, this might slow down the performance (increase the response time) of the Outlook activities, as illustrated in the picture above.

    In the response time graph, check if the increase in the response time was due to a significant increase in server time (light blue), network time (blue) or client time (dark blue). For example, if you find that at certain times every day, the network time increases significantly, you can troubleshoot why there is a network slowdown at those times, and whether the problem is limited to a single location by viewing the Business Locations section of the dashboard.

    Use the volume graph to view the recent changes in activity statuses over the timeframe, and correlate a change in statuses with a delay in the server, network or client time.

    Note

    To see the exact client, network and server time, hover over the response graph and read the values in the pop-up window.

  6. Step 6 For virtual deployments, check the remote display latency to verify if this delay significantly worsens the end-user experience.

    In virtual environments, check if the cause for the delay is due to the client time, server time, network time, latency time, or a combination of those parameters.

    Remote display latency adds to existing delays

    For applications which run in a virtual environment, the dashboard displays an additional Latency graph on the right side of the screen, displaying the delay times so you can correlate them with the other response times and with the traffic. Select a place on the latency graph which you want to check, and you can see the corresponding dates and response times on the other graphs.

    Find a pattern between the remote display delays and other trends
  7. Step 7 Troubleshoot the activity response time of a single business location to determine the possible cause of a slow response time.
    Isolate the information concerning a single location

    Select the location you want to troubleshoot in the Business Locations section. For example, select the London location has a status of major (orange), for all the other sections of the dashboard to display only data on that particular location. By selecting different fields in the drop-down list in the other sections, you can immediately see features the devices with slow response times have in common:

    View the commonalities of the devices with long response time

    By selecting the location with a long response time (SanFran Building B) and then different attributes in the left side area of the dashboard, you can see in this example that the problematic devices are desktops, which run Microsoft Windows XP 32 bit and belong to the sales department. You can also see at a glance that the long response time is influenced by the intense network traffic.

    Drill down to the Commonalities Analysis dashboard to receive more details. For example, the hour of the day when the response time is longest, the device hardware information (like memory size, number of CPU cores), or a breakdown of the data per user and device.

    Drill down to receive further information
    Field Description
    Time

    The time of the measurement (date, hours, minutes). For example, Nov 27, 2015 8:00 PM.

    Application

    Displays the name of the monitored application.

    Activity

    Displays the name of the monitored activity within the application.

    Normal

    Normal refers to the status of an activity when its performance is good, since its activity response time is within the defined baseline performance of this activity. It is usually colored green .

    Minor

    When an activity has a status of Minor (colored yellow ), it indicates that this activity response time is slightly slower (a minor departure or deviation) from the defined baseline performance (minor activity threshold) of this activity.

    Major

    When an activity has a status of Major (colored orange ), it indicates that this activity response time requires attention, as it was significantly slower (a major departure or deviation) from the expected baseline performance (major activity threshold) of this activity.

    Critical

    Critical is when the status of an activity is reported to Aternity as unavailable. It is colored red .

  8. Step 8 Change the Timeframe of the dashboard in the top right corner to a longer time interval to see when the problems started, or zoom in on short time intervals to help you troubleshoot the issues.

    Choose to see all the statuses of the activities to compare the slow response times to the normal ones, or focus only on the response times which exceed the thresholds by choosing Exclude Normal in the Status drop-down list.

    Select the timeframe and activity statuses to display
    Field Description
    Timeframe

    Choose the start time of the data displayed in this dashboard.

    You can access data in this dashboard (retention) going back up to seven days. This dashboard's data refreshes every five minutes.

    Status

    Choose whether to display only the activities which are exceeding SLA thresholds. Select one of the following:

    • Exclude Normal displays only activities which exceed their SLA thresholds.

    • All does not exclude data.