Correlate an Activity with the Device's Resources (Activity Resource Analysis)

The Activity Resource Analysis dashboard exposes the effects of an activity directly on a device's hardware and system resources running Windows. For example, you can watch the effects of opening a mail on the device's CPU usage. This level of detail helps you establish better theories on the possible causes of a problem, to perform a deeper root cause analysis (RCA) of the issue.

The Activity Resource Analysis dashboard

You can isolate a single activity reported from this (non mobile) device at a specific time, and view the all the device-related events which occurred while the activity took place.

For example, if the CPU usage of a device spikes above 80% during three occurrences of an activity, you can investigate this correlation, to determine the reason why the activity might be causing such behavior.

Step-by-step in Activity Resource Analysis to view the device measurements which occurred during an activity

When you first open the dashboard, it only initially displays two of its sections: Applications and Device Details. Select an application name to view its activities and processes, then select an activity name to view each activity status in a graph with its SLA thresholds and baselines. Finally, select a single activity to view the exact device measurements which occurred at the time of the activity.

Procedure

  1. Step 1 Open a browser and sign in to Aternity.
  2. Step 2 You can only access the Activity Resource Analysis dashboard by drilling down on a dashboard displaying a list of devices:
    The starting point of the Activity Resource Analysis dashboard has just two sections

    The dashboard opens with only two sections:

    Field Description
    Applications

    For each monitored application, view every instance of every activity performed from this device during the dashboard's timeframe. It displays each occurrence as a circle with its status.

    Tip

    This could display too many circles, as it displays all the activities in a single row. Try viewing each activity separately, by selecting the application name (see below).

    Device Details

    Displays the device-related events which occurred during the performance of an activity. For example, the CPU usage or the amount of data sent to the network. For descriptions of each item in the list, see below.

    Each row displays the activity score for this set of activities, and in some cases, it also shows the maximum and average measurements for that row. Each circle represents an activity which was performed at a certain time on this device. The horizontal axis is time, where the start and end times are the dashboard's timeframe.

    Activities which occurred at a specific time
    Field Description

    Green activity

    A green activity has a normal status, when its response time is as expected, which is less than the minor baseline for this activity.

    Yellow activity

    A yellow activity has a minor status when its response time is slower than expected, since it passed the minor baseline for this activity.

    Orange activity

    A orange activity has a major status when its response time is significantly slower than expected, since it passed the major baseline for this activity.

    Red activity

    A red activity has a critical status, when the activity failed to respond.

    In this dashboard, there are also events with no statuses:

    Blue / purple activity

    If an event or activity does not have any baseline, like a read event of the hard disk, it cannot have a status, and therefore it displays a shade of purple or blue.

  3. Step 3 Select an application name from the Applications section.

    The dashboard expands to reveal two new sections: Activities and Processes (for desktop applications only).

    Select an application name to display each of its activities and processes
    Field Description
    Activities

    Displays the list of activities for the selected application, and shows each occurrence of the activity during the timeframe as a circle with its status.

    Processes (desktop applications only)

    Displays the device processes and their measurements at the time of an activity which are presumably a direct consequence of the demands of the application.

    During an activity, if an application uses resources (x% CPU or RAM), or sends x MB of network traffic, it is not the same as saying that it is because of the activity. They happen at the same time, so they are correlated (see Correlation vs. Causation). However, you can be reasonably confident that these device measurements occurred because of the activity.

    For an example of when the measurements do not indicate causation, if a device runs a presentation and a graphics application at the same time, the presentation may only use 20% CPU during an activity, but due to other apps working at the same time, the overall CPU usage may be as high as 90%.

  4. Step 4 Select an activity name to view its graph on the right hand side, where the same activity occurrences are spaced out more clearly, where slower responses are displayed above faster ones.
    View the graph of the occurrences of an activity

    Configure the graph to display the activity's various baselines and SLA thresholds using the Reference Lines menu in the top bar, to understand the reason each activity has its status. If it falls below a baseline, it has one status, while above that baseline it takes on a different status.

    View the activities with different baselines and thresholds by selecting from the Reference Lines menu
    Field Description
    Just Measurements

    Select this to display the activities without any baselines or SLA thresholds.

    With Thresholds

    Select this to display the activities with their SLA thresholds.

    With Baselines

    Select this to display the activities with their baselines.

    With Thresholds and Baselines

    Select this to display the activities with their baselines and SLA thresholds.

    Tip

    You can view any of the device measurements in more detail by selecting one or more names of the device measurements to view their graphs on the right hand side.

    Field Description

    Minor baseline (yellow dotted line)

    The Minor baseline threshold is a response time for a specific activity which is slower than expected. Anything slower than this threshold changes the activity's status to minor (), because it is a minor departure from the expected performance time. By default, the minor threshold is set at the slowest 10% of response times (90 percentile) for each activity in each location. If the activity is faster than this time, its status becomes normal ().

    Major baseline (red dotted line)

    The Major baseline threshold is a response time for a specific activity which is significantly slower than expected. Anything slower than this threshold changes the activity's status to major (), which is a call to action, because it is a major departure from the expected performance time. By default, the major threshold is set at the slowest 3% of all response times (97 percentile) for each activity in each location.

    Internal SLA threshold (yellow solid line)

    The Internal threshold is the response time (in seconds) of an activity which would be your early warning, showing you are at risk of exceeding your official service level agreement (SLA) with your customers. Any response longer than this threshold is colored yellow in the SLA dashboard, as it warns you risk breaking the SLA commitment for this activity. A customer service representative can configure this threshold in the activity's monitor.

    External SLA threshold (red solid line)

    The External threshold represents the maximum response time (in seconds) of an activity as defined in your official service level agreement (SLA) with your customers. Any response longer than this threshold is colored red in the SLA dashboard, since it breaks the official SLA commitment for this activity. You can configure the thresholds in Managed Applications > Managed Activities > Edit.

  5. Step 5 Select to highlight a single occurrence of an activity and view the highlighted device measurements which occurred during that activity's performance.
    Correlate an activity with measurements for the whole device in the Device Details section

    The Device Details section highlights the measurements observed on the device at the time of the activity. These metrics are at the level of the entire device, hence they may or may not be directly because of the activity.

    Field Description
    Boot

    Displays the total boot time of the device.

    Device Health Event

    Displays any occurrence of a device health event during the dashboard's timeframe.

    Device CPU

    Displays the percentage CPU utilization of the core with the greatest usage at a given time. For example, if the device has four CPU cores, where one is at 80%, one is at 60% and the others are idle, it will display a value of 80%.

    Max CPU Core Utilization

    Displays the individual CPU core processor with the highest percentage usage at a given time. Look for 100% for a length of time (flat line), indicating a process is stuck and hogging that CPU's resources. For example, if the device has four CPU cores, where one is at 100% usage and the others are idle, it will display a value of 100%.

    Disk IO Read

    Displays the rate at which the device reads from the hard disk in MB per second at any given time during the activity.

    For example, if a virus scanner slows performance by issuing many disk read requests, reschedule to off-peak times. Alternatively, if the read rate falls to almost zero, the hard disk may be failing, or its connection to the computer may be unreliable.

    Disk IO Write

    DIsplays the rate at which the device writes to the hard disk in MB per second at any given time during the activity.

    For example, a movie editor can perform large disk writes, slowing down the device's performance. Alternatively, if the write rate falls to almost zero, the hard disk may be failing, or its connection to the computer may be unreliable.

    Disk Queue Length

    Displays the number of waiting I/O requests to read or write to the hard disk or a logical disk at a given time during the activity.

    A consistent queue for the disk indicates a bottleneck in hard disk access, which significantly impacts on system performance, either due to excess system demands on the disk, or it can be a hardware disk problem. To check if the problem is hardware, view if the speed (rate of reads and writes to the disk) is low.

    Network IO Read

    Displays the data downloads of this device in MB per second at any given time during the activity.

    For example, if its throughput or usage of bandwidth is low, and the user complains of slow network connections, consider checking the NIC hardware.

    Network IO Write

    Displays the data uploads from this device in MB per second at any given time during the activity.

    For example, if its throughput or usage of bandwidth is low, and the user complains of slow network connections, consider checking the NIC hardware.

    Device Physical Memory

    (Windows, Macs, mobile) Displays the percentage usage of the device's physical RAM memory at a given time during the activity.

    Device Virtual Memory

    (Windows only) Displays the current usage of a device's virtual memory as a percentage of the device's total virtual memory (physical RAM plus hard disk allocation for memory page faults) at a given time during the activity.

    High usage of virtual memory slows performance significantly, because using the hard disk instead of RAM is 1000 times slower than physical memory. To resolve, increase the capacity of RAM on the device.

    Network Speed - WiFi

    (Macs and in Windows from Agent 9.2) Displays the potential speed (bandwidth) of the WiFi connection at that moment, in megabits per second (Mbps). Lower WiFi bandwidth can be due to poor signal strength or overlapping channels, which slows the network time. In Windows, see the potential speed in the Control Panel > Network and Sharing > Adapter Settings > Status of the WiFi connection. In Macs, view it in About This Mac > System Report > Network > Wi-Fi.

    WiFi network speed on Windows and Macs
    Signal Strength - WiFi

    (Windows Agent 9.2 or later, Macs and mobile devices) Displays the percent strength of the WiFi signal which the device receives, which can impact communication speed. For more details, hover your mouse over the graph in the dashboard to see the name of the WiFi network connection (SSID), the wireless network card MAC address (BSSID), and the WiFi channel.

    View the details of the wireless network connection in Aternity
    WiFi Channel

    (From Agent 9.2 or Agent for Mac 2.3) Displays the channel number which your device uses to connect to the WiFi router. Use this to ensure channels do not overlap one another in the same physical space. Your network performance significantly drops if a nearby WiFi router uses an overlapping channel with the same network speed.

    For a stronger correlation, view the highlighted circles in the Processes section (for desktop applications only) to see the device measurements which are associated directly with this application (process) and therefore are they can be tightly coupled with the activity.

    Correlate an activity with measurements for just this application in the Processes section
    Field Description
    CPU

    View the percentage CPU utilization of this Windows process while it performs an activity, measured as a percentage of the total power of all CPU cores available.

    Compare this with the Device CPU readings to understand whether this application is the cause of any spike in CPU readings.

    Physical Memory

    View the amount of working set memory in GB for this Windows process while it performs an activity.

    If the activity always coincides with a spike in memory consumption, this is probably the cause of slow performance.

    Virtual Memory

    View the amount of reserved memory (commit size) in GB for this Windows process, while it performs an activity.

    If the activity always coincides with a spike in memory consumption, this is probably the cause of slow performance.

  6. Step 6 To view more details of one of the measurements, hover your mouse pointer over the circle for a standard activity, or for a Skype for Business/Lync activity.

    For an occurrence of a standard activity or device measurement, you can view the following details.

    View the details of a single occurrence of an activity
    Field Description
    Activity

    Displays the name of the monitored activity within the application as it appears in the dashboards.

    Application (only in dashboards with multiple applications)

    Displays the name of the monitored application, as it appears throughout the system. You can customize it when you add it as a managed application.

    Recorded At

    Displays the time of the occurrence of this activity or device measurement.

    Activity Response

    (For managed applications only) Displays the response time of the activity. The response times of activities are split into client time ( light blue), and the combination or union of the backend time ( dark blue) and the network time ( blue).

    Activity response time splits into network, server and client time

    Use the actual response times (not scores) to check the performance of chronic (long term) problems. You cannot rely on measurements based on the recent baselines, as those responses would be chronically slow for some time, thereby skewing baselines to make those times look normal.

    Client Time

    Displays the client time for this activity. Client time is the time used by the device itself as part of an activity to process data before sending its first message request to the server and after the last message response arrives back from the server.

    Infra Time

    Displays the infra time for this activity. Infra time is the total time spent outside the client. It starts with the first request to the server and ends when the final response arrives at the client.

    Latency

    (Virtual sessions only) Displays the remote display latency.

    Status

    Displays the status of this activity, if it has baselines defined.

    If there are no baselines, it displays in one of the shades of blue.

    You can drill down to troubleshoot further on this activity by accessing:

    For an occurrence of a Skype activity you can view the following details.

    Skype for Business/Lync activity details
    Field Description
    Application (only in dashboards with multiple applications)

    Displays the name of the monitored application, as it appears throughout the system. You can customize it when you add it as a managed application.

    Recorded At

    Displays the time of the occurrence of this activity or device measurement.

    Combined MOS

    The combined MOS score (and status) for a device is the LOWER value of the inbound MOS and outbound MOS scores in a call.

    Inbound MOS

    The inbound MOS (or inbound listening MOS) for someone in a call is the MOS score of the incoming audio or video, showing if you clearly hear others in the call over background noise or a poor connection (inbound network MOS). The inbound MOS of a listener is the same as the outbound MOS of the speaker.

    Outbound MOS

    The outbound MOS for someone in a call is the MOS score of your outgoing audio or video, showing if others clearly hear you in the call over background noise or a slow network (inbound network MOS).

    Call

    Displays the way a user created a Skype for Business or Lync call:

    • Incoming are the people who answered a Skype or Lync call.

    • Outgoing are the people who dialed a Skype or Lync call.

    For example, if you have a call center and expect most calls to be incoming, you can confirm this expectation by monitoring the dominant call direction.

    Call Type

    There are two modes of calls in Skype for Business or Lync: Audio only or Audio/Video.

    Call Mode

    There are two types of calls in Skype for Business or Lync: Direct between two devices, or Conference, where more than two devices connect to a bridge to participate in a call. Each connection to a call appears in the dashboards as a separate entry.

    End Call Reason

    Displays the quality and performance of calls which ended in different ways:

    • Ended Successfully are for calls which started and ended normally, with no unexpected disconnections.

    • Disconnected with Error: A call is dropped if Skype for Business or Lync ended the call unexpectedly, without the user manually ending the call. Aternity reports the failure and its reason.

    • Failed Calls: A call fails if Skype for Business or Lync could not successfully establish a connection and start. Aternity reports the failure and its reason as the SIP code and SIP string.

    Callee Device

    The device of the callee (a Microsoft term) is the type of device used by the other participant in a Skype or Lync call:

    • PC indicates the other participant used Skype for Business or Lync running on a Windows desktop or laptop.

    • Conference Bridge indicates that this user was in a conference call, where every participant connects via the bridge. Hence the callee is the conference bridge.

    • iPhone indicates the other participant used the mobile iOS version of Skype for Business or Lync on an iPhone.

    • iPad indicates the other participant used the tablet iOS version of Skype for Business or Lync on an iPad.

    • Android indicates the other participant used the Android version of Skype for Business or Lync on an Android tablet or phone.

    • Mac indicates the other participant used the Mac version of Skype for Business or Lync on a Mac desktop or laptop.

    • Other can refer to a gateway or mediation server.

    Capture Device Name

    A capture device is a microphone, either built-in or standalone, used for collecting audio input to a Skype / Lync call.

    Capture Device Driver Ver.

    Displays the name and full version of the driver which supports the capture device in a Skype call. A capture device is a microphone, either built-in or standalone, used for collecting audio input to a Skype / Lync call.

    Render Device Name

    The render device is a participant's speaker or headphones which outputs the audio of a Skype / Lync call.

    Render Device Driver Ver.

    Displays the full version and manufacturer of the driver which supports the audio output (render) device.

    Audio Inbound Jitter

    Displays the differences (variance) in the delay of incoming audio packets from the other caller, or (in conference calls) from the Skype server to a caller, measured in milliseconds.

    Wide differences in delay (above 30ms) means that some packets are much slower than others, so when they arrive at the other end, the order of the packets is jumbled, which creates a choppy or distorted sound. This is usually caused by network congestion, but you can counter it with a large enough buffer to re-order the jumbled packets.

    Audio Outbound Jitter

    Displays the differences (variance) in the delay of outgoing audio packets reaching the other caller, or (in conference calls) from a caller to the Skype server, measured in milliseconds.

    Wide differences in delay (above 30ms) means that some packets are much slower than others, so when they arrive at the other end, the order of the packets is jumbled, which creates a choppy or distorted sound. This is usually caused by network congestion, but you can counter it with a large enough buffer to re-order the jumbled packets.

    Audio Inbound Packet Loss

    Displays the percentage audio network packets in a Skype call which were lost in transit before reaching the participant. Any value above 5% affects audio quality significantly.

    Audio Outbound Packet Loss

    Displays the percentage of audio network packets in a Skype call which were lost in transit on its way to the other caller, or (in conference calls) from a participant to the Skype server. Any value above 5% affects audio quality significantly.

    Audio Outbound Round Trip Time

    Displays the time for an audio packet on a Skype call to reach the destination and come back again to the caller.

    Audio Inbound Codec Name

    Displays the name of the codec which Skype used to understand the incoming compressed sound.

    Skype dynamically chooses the best codec to compress the audio signal, based on the bandwidth available and ensuring the recipient can unzip the audio on the other side.

    Audio Outbound Codec Name

    Displays the name of the codec which Skype used to compress the outgoing sound.

    Skype dynamically chooses the best codec to compress the audio signal, based on the bandwidth available and ensuring the recipient can unzip the audio on the other side.

    Audio Forward Error Correction Used

    Displays True if Skype dynamically switched on forward error correction (FEC) in a call, to combat packet loss. FEC sends extra packets containing redundant information, to help it complete the audio stream on the other end, hence it uses more bandwidth.

  7. Step 7 You can limit the display of this dashboard using the menus at the top of the window.
    Select the data to display in the dashboard
    Table 1.
    Field Description
    Time Zone Selection

    Select the time zone to view the times associated with the data in this dashboard:

    • Default displays the time zone set in most Aternity dashboards.

    • Yours displays the time zone of your computer where you are viewing the dashboard.

    • Device displays the time zone of the monitored device which is the focus of this dashboard's content.

    Timeframe

    You can change the start time of the data displayed in this dashboard in the Timeframe menu in the top right corner of the dashboard.

    You can access data in this dashboard (retention) going back up to seven days.

    This dashboard displays raw data in real time, refreshing every time you access it or whenever you manually refresh the browser page.

    Reference Lines

    Configures the thresholds to display in the graphs on the right side of the dashboard (see the step above).

    Username / Hostname

    Displays the information for a user who performs a certain activity on one or multiple devices (except for mobile devices), or displays the information for a device which has one or multiple users perform a specific activity during the period of time selected in the Timeframe. For example, if you have a user who reads his Outlook mail on his laptop and on his desktop, you can see the data for both devices, or you can limit the display to one device. If you have a device (hostname) which has several users running the same application, you can choose to display the data regarding all the users (usernames), or for one user only.

Example

To troubleshoot a user complaining of slow performance reading emails:

  1. Use the Device Inventory dashboard to view the details of the user's device.

  2. Drill-down to the Activity Resource Analysis dashboard to check if the slowdown is due to Microsoft Outlook or other applications.

  3. Use the default timeframe initially (48 hours) to view all the applications which have run on the device during that time. Check for yellow or orange activities. For example, you may find that Microsoft Outlook turned to major (orange) and then to minor (yellow), and around the same time, another application, BranchPortal, also became major.

  4. Use the custom option of the Timeframe drop-down menu to focus on the problematic times. The shortest interval you can choose is one hour.

  5. Select Microsoft Outlook in the Applications section to display its Activities and Processes.

  6. Look for any orange or yellow activities. If the status of Open Inbox has a major activity, select Open Inbox in the Activities section and view its graph on the right side of the window to see the response times and thresholds, which gives them the status of minor or major.

  7. Select the orange circle on the Open Inbox activity row to highlight its elements in the Processes section, showing that the CPU overloaded at that time (its average load is 70% and its maximum load is 90%).

  8. Select CPU in the Processes section to view the graph on the right side of the dashboard. Verify the high CPU at that time, and consider theories which may cause this slowdown (like Outlook accessing large emails while a virus scanner checks each email).

  9. Hover over the status circle to see the detailed information of that particular occurrence of the activity, including the client time, server time and total response time (see above).

  10. Look at the Device Details section to see if there was high traffic on the network, or a high usage of memory at the same time as the poor performance.

  11. Perform the same steps on any other application showing a slowdown at the same time. Check other factors like high network traffic or heavy usage of the device memory caused by other applications which could influence Outlook's performance.