SarCheck® Automated Analysis of Linux data

(English text version 6.02.11)


This is an analysis of the data contained in the file 20100505. The data was collected on 05/05/2010, from 00:00:00 to 09:50:01, from system 'lindev'. Operating system was 2.6.18-194.el5. 4 processors were present. 173 gigabytes of memory were present.

Data collected by the ps -elf command on 05/05/2010 from 00:00:00 to 09:40:00, and stored in the file /opt/sarcheck/ps/20100505, will also be analyzed.

The default GRAPHDIR was changed with the -gd switch to ./.

Table of Contents

SUMMARY

When the data was collected, no CPU bottleneck could be detected. No significant memory bottleneck was seen but one is likely to develop if the workload increases significantly. I/O bottlenecks were seen at peak times. A change has been recommended to at least one tunable parameter. Recommendations can be found in the Recommendations Section of this report.

Some of the defaults used by SarCheck's rules have been overridden using the sarcheck_parms file /opt/sarcheck/etc/sarcheck_parms. See the Custom Settings section of the report for more information.

Based on a switch setting or parms file entry, the analysis will treat this system as a server and not a desktop.

RECOMMENDATIONS SECTION

All recommendations apply to 'lindev' operating system 2.6.18-194.el5. These recommendations contained in this report are based solely on the conditions which were present when the performance data was collected. It is possible that conditions which were not present at that time may cause some of these recommendations to result in worse performance. To minimize this risk, analyze data from several different days and implement only regularly occurring recommendations.

Change the virtual memory parameter 'swappiness' from 60 to 70. This controls the systems likelihood of swapping out memory pages which have not been recently used. Larger values will swap out unused pages of memory, speeding the time required to load new pages into memory.

To change the value of swappiness immediately as described, use the following command:

    sysctl -w vm.swappiness="70"

If you are able to make this change and it improves performance, you can make the change permanent by adding the command to the /etc/sysctl.conf file.

Try to balance the disk I/O load across time or among other disk devices. The load on at least one disk was clearly excessive at peak times. By distributing the load over a greater time period or across other disk devices, the load will not be as likely to cause performance degradation.

RESOURCE ANALYSIS SECTION

Average system-wide CPU utilization was only 10.0 percent. This indicates that spare CPU capacity exists. If any performance problems were seen during the monitoring period, they were not caused by a lack of CPU power. Overall CPU utilization peaked at 42.96 percent from 01:20:01 to 01:30:01. A CPU upgrade is not recommended because the current CPUs had significant unused capacity. Average CPU utilization of niced processes was 3.3 percent. If the niced processes have had their priorities reduced enough, they will not cause performance degradation of processes running at their normal priorities. On a busy system, niced processes may take a really long time to run. If the niced processes are actually running at a higher priority than other processes, they can still cause performance degradation.

Average time spent waiting for I/O was 0.99 percent. I/O wait time peaked at 12.54 percent from 06:40:01 to 06:50:01. Traditional thresholds for this data show that values in excess of 7 - 15 percent are high enough to suggest an I/O bottleneck.

Average time spent servicing interrupts was 0.095 percent. The time spent servicing interrupts peaked at 0.91 percent from 06:50:01 to 07:00:01.

Average time spent servicing softirqs was 0.20 percent. The time spent servicing softirqs peaked at 1.62 percent from 07:00:01 to 07:10:01.

Graph of CPU utilization

Processor number 0 was busy for an average of 13.05 percent of the time. The processor was busy with user work 9.36 percent of the time and was busy with system work 0.96 percent of the time. The sys/usr ratio on this processor was 0.10. This is below the threshold of 2.50:1. The processor was waiting for I/O 3.30 percent of the time. The processor was handling IRQs 0.38 percent of the time and softirqs 0.78 percent of the time. During the peak interval from 08:10:01 to 08:20:01, this processor was 49.67 percent busy.

Processor number 1 was busy for an average of 7.71 percent of the time. The processor was busy with user work 4.06 percent of the time and was busy with system work 0.29 percent of the time. The sys/usr ratio on this processor was 0.07. The processor was waiting for I/O 0.24 percent of the time. The processor was handling IRQs 0.00 percent of the time and softirqs 0.00 percent of the time. During the peak interval from 04:00:01 to 04:10:01, this processor was 51.86 percent busy.

Processor number 2 was busy for an average of 10.53 percent of the time. The processor was busy with user work 6.95 percent of the time and was busy with system work 0.32 percent of the time. The sys/usr ratio on this processor was 0.05. The processor was waiting for I/O 0.19 percent of the time. The processor was handling IRQs 0.00 percent of the time and softirqs 0.00 percent of the time. During the peak interval from 01:20:01 to 01:30:01, this processor was 86.03 percent busy.

Processor number 3 was busy for an average of 8.87 percent of the time. The processor was busy with user work 4.54 percent of the time and was busy with system work 0.31 percent of the time. The sys/usr ratio on this processor was 0.07. The processor was waiting for I/O 0.22 percent of the time. The processor was handling IRQs 0.00 percent of the time and softirqs 0.00 percent of the time. During the peak interval from 01:40:02 to 01:50:01, this processor was 49.49 percent busy.

Individual Processor Statistics
Processor# Average %Busy Average %User Average %Sys Average %Nice Average %IOwait Average %IRQ Average %Softirq Peak %Busy
0 13.05 9.36 0.96 2.73 3.30 0.38 0.78 49.67
1 7.71 4.06 0.29 3.35 0.24 0.00 0.00 51.86
2 10.53 6.95 0.32 3.25 0.19 0.00 0.00 86.03
3 8.87 4.54 0.31 4.02 0.22 0.00 0.00 49.49
Summary: Systemwide 10.04 6.23 0.47 3.34 0.99 0.09 0.20 42.96

The maximum number of shared memory segments for the entire system (SHMMNI) is 4096. The SHMALL parameter value is 2147483647 pages. SHMALL sets the total amount of shared memory pages available to the system. The SHMMAX value which is the maximum size of a single shared memory segment that is available from the operating system is 2147483647 bytes, or 524288 pages. The SHMALL value is at least as large as SHMMAX and this is the way they should be set.

The maximum number of semaphores per semaphore set (SEMMSL) is 250. The SEMMNS value which is the maximum number of semaphores in all semaphore sets for the system is 32000. The SEMOPM value of 32 is the maximum number of operations that may be specified for one semop() call. The SEMMNI parameter value is 128 which is the total amount of semaphore sets available to the system.

The average amount of free memory was 7600500.4 pages or 29689.5 megabytes. The minimum amount of free memory was 338 pages or 1.32 megabytes at 07:50:01.

The average amount of free memory including cached memory and memory used for buffers was 43442499.9 pages or 169697.3 megabytes. This is 95.8 percent of all the memory seen on the system. The minimum amount of free memory including cached memory and memory used for buffers was 41772040 pages or 163172.03 megabytes at 09:50:01.

In the following graph, the 'free' utility showed that the system could spare enough memory to support useful cache/buffers.

Graph of megabytes of free memory remaining

No significant memory bottleneck was seen. Some swap out activity was seen.

The value of the page_cluster parameter was 3. This means that 8 pages are read at once. Values of 2 or 3 are typically better for systems will small memory sizes, systems where response time is important, and systems where most I/O is not sequential. There may be an advantage in raising this value if you want to speed up the performance of programs which do mostly sequential I/O.

The average page in rate was 2238.46 per second. Page ins peaked at 25389.21 per second from 07:20:01 to 07:30:01. An unusually high page in rate was detected. This may be normal for your environment, but it is still worth noting. The average page out rate was 2122.30 per second. Page outs peaked at 48061.68 per second from 01:10:01 to 01:20:01. An unusually high page out rate was detected. This may be normal for your environment, but it is still worth noting.

Graph of swap out rate

The average swap in rate was greater than zero but less than .01 per second. Swap ins peaked at 0.07 per second from 07:30:01 to 07:40:01. The average swap out rate was 4.24 per second. Swap outs peaked at 113.33 per second from 07:20:01 to 07:30:01. An unusually high swap out rate was detected. This may be normal for your environment, but it is still worth noting.

The /proc/sys/vm/swappiness file was seen. The swappiness value was 60. A higher value has been recommended for swappiness. This recommendation works best in a server environment where it makes sense to write unused pages of memory to swap in order to free the memory for some other process. It may cause response time problems for someone trying to use the system as a desktop.

Graph of swap space used

The amount of swap space in use peaked at 13289.88 megabytes at 07:30:01. The average amount of swap space in use was 3333.08 megabytes. The size of swap space was 33951.99 megabytes. The peak amount of swap space in use was 39.14 percent of the total.

The value of the logging_level parameter was 0. A value of zero disables logging. This has the least overhead but provides no information about SCSI activity. This program does not use data collected by SCSI logging and this information is provided in case the user wishes to get deeply involved in the SCSI activity of their system.

According to data collected from /proc/partitions, the system-wide disk I/O rate averaged 69.89 per second and peaked at 505.15 per second from 07:30:01 to 07:40:01. The read rate averaged 34.34 per second and peaked at 480.57 per second from 07:30:01 to 07:40:01. The write rate averaged 35.54 per second and peaked at 207.21 per second from 02:30:01 to 02:40:01.

Graph of systemwide disk I/O rate

The following graph is a display of peak activity for the busiest disk device during each interval. The line at the bottom of the graph shows the disk device that was busiest. In this case, it shows that peak system activity is not heavy enough to cause consistently poor performance. It isn't always the same disk that is the busiest and this will help maintain good performance if activity increases.

Graph of peak disk percent busy statistics

The -dtoo switch has been used to format disk statistics into the following table.

Disk Device Statistics
Disk Device Average
%busy
Peak
%busy
Average
IO/sec
Peak
IO/sec
Average
MB/sec
Peak
MB/sec
Average
KB/IO
Peak
KB/IO
Average
read/sec
Average
write/sec
sda 4.25 61.60 38.86 422.32 3.59 47.04 94.57 3524.02 26.40 12.46
sdb 1.69 25.68 26.84 214.04 0.67 9.99 25.55 3319.13 6.96 19.88
dm-0 0.17 3.01 5.37 78.82 0.04 1.08 6.96 187.41 1.57 3.80

The I/O rate on disk device sda averaged 38.86 per second and peaked at 422.32 per second from 06:40:01 to 06:50:01. This is so high that the I/Os are probably not all physical I/Os to a conventional magnetic disk. The disk's transfer rate averaged 3.59 megabytes per second and peaked at 47.04 megabytes per second from 01:10:01 to 01:20:01. The size of an I/O averaged 94.57 kilobytes and peaked at 3524.02 kilobytes from 07:20:01 to 07:30:01. The read rate averaged 26.40 per second. The write rate averaged 12.46 per second. The read rate is much higher than the write rate. This disk was busy for an average of 4.25 percent of the time and was 61.60 percent busy at peak times. The I/O rate was high considering how busy the disk was. This indicates that some of the I/O may have been cached, a situation which is more likely when the disk is performing primarily read operations. According to information collected from the system, disk device sda used the cfq scheduler.

The I/O rate on disk device sdb averaged 26.84 per second and peaked at 214.04 per second from 01:00:01 to 01:10:01. This would be a surprisingly high number for a conventional magnetic disk. The disk's transfer rate averaged 0.67 megabytes per second and peaked at 9.99 megabytes per second from 01:00:01 to 01:10:01. The size of an I/O averaged 25.55 kilobytes and peaked at 3319.13 kilobytes from 01:10:01 to 01:20:01. The read rate averaged 6.96 per second. The write rate averaged 19.88 per second. The write rate is much higher than the read rate. This disk was busy for an average of 1.69 percent of the time and was 25.68 percent busy at peak times. The I/O rate was high considering how busy the disk was. This indicates that some of the I/O may have been cached. According to information collected from the system, disk device sdb used the cfq scheduler.

The I/O rate on disk device dm-0 averaged 5.37 per second and peaked at 78.82 per second from 07:30:01 to 07:40:01. The disk's transfer rate averaged 0.04 megabytes per second and peaked at 1.08 megabytes per second from 07:30:01 to 07:40:01. The size of an I/O averaged 6.96 kilobytes and peaked at 187.41 kilobytes from 07:30:01 to 07:40:01. The read rate averaged 1.57 per second. The write rate averaged 3.80 per second. This disk was busy for an average of 0.17 percent of the time and was 3.01 percent busy at peak times.

The noatime option was not specified on any of the mounted filesystems. This option was checked because non-trivial levels of disk activity were seen. Disk activity on the busiest disk device, sda, peaked at 61.60 percent busy from 06:40:01 to 06:50:01.

The value of the ctrl-alt-del parameter was 0. The value of 0 is better in almost all cases because it prevents an immediate reboot if the ctrl, alt, and delete keys are pressed simultaneously.

During multiple time intervals ps -elf data indicated that there were a peak of 73 processes present. This was the largest number of processes seen with ps -elf but it is not likely to be the absolute peak because the operating system does not store the true "high-water mark" for this statistic. There were an average of 70.6 processes present.

Graph of the number of processes present

No runaway processes, memory leaks, or suspiciously large processes were detected in the data contained in file /opt/sarcheck/ps/20100505. No table was generated because no unusual resource utilization was seen in the ps data.

CAPACITY PLANNING SECTION

The section is designed to provide the user with a rudimentary linear capacity planning model and should be used for rough approximations only. These estimates assume that an increase in workload will affect the usage of all resources equally. These estimates should be used on days when the load is heaviest to determine approximately how much spare capacity remains at peak times.

Based on the limited data available in this single day of data, the system cannot support an increase in workload at peak times without some loss of performance or reliability, and the bottleneck is likely to be disk I/O. Implementation of some of the suggestions in the recommendations section may help to increase the system's capacity.

Graph of remaining room for growth

The CPU can support an increase in workload of at least 100 percent at peak times. For more information on peak CPU and disk utilization, refer to the Resource Analysis section of this report. The system was able to meet its memory needs with minimal activity to reclaim pages of memory. It should be able to handle a moderate increase in workload. The busiest disk can support a workload increase of approximately 0 percent at peak times. For more information on peak CPU and disk utilization, refer to the Resource Analysis section of this report.

CUSTOM SETTINGS SECTION

The default CAPDSK threshold was changed in the sarcheck_parms file from 75.0 to 55.0 percent. This value is likely to compromise the accuracy of the analysis.

The default HSIZE was changed in the sarcheck_parms file from 0.75 to 1.20. This was done with the WIDE entry in the sarcheck_parms file.

Please note: In no event can Aptitune Corporation be held responsible for any damages, including incidental or consequent damages, in connection with or arising out of the use or inability to use this software. All trademarks belong to their respective owners. This software is provided for the exclusive use of: Your Company. This software expires on 06/09/2010 (mm/dd/yyyy). Code version: SarCheck for Linux 6.02.11. Serial number: 00012345.

Thank you for trying this evaluation copy of SarCheck. To order a licensed version of this software, just type 'analyze -o' at the prompt to produce the order form and follow the instructions.

(c) Copyright 2003-2010 by Aptitune Corporation, Portsmouth NH 03801, USA, All Rights Reserved. http://www.sarcheck.com/

Statistics for system: lindev
  Start of peak interval End of peak interval Date of peak interval
Statistics collected on: 05/05/2010      
MAC Address: 00:02:B3:3A:41:58      
Average combined CPU utilization: 10.04%      
Average user CPU utilization: 6.23%      
Average sys CPU utilization: 0.47%      
Average 'nice' CPU utilization: 3.34%      
Peak combined CPU utilization: 42.96% 01:20:01 01:30:01 05/05/2010
Peak 'not nice' CPU utilization: 39.49% 08:10:01 08:20:01 05/05/2010
Average time in I/O wait 0.99%      
Average time servicing interrupts: 0.09%      
Average time servicing softirqs: 0.20%      
Average page out rate: 2122.30/sec      
Peak page out rate: 48061.68/sec 01:10:01 01:20:01 05/05/2010
Average swap out rate: 4.24/sec      
Peak swap out rate: 113.33/sec 07:20:01 07:30:01 05/05/2010
Average swap space in use: 3333.08 megabytes      
Peak swap space in use: 13289.88 megabytes 07:30:01   05/05/2010
Average amount of free memory: 7600500 pages or
29689.5 megabytes
     
Minimum amount of free memory: 338 pages or
1.32 megabytes
07:50:01   05/05/2010
Average system-wide I/O rate: 69.89/sec      
Peak system-wide I/O rate: 505.15/sec 07:30:01 07:40:01 05/05/2010
Average read rate: 34.34/sec      
Peak read rate: 480.57/sec 07:30:01 07:40:01 05/05/2010
Average write rate: 35.54/sec      
Peak write rate: 207.21/sec 02:30:01 02:40:01 05/05/2010
Disk device w/highest peak: sda
Avg pct busy for that disk: 4.25%      
Peak pct busy for that disk: 61.60% 06:40:01 06:50:01 05/05/2010
Avg number of processes seen by ps: 70.6      
Max number of processes seen by ps: 73      
Approx CPU capacity remaining: 100%+      
Approx I/O bandwidth remaining: 0.0%      
Can memory support add'l load: Moderate