|
Computer applications deliver their business value
only after they have been implemented in production. This basic
fact is often overlooked -- and is the focus of another Technology
Strategists service, Production
Readiness Assessments. Over time, even a well implemented
application can slow down or become unstable -- diminishing the
flow of business value or making it increasingly expensive.
The common response to poor application performance may be to
invest in newer/faster/larger hardware. The falling prices of
computer hardware makes this a very attractive approach. Ask any
hardware vendor, they will be more than happy to make a new sale
-- and it will address the problem in some cases. But what of the
cases where it doesn't? Or where the cost of the proposed upgrade
is too high? Or the upgrade did not correct the performance
problem? Technology Strategist's approach to these
types of problems is to apply a prescriptive, analytical approach
to the business system as a whole, not just the troublesome
application. Understanding how the application realizes business
value can be very helpful in diagnosing and developing practical
solutions to the problem at hand. Just as knowing
specifically what the perceived problems are -- subjective
measurements of unhappiness are very difficult to address.
Objective measures are essential to focus the assessment and
define completion criteria. System utilization
measurements of all involved devices are an essential component.
All too often, applications in trouble are not monitored for
resource utilization, so early warnings of impending problems are
just not seen. Essentially all computer systems have some
sort of performance/utilization measurement tool -- Windows has
perfmon, Unix/Linux has sar, VMS has monitor, etc. Many
databases have built-in performance and accounting tools.
The general problem with any of these tools is that there is far
too much detail -- experience helps in weeding out extraneous
details and associating business activity with system workload.
This is particularly true of network activity monitoring -- an
essential component of all layered applications. In general,
though, the key measures are processor utilization and disk
activity, coupled with statistics on the business volumes being
processed. Usually the processor is either waiting
for the disk or for user response. Batch jobs inherently should
consume all available resources -- processor or disk.
(Memory shortages have become rare -- with virtual memory
management, memory shortages manifest themselves as excessive disk
activity to page/swap files.) Performance of the storage subsystem
can often be directly estimated from the length of time the
processor was not busy during a job. But the quantity of IO
activity must be knowable -- this can take some work on platforms
that normally do not report batch resource usage (like Windows).
Systems that may mis-report actual disk activity, like
external intelligent storage arrays, often have local management
tools to show the true activity picture rather than the logical
perspective shown by the host. If the application is
essentially waiting for the disk, a faster processor would likely
just create more expensive idle time. And if the volume of
information being moved is large and cannot be reduced, there may
be few solutions available that do not involve making application
or database changes. It is not at all unusual to find that
performance issues may be application design issues in disguise --
not all of which may be solvable. Technology
Strategists has investigated a number of performance issues where
the resource monitor tools were displaying paradoxical
information. These were situations where the business response was
slow, but apparently neither the processor nor disk appeared busy.
In one case, RAID5 had been used to manage a write-intensive
database through an external controller. Inspection of the
controller monitors showed that the disks were fully utilized with
repeated updates of the RAID5 parity values. Disk usage from the
host (a Sun/Solaris system) only showed the logical requests. When
this was corrected, by changing to a RAID0+1 architecture, the
next bottleneck to be exposed turned out to be a design flaw in
update serialization that ultimately required program redesign to
address. This is the typical story with performance --
peeling back each problem reveals another, underlying bottleneck
that must in turn be addressed. This continues until the desired
performance objectives are achieved or the cost/benefit curve
becomes unattractive. At times it may not be
recognized by the operations staff that 'normal' system activity
is actually a problem. This is easy when there is staff turnover
or degradation has occurred gradually over time. The consultant
can provide a fresh perspective to question what is normal anyhow
-- perhaps maintenance activities were needed? Or an archiving
process is required to cut the database down to size? And what
should be monitored regularly -- and within what limits should it
operate? Establishing the relationship between
visible, countable chunks of business activity and the underlying
system responses can also help in understanding whether the
expectations of performance are realistic. Business transactions
can sometimes be very complicated -- when these are serialized for
processing the resultant response times can be quite long even
though the individual system operations are quick. Web-based
applications (or indeed any multi-tier application) can be
particularly vulnerable to this problem when integrating
information from distributed locations in cyberspace. Building a
transaction map to decompose business operations into system
activities can be a valuable approach to solving performance
problems -- does it need to be faster, less or done differently?
In final analysis, performance should be looked upon as a
business requirement that must be specified as part of the initial
application requirements. By establishing a service level
agreement to document the performance requirement, measurement
approach and resolution strategy, future problems can be avoided.
Also, if developers and software vendors are aware up front as to
the specific performance requirements for an application, it
is likely that different design and implementation decisions may
be made. It is all a part of having (and maintaining) the
right tool for the job. |