Design Specifications

General Notes On Java Performance

Sun and others have showed that raw performance of Java code can approximate that of C code. While impressive, I do not believe that performance issues lays on raw execution alone. The problem with Java is not how fast it can iterate through a loop, perform method calls or allocate memory space.

Rather the result lays in the liberal use of object oriented design in the Java language. Let's first and foremost say on the record that I applaude object oriented design and programming. Software so designed is easier to debug, maintain, extend and integrate. And Java has object oriented design features ingrained to its very bones.

Because Java advocates and requires the use of object oriented features to perform even the most mundane task (except, that is, for tight performance measuring iterations), it will ever be slower than older programming languages, C and C++ in particular, that still advocate the use of tight procedural code. Consider, for example, the case of reading a line from a file and translating all its letters to upper case:

Even if Java and C/C++ had the exact same raw performance, the Java operation calls for more memory allocation, object creation/destruction sequences, and method calls to achieve the exact same task.

From a programmer's perspective there are many benefits to the Java approach. The buffer size may be machine dependent, so the application will have the best I/O utilization on different platforms. The data input stream object accepts different line termination sequences and can translate between different character encoding, again a portability issue. The use of dynamically allocated strings assures no buffer overflow problems, a major issue in C/C++ programs (and major security concern). Finally, immutable strings allows passing by value, thread safety,

Moving from C++ to Java, a Java application can be expected to be easier to develop and maintain, easier to upgrade, much more reliable and fault tolerant, impose less security and data integrity risks, and overall perform better, given a better computer. But it will consume more memory and require more processing power.

This change is akin to the shift from command line based applications to a GUI environment. Immediately memory and CPU requirements rise, as applications seem to work slower and take more time to load and operate. Yet, the gain in useability seems to be worth the slowing down of the system.

While this is all true, and acceptable, on the client side, it introduces a major problem on the server side, especially in a harsh environment as the Internet. Web sites expect, and go to every effort to assure, a rapidly growing user base. A successful Web site is one that quickly grows beyond the capabilities of the hosting machine and much faster Moore's law – it might double it's load in an Internet year, while hardware plays catch up in normal human years. Clearly, any program that slows down the server does not belong out there.

Yet, as said before, Java has its benefits in development time, maintainance and reliability. These benefits alone would outweight the cost involved in purchasing faster and better hardware, if the performance difference is kept reasonable. There are three ways to tackle this problems:

      1. Design and build a distributed system, one which can be disparated across a number of servers to reduce load and improve performance at the cost of hardware

      2. Employ a different design that is less object oriented and therefore faster, for example, the use of sockets for exchaning information rather than high-level CORBA or RMI

      3. Write optimized code for critical sections

The first approach makes the most sense across the larger system definition: on paper it provides infinite scalability. Distribution also brings other benefits, such as load balancing and fault tolerance. Yet, distribution has its limitations, one of them being the cost of installing and maintaining more hardware. Some applications simply do not distribute very well, or if they distribute, an heavily loaded part has to be installed on a single machine.

The second approach is to