Frequently Asked Questions About the Java HotSpot VM
The questions and answers are divided into the following topics.
The Java HotSpot VM Architecture and Use
Benchmarking the Java HotSpot Virtual Machine
The Java HotSpot VM Architecture and Use
This topic contains the following questions with answers.- What command-line options does the Java HotSpot VM provide for performance tuning, and how do the options relate to those provided by the Exact VM that shipped with previous Java 2 SDK releases on Solaris?
- My application runs slowly. Why?
- My application runs much slower with 1.3/1.4 when compared with the 1.2 Production release for Solaris. Why?
- What's the difference between the -client and -server systems?
- Where is Java HotSpot 2.0 for Solaris?
- Where do I get the server and client systems?
- Pause times are unneccessarily long, what parameters can I use to tune this?
- I have a program which uses a lot of threads and it is slowing down compared with 1.2.
- Does 1.3/1.4 have asynchronous GC?
- Are there any command-line flags/environment flags to control/observe the rate of Finalization?
- I'm seeing some thread starvation and erratic behavior using threads in my application, are there any tuning flags to help me out?
- I would like java to default to -server. I have a lot of scripts which I cannot change (or do not want to change). Is there any way to do this?
- I can't get profiling to work, what should I do?
- I keep running out of file descriptors, what should I do?
- 1.3.0 is so late on Solaris, will this continue to be a problem?
- My program isn't scaling with the number of processors.
- Should I pool objects to help GC? Should I call System.gc() periodically? Should I warm up my loops first so that Hotspot will compile them?
- How many processors will the HotSpot VM scale to?
- How do I profile heap usage? The VM prints "OutOfMemoryError" and exits. Increasing max heap size doesn't help. What's going on?
- What determines when softly referenced objects are flushed?
- My server application isn't getting faster, why? I/O is a problem, why?
- My application has a lot of threads and is running out of memory and it doesn't under the 1.2 production version for Solaris, why?
- I'm passing "-Xgenconfig:64m,64m,semispaces:128m,512m,markcompact -Xmx900m" to 1.2. Can you tell me what options should be passed to 1.3/1.4 to have a similar setup?
- What options do I need in order to use the "alternate" one-to-one thread model on Solaris 8 or 9?
- With 1.3.1 we got 4GB heaps on Solaris, why can't I get this to work on Windows?
- I'm getting lots of full GC's when I turn on -verbose:gc at regular intervals, I've tuned the heap and it makes no difference, what's going on?
- My application uses a database and doesn't seem to scale well. What could be going on?
What command-line options does the Java HotSpot VM provide for performance tuning, and how do those options relate to those provided by the Exact VM that shipped with previous Java 2 SDK releases on Solaris?
See the document Java HotSpot VM Options.
My application runs slowly. Why?
First, we need to make understand what percentage of time your application is running bytecodes. If your program is I/O bound or running in native methods, then there isn't much we can do to speed up the program with a new VM. The VM technology will speed up the time spent running in bytecode. Solaris graphical code under X uses native methods, so that is an example of code which the VM cannot make faster.
My application runs much slower with 1.3/1.4 when compared with the 1.2 Production release for Solaris. Why?
Assuming that you are running a lot of bytecode make sure that you are using the correct mode of the virtual machine. For applications where small footprint and fast startup are important, use -client. For applications where overall performance is the most important, use -server. Don't forget that -server or -client must be the first argument to java, -client is the default. If this isn't your problem, read on for more tuning parameters you can try, and also see Java HotSpot VM Options.
What's the difference between the -client and -server systems?
These two systems are different binaries. They are essentially two different compilers (JITs) interfacing to the same runtime system. The client system is optimal for applications which need fast startup times or small footprints, the server system is optimal for applications where the performance is most important. In general the client system is better on GUIs. Some of the other differences include the compilation policy used, heap defaults, and inlining policy.
Where is Java HotSpot 2.0 for Solaris?
Java HotSpot 2.0 for Windows is the server system. Java 2 SE 1.3 for Windows contains the client system. The Java 2 SE v 1.3.0 for Solaris contains both of the above systems in one distribution and you choose which system you want by specifying -server or -client. 'java -server' will give you the equivalent functionality of HotSpot 2.0 for Windows and 'java -client' will give you the equivalent functionality of the Java 2 SE 1.3 for Windows. The default is 'java -client'
Where do I get the server and client systems?
If you load the JRE or SDK on Linux or Solaris, from 1.3 and on, you'll have both client and server systems available. For Windows, if you download the JRE, you get only the client, you'll need to download the SDK to get the both systems.
Pause times are unneccessarily long, what parameters can I use to tune this?
There are several things to try in this arena. First, give -Xincgc a try. This uses the "Train" garbage collection algorithm, which attempts to collect a fraction of the heap instead of the entire thing at once. The train algorithm clusters objects that reference each other together, and collects these clusters individually. For most programs this results in shorter pauses, although throughput is usually worse.
Next, you might try decreasing the amount of heap used. A larger heap will cause garbage collection pauses to increase because there is more heap to scan. Try -Xmx32m. If your application requires more memory than you can adjust the size of the eden (young generation space) with -XX:NewSize=... and -XX:MaxNewSize=... (for 1.3/1.4) or -Xmn in 1.4. For some applications a very large eden helps, for others it will increase the times of minor collections. For most programs, collecting eden is much faster than other generations because most objects die young. If you currently invoke with something like:
try:-Xms260m -Xmx260m
which will dedicate 1/3rd of the memory to eden. For 1.3, MaxNewSize is set to 32mb on Sparc, 2.5mb on Intel based machines. NewRatio (the ratio between the young/old generations) has values of 2 on Sparc Server, 12 on client Intel, and 8 everywhere else, as you can quickly determine, this is superseded by MaxNewSize's defaults (rendering NewRatio ineffective for even moderately sized heaps). In 1.4, MaxNewSize has been effectively set to infinity, and NewRatio can be used instead to set the value of the new generation. Using the above as an example, you can do the following in 1.4:-Xms384m -Xmx384m -XX:NewSize=128m -XX:MaxNewSize=128m
Which gives a 2:1 ratio between old and new generations, since the old generation would be 256m and the NewSize would be 128m for a total of 384m.-Xms384m -Xmx384m -XX:NewRatio=2
If you are worried about the number of garbage collections, but less worried about pause times, then increasing the heap should cause the number of full garbage collections to decrease, this is especially true if you increase the size of the eden space as well.
Many systems have less efficient memory management than in HotSpot. To work around this, some programs keep an "object pool", saving previously allocated objects in some freelist-like data structure and reusing them instead of allocating new ones. But... Don't use object pools! Using object pools will fool the collector into thinking objects are live when they really aren't. This may have worked before exact garbage collection became popular (in the 1.x systems), but this is just not a good idea for any modern Java Virtual Machines.
See also Tuning Garbage Collection with the 1.3.1 Java Virtual Machine.
I have a program which uses a lot of threads and it is slowing down compared with 1.2.
We have an undocumented option, -Xconcurrentio, which generally helps programs with many threads, but is only available on Solaris (this is because Solaris offers more than one threading model). The main feature turned on with -Xconcurrentio is to use LWP based synchronization instead of thread based synchronization. We have found certain applications to speed up by over 40%. In 1.4, LWP based synchronization is the default, but -Xconcurrentio can still help since it turns on some other internal options. Finally, there is an alternate thread library which is the default on Solaris 9 and can also be used on Solaris 8. Please see Java and Solaris Threads Document
Does 1.3/1.4 have asynchronous GC?
No it does not, but this feature is planned for version 1.4.1. This is also known as concurrent GC.
Are there any command-line flags/environment flags to control/observe the rate of finalization?
Currently there are no controls for this, although they are planned for 1.5
I'm seeing some thread starvation and erratic behavior using threads in my application, are there any tuning flags to help me out?
There are a couple of things you can try on Solaris (unfortunately, there's only one thread model available with Windows/Linux). Using bound threads when the application uses few threads may benefit you. Try -XX:+UseBoundThreads. Also, you can try to load the alternate libthread.so in /usr/lib/lwp/ on Solaris 8 by changing your LD_LIBRARY_PATH to include /usr/lib/lwp before /usr/lib. Both of these give better throughput and system utilization for certain applications, especially those using fewer threads.
For applications using many threads, /usr/lib/libthread.so and unbound threads are still the best bet. Of course, see using -Xconcurrentio (above) for applications with many threads.
I would like java to default to -server. I have a lot of scripts which I cannot change (or do not want to change). Is there any way to do this?
Yes. You can change <InstallationDir>/jre/lib/jvm.cfg (the default installation on Solaris would be /usr/j2se) to have -server as the first uncommented line in the file. This will cause -server to be used as the default.
I can't get profiling to work, what should I do?
First, make sure you are running with -Xrunhprof and try -Xrunhprof:help to see the different kinds of profiling available. If you are still having problems (especially running out of memory) then version 1.3.1 will have significant improvements in the profiling system and will probably address your problems. In the meantime, you can either run the older VM (1.2.2 or HotSpot 1.0.1 for Solaris) or try using ctrl-\ to interrupt the java process and get a thread dump.
I keep running out of file descriptors, what should I do?
Certain applications will use a lot of file descriptors. The only thing that you can do is to set the number of file descriptors allowed on the system higher. The hard limit default is 1024 and the soft limit default is 64. To set this higher you need to modify /etc/system by adding the following 2 definitions:
to get them both at 4096. Now use ulimit (sh,ksh) or limit (csh) to increase the number of file descriptors. See man -s1 ulimit for details on how to do this.set rlim_fd_max = 4096 set rlim_fd_cur = 4096
1.3.0 is so late on Solaris, will this continue to be a problem?
No! In the future we will be releasing on all platforms simultaneously.
My program isn't scaling with the number of processors.
Scaling problems could be a multitude of things. First, your application may not be written in a scalable manner (if you use a lot of synchronization, for one example, or if you have only one thread, as another). It may also be that you are utilizing OS system resources which are not scalable. Finally, if you have many threads, it may be that garbage collection is getting in the way. Garbage collection in HotSpot is single threaded (a concurrent and parallel garbage collector will be available in version 1.5) which means that your Java threads must be suspended for each one. First, make sure your application isn't doing its own garbage collection via a "System.gc()" explicit call. This used to be a popular style when generational garbage collectors didn't exist (the 1.x versions of Java at Sun) but should no longer be necessary. Next, turn on -verbose:gc to view and time the collections. This will help determine if garbage collection is the problem. If it is, check out the question on pause times above to help with garbage collection issues. See also Tuning Garbage Collection with the 1.3.1 Java Virtual Machine.
The threading model used by the VM may be a problem. The Solaris threading model uses a many to many mapping of Java threads to OS threads. In other words, a single Java thread may be mapped to different OS threads during its lifetime, and any given OS thread may have more than Java thread.
You can try reducing the variability by using -XX:+UseBoundThreads. This will map the Java thread to a particular Solaris LWP (this option is not available on Windows/Linux).
One can also attempt, if on Solaris 8, to use the alternate libthread in /usr/lib/lwp /usr/lib/lwp/libthread.so will switch to a one level thread model. This disallows user level scheduling, as only the kernel will schedule the thread. If the thread blocks on a mutex, the LWP will block in the kernel. In general if there is high lock contention then the standard libthread will work better than the alternate libthread as the alternate libthread's operations are heavier weight.
Should I pool objects to help GC? Should I call System.gc() periodically? Should I warm up my loops first so that Hotspot will compile them?
The answer to all of these is No!
Pooling objects will cause them to live longer than necessary. The garbage collection methods will be much more efficient if you let it do the memory management. We strongly advise taking out object pools.
Don't call System.gc(), HotSpot will make the determination of when its appropriate and will generally do a much better job. If you are having problems with the pause times for garbage collection or it taking too long, then see the pause time question above.
Warming up loops for HotSpot is not necessary. HotSpot now contains On Stack Replacement technology which will compile a running (interpreted) method and replace it while it is still running in a loop. No need to waste your applications time warming up seemingly infinite (or very long running) loops in order to get better application performance.
See also Tuning Garbage Collection with the 1.3.1 Java Virtual Machine.
How many processors will the HotSpot VM scale to?
There are practical limitations to scalability, and often garbage collection will be a bottleneck when large numbers of processors are employed. Scalability is the #1 priority for our development team. Currently we run applications on 30 cpus and occassionaly more and we see throughput improvements for those applications which are written in a scalable way.
Try using -Xaprof to get a profile of the allocations (objects and sizes) of your application.
You can also try -Xrunhprof:heap=all (or other option, try -Xrunhprof:help for a list)
The VM prints "OutOfMemoryError" and exits. Increasing max heap size doesn't help. What's going on?
The Java HotSpot VM cannot expand its heap size if memory is completely allocated and no swap space is available. This can occur, for example, when several applications are running simultaneously. When this happens, the VM will exit after printing a message similar to the following.
If you see this symptom, consider increasing the available swap space by allocating more of your disk for virtual memory and/or by limiting the number of applications you run simultaneously. You may also be able to avoid this problem by setting the command-line flags -Xmx and -Xms to the same value to prevent the VM from trying to expand the heap. Note that simply increasing the value of -Xmx will not help when no swap space is available.Exception java.lang.OutOfMemoryError: requested <size> bytes
This issue is being tracked in bug 4697804.
What determines when softly referenced objects are flushed?
Starting with Java HotSpot VM implementations in J2SE 1.3.1, softly reachable objects will remain alive for some amount of time after the last time they were referenced. The default value is one second of lifetime per free megabyte in the heap. This value can be adjusted using the -XX:SoftRefLRUPolicyMSPerMB flag, which accepts integer values representing milliseconds. For example, to change the value from one second to 2.5 seconds, use this flag:
-XX:SoftRefLRUPolicyMSPerMB=2500
The Java HotSpot Server VM uses the maximum possible heap size (as set with the -Xmx option) to calculate free space remaining.
The Java Hotspot Client VM uses the current heap size to calculate the free space.
This means that the general tendency is for the Server VM to grow the heap rather than flush soft references, and -Xmx therefore has a significant effect on when soft references are garbage collected.
On the other hand, the Client VM will have a greater tendency to flush soft references rather than grow the heap.
The behavior described above is true for the current (J2SE 1.3.1 and J2SE 1.4 pre-release) versions of the Java HotSpot VMs. This behavior is not part of the VM specification, however, and is subject to change in future releases. Likewise the -XX:SoftRefLRUPolicyMSPerMB flag is not guaranteed to be present in any given release.
Prior to version 1.3.1, the Java HotSpot VMs cleared soft references whenever it found them.
My server application isn't getting faster, why? I/O is a problem, why?
If you're blocked doing I/O, then no matter which version of java you use you will not be able to speed this up. If your application is using many threads then try -Xconcurrentio to speed things up (Solaris only), this can make very large differences in throughput (we've noticed 40%+ improvement on certain applications).
My application has a lot of threads and is running out of memory and it doesn't under the 1.2 production version for Solaris, why?
You may be running into a problem with the default stack size for threads. With the 1.2 system, the default is 128k, but for HotSpot on Sparc it is 512K and HotSpot on Solaris Intel its 256k (with Linux Intel and Windows it is whatever the default stack size is when creating a thread in the OS).
Reduce your stack size by running with the -Xss option. For example:
64k is the least amount of stack space allowed per thread.java -server -Xss64k
I'm passing "-Xgenconfig:64m,64m,semispaces:128m,512m,markcompact -Xmx900m" to 1.2. Can you tell me what options should be passed to 1.3/1.4 to have a similar setup?
"64m,64m,semispaces" means a young generation consisting of a semispaces collector with an initial semi-space size 64m and a maximum semi-space size of 64m. In 1.2, generation sizing does not include any scratch areas, and in a standard semispace collector 50% of the space is scratch. In 1.3/1.4, generation sizing includes scratch (you can argue either way). For 1.3, the similar setting would be:
The SurvivorRatio is 64 for sparc-solaris, meaning that you get a 124mb eden and two 2mb semispaces. That is pretty small, but should work fine. To get closer to 1.2 behavior one can in addition set-XX:NewSize=128m -XX:MaxNewSize=128m
which will provide a 64mb eden and two 32mb semispaces.-XX:SurvivorRatio=2
The default value for MaxNewSize is 32mb (in 1.3 Solaris Sparc, 2.5mb on 1.3 Intel), so the 1.2 setting above gives a 4X larger new generation.
"128m,512m,markcompact" means a mark-compact old generation with initial 128mb size and max 512mb size. In 1.3/1.4, the old generation is sized as the remaining heap size not occupied by the young generation, so you should specify
and maybe -XX:SurvivorRatio=2.-XX:NewSize=128m -XX:MaxNewSize=128m -Xms256m -Xmx900m
What options do I need in order to use the "alternate" one-to-one thread model on Solaris 8 or 9?
On Solaris 8 you need to add /usr/lib/lwp to your LD_LIBRARY_PATH. On Solaris 9 (or higher) the one-to-one model is the default and you do not need to do anything.
With 1.3.1 we got 4GB heaps on Solaris, why can't I get this to work on Windows?
It's due to fragmentation of the address space. After Windows has loaded all it's stuff, plus the java dlls, the largest available continguous chunk of addresses for the heap is about 1.4-1.6G on Win/NT. Might be smaller on Win/98.
If you really need more space they can try rebasing java.exe and the JDK dlls higher. But that won't buy you much (maybe a couple hundred meg).
I'm getting lots of full GC's when I turn on -verbose:gc at regular intervals, I've tuned the heap and it makes no difference, what's going on?
If you're using RMI, then you could be running into distributed GC. Also, some applications are adding explicit GC's thinking that it will make their application faster. Luckily, you can disable this with an option to 1.3/1.4. Try -XX:+DisableExplicitGC along with -verbose:gc and see if this helps.
My application uses a database and doesn't seem to scale well. What could be going on?
Oracle provides two types of database drivers: a type-2 driver, called the OCI (Oracle Call Interface) driver that utilizes native code, and a type-4 pure Java driver called the thin driver. In single processor environments, the thin driver works somewhat better than the OCI driver because of the JNI overhead associated with the OCI driver. On multi-processor configuations, synchronization points within Solaris used by the OCI driver become big bottlenecks and prevent scaling. We recommend using the thin driver in all cases.
Benchmarking the Java HotSpot VM
This topic has the following questions and answers.- I write a simple loop to time a simple operation and HotSpot looks even slower than Java 2 SDK. What am I doing wrong? Here's my program
- I'm trying to time method invocation time. I don't want there to be any extra work done, so I'm using an empty method. But when I run with HotSpot I get times that are unbelievably fast.
- Okay, so I'll put some random code in the body of the method so it's not empty and the inlining can't just remove it. Here's my new method
- I'm trying to benchmark object allocation and garbage collection. So I have harness...
- I have a graphics-intensive or GUI-based program. I've tried it on HotSpot and it doesn't seem to perform much better than the Java 2 SDK, and only slightly better than on JDK1.1.x implementations. Why isn't HotSpot making my graphics code go faster?
- What do you recommend for benchmarking HotSpot, or any virtual machine?
I write a simple loop to time a simple operation and HotSpot looks even slower than Java 2 SDK. What am I doing wrong? Here's my program:
public class Benchmark { |
You are writing a microbenchmark.
Remember how HotSpot works. It starts by running your program with an interpreter. When it discovers that some method is "hot" -- that is, executed a lot, either because it is called a lot or because it contains loops that loop a lot -- it sends that method off to be compiled. After that one of two things will happen, either the next time the method is called the compiled version will be invoked (instead of the interpreted version) or the currently long running loop will be replaced, while still running, with the compiled method. The latter is known as "on stack replacement" and exists in the 1.3/1.4 HotSpot based systems.
In the meantime, if you insist on using/writing microbenchmarks like this, you can work around the problem by moving the body of main to a new method and calling it once from main to give the compiler a chance to compile the code, then calling it again in the timing bracket to see how fast HotSpot is.
I'm trying to time method invocation time. I don't want there to be any extra work done, so I'm using an empty method. But when I run with HotSpot I get times that are unbelievably fast. Here's my code:
public class EmptyMethod { |
Empty methods don't count. And you are also seeing that generated code is sensitive to alignment.
The call to the empty method is being inlined away, so there really is no call there to time. Small methods will be inlined by the compiler at their call sites. This reduces the overhead of calls to small methods. This is particularly helpful for the accessor methods use to provide data abstraction. If the method is actually empty, the inlining completely removes the call.
Code is generated into memory and executed from there. The way the code is laid out in memory makes a big difference in the way it executes. In this example on my machine, the loop that claims to call the method is better aligned and so runs faster than the loop that's trying to figure out how long it takes to run an empty loop, so I get negative numbers for methodTime-loopTime.
Okay, so I'll put some random code in the body of the method so it's not empty and the inlining can't just remove it. Here's my new method (and the call site is changed to call method(17)):
public static void method(int arg) { int value = arg + 25; }
The HotSpot compiler is smart enough not to generate code for dead variables.
In the method above, the local variable is never used, so there's no reason to compute its value. So then the method body is empty again and when the code gets compiled (and inlined, because we removed enough code to make it small enough for inlining) it turns into an empty method again.
This can be surprising to people not used to dealing with optimizing compilers, because they can be fairly clever about discovering and eliminating dead code. They can occasionally be fairly stupid about it, so don't count on the compiler to do arbitrary optimizations of your code.
Dead code elimination also extends to control flow. If the compiler can see that a particular "variable" is in fact a constant at a test, it may choose not to compile code for the branch that will never be executed. This makes it tricky to make microbenchmarks "tricky enough" to actually time what you think you are timing.
Dead code elimination is quite useful in real code. Not that people intentionally write dead code; but often the compiler discovers dead code due to inlining where constants (e.g., actual parameters to methods) replace variables, making certain control flows dead.
I'm trying to benchmark object allocation and garbage collection. So I have harness like the one above, but the body of the method is:
public static void method() { Object o = new Object(); }
That's the optimal case for the HotSpot storage manager. You will get numbers that are unrealistically good.
You are allocating objects that need no initialization and dropping them on the floor instantly. (No, the compiler is not smart enough to optimize away the allocation.) Real programs do allocate a fair number of short-lived temporary objects, but they also hold on to some objects for longer than this simple test program. The HotSpot storage manager does more work for the objects that are retained for longer, so beware of trying to scale up numbers from tests like this to real systems.
I have a graphics-intensive or GUI-based program. I've tried it on HotSpot and it doesn't seem to perform much better than the Java 2 SDK, and only slightly better than on JDK1.1.x implementations. Why isn't HotSpot making my graphics code go faster?
Graphics programs spend a lot of their time in native libraries.
The overall performance of a Java application depends on four factors:
- The design of the application
- The speed at which the virtual machine executes the Java bytecodes
- The speed at which the libraries that perform basic functional tasks execute (in native code)
- The speed of the underlying hardware and operating system
HotSpot is a replacement for the Java 2 SDK virtual machine. The virtual machine is responsible for byte code execution, storage allocation, thread synchronization, etc. Running with the virtual machine are native code libraries that handle input and output through the operating system, especially graphics operations through the window system. The HotSpot virtual machine uses the same native code libraries that the Java 2 SDK uses, so programs that spend significant portions of their time in those native code libraries will not see their performance on HotSpot improved as much as programs that spend most of their time executing byte codes.
In addition, HotSpot is a Java 2 virtual machine, and so graphics operations go through the new Java2D APIs. These APIs are significantly more featureful than the old AWT APIs, but come with an overhead not present in JDK 1.1.x systems.
This observation about native code applies to other native libraries that come with the Java 2 SDK, or any native code libraries that you happen to use with your application.
What do you recommend for benchmarking HotSpot, or any virtual machine?
We like to use the SPEC JVM98 benchmark. We use it for tracking our own progress over time, and we use it for comparing ourselves to other virtual machines.
The SPEC JVM98 benchmark was developed by a consortium of interested vendors under the auspices of the Standard Performance Evaluation Corporation (SPEC). It is the only industry-standard benchmark for Java platforms. The benchmark is a collection of kernels from several types of programs, most of them based on real applications. The benchmark seems to have a good mix of operations and realistic behaviors (method invocations, storage allocation and lifetimes, input and output). We find that the benchmark is predictive of the performance we see across a number of real applications. It comes with an easy-to-use harness that ensures that it is run the same way on all platforms, so fair comparisons can be made between platforms. The SPEC JVM98 benchmark is available from http://www.spec.org/osg/jvm98/.
Other than that, we like benchmarking real applications. Those are usually harder to obtain, somewhat more difficult to run, and more difficult to compare one against the other.