Featured

Memory Model of Java Virtual Machine

This is the post excerpt.

Advertisements

The Java Virtual Machine is an abstract computing machine. It is responsible for most of the features which made Java a great technology. Few of the major selling points of Java was:

  • Platform Independence
    • Hardware Independence
    • Operating System Independence
  • Security from malicious programs

It is said to be abstract computing machine because it offers all the specification which if implemented properly can act as a machine to execute instructions.

Sun Microsystems (now acquired by Oracle) offers an implementation of the JVM. The JVM understands a particular binary format which is called the class file format. The JVM is capable of executing the instructions in the class file, which specifically means that if you can produce a valid class file, it can be executed by the JVM. Hence, any language (not necessarily Java) if can produce a valid class file, the JVM will accept it.

The Memory Model of Java Virtual Machine

It is absolutely alright if you are not familiar with the memory model of the underlying JVM. Most of the times we do not necessarily need it but sometimes this knowledge can save you a lot of setup and debugging hours. Mostly in memory intensive applications where the volume of data to be processed is in GBs like the available caching solutions ehcache, Terracota.

Certainly the knowledge of the JVM memory model is beneficial if you are working on such products.

We must understand that there can be multiple threads of execution in a running JVM and hence it needs a lot of memory to cater to these threads and for its internal working.

new.png

At run time the JVM manages certain areas of the memory allocated to it. These are called Run Time Data Areas and I divide it in two major categories

  1. Shared Area : one created at JVM startup and destroyed at JVM exits.
  2. Managed Per Thread : one created per thread and destroyed when the thread exits.

Here are the run time data areas

Program Counter Register

This area is managed per thread, this means that each thread of execution will have its own program counter. At any point of time, it contains an address which is the address of the virtual machine instruction being executed.

The thread of execution at any point in time is executing an instruction which is present in a method. In case the method is a native method, the program counter register will be undefined. In rest all the cases it will contain the address of the instruction.

The Java Virtual Machine Stack

This area is managed per thread, hence each thread will have one private stack for itself. This stack is used to store frames.

A frame is used to store data and partial results, perform dynamic linking and dispatch exceptions. A frame is very specific to a method. Each invocation of a method results in creation of a new frame. This frame is pushed on to the stack of the thread which is executing the method. When the method execution is completes (normally or abruptly), the frame is destroyed.

Implementations of the abstract machine (JVM) can choose to either have a fixed size stack or a dynamic size stack. It can provide options to the user to control the size of the stack by supplying the size (in case of fixed size) or a minimum and maximum allowable size (in case of dynamic size).

StackOverflowError: Now that we are discussing about the JVM Stack, it is necessary that we talk about the StackOverflowError. As we know that the stack can be of fixed size or dynamic size.

Assuming that the stack is of fixed size, and one of our computation demands more stack (for e.g.: a recursive method which keeps on pushing frames on the stack) and hence requires more size than permitted. In such a scenario it would be impossible to allocate more space and it will throw a StackOverflowError.

OutOfMemoryError: In case the stack is of dynamic size and some computation needs more space, then the JVM will attempt to increase the size of the stack within the permissible max size. In case the system memory is not free and the size cant be increased, it will throw a OutOfMemoryError.

Also, when a new thread is started, the JVM tries to allocate memory for the stack. If the system memory is not available for creation of this stack, it will throw a OutOfMemoryError.

Native Method Stacks

This data area is similar to the JVM stack and it is managed per thread. This is needed for the execution of the native methods. The same exceptions are associated with these stacks as well. So, everything is almost similar to the above section except for the fact that they are sometimes also called C Stack.

Above were the data areas managed per thread. Now let us discuss the shared ones.

The Heap

This area is created at the time of JVM startup and is destroyed when the JVM stops. The heap is shared among all the threads of execution.

What is the heap used for?

The heap contains a lot of other data areas in general which we will explain in detail in the sections below.

  1. The memory for class instances or arrays is allocated from the heap. Which means, the moment a thread executes a statement like int[] X = new int[10], a memory block to store this array is allocated from the heap. A JVM implementation may have an  automated storage management system (we know it as the garbage collector). The block of memory which is allocated for the objects and arrays can be de-allocated by the storage management system when it is appropriate (here appropriate is a complex term and needs a separate discussion).
  2. Method Area is logically a part of the heap. Hence, the heap must provide space for method area. Method Area is shared among all threads of execution. The method area stores the per class structures, for e.g.: field and method data, the run time constant pool, code for method and constructors and special initialization methods like <clinit> or <init>
    1. Run-time Constant Pool  is logically a part of the method area and it is a per class or per interface representation of the constant_pool table in a class file. It is created when the class or interface is created. It contains constants like numeric literals and string literals known at compile time, for e.g.: String s = “techieme.in” or int k = 10. It also contains field of method references which must be resolved at the run time.

Error condition associated with the Heap

As we said about the stack, the heap can also be of fixed or dynamic size. If any computation requires more heap than can be made available by the automatic storage management, then it will throw a OutOfMemoryError.

The same error can occur if it is not able to create the method area or run time constant pool.

References : https://docs.oracle.com/javase/specs/jvms/se7/jvms7.pdf

Java Virtual Machine (JVM) Startup

In the last post we had a brief idea about the memory model of the Java Virtual Machine. We also came across certain terms like the Creation, Linking, Loading and Initialization of classes. In this post we will try to improve our understanding of all these terms.

jvmstartup

At the Java Virtual Machine Startup

The JVM starts up by creating an initial class and loads it using the bootstrap class loader (we will learn about the class loader in another post). Some implementation of the JVM allow us to pass the name of the initial class using the command line.

The JVM then links this initial class. Linking in itself is a very complex process which has many steps like verification, preparation, resolution, access control, method overriding.

After linking, the JVM initializes the class and invokes the method with the signature public static void main(String []args) . The main method may result in linking other classes and in turn execution of additional methods as well.

Now let us discuss all these steps in detail

Creating & Loading Classes

Creation of a class C means creating the internal representation of the class in the JVM. This is done in the method area . The internal representation is specific to implementation of the JVM.

A class is loaded by a class loader, the loading process is a two step process where the first step is to create an array of bytes representing the ClassFile structure. The second step is to define the class using the defineClass  method which derives the class from the array of bytes.

We will discuss more about class loaders and their types in a different post. But this is the step where you get most of the ClassNotFoundException or NoClassDefFoundError

Linking

This is the process which takes place after the class loading, it involves verifying and preparing the class or interface and its direct parents(direct superclass or direct superinterface). It may also include verifying and preparing the element type if the class is an array type. There are scenarios when linking can result in creation of new data structures as one class can be dependent on another classes. There is a possibility of getting an OutOfMemoryError.

Verification

The JVM enforces structural constraints which must be satisfied by a class or interface. The process of verification ensures that the binary representation of the class or interface does satisfy the structural constraint.

This may also initiate loading other classes of interest but the new classes may not get verified as a part of the verification of this class.

As a part of the verification process, there is a chance of getting the VerifyError in case the structural constraints are not satisfied.

Preparation

This phase involves creation of static fields of a class or interface and initializing them with their default values. This does not execute the explicit initializer for the static fields.

For e.g.: The statement  static int i = 100; is a two step process. The first step would be static int i = 0; and the second step would be i = 100;  In the preparation phase the first step is executed. If we have any static initializer blocks, even these are not executed in the preparation phase. These static blocks will be executed in the initialization phase.

Resolution

Each class and interface has its own run time constant pool which contains the symbolic references of the other types (classes or interfaces or array types) which are required or this class to be usable.

Resolution is the process of resolving these symbolic references and determining their concrete values. If an error occurs in this process, then a IncompatibleClassChangeError must be thrown at the same point where the symbolic reference is being used.

There are various things to be resolved as per the specification. For e.g.: Class and Interface Resolution, Field Resolution, Method Resolution, Interface Method Resolution, Method Type and Method Handle Resolution and Call Site Specifier Resolution.

It is alright if you do not know enough about these resolutions. We may discuss it in some other post dedicated solely for resolution.

Initialization

This phase consists of executing the initialization method of the class or interface. Now this is a tricky part, we all know that JVM supports multi threading and there can be possibilities that the same class initialization is triggered from various threads. May be due to recursive initialization requests. The JVM has full responsibility of synchronizing this initialization activity.

An initialization lock is acquired by a thread before initiating the Initialization. Once the lock is acquired the initialization starts.

Exiting the Java Virtual Machine

The Java Virtual Machine exits when some thread invokes the exit method of class Runtime or class System, or the haltmethod of class Runtime, and the exit or halt operation is permitted by the security manager.

Conclusion

This is a brief explanation of the phases which are executed after the JVM starts. We will dwell deep into the class loading and related issues and the resolution of symbolic references in some detailed post.

I have tried to put these steps in a very simplistic manner, if you want more details about these look out for my next post.

Keep reading and stay subscribed 🙂