JVM Thread Actual Memory Usage

As containerized environments become prevalent (like Kubernetes), knowing the physical memory that will be used by the JVM becomes ever more important. Typical thread-per-request web frameworks can easily use thousands of threads, which can contribute to the memory footprint. In this article I explore the base amount of physical memory a JVM thread stack uses on Linux. This can help guide decisions on sizing of thread pools.

The short summary

A thread stack will use 16kB of physical ram if it does nothing else but sleep. That is the base overhead of a JVM thread. Further stack memory consumption depends on the things you put in the stack.

This is tested on my local machine against the 64-bit Hotspot 1.8 JVM, with a thread which simply does Thread.sleep(); using the default -Xss1024k.

The details

The JVM has a flag called -Xss which specifies the stack size of a thread. By default in 64-bit JVMs, this is 1MB. That doesn’t mean each thread will actually consume 1MB of physical resources. All it means is the JVM will malloc 1MB for each thread for its stack.

All modern operating systems lazily allocate physical ram. So a malloc simply earmarks the virtual pages – they won’t get allocated physical pages until they are actually used.

What this means is the actual stack memory usage of a thread is determined mostly by the objects you stick in the stack and depth of method call hierarchies. A single run() method that Thread.sleep() seems to consume 16kB.

The experiment

You can verify this yourself on a Linux machine.

public class ThreadUsage {
    public static void main(String args[]) {
        for (int i = 0; i < 1024; i++) {
            new Thread() {
                public void run() {
                    System.out.println("Sleeping... " + Thread.currentThread().getName());
                    try {
                        Thread.sleep(1000000000);
                    } catch (InterruptedException e) {
                        throw new RuntimeException(e);
                    }
                }
            }.start();
        }
        try {
            Thread.sleep(1000000000);
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
    }
}

Compile this with javac ThreadUsage.java, then run it with java ThreadUsage.

Next, view its allocations (Linux only):

jcmd -l # find the pid
sudo cat /proc/<pid>/smaps | less

If you scroll down you should be able to find the stack allocations – they each happen separately for each thread, and there will be 1025 of them. They look like this:

7f5fc8bff000-7f5fc8cfd000 rw-p 00000000 00:00 0 
Size:               1016 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  16 kB
Pss:                  16 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:        16 kB
Referenced:           16 kB
Anonymous:            16 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB

The malloc‘d stack size is 1016 kB. You can play with -Xss (like -Xss512k), and you’ll see this value adjust accordingly. (I’m not sure where the 8kB missing goes.)

Rss (resident set size) represents the physical pages allocated for this stack. As can be seen, it’s 16kB.

2 thoughts on “JVM Thread Actual Memory Usage

    1. Physical memory will be allocated on demand. So it depends on what the thread does – if it allocates stuff on the stack and does recursive calls, then the actual stack memory usage grows. As shown the 16kB seems to be the minimum bound.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.