Threading Primitives in WebKit (Briefly)

The nice overview of threading primitives in Chromium is posted in Chromium Advent Calendar. As Chromium/Blink maintain their own threading primitives, threading primitives in WebKit are also largely changed from the fork. In this post, I introduce threading primitives in WebKit briefly.

WTF::Thread

In old days, we have ThreadIdentifier, which is actually uint32_t identifier to a thread. We have an internal hash table between ThreadIdentifier and PlatformThread like pthread_t.

createThread(String name, Function) -> ThreadIdentifier
waitForCompletion(ThreadIdentifier) -> bool

However, this design has several problems. First, this has very limited extensibility: attaching various additional information to thread is not as easy as we extend class. Second, this interface requires looking up the corresponding thread handle from the hash table every time we call threading operations. Finally, and the worst problem is that we cannot know whether the given ThreadIdentifier is used. The following code explains the above problem. We cannot manage the lifetime of the holder of the thread (in the example, ThreadIdentifier). This means that ThreadIdentifier must be monotonically increasing.

ThreadIdentifier thread1 = createThread(...);
waitForCompletion(thread1);

ThreadIdentifier thread2 = createThread(...);
// thread1 should not be the same number to thread2 since someone would retrieve some information from thread1.

Thread class in WTF (of course, WTF stands for web template framework) is a brand new abstraction over native threads to solve the above problems. This is ref-counted Thread object. Thread::create(...) -> Ref<Thread> creates it and it offers threading operations as its member functions. For example, thread->waitForCompletion() is join function in WebKit threading.

{
    Ref<Thread> thread = Thread::create("thread name", [&] {
        ...
    });
    thread->waitForCompletion();

    // Thread object is live. But thread is already finished.
}
// Thread is destructed.

This Thread class is portable. It just works (TM) on macOS, Linux (and UNIX environments including FreeBSD), and Windows. It is important to build advanced features on the top of this Thread abstraction.

Thread has one on one correspondence between a native thread and Thread. This ref-counted object is held in thread local storage (TLS) and retained while Thread is running. We can get the current thread from TLS by calling Thread::current() -> Thread&. So, for example, checking whether the given Thread is the current one is done by thread == &Thread::current() pointer comparison.

And user of this thread can retain Thread to perform threading operations onto it. Since Thread is ref-counted, Thread is destroyed when nobody retains it. When (1) no users retain this thread and (2) the thread itself finishes, Thread will be destructed.

By introducing Thread, (1) we can easily attach any information to Thread as we want. Moreover, (2) we can manage the lifetime of the holder of the thread by ref-counted Thread object. We can destroy Thread when it is no longer used. When using ThreadIdentifier, we were not able to recycle unused ThreadIdentifier since it may be used in some places.

Note that Thread::current() works even if we call it from non-WebKit-created threads (a.k.a. external threads). At that time, Thread is created and stored in TLS. If the thread finishes, this TLS and held Thread are automatically destroyed.

Advanced features of Thread

One of the interesting aspect of our Thread is that it has bunch of advanced features that are not typically offered in standard libraries like C++ std::thread. This is derived from the fact that our WTF library is tightly coupled with JavaScriptCore (JSC). Thread offers advanced features that is necessary for JSC.

For example, Thread::suspend() -> Expected<void, PlatformSuspendError> is platform independent way to suspend the thread. This is portable (working in macOS, Linux, and Windows) and used for garbage collection (GC)’s stop the world¹. There is a list of such advanced features in our Thread ;). They are building block of our GC in JSC.

Thread::suspend() -> Expected<void, PlatformSuspendError>
Thread::resume() -> void
Thread::getRegisters(PlatformRegisters&) -> size_t
Thread::stack() -> const StackBounds&

In POSIX environment, we have further features.

Thread::signal(int) -> bool

While macOS and Windows have platform APIs to suspend and resume threads, Linux does not have such one. We implement it by using POSIX signal and semaphore, which is typical way to implement stop-the-world GC operation.

Locking

I do not say much about locking in this post since here is very nice blog post in webkit.org. This offers WTF::Lock and WTF::Condition.

WTF::ThreadGroup

Grouping live threads is useful. Consider multi-threaded environment, various threads take a lock of one VM, run JS, and release the lock. When GC happens, conservative GC would like to scan the stack and registers of live threads that touch this VM before.

ThreadGroup offers exact this feature. We can add Thread to ThreadGroup. When Thread finished its execution, Thread cooperatively removes itself from added ThreadGroups. If you take a lock of ThreadGroup, all the threads included in this ThreadGroup is kept alive until the lock is released. We can iterate live threads in ThreadGroup and suspend each thread to perform stop-the-world.

While the concept of ThreadGroup is simple, its implementation is a bit tricky. Thread can concurrently finish, and be removed from ThreadGroup. Any thread can add any Thread to multiple ThreadGroups at any time. If you are interested in the implementation, you can look into the change.

WTF::WorkQueue and WTF::AutomaticThread

WorkQueue is simple abstraction. We can put a task (Function) to queue, and thread running inside the WorkQueue polls and run the task.

There is also a similar abstraction to WorkQueue: AutomaticThread. This is very fancy feature used in JSC. AutomaticThread can poll tasks and run them. While WorkQueue can take any functions as its tasks, AutomaticThread implements the body of the task in its virtual member, but semantics is very similar. The difference is that Thread is automatically destroyed when AutomaticThread becomes idle more than 10 seconds.

Reducing threads significantly affects on the memory consumption of the browser. Outstanding example is malloc. Recent malloc library uses TLS to gain high performance in multi-threaded environment. For example, various malloc implementations (including bmalloc in WebKit) have synchronization-free cache in TLS to speed up the fast case. This cache remains until the thread is destroyed!

AutomaticThread is mainly used for concurrent JIT compiler threads in JSC.

WTF::ParallelHelperPool

ParallelHelperPool is an interesting thread pool which is intended to be shared by multiple parallel tasks. We have a task that can be executed in parallel manner e.g. GC’s parallel marking. We set this task to the pool. And the pool run this task in parallel. As is noted in the code, this abstraction is suitable for the use case: there are multiple concurrent tasks that may all want parallelism. Since threads are managed by AutomaticThread, threads will be automatically destroyed if the pool is not used. Currently it is used for GC’s parallel marking (And recently, this thread also performs parallel marking constraint solving).

WTF::ThreadMessage

ThreadMessage is a feature executing lambda while suspending a specified thread. It is constructed on Thread::suspend and Thread::resume.

sendMessage(thread, [] (PlatformRegisters& registers) {
    ...
});

Since suspend and resume are portable, this ThreadMessage feature is also portable. By using sendMessage, we can modify some data while suspending a thread.

In WebKit, it is used to insert trap in running VM (called VM trap). One example is terminating a running VM without introducing runtime cost. You may see a dialog like “Your JS takes too much time. Do you want to stop it?”.

In JSC, we have check_trap bytecode. In optimizing compiler, it just emits nop. When we would like to terminating a running VM, we sendMessage to a thread running this VM. While suspending the thread, we rewrite JIT generated nop with hlt in x86, and then resume the thread. When the resumed thread hits this hlt, it causes fault signal. And we handle this fault in our signal handler. In the signal handler, we throw uncatchable JS exception for VM termination, and VM will be terminated. Since we just execute nop in an usual optimizing JIT code, we do not need to introduce runtime cost for this feature.

WTF::ThreadSpecific

WTF::ThreadSpecific<> offers the portable abstraction of TLS in POSIX and Windows. TLS is a storage to put per-thread data. It is good to achieve high performance in multi-threaded environment since accessing TLS’s data does not require synchronization. WTF::ThreadSpecific<> uses pthread_get_specific and pthread_set_specific in POSIX environment. In Windows, we use fiber local storage (FLS) under the hood.

One interesting feature of TLS is FAST_TLS in Darwin. It is a platform-provided system-reserved TLS slot like __PTK_FRAMEWORK_JAVASCRIPTCORE_KEY3. You can check them in system library header. These slots are intended to be used for system libraries. For these slots, you can use _pthread_getspecific_direct(KEY) and it is compiled to code accessing memory segment register and offset, it is quite fast. It is nice example of co-designing platform and system libraries.

Summary

Webkit has WTF utility library, and it offers various fancy threading primitives. As we encourage more parallelism in WebKit, we will add more features to WTF.

This stop the world functionality is also used to implement sampling profilers in JSC. One sampling profiler thread periodically stops the JS VM thread, retrieves execution context data including stack traces, and resumes the thread. ↩

Constellation Scorpius

Technical notes