The nice overview of threading primitives in Chromium is posted in Chromium Advent Calendar. As Chromium/Blink maintain their own threading primitives, threading primitives in WebKit are also largely changed from the fork. In this post, I introduce threading primitives in WebKit briefly.
In old days, we have
ThreadIdentifier, which is actually
uint32_t identifier to a thread.
We have an internal hash table between
createThread(String name, Function) -> ThreadIdentifier
waitForCompletion(ThreadIdentifier) -> bool
However, this design has several problems.
First, this has very limited extensibility: attaching various additional information to thread is not as easy as we extend class.
Second, this interface requires looking up the corresponding thread handle from the hash table every time we call threading operations.
Finally, and the worst problem is that we cannot know whether the given
ThreadIdentifier is used.
The following code explains the above problem.
We cannot manage the lifetime of the holder of the thread (in the example,
This means that
ThreadIdentifier must be monotonically increasing.
1 2 3 4 5
Thread class in
WTF (of course, WTF stands for web template framework) is a brand new abstraction over native threads to solve the above problems.
This is ref-counted
Thread::create(...) -> Ref<Thread> creates it and it offers threading operations as its member functions.
join function in WebKit threading.
1 2 3 4 5 6 7 8 9
Thread class is portable. It just works (TM) on macOS, Linux (and UNIX environments including FreeBSD), and Windows.
It is important to build advanced features on the top of this
Thread has one on one correspondence between a native thread and
This ref-counted object is held in thread local storage (TLS) and retained while
Thread is running.
We can get the current thread from TLS by calling
Thread::current() -> Thread&.
So, for example, checking whether the given
Thread is the current one is done by
thread == &Thread::current() pointer comparison.
And user of this thread can retain
Thread to perform threading operations onto it.
Thread is ref-counted,
Thread is destroyed when nobody retains it.
When (1) no users retain this thread and (2) the thread itself finishes,
Thread will be destructed.
Thread, (1) we can easily attach any information to
Thread as we want.
Moreover, (2) we can manage the lifetime of the holder of the thread by ref-counted
We can destroy
Thread when it is no longer used.
ThreadIdentifier, we were not able to recycle unused ThreadIdentifier since it may be used in some places.
Thread::current() works even if we call it from non-WebKit-created threads (a.k.a. external threads).
At that time,
Thread is created and stored in TLS. If the thread finishes, this TLS and held
Thread are automatically destroyed.
Advanced features of Thread
One of the interesting aspect of our
Thread is that it has bunch of advanced features that are not typically offered in standard libraries like C++
This is derived from the fact that our
Thread offers advanced features that is necessary for JSC.
Thread::suspend() -> Expected<void, PlatformSuspendError> is platform independent way to suspend the thread.
This is portable (working in macOS, Linux, and Windows) and used for garbage collection (GC)’s stop the world1.
There is a list of such advanced features in our
Thread ;). They are building block of our GC in JSC.
Thread::suspend() -> Expected<void, PlatformSuspendError>
Thread::resume() -> void
Thread::getRegisters(PlatformRegisters&) -> size_t
Thread::stack() -> const StackBounds&
In POSIX environment, we have further features.
Thread::signal(int) -> bool
While macOS and Windows have platform APIs to suspend and resume threads, Linux does not have such one. We implement it by using POSIX signal and semaphore, which is typical way to implement stop-the-world GC operation.
I do not say much about locking in this post since here is very nice blog post in webkit.org.
Grouping live threads is useful. Consider multi-threaded environment, various threads take a lock of one VM, run JS, and release the lock. When GC happens, conservative GC would like to scan the stack and registers of live threads that touch this VM before.
ThreadGroup offers exact this feature. We can add
Thread finished its execution,
Thread cooperatively removes itself from added
If you take a lock of
ThreadGroup, all the threads included in this
ThreadGroup is kept alive until the lock is released.
We can iterate live threads in
ThreadGroup and suspend each thread to perform stop-the-world.
While the concept of
ThreadGroup is simple, its implementation is a bit tricky.
Thread can concurrently finish, and be removed from
Any thread can add any
Thread to multiple
ThreadGroups at any time.
If you are interested in the implementation, you can look into the change.
WTF::WorkQueue and WTF::AutomaticThread
WorkQueue is simple abstraction. We can put a task (
Function) to queue, and thread running inside the WorkQueue polls and run the task.
There is also a similar abstraction to
This is very fancy feature used in JSC.
AutomaticThread can poll tasks and run them.
WorkQueue can take any functions as its tasks,
AutomaticThread implements the body of the task in its
virtual member, but semantics is very similar.
The difference is that
Thread is automatically destroyed when
AutomaticThread becomes idle more than 10 seconds.
Reducing threads significantly affects on the memory consumption of the browser.
Outstanding example is
malloc library uses TLS to gain high performance in multi-threaded environment.
For example, various
malloc implementations (including
bmalloc in WebKit) have synchronization-free cache in TLS to speed up the fast case.
This cache remains until the thread is destroyed!
ParallelHelperPool is an interesting thread pool which is intended to be shared by multiple parallel tasks.
We have a task that can be executed in parallel manner e.g. GC’s parallel marking.
We set this task to the pool. And the pool run this task in parallel.
As is noted in the code, this abstraction is suitable for the use case: there are multiple concurrent tasks that may all want parallelism.
Since threads are managed by
AutomaticThread, threads will be automatically destroyed if the pool is not used.
Currently it is used for GC’s parallel marking (And recently, this thread also performs parallel marking constraint solving).
ThreadMessage is a feature executing lambda while suspending a specified thread.
It is constructed on
1 2 3
resume are portable, this
ThreadMessage feature is also portable.
sendMessage, we can modify some data while suspending a thread.
In WebKit, it is used to insert trap in running VM (called VM trap). One example is terminating a running VM without introducing runtime cost. You may see a dialog like “Your JS takes too much time. Do you want to stop it?”.
In JSC, we have
check_trap bytecode. In optimizing compiler, it just emits
When we would like to terminating a running VM, we
sendMessage to a thread running this VM.
While suspending the thread, we rewrite JIT generated
hlt in x86, and then resume the thread.
When the resumed thread hits this
hlt, it causes fault signal. And we handle this fault in our signal handler.
In the signal handler, we throw uncatchable JS exception for VM termination, and VM will be terminated.
Since we just execute
nop in an usual optimizing JIT code, we do not need to introduce runtime cost for this feature.
WTF::ThreadSpecific<> offers the portable abstraction of TLS in POSIX and Windows.
TLS is a storage to put per-thread data. It is good to achieve high performance in multi-threaded environment since accessing TLS’s data does not require synchronization.
pthread_set_specific in POSIX environment.
In Windows, we use fiber local storage (FLS) under the hood.
One interesting feature of TLS is
FAST_TLS in Darwin.
It is a platform-provided system-reserved TLS slot like
You can check them in system library header.
These slots are intended to be used for system libraries.
For these slots, you can use
_pthread_getspecific_direct(KEY) and it is compiled to code accessing memory segment register and offset, it is quite fast.
It is nice example of co-designing platform and system libraries.
WTF utility library, and it offers various fancy threading primitives.
As we encourage more parallelism in WebKit, we will add more features to
This stop the world functionality is also used to implement sampling profilers in JSC. One sampling profiler thread periodically stops the JS VM thread, retrieves execution context data including stack traces, and resumes the thread. ↩