Using Multiple IOThread
s
This document explains the IOThread
feature and how to write code that runs
outside the BQL.
The main loop and IOThread
s
QEMU is an event-driven program that can do several things at once using an event loop. The VNC server and the QMP monitor are both processed from the same event loop, which monitors their file descriptors until they become readable and then invokes a callback.
The default event loop is called the main loop (see main-loop.c
). It is
possible to create additional event loop threads using
-object iothread,id=my-iothread
.
Side note: The main loop and IOThread
are both event loops but their code is
not shared completely. Sometimes it is useful to remember that although they
are conceptually similar they are currently not interchangeable.
Why IOThread
s are useful
IOThread
s allow the user to control the placement of work. The main loop is a
scalability bottleneck on hosts with many CPUs. Work can be spread across
several IOThread
s instead of just one main loop. When set up correctly this
can improve I/O latency and reduce jitter seen by the guest.
The main loop is also deeply associated with the BQL, which is a scalability bottleneck in itself. vCPU threads and the main loop use the BQL to serialize execution of QEMU code. This mutex is necessary because a lot of QEMU’s code historically was not thread-safe.
The fact that all I/O processing is done in a single main loop and that the
BQL is contended by all vCPU threads and the main loop explain
why it is desirable to place work into IOThread
s.
The experimental virtio-blk
data-plane implementation has been benchmarked and
shows these effects:
ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf
How to program for IOThread
s
The main difference between legacy code and new code that can run in an
IOThread
is dealing explicitly with the event loop object, AioContext
(see include/block/aio.h
). Code that only works in the main loop
implicitly uses the main loop’s AioContext
. Code that supports running
in IOThread
s must be aware of its AioContext
.
- AioContext supports the following services:
File descriptor monitoring (read/write/error on POSIX hosts)
Event notifiers (inter-thread signalling)
Timers
Bottom Halves (BH) deferred callbacks
- There are several old APIs that use the main loop AioContext:
LEGACY
qemu_aio_set_fd_handler()
- monitor a file descriptorLEGACY
qemu_aio_set_event_notifier()
- monitor an event notifierLEGACY
timer_new_ms()
- create a timerLEGACY
qemu_bh_new()
- create a BHLEGACY
qemu_bh_new_guarded()
- create a BH with a device re-entrancy guardLEGACY
qemu_aio_wait()
- run an event loop iteration
Since they implicitly work on the main loop they cannot be used in code that
runs in an IOThread
. They might cause a crash or deadlock if called from an
IOThread
since the BQL is not held.
- Instead, use the
AioContext
functions directly (seeinclude/block/aio.h
): aio_set_fd_handler()
- monitor a file descriptoraio_set_event_notifier()
- monitor an event notifieraio_timer_new()
- create a timeraio_bh_new()
- create a BHaio_bh_new_guarded()
- create a BH with a device re-entrancy guardaio_poll()
- run an event loop iteration
The qemu_bh_new_guarded
/aio_bh_new_guarded
APIs accept a
MemReentrancyGuard
argument, which is used to check for and prevent re-entrancy problems. For
BHs associated with devices, the reentrancy-guard is contained in the
corresponding DeviceState
and named mem_reentrancy_guard
.
The AioContext
can be obtained from the IOThread
using
iothread_get_aio_context()
or for the main loop using
qemu_get_aio_context()
. Code that takes an AioContext
argument
works both in IOThread
s or the main loop, depending on which AioContext
instance the caller passes in.
How to synchronize with an IOThread
Variables that can be accessed by multiple threads require some form of
synchronization such as qemu_mutex_lock()
, rcu_read_lock()
, etc.
AioContext
functions like aio_set_fd_handler()
,
aio_set_event_notifier()
, aio_bh_new()
, and aio_timer_new()
are thread-safe. They can be used to trigger activity in an IOThread
.
Side note: the best way to schedule a function call across threads is to call
aio_bh_schedule_oneshot()
.
The main loop thread can wait synchronously for a condition using
AIO_WAIT_WHILE()
.
AioContext
and the block layer
The AioContext
originates from the QEMU block layer, even though nowadays
AioContext
is a generic event loop that can be used by any QEMU subsystem.
The block layer has support for AioContext
integrated. Each
BlockDriverState
is associated with an AioContext
using
bdrv_try_change_aio_context()
and bdrv_get_aio_context()
.
This allows block layer code to process I/O inside the
right AioContext
. Other subsystems may wish to follow a similar approach.
Block layer code must therefore expect to run in an IOThread
and avoid using
old APIs that implicitly use the main loop. See
How to program for IOThreads for information on how to do that.
Code running in the monitor typically needs to ensure that past
requests from the guest are completed. When a block device is running
in an IOThread
, the IOThread
can also process requests from the guest
(via ioeventfd). To achieve both objects, wrap the code between
bdrv_drained_begin()
and bdrv_drained_end()
, thus creating a “drained
section”.
Long-running jobs (usually in the form of coroutines) are often scheduled in
the BlockDriverState
’s AioContext
. The functions
bdrv_add
/remove_aio_context_notifier
, or alternatively
blk_add
/remove_aio_context_notifier
if you use BlockBackends
,
can be used to get a notification whenever bdrv_try_change_aio_context()
moves a BlockDriverState
to a different AioContext
.