69 lines
3.5 KiB
Plaintext
69 lines
3.5 KiB
Plaintext
Multithreading support in memcached
|
|
|
|
OVERVIEW
|
|
|
|
By default, memcached is compiled as a single-threaded application. This is
|
|
the most CPU-efficient mode of operation, and it is appropriate for memcached
|
|
instances running on single-processor servers or whose request volume is
|
|
low enough that available CPU power is not a bottleneck.
|
|
|
|
More heavily-used memcached instances can benefit from multithreaded mode.
|
|
To enable it, use the "--enable-threads" option to the configure script:
|
|
|
|
./configure --enable-threads
|
|
|
|
You must have the POSIX thread functions (pthread_*) on your system in order
|
|
to use memcached's multithreaded mode.
|
|
|
|
Once you have a thread-capable memcached executable, you can control the
|
|
number of threads using the "-t" option; the default is 4. On a machine
|
|
that's dedicated to memcached, you will typically want one thread per
|
|
processor core. Due to memcached's nonblocking architecture, there is no
|
|
real advantage to using more threads than the number of CPUs on the machine;
|
|
doing so will increase lock contention and is likely to degrade performance.
|
|
|
|
|
|
INTERNALS
|
|
|
|
The threading support is mostly implemented as a series of wrapper functions
|
|
that protect calls to underlying code with one of a small number of locks.
|
|
In single-threaded mode, the wrappers are replaced with direct invocations
|
|
of the target code using #define; that is done in memcached.h. This approach
|
|
allows memcached to be compiled in either single- or multi-threaded mode.
|
|
|
|
Each thread has its own instance of libevent ("base" in libevent terminology).
|
|
The only direct interaction between threads is for new connections. One of
|
|
the threads handles the TCP listen socket; each new connection is passed to
|
|
a different thread on a round-robin basis. After that, each thread operates
|
|
on its set of connections as if it were running in single-threaded mode,
|
|
using libevent to manage nonblocking I/O as usual.
|
|
|
|
UDP requests are a bit different, since there is only one UDP socket that's
|
|
shared by all clients. The UDP socket is monitored by all of the threads.
|
|
When a datagram comes in, all the threads that aren't already processing
|
|
another request will receive "socket readable" callbacks from libevent.
|
|
Only one thread will successfully read the request; the others will go back
|
|
to sleep or, in the case of a very busy server, will read whatever other
|
|
UDP requests are waiting in the socket buffer. Note that in the case of
|
|
moderately busy servers, this results in increased CPU consumption since
|
|
threads will constantly wake up and find no input waiting for them. But
|
|
short of much more major surgery on the I/O code, this is not easy to avoid.
|
|
|
|
|
|
TO DO
|
|
|
|
The locking is currently very coarse-grained. There is, for example, one
|
|
lock that protects all the calls to the hashtable-related functions. Since
|
|
memcached spends much of its CPU time on command parsing and response
|
|
assembly, rather than managing the hashtable per se, this is not a huge
|
|
bottleneck for small numbers of processors. However, the locking will likely
|
|
have to be refined in the event that memcached needs to run well on
|
|
massively-parallel machines.
|
|
|
|
One cheap optimization to reduce contention on that lock: move the hash value
|
|
computation so it occurs before the lock is obtained whenever possible.
|
|
Right now the hash is performed at the lowest levels of the functions in
|
|
assoc.c. If instead it was computed in memcached.c, then passed along with
|
|
the key and length into the items.c code and down into assoc.c, that would
|
|
reduce the amount of time each thread needs to keep the hashtable lock held.
|