[wsf-c-dev] Memory management
James Clark
james at wso2.com
Tue Dec 12 07:55:19 PST 2006
At the moment all our memory management is low-level using functions
with malloc/free semantics. The main problem with this is that it's
inconvenient and difficult to ensure that memory always gets cleaned up.
It's especially difficult to ensure that memory gets cleaned up reliably
on all error paths: it would be a huge amount of work to build a set of
tests that gave coverage of all of this.
I can see two kinds of solution:
a) C++ destructors (and overloading operator= etc)
b) Apache-style pools
With (a), the main advantage is that it provides a very natural,
easy-to-use way of managing memory. On the other hand it has some
disadvantages:
- it forces the use of C++
- it may not be very convenient for people using the API from C
- it means we can't use exceptions, not even if we roll our own using
setjmp/longjmp, because getting the destructors called when an exception
is thrown needs runtime support which will cause us problems as
discussed in my previous message on C++
- we would end up using reference counting (managed using constructors
and destructors) to handle shared data structures. Heavy use of
reference counting is not so good from a performance point of view
especially in a SMP situation. Most CPU architectures support atomic
instructions that you can use to modify the reference counts safely
without using an operating system mutex, but it's still relatively slow.
Particularly, it will be bad when we have multiply threads accessing
shared data (such as configuration data) read-only. Modifying reference
counts will mean that you can end up with the same cache-line being
modified simultaneously on different cores, which slows down the cache a
lot.
So on the whole I think (a) is not a good approach, which brings us to
(b). As you probably know, the idea of Apache pools is that you
allocate memory from a pool, and instead of freeing individually freeing
each allocated chunk of memory, you free the pool, which frees all
chunks of memory allocated using the pool. Note that it's still
possible to use C++ with this: you use an overloaded operator new with a
pool argument (new (pool) Foo), and a empty operator delete. This is
not just an Apache thing, it's a well-known, widely used technique, with
a significant amount of CS literature about it (it's called "regions"
rather than "pools" in the literature).
One major issue with (b) is how to make it possible to process large
messages without using memory proportional to the size of the message.
I can think of two solutions:
(i) Use sub-pools. For any pool, you can create a sub-pool and allocate
from that. A sub-pool can be freed explicitly, but if it's not, then it
gets freed along with its parent pool. You then process a large message
by dividing it into chunks that can be processed separately, and
processing each such chunk in a newly allocated sub-pool, which you free
when you are done that chunk.
(ii) Extend the pool interface to allow freeing individual chunks in the
pool in addition to freeing the entire pool. There's a good paper on
this approach here:
http://www.cs.umass.edu/~emery/pubs/berger-oopsla2002.pdf
It's quite a persuasive paper, but I'm not convinced. In particular, it
doesn't evaluate the relative merits of this approach compared to (i)
(which is really doing a very similar thing).
On the whole I'm inclined to go for (i), because it's a lot easier to
implement efficiently. If we find we need it, we can later extend the
implementation to support an individual free operation.
The next question is what do we associate pools with. The key
requirement is that a pool has to be associated with something that has
an inherent lifetime, so that when that thing's lifetime is over, the
pool can automatically be freed. I think the main thing to associate a
pool with is the fiber (logical thread of control), which I discussed in
my message about Concurrency. Typically such a fiber would be started
by the first message in a MEP, and the fiber would end when the MEP was
over. So it's fairly similar to having the fiber be associated with the
operation. One difference is in the middle-tier scenario. If in the
course of responding to a request, a server needs to invoke other web
services, then those invocations would be part of the same fiber and
thus use the same pool, which would allow information efficiently to be
copied from the incoming response from the backend to the outgoing
response to the client.
The final question is how do pools relate to the environment object
currently used in Axis. At first I thought it would just be a matter of
putting a pointer to a pool in the environment in place of the current
allocator object. Unfortunately I don't think that is going to work
well, when you have multiple pools involved. Let's suppose
- there's an object S allocated from the pool associated with the
service context
- the current environment's pool is the fiber's pool
- we want to call a method foo on S to, say, set a property
S.foo() may need to allocate memory (for example to grow a hash-table).
That memory needs to be allocated from the service context pool. If we
just pass the current environment to S.foo() and S passes that on to the
hash-table add function, then the memory would end up getting allocated
from the fiber's pool, leading to memory corruption. It doesn't work
for S to store an environment when it's created and then pass that to
the hash-table addition function, because the error handling stuff (in
particular the pointer to the jmp_buf that we would longjmp to on an
exception), needs to come from the current environment. What would need
to happen is that when S is created, it would have to extract the pool
from the environment and store it. Then when S.foo() is called, it
would have to create a new environment, when is the same as the current
environment but with the pool replaced by the stored pool, and then pass
this new environment to the hash-table function, and then release this
new environment before it returns. Blech! Overall I think the approach
of putting the pool in the environment would be awkward and bug-prone.
The pool needs to be separate from the environment because the
environment is determined by the thread but the pool isn't. Also every
function needs an env, but not every function needs a pool. For example,
in the above case, S would get a pool when it was created which it
should store (or which the hash-table would store). S.foo() wouldn't be
passed a pool, so there would be only one pool around (the one that S
stored on creation) and thus there would be little danger of foo() using
the wrong the pool to grow its hash-table.
One approach is for every function to have an env argument, and for any
function that needs to allocate also to have a pool argument. Having
two boilerplate arguments for many functions would be a bit much. I
think a better approach would be to pass the pool as an argument where
it's needed, but to put the env in thread-local storage. Thread-local
storage has some disadvantages:
- it may be much slower than accessing memory from an argument; in our
case, this doesn't matter because the only bit of the environment which
needs to be accessed fast is the allocator, and that is now moved into
the pool
- it's not portable, in the sense that it requires OS-support beyond
what ANSI C provides. However, as a practical matter, all major
operating systems do support thread-local storage. For example, POSIX
has pthread_key_create, Windows has TlsAlloc. In the worst case, you
can have a hash-table keyed to the current thread id. And in the
single-threaded case, thread-local storage can be implemented as a
static storage.
Thus so long as use of thread-local storage is properly encapsulated and
isn't used in a performance critical way, I don't think it will be a
problem. On the contrary, it will make the code more pleasant to work
with, and will actually improve performance (because you won't waste
time passing the env around from function to function).
The downside is that it would be more work to convert the current Axis2
code-base over to such a scheme. But I think it's doable:
- the first stage would be to globally replace the env argument by a
pool argument, and put the env in thread-local storage; this would be a
big change but should be fairly mechanical
- the second stage is to convert objects that may not be allocated from
the per-fiber pool so that they store a pool when they are created
instead of always getting the pool as an argument. Initially this can be
done just for objects that live longer than the per-fiber pool (you've
probably done this already with the httpd pool patches). Then,
over-time, we can convert other objects and introduce sub-pools to allow
for the processing of large messages.
James
More information about the Wsf-c-dev
mailing list