This section will describe how a slab is structured and managed. The struct which describes it is much simpler than the cache descriptor, but how the slab is arranged is considerably more complex. We begin with the descriptor.
typedef struct slab_s { struct list_head list; unsigned long colouroff; void *s_mem; unsigned int inuse; kmem_bufctl_t free; } slab_t;
The reader will note that given the slab manager or an object within the slab, there does not appear to be an obvious way to determine what slab or cache they belong to. This is addressed by using the list field in the struct page that makes up the cache. SET_PAGE_CACHE() and SET_PAGE_SLAB() use next and prev on the page list to track what cache and slab an object belongs to. To get the descriptors from the page, the macros GET_PAGE_CACHE() and GET_PAGE_SLAB() are available. This set of relationships is illustrated in Figure 8.6
The last issue is where the slab management struct is kept. Slab managers are kept either on (CFLGS_OFF_SLAB set in the static flags) or off-slab. Where they are placed are determined by the size of the object during cache creation.
If the objects are larger than a threshold (512 bytes on x86), the CFGS_OFF_SLAB is set in the cache flags and the slab descriptor or manager is kept off-slab in one of the sizes cache (See Section 8.4) that is large enough to contain the struct is selected and kmem_cache_slabmgmt() allocates from it as necessary. This limits the number of objects that can be stored on the slab because there is limited space for the bufctl's but that is unimportant as the objects are large and there should not be many in the slab.
Alternatively, the slab manger is kept at the beginning of the slab. When stored on-slab, enough space is kept at the beginning of the slab to store both the slab_t and the kmem_bufctl_t array. The array is responsible for tracking where the next free object is stored and is discussed later in the chapter. The objects are stored after the kmem_bufctl_t array.
Figure 8.7 should help clarify what a slab with the descriptor on-slab looks like and Figure 8.8 illustrates how a cache uses a sizes cache to store the slab descriptor when the descriptor is kept off-slab.
At this point, we have seen how the cache is created, but on creation, it is an empty cache with empty lists for its slab_full, slab_partial and slabs_free. A cache is grown with the function kmem_cache_grow() when no objects are left in the slabs_partial list and there is no slabs in slabs_free. The tasks it fulfills are
The slab allocator has to have a quick and simple way of tracking where free objects are on the partially filled slabs. It achieves this using a kmem_bufctl_t array that is associated with each slab manager as obviously it is up to the slab manager to know where its free objects are.
Historically, and according to the paper describing the slab allocator paper [#!bonwick94!#], kmem_bufctl_t was a linked list of objects. In Linux 2.2.x, this struct was a union of three items, a pointer to the next free object, a pointer to the slab manager and a pointer to the object. Which it was depended on the state of the object.
Today, the slab and cache an object belongs to is determined by the struct page and the kmem_bufctl_t is simply an integer array of object indices. The number of elements in the array is the same as the number of objects on the slab.
141 typedef unsigned int kmem_bufctl_t;
As the array is kept after the slab descriptor and there is no pointer to the first element directly, a helper macro slab_bufctl() is provided.
163 #define slab_bufctl(slabp) \ 164 ((kmem_bufctl_t *)(((slab_t*)slabp)+1))
This seemingly cryptic macro is quite simple when broken down. The parameter
slabp is to the slab manager. The block ((slab_t*)slabp)+1
casts slabp to a slab_t struct and adds 1 to it. This will give a
slab_t * pointer to the beginning of the kmem_bufctl_t
array. (kmem_bufctl_t *) recasts that pointer back
to the required type. The results in blocks of code that contain
slab_bufctl(slabp)[i]. Translated that says, take a pointer
to a slab descriptor, offset it with slab_bufctl to the beginning of the
kmem_bufctl_t array and give the i element of the array.
The index to the next free object in the slab is stored in
slab_tfree eliminating the need for a linked list
to track free objects. When objects are allocated or freed, this pointer
is updated based on information in the kmem_bufctl_t array.
When a cache is grown, all the objects and the kmem_bufctl_t array on the slab are initialized. The array is filled with the index of each object beginning with 1 and ending with the marker BUFCTL_END. For a slab with 5 objects, the elements of the array would look like Figure 8.10
The value 0 is stored in slab_tfree as the 0
object is the first free object to be used. The idea is that for a given
object n, the index of the next free object will be stored in
kmem_bufctl_t[n]. Looking at the array above, the next object free after 0
is 1. After 1, there is two and so on. As the array is used, this arrangement
will make the array act as a LIFO for free objects.
kmem_cache_alloc() is the function which allocates an object by calling kmem_cache_alloc_one_tail() which does the ``real'' work and updates the kmem_bufctl_t() array.
slab_tfree has the index of the
first free object. The index of the next free object is at
kmem_bufctl_t[slab_t
free]. In code terms, this
looks like
1253 objp = slabp->s_mem + slabp->free*cachep->objsize; 1254 slabp->free=slab_bufctl(slabp)[slabp->free];
slabps_mem is the index of the first object on the
slab. slabp
free is the index of the object to allocate
and it has to be multiplied by the size of an object.
The index of the next free object to allocate is stored at
kmem_bufctl_t[slabpfree]. There is no pointer directly
to the array hence the helper macro slab_bufctl() is used. Note
that the kmem_bufctl_t array is not changed during allocations
but that the elements that are unallocated are unreachable. For example,
after two allocations, index 0 and 1 of the kmem_bufctl_t
array are not pointed to by any other element.
The kmem_bufctl_t list is only updated when an object is freed in the function kmem_cache_free_one(). The array is updated with this block of code
1451 unsigned int objnr = (objp-slabp->s_mem)/cachep->objsize; 1452 1453 slab_bufctl(slabp)[objnr] = slabp->free; 1454 slabp->free = objnr;
objp is the object about to be freed and objnr is its index.
kmem_bufctl_t[objnr] is updated to pointer to the current value
of slabpfree effectively placing the object pointed to
by free on the pseudo linked list. slabp
free is updated
to the object been freed so that it will be the next one allocated.
During cache creation, the function kmem_cache_estimate() is called to estimate how many objects may be stored on a single slab taking into account whether the slab descriptor must be stored on or off-slab and the size of each kmem_bufctl_t needed to track if an object is free or not. It returns the number of objects that may be stored and how many bytes are wasted. The number of wasted bytes is important if cache colouring is to be used.
The calculation is quite basic and takes the following steps
When a cache is been shrunk or destroyed, the slabs will be deleted. As the objects may have destructors, they must be called so the tasks of this function are
The call graph at Figure 8.11 is very simple.