next up previous contents index
Next: 8.3 Objects Up: 8. Slab Allocator Previous: 8.1 Caches   Contents   Index

Subsections


8.2 Slabs

This section will describe how a slab is structured and managed. The struct which describes it is much simpler than the cache descriptor, but how the slab is arranged is considerably more complex. We begin with the descriptor.

typedef struct slab_s {
        struct list_head        list;
        unsigned long           colouroff;
        void                    *s_mem;
        unsigned int            inuse;
        kmem_bufctl_t           free;
} slab_t;

list is the list the slab belongs to. One of slab_full, slab_partial and slab_free from the cache manager

colouroff is the colour offset from the base address of the first object within the slab. The address of the first object is s_mem + colouroff .

s_mem is the starting address of the first object within the slab

inuse gives the number of active objects in the slab

free This is an array of bufctl's used for storing locations of free objects. See Section 8.2.3

The reader will note that given the slab manager or an object within the slab, there does not appear to be an obvious way to determine what slab or cache they belong to. This is addressed by using the list field in the struct page that makes up the cache. SET_PAGE_CACHE() and SET_PAGE_SLAB() use next and prev on the page list to track what cache and slab an object belongs to. To get the descriptors from the page, the macros GET_PAGE_CACHE() and GET_PAGE_SLAB() are available. This set of relationships is illustrated in Figure 8.6



\includegraphics[]{graphs/pageslabcache.ps}
[1]
Figure: Page to Cache and Slab Relationship
[2]


The last issue is where the slab management struct is kept. Slab managers are kept either on (CFLGS_OFF_SLAB set in the static flags) or off-slab. Where they are placed are determined by the size of the object during cache creation.


8.2.1 Storing the Slab Descriptor

If the objects are larger than a threshold (512 bytes on x86), the CFGS_OFF_SLAB is set in the cache flags and the slab descriptor or manager is kept off-slab in one of the sizes cache (See Section 8.4) that is large enough to contain the struct is selected and kmem_cache_slabmgmt() allocates from it as necessary. This limits the number of objects that can be stored on the slab because there is limited space for the bufctl's but that is unimportant as the objects are large and there should not be many in the slab.



\includegraphics[width=17cm]{graphs/slablayout_onslab.ps}
[1]
Figure: Slab With Descriptor On-Slab
[2]


Alternatively, the slab manger is kept at the beginning of the slab. When stored on-slab, enough space is kept at the beginning of the slab to store both the slab_t and the kmem_bufctl_t array. The array is responsible for tracking where the next free object is stored and is discussed later in the chapter. The objects are stored after the kmem_bufctl_t array.

Figure 8.7 should help clarify what a slab with the descriptor on-slab looks like and Figure 8.8 illustrates how a cache uses a sizes cache to store the slab descriptor when the descriptor is kept off-slab.



\includegraphics[width=17cm]{graphs/slablayout_offslab.ps}
[1]
Figure: Slab With Descriptor Off-Slab
[2]



8.2.2 Slab Creation



\includegraphics[width=17cm]{graphs/kmem_cache_grow.ps}
[1]
Figure: Call Graph: kmem_cache_grow
[2]


At this point, we have seen how the cache is created, but on creation, it is an empty cache with empty lists for its slab_full, slab_partial and slabs_free. A cache is grown with the function kmem_cache_grow() when no objects are left in the slabs_partial list and there is no slabs in slabs_free. The tasks it fulfills are


8.2.3 Tracking Free Objects

The slab allocator has to have a quick and simple way of tracking where free objects are on the partially filled slabs. It achieves this using a kmem_bufctl_t array that is associated with each slab manager as obviously it is up to the slab manager to know where its free objects are.

Historically, and according to the paper describing the slab allocator paper [#!bonwick94!#], kmem_bufctl_t was a linked list of objects. In Linux 2.2.x, this struct was a union of three items, a pointer to the next free object, a pointer to the slab manager and a pointer to the object. Which it was depended on the state of the object.

Today, the slab and cache an object belongs to is determined by the struct page and the kmem_bufctl_t is simply an integer array of object indices. The number of elements in the array is the same as the number of objects on the slab.

141 typedef unsigned int kmem_bufctl_t;

As the array is kept after the slab descriptor and there is no pointer to the first element directly, a helper macro slab_bufctl() is provided.

163 #define slab_bufctl(slabp) \
164         ((kmem_bufctl_t *)(((slab_t*)slabp)+1))

This seemingly cryptic macro is quite simple when broken down. The parameter slabp is to the slab manager. The block ((slab_t*)slabp)+1 casts slabp to a slab_t struct and adds 1 to it. This will give a slab_t * pointer to the beginning of the kmem_bufctl_t array. (kmem_bufctl_t *) recasts that pointer back to the required type. The results in blocks of code that contain slab_bufctl(slabp)[i]. Translated that says, take a pointer to a slab descriptor, offset it with slab_bufctl to the beginning of the kmem_bufctl_t array and give the i${th}$ element of the array.

The index to the next free object in the slab is stored in slab_t$\rightarrow$free eliminating the need for a linked list to track free objects. When objects are allocated or freed, this pointer is updated based on information in the kmem_bufctl_t array.

8.2.4 Initialising the kmem_bufctl_t Array

When a cache is grown, all the objects and the kmem_bufctl_t array on the slab are initialized. The array is filled with the index of each object beginning with 1 and ending with the marker BUFCTL_END. For a slab with 5 objects, the elements of the array would look like Figure 8.10



\includegraphics[]{graphs/bufctl_init.ps}
[1]
Figure: initialized kmem_bufctl_t Array
[2]


The value 0 is stored in slab_t$\rightarrow$free as the 0${th}$ object is the first free object to be used. The idea is that for a given object n, the index of the next free object will be stored in kmem_bufctl_t[n]. Looking at the array above, the next object free after 0 is 1. After 1, there is two and so on. As the array is used, this arrangement will make the array act as a LIFO for free objects.

8.2.5 Finding the Next Free Object

kmem_cache_alloc() is the function which allocates an object by calling kmem_cache_alloc_one_tail() which does the ``real'' work and updates the kmem_bufctl_t() array.

slab_t$\rightarrow$free has the index of the first free object. The index of the next free object is at kmem_bufctl_t[slab_t$\rightarrow$free]. In code terms, this looks like

1253         objp = slabp->s_mem + slabp->free*cachep->objsize;
1254         slabp->free=slab_bufctl(slabp)[slabp->free];

slabp$\rightarrow$s_mem is the index of the first object on the slab. slabp$\rightarrow$free is the index of the object to allocate and it has to be multiplied by the size of an object.

The index of the next free object to allocate is stored at kmem_bufctl_t[slabp$\rightarrow$free]. There is no pointer directly to the array hence the helper macro slab_bufctl() is used. Note that the kmem_bufctl_t array is not changed during allocations but that the elements that are unallocated are unreachable. For example, after two allocations, index 0 and 1 of the kmem_bufctl_t array are not pointed to by any other element.

8.2.6 Updating kmem_bufctl_t

The kmem_bufctl_t list is only updated when an object is freed in the function kmem_cache_free_one(). The array is updated with this block of code

1451                 unsigned int objnr = (objp-slabp->s_mem)/cachep->objsize;
1452 
1453                 slab_bufctl(slabp)[objnr] = slabp->free;
1454                 slabp->free = objnr;

objp is the object about to be freed and objnr is its index. kmem_bufctl_t[objnr] is updated to pointer to the current value of slabp$\rightarrow$free effectively placing the object pointed to by free on the pseudo linked list. slabp$\rightarrow$free is updated to the object been freed so that it will be the next one allocated.


8.2.7 Calculating the Number of Objects on a Slab

During cache creation, the function kmem_cache_estimate() is called to estimate how many objects may be stored on a single slab taking into account whether the slab descriptor must be stored on or off-slab and the size of each kmem_bufctl_t needed to track if an object is free or not. It returns the number of objects that may be stored and how many bytes are wasted. The number of wasted bytes is important if cache colouring is to be used.

The calculation is quite basic and takes the following steps


8.2.8 Slab Destroying

When a cache is been shrunk or destroyed, the slabs will be deleted. As the objects may have destructors, they must be called so the tasks of this function are

The call graph at Figure 8.11 is very simple.



\includegraphics[]{graphs/kmem_slab_destroy.ps}
[1]
Figure: Call Graph: kmem_slab_destroy
[2]



next up previous contents index
Next: 8.3 Objects Up: 8. Slab Allocator Previous: 8.1 Caches   Contents   Index
Mel 2003-01-14