next up previous contents index
Next: 6.4 Free Pages Up: 6. Physical Page Allocation Previous: 6.2 Managing Free Blocks   Contents   Index


6.3 Allocating Pages



\includegraphics[width=10cm]{graphs/alloc_pages.ps}
Figure: Call Graph: alloc_pages


For allocation, the buddy system works by rounding requests for a number of pages up to the nearest power of two number of pages which is referred to as the order order of the allocation. If a free block can not be found of the requested order, a higher order block is split into two buddies. One is allocated and the other is placed on the free list for the lower order. Figure 6.3 shows where a $2^4$ block is split and how the buddies are added to the free lists until a block for the process is available. When the block is later freed, the buddy will be checked. If both are free, they are merged to form a higher order block and placed on the higher free list where its buddy is checked and so on. If the buddy is not free, the freed block is added to the free list at the current order. During these list manipulations, interrupts have to be disabled to prevent an interrupt handler manipulating the lists while a process has them in an inconsistent state. This is achieved by using an interrupt safe spinlock.

The second decision is for which node to use. Linux uses a node-local allocation policy which states the memory bank associated with the running CPU is used for allocating pages. Here, the function _alloc_pages() is what is important. This function is different depending on whether a UMA (function in mm/page_alloc.c) or NUMA (function in mm/numa.c) architecture is in use.

No matter which API is used, they all will use __alloc_pages() in mm/page_alloc.c for all the real work and it is never called directly, see Figure 6.2 for the call graph. This function selects which zone to allocate from. It starts with the requested zone but will fall back to other zones if absolutely necessary. What zones to fall back on are decided at boot time by the function build_zonelists() but generally HIGHMEM will fall back to NORMAL and that in turn will fall back to DMA. If number of free pages reaches the pages_low watermark, it will wake kswapd() to begin freeing up pages from zones and if memory is extremely tight, the caller will do the work of kswapd itself.



\includegraphics[width=10cm]{graphs/buddy_allocation.ps}
Figure: Allocating physical pages


The function rmqueue() is what allocates the block of pages or splits higher level blocks if one of the appropriate size is not available.


next up previous contents index
Next: 6.4 Free Pages Up: 6. Physical Page Allocation Previous: 6.2 Managing Free Blocks   Contents   Index
Mel 2003-01-14