For allocation, the buddy system works by rounding requests for a number of
pages up to the nearest power of two number of pages which is referred to as
the order order of the allocation. If a free block can not be found of
the requested order, a higher order block is split into two buddies. One
is allocated and the other is placed on the free list for the lower order.
Figure 6.3 shows where a block is
split and how the buddies are added to the free lists until a block for
the process is available. When the block is later freed, the buddy will be
checked. If both are free, they are merged to form a higher order block and
placed on the higher free list where its buddy is checked and so on. If the
buddy is not free, the freed block is added to the free list at the current
order. During these list manipulations, interrupts have to be disabled to
prevent an interrupt handler manipulating the lists while a process has them
in an inconsistent state. This is achieved by using an interrupt safe spinlock.
The second decision is for which node to use. Linux uses a node-local allocation policy which states the memory bank associated with the running CPU is used for allocating pages. Here, the function _alloc_pages() is what is important. This function is different depending on whether a UMA (function in mm/page_alloc.c) or NUMA (function in mm/numa.c) architecture is in use.
No matter which API is used, they all will use __alloc_pages() in mm/page_alloc.c for all the real work and it is never called directly, see Figure 6.2 for the call graph. This function selects which zone to allocate from. It starts with the requested zone but will fall back to other zones if absolutely necessary. What zones to fall back on are decided at boot time by the function build_zonelists() but generally HIGHMEM will fall back to NORMAL and that in turn will fall back to DMA. If number of free pages reaches the pages_low watermark, it will wake kswapd() to begin freeing up pages from zones and if memory is extremely tight, the caller will do the work of kswapd itself.
The function rmqueue() is what allocates the block of pages or splits higher level blocks if one of the appropriate size is not available.