Appendix F Physical Page Allocation

F.1 Allocating Pages

F.1.1 Function: alloc_pages

Source: include/linux/mm.h

The call graph for this function is shown at 6.3. It is declared as follows;

439 static inline struct page * alloc_pages(unsigned int gfp_mask, 
                                  unsigned int order)
440 {
444       if (order >= MAX_ORDER)
445             return NULL;
446       return _alloc_pages(gfp_mask, order);
447 }

: 439The gfp_mask (Get Free Pages) flags tells the allocator how it may behave. For example GFP_WAIT is not set, the allocator will not block and instead return NULL if memory is tight. The order is the power of two number of pages to allocate
: 444-445A simple debugging check optimized away at compile time
: 446This function is described next

F.1.2 Function: _alloc_pages

Source: mm/page_alloc.c

The function _alloc_pages() comes in two varieties. The first is designed to only work with UMA architectures such as the x86 and is in mm/page_alloc.c. It only refers to the static node contig_page_data. The second is in mm/numa.c and is a simple extension. It uses a node-local allocation policy which means that memory will be allocated from the bank closest to the processor. For the purposes of this book, only the mm/page_alloc.c version will be examined but developers on NUMA architectures should read _alloc_pages() and _alloc_pages_pgdat() as well in mm/numa.c

244 #ifndef CONFIG_DISCONTIGMEM
245 struct page *_alloc_pages(unsigned int gfp_mask, 
                              unsigned int order)
246 {
247     return __alloc_pages(gfp_mask, order,
248       contig_page_data.node_zonelists+(gfp_mask & GFP_ZONEMASK));
249 }
250 #endif

: 244The ifndef is for UMA architectures like the x86. NUMA architectures used the _alloc_pages() function in mm/numa.c which employs a node local policy for allocations
: 245The gfp_mask flags tell the allocator how it may behave. The order is the power of two number of pages to allocate
: 247node_zonelists is an array of preferred fallback zones to allocate from. It is initialised in build_zonelists()(See Section B.1.6) The lower 16 bits of gfp_mask indicate what zone is preferable to allocate from. Applying the bitmask gfp_mask & GFP_ZONEMASK will give the index in node_zonelists we prefer to allocate from.

F.1.3 Function: __alloc_pages

Source: mm/page_alloc.c

At this stage, we've reached what is described as the “heart of the zoned buddy allocator”, the __alloc_pages() function. It is responsible for cycling through the fallback zones and selecting one suitable for the allocation. If memory is tight, it will take some steps to address the problem. It will wake kswapd and if necessary it will do the work of kswapd manually.

327 struct page * __alloc_pages(unsigned int gfp_mask, 
                                unsigned int order,
                                zonelist_t *zonelist)
328 {
329       unsigned long min;
330       zone_t **zone, * classzone;
331       struct page * page;
332       int freed;
333 
334       zone = zonelist->zones;
335       classzone = *zone;
336       if (classzone == NULL)
337             return NULL;
338       min = 1UL << order;
339       for (;;) {
340             zone_t *z = *(zone++);
341             if (!z)
342                   break;
343 
344             min += z->pages_low;
345             if (z->free_pages > min) {
346                   page = rmqueue(z, order);
347                   if (page)
348                         return page;
349             }
350       }

: 334Set zone to be the preferred zone to allocate from
: 335The preferred zone is recorded as the classzone. If one of the pages low watermarks is reached later, the classzone is marked as needing balance
: 336-337An unnecessary sanity check. build_zonelists() would need to be seriously broken for this to happen
: 338-350This style of block appears a number of times in this function. It reads as “cycle through all zones in this fallback list and see can the allocation be satisfied without violating watermarks”. Note that the pages_low for each fallback zone is added together. This is deliberate to reduce the probability a fallback zone will be used.
: 340z is the zone currently been examined. The zone variable is moved to the next fallback zone
: 341-342If this is the last zone in the fallback list, break
: 344Increment the number of pages to be allocated by the watermark for easy comparisons. This happens for each zone in the fallback zones. While this appears first to be a bug, this behavior is actually intended to reduce the probability a fallback zone is used.
: 345-349Allocate the page block if it can be assigned without reaching the pages_min watermark. rmqueue()(See Section F.1.4) is responsible from removing the block of pages from the zone
: 347-348If the pages could be allocated, return a pointer to them

352       classzone->need_balance = 1;
353       mb();
354       if (waitqueue_active(&kswapd_wait))
355             wake_up_interruptible(&kswapd_wait);
356 
357       zone = zonelist->zones;
358       min = 1UL << order;
359       for (;;) {
360             unsigned long local_min;
361             zone_t *z = *(zone++);
362             if (!z)
363                   break;
364 
365             local_min = z->pages_min;
366             if (!(gfp_mask & __GFP_WAIT))
367                   local_min >>= 2;
368             min += local_min;
369             if (z->free_pages > min) {
370                   page = rmqueue(z, order);
371                   if (page)
372                         return page;
373             }
374       }
375

: 352Mark the preferred zone as needing balance. This flag will be read later by kswapd
: 353This is a memory barrier. It ensures that all CPU's will see any changes made to variables before this line of code. This is important because kswapd could be running on a different processor to the memory allocator.
: 354-355Wake up kswapd if it is asleep
: 357-358Begin again with the first preferred zone and min value
: 360-374Cycle through all the zones. This time, allocate the pages if they can be allocated without hitting the pages_min watermark
: 365local_min how low a number of free pages this zone can have
: 366-367If the process can not wait or reschedule (__GFP_WAIT is clear), then allow the zone to be put in further memory pressure than the watermark normally allows

376       /* here we're in the low on memory slow path */
377 
378 rebalance:
379       if (current->flags & (PF_MEMALLOC | PF_MEMDIE)) {
380             zone = zonelist->zones;
381             for (;;) {
382                   zone_t *z = *(zone++);
383                   if (!z)
384                         break;
385 
386                   page = rmqueue(z, order);
387                   if (page)
388                         return page;
389             }
390             return NULL;
391       }

: 378This label is returned to after an attempt is made to synchronusly free pages. From this line on, the low on memory path has been reached. It is likely the process will sleep
: 379-391These two flags are only set by the OOM killer. As the process is trying to kill itself cleanly, allocate the pages if at all possible as it is known they will be freed very soon

393       /* Atomic allocations - we can't balance anything */
394       if (!(gfp_mask & __GFP_WAIT))
395             return NULL;
396 
397       page = balance_classzone(classzone, gfp_mask, order, &freed);
398       if (page)
399             return page;
400 
401       zone = zonelist->zones;
402       min = 1UL << order;
403       for (;;) {
404             zone_t *z = *(zone++);
405             if (!z)
406                   break;
407 
408             min += z->pages_min;
409             if (z->free_pages > min) {
410                   page = rmqueue(z, order);
411                   if (page)
412                         return page;
413             }
414       }
415 
416       /* Don't let big-order allocations loop */
417       if (order > 3)
418             return NULL;
419 
420       /* Yield for kswapd, and try again */
421       yield();
422       goto rebalance;
423 }

: 394-395If the calling process can not sleep, return NULL as the only way to allocate the pages from here involves sleeping
: 397balance_classzone()(See Section F.1.6) performs the work of kswapd in a synchronous fashion. The principal difference is that instead of freeing the memory into a global pool, it is kept for the process using the current→local_pages linked list
: 398-399If a page block of the right order has been freed, return it. Just because this is NULL does not mean an allocation will fail as it could be a higher order of pages that was released
: 403-414This is identical to the block above. Allocate the page blocks if it can be done without hitting the pages_min watermark
: 417-418Satisifing a large allocation like 2⁴ number of pages is difficult. If it has not been satisfied by now, it is better to simply return NULL
: 421Yield the processor to give kswapd a chance to work
: 422Attempt to balance the zones again and allocate

F.1.4 Function: rmqueue

Source: mm/page_alloc.c

This function is called from __alloc_pages(). It is responsible for finding a block of memory large enough to be used for the allocation. If a block of memory of the requested size is not available, it will look for a larger order that may be split into two buddies. The actual splitting is performed by the expand() (See Section F.1.5) function.

198 static FASTCALL(struct page *rmqueue(zone_t *zone, 
                                         unsigned int order));
199 static struct page * rmqueue(zone_t *zone, unsigned int order)
200 {
201       free_area_t * area = zone->free_area + order;
202       unsigned int curr_order = order;
203       struct list_head *head, *curr;
204       unsigned long flags;
205       struct page *page;
206 
207       spin_lock_irqsave(&zone->lock, flags);
208       do {
209             head = &area->free_list;
210             curr = head->next;
211 
212             if (curr != head) {
213                   unsigned int index;
214 
215                   page = list_entry(curr, struct page, list);
216                   if (BAD_RANGE(zone,page))
217                         BUG();
218                   list_del(curr);
219                   index = page - zone->zone_mem_map;
220                   if (curr_order != MAX_ORDER-1)
221                         MARK_USED(index, curr_order, area);
222                   zone->free_pages -= 1UL << order;
223 
224                   page = expand(zone, page, index, order,
                                    curr_order, area);
225                   spin_unlock_irqrestore(&zone->lock, flags);
226 
227                   set_page_count(page, 1);
228                   if (BAD_RANGE(zone,page))
229                         BUG();
230                   if (PageLRU(page))
231                         BUG();
232                   if (PageActive(page))
233                         BUG();
234                   return page;    
235             }
236             curr_order++;
237             area++;
238       } while (curr_order < MAX_ORDER);
239       spin_unlock_irqrestore(&zone->lock, flags);
240 
241       return NULL;
242 }

: 199The parameters are the zone to allocate from and what order of pages are required
: 201Because the free_area is an array of linked lists, the order may be used an an index within the array
: 207Acquire the zone lock
: 208-238This while block is responsible for finding what order of pages we will need to allocate from. If there isn't a free block at the order we are interested in, check the higher blocks until a suitable one is found
: 209head is the list of free page blocks for this order
: 210curr is the first block of pages
: 212-235If there is a free page block at this order, then allocate it
: 215page is set to be a pointer to the first page in the free block
: 216-217Sanity check that checks to make sure the page this page belongs to this zone and is within the zone_mem_map. It is unclear how this could possibly happen without severe bugs in the allocator itself that would place blocks in the wrong zones
: 218As the block is going to be allocated, remove it from the free list
: 219index treats the zone_mem_map as an array of pages so that index will be the offset within the array
: 220-221Toggle the bit that represents this pair of buddies. MARK_USED() is a macro which calculates which bit to toggle
: 222Update the statistics for this zone. 1UL<<order is the number of pages been allocated
: 224expand()(See Section F.1.5) is the function responsible for splitting page blocks of higher orders
: 225No other updates to the zone need to take place so release the lock
: 227Show that the page is in use
: 228-233Sanity checks
: 234Page block has been successfully allocated so return it
: 236-237If a page block was not free of the correct order, move to a higher order of page blocks and see what can be found there
: 239No other updates to the zone need to take place so release the lock
: 241No page blocks of the requested or higher order are available so return failure

F.1.5 Function: expand

Source: mm/page_alloc.c

This function splits page blocks of higher orders until a page block of the needed order is available.

177 static inline struct page * expand (zone_t *zone, 
                              struct page *page,
                              unsigned long index, 
                              int low, 
                              int high, 
                              free_area_t * area)
179 {
180       unsigned long size = 1 << high;
181 
182       while (high > low) {
183             if (BAD_RANGE(zone,page))
184                   BUG();
185             area--;
186             high--;
187             size >>= 1;
188             list_add(&(page)->list, &(area)->free_list);
189             MARK_USED(index, high, area);
190             index += size;
191             page += size;
192       }
193       if (BAD_RANGE(zone,page))
194             BUG();
195       return page;
196 }

177The parameters are

: zone is where the allocation is coming from
: page is the first page of the block been split
: index is the index of page within mem_map
: low is the order of pages needed for the allocation
: high is the order of pages that is been split for the allocation
: area is the free_area_t representing the high order block of pages

180size is the number of pages in the block that is to be split

182-192Keep splitting until a block of the needed page order is found

183-184Sanity check that checks to make sure the page this page belongs to this zone and is within the zone_mem_map

185area is now the next free_area_t representing the lower order of page blocks

186high is the next order of page blocks to be split

187The size of the block been split is now half as big

188Of the pair of buddies, the one lower in the mem_map is added to the free list for the lower order

189Toggle the bit representing the pair of buddies

190index now the index of the second buddy of the newly created pair

191page now points to the second buddy of the newly created paid

193-194Sanity check

195The blocks have been successfully split so return the page

F.1.6 Function: balance_classzone

Source: mm/page_alloc.c

This function is part of the direct-reclaim path. Allocators which can sleep will call this function to start performing the work of kswapd in a synchronous fashion. As the process is performing the work itself, the pages it frees of the desired order are reserved in a linked list in current→local_pages and the number of page blocks in the list is stored in current→nr_local_pages. Note that page blocks is not the same as number of pages. A page block could be of any order.

253 static struct page * balance_classzone(zone_t * classzone, 
                                           unsigned int gfp_mask, 
                                           unsigned int order, 
                                           int * freed)
254 {
255     struct page * page = NULL;
256     int __freed = 0;
257 
258     if (!(gfp_mask & __GFP_WAIT))
259         goto out;
260     if (in_interrupt())
261         BUG();
262 
263     current->allocation_order = order;
264     current->flags |= PF_MEMALLOC | PF_FREE_PAGES;
265 
266     __freed = try_to_free_pages_zone(classzone, gfp_mask);
267 
268     current->flags &= ~(PF_MEMALLOC | PF_FREE_PAGES);
269

: 258-259If the caller is not allowed to sleep, then goto out to exit the function. For this to occur, the function would have to be called directly or __alloc_pages() would need to be deliberately broken
: 260-261This function may not be used by interrupts. Again, deliberate damage would have to be introduced for this condition to occur
: 263Record the desired size of the allocation in current→allocation_order. This is actually unused although it could have been used to only add pages of the desired order to the local_pages list. As it is, the order of pages in the list is stored in page→index
: 264Set the flags which will the free functions to add the pages to the local_list
: 266Free pages directly from the desired zone with try_to_free_pages_zone() (See Section J.5.3). This is where the direct-reclaim path intersects with kswapd
: 268Clear the flags again so that the free functions do not continue to add pages to the local_pages list

270     if (current->nr_local_pages) {
271         struct list_head * entry, * local_pages;
272         struct page * tmp;
273         int nr_pages;
274 
275         local_pages = &current->local_pages;
276 
277         if (likely(__freed)) {
278             /* pick from the last inserted so we're lifo */
279             entry = local_pages->next;
280             do {
281                 tmp = list_entry(entry, struct page, list);
282                 if (tmp->index == order && 
                        memclass(page_zone(tmp), classzone)) {
283                     list_del(entry);
284                     current->nr_local_pages--;
285                     set_page_count(tmp, 1);
286                     page = tmp;
287 
288                     if (page->buffers)
289                         BUG();
290                     if (page->mapping)
291                         BUG();
292                     if (!VALID_PAGE(page))
293                         BUG();
294                     if (PageLocked(page))
295                         BUG();
296                     if (PageLRU(page))
297                         BUG();
298                     if (PageActive(page))
299                         BUG();
300                     if (PageDirty(page))
301                         BUG();
302 
303                     break;
304                 }
305             } while ((entry = entry->next) != local_pages);
306         }

Presuming that pages exist in the local_pages list, this function will cycle through the list looking for a page block belonging to the desired zone and order.

: 270Only enter this block if pages are stored in the local list
: 275Start at the beginning of the list
: 277If pages were freed with try_to_free_pages_zone() then...
: 279The last one inserted is chosen first as it is likely to be cache hot and it is desirable to use pages that have been recently referenced
: 280-305Cycle through the pages in the list until we find one of the desired order and zone
: 281Get the page from this list entry
: 282The order of the page block is stored in page→index so check if the order matches the desired order and that it belongs to the right zone. It is unlikely that pages from another zone are on this list but it could occur if swap_out() is called to free pages directly from process page tables
: 283This is a page of the right order and zone so remove it from the list
: 284Decrement the number of page blocks in the list
: 285Set the page count to 1 as it is about to be freed
: 286Set page as it will be returned. tmp is needed for the next block for freeing the remaining pages in the local list
: 288-301Perform the same checks that are performed in __free_pages_ok() to ensure it is safe to free this page
: 305Move to the next page in the list if the current one was not of the desired order and zone

308         nr_pages = current->nr_local_pages;
309         /* free in reverse order so that the global 
             * order will be lifo */
310         while ((entry = local_pages->prev) != local_pages) {
311             list_del(entry);
312             tmp = list_entry(entry, struct page, list);
313             __free_pages_ok(tmp, tmp->index);
314             if (!nr_pages--)
315                 BUG();
316         }
317         current->nr_local_pages = 0;
318     }
319  out:
320     *freed = __freed;
321     return page;
322 }

This block frees the remaining pages in the list.

: 308Get the number of page blocks that are to be freed
: 310Loop until the local_pages list is empty
: 311Remove this page block from the list
: 312Get the struct page for the entry
: 313Free the page with __free_pages_ok() (See Section F.3.2)
: 314-315If the count of page blocks reaches zero and there is still pages in the list, it means that the accounting is seriously broken somewhere or that someone added pages to the local_pages list manually so call BUG()
: 317Set the number of page blocks to 0 as they have all been freed
: 320Update the freed parameter to tell the caller how many pages were freed in total
: 321Return the page block of the requested order and zone. If the freeing failed, this will be returning NULL

F.2 Allocation Helper Functions

This section will cover miscellaneous helper functions and macros the Buddy Allocator uses to allocate pages. Very few of them do “real” work and are available just for the convenience of the programmer.

F.2.1 Function: alloc_page

Source: include/linux/mm.h

This trivial macro just calls alloc_pages() with an order of 0 to return 1 page. It is declared as follows

449 #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)

F.2.2 Function: __get_free_page

Source: include/linux/mm.h

This trivial function calls __get_free_pages() with an order of 0 to return 1 page. It is declared as follows

454 #define __get_free_page(gfp_mask) \
455             __get_free_pages((gfp_mask),0)

F.2.3 Function: __get_free_pages

Source: mm/page_alloc.c

This function is for callers who do not want to worry about pages and only get back an address it can use. It is declared as follows

428 unsigned long __get_free_pages(unsigned int gfp_mask, 
                           unsigned int order)
428 {
430       struct page * page;
431 
432       page = alloc_pages(gfp_mask, order);
433       if (!page)
434             return 0;
435       return (unsigned long) page_address(page);
436 }

: 431 alloc_pages() does the work of allocating the page block. See Section F.1.1
: 433-434 Make sure the page is valid
: 435 page_address() returns the physical address of the page

F.2.4 Function: __get_dma_pages

Source: include/linux/mm.h

This is of principle interest to device drivers. It will return memory from ZONE_DMA suitable for use with DMA devices. It is declared as follows

457 #define __get_dma_pages(gfp_mask, order) \
458         __get_free_pages((gfp_mask) | GFP_DMA,(order))

: 458 The gfp_mask is or-ed with GFP_DMA to tell the allocator to allocate from ZONE_DMA

F.2.5 Function: get_zeroed_page

Source: mm/page_alloc.c

This function will allocate one page and then zero out the contents of it. It is declared as follows

438 unsigned long get_zeroed_page(unsigned int gfp_mask)
439 {
440       struct page * page;
441 
442       page = alloc_pages(gfp_mask, 0);
443       if (page) {
444             void *address = page_address(page);
445             clear_page(address);
446             return (unsigned long) address;
447       }
448       return 0;
449 }

: 438gfp_mask are the flags which affect allocator behaviour.
: 442alloc_pages() does the work of allocating the page block. See Section F.1.1
: 444page_address() returns the physical address of the page
: 445clear_page() will fill the contents of a page with zero
: 446Return the address of the zeroed page

F.3 Free Pages

F.3.1 Function: __free_pages

Source: mm/page_alloc.c

The call graph for this function is shown in Figure 6.4. Just to be confusing, the opposite to alloc_pages() is not free_pages(), it is __free_pages(). free_pages() is a helper function which takes an address as a parameter, it will be discussed in a later section.

451 void __free_pages(struct page *page, unsigned int order)
452 {
453       if (!PageReserved(page) && put_page_testzero(page))
454             __free_pages_ok(page, order);
455 }

: 451The parameters are the page we wish to free and what order block it is
: 453Sanity checked. PageReserved() indicates that the page is reserved by the boot memory allocator. put_page_testzero() is just a macro wrapper around atomic_dec_and_test() decrements the usage count and makes sure it is zero
: 454Call the function that does all the hard work

F.3.2 Function: __free_pages_ok

Source: mm/page_alloc.c

This function will do the actual freeing of the page and coalesce the buddies if possible.

81 static void FASTCALL(__free_pages_ok (struct page *page, 
                               unsigned int order));
 82 static void __free_pages_ok (struct page *page, unsigned int order)
 83 {
 84       unsigned long index, page_idx, mask, flags;
 85       free_area_t *area;
 86       struct page *base;
 87       zone_t *zone;
 88 
 93       if (PageLRU(page)) {
 94             if (unlikely(in_interrupt()))
 95                   BUG();
 96             lru_cache_del(page);
 97       }
 98 
 99       if (page->buffers)
100             BUG();
101       if (page->mapping)
102             BUG();
103       if (!VALID_PAGE(page))
104             BUG();
105       if (PageLocked(page))
106             BUG();
107       if (PageActive(page))
108             BUG();
109       page->flags &= ~((1<<PG_referenced) | (1<<PG_dirty));

: 82The parameters are the beginning of the page block to free and what order number of pages are to be freed.
: 93-97A dirty page on the LRU will still have the LRU bit set when pinned for IO. On IO completion, it is freed so it must now be removed from the LRU list
: 99-108Sanity checks
: 109The flags showing a page has being referenced and is dirty have to be cleared because the page is now free and not in use

110 
111       if (current->flags & PF_FREE_PAGES)
112             goto local_freelist;
113  back_local_freelist:
114 
115       zone = page_zone(page);
116 
117       mask = (~0UL) << order;
118       base = zone->zone_mem_map;
119       page_idx = page - base;
120       if (page_idx & ~mask)
121             BUG();
122       index = page_idx >> (1 + order);
123 
124       area = zone->free_area + order;
125

: 111-112If this flag is set, the pages freed are to be kept for the process doing the freeing. This is set by balance_classzone()(See Section F.1.6) during page allocation if the caller is freeing the pages itself rather than waiting for kswapd to do the work
: 115The zone the page belongs to is encoded within the page flags. The page_zone() macro returns the zone
: 117The calculation of mask is discussed in companion document. It is basically related to the address calculation of the buddy
: 118base is the beginning of this zone_mem_map. For the buddy calculation to work, it was to be relative to an address 0 so that the addresses will be a power of two
: 119page_idx treats the zone_mem_map as an array of pages. This is the index page within the map
: 120-121If the index is not the proper power of two, things are severely broken and calculation of the buddy will not work
: 122This index is the bit index within free_area→map
: 124area is the area storing the free lists and map for the order block the pages are been freed from.

126       spin_lock_irqsave(&zone->lock, flags);
127 
128       zone->free_pages -= mask;
129 
130       while (mask + (1 << (MAX_ORDER-1))) {
131             struct page *buddy1, *buddy2;
132 
133             if (area >= zone->free_area + MAX_ORDER)
134                   BUG();
135             if (!__test_and_change_bit(index, area->map))
136                   /*
137                    * the buddy page is still allocated.
138                    */
139                   break;
140             /*
141              * Move the buddy up one level.
142              * This code is taking advantage of the identity:
143              *      -mask = 1+~mask
144              */
145             buddy1 = base + (page_idx ^ -mask);
146             buddy2 = base + page_idx;
147             if (BAD_RANGE(zone,buddy1))
148                   BUG();
149             if (BAD_RANGE(zone,buddy2))
150                   BUG();
151 
152             list_del(&buddy1->list);
153             mask <<= 1;
154             area++;
155             index >>= 1;
156             page_idx &= mask;
157       }

: 126The zone is about to be altered so take out the lock. The lock is an interrupt safe lock as it is possible for interrupt handlers to allocate a page in this path
: 128Another side effect of the calculation of mask is that -mask is the number of pages that are to be freed
: 130-157The allocator will keep trying to coalesce blocks together until it either cannot merge or reaches the highest order that can be merged. mask will be adjusted for each order block that is merged. When the highest order that can be merged is reached, this while loop will evaluate to 0 and exit.
: 133-134If by some miracle, mask is corrupt, this check will make sure the free_area array will not not be read beyond the end
: 135Toggle the bit representing this pair of buddies. If the bit was previously zero, both buddies were in use. As this buddy is been freed, one is still in use and cannot be merged
: 145-146The calculation of the two addresses is discussed in Chapter 6
: 147-150Sanity check to make sure the pages are within the correct zone_mem_map and actually belong to this zone
: 152The buddy has been freed so remove it from any list it was part of
: 153-156Prepare to examine the higher order buddy for merging
: 153Move the mask one bit to the left for order 2^k+1
: 154area is a pointer within an array so area++ moves to the next index
: 155The index in the bitmap of the higher order
: 156The page index within the zone_mem_map for the buddy to merge

158       list_add(&(base + page_idx)->list, &area->free_list);
159 
160       spin_unlock_irqrestore(&zone->lock, flags);
161       return;
162 
163  local_freelist:
164       if (current->nr_local_pages)
165             goto back_local_freelist;
166       if (in_interrupt())
167             goto back_local_freelist;             
168 
169       list_add(&page->list, &current->local_pages);
170       page->index = order;
171       current->nr_local_pages++;
172 }

: 158As much merging as possible as completed and a new page block is free so add it to the free_list for this order
: 160-161Changes to the zone is complete so free the lock and return
: 163This is the code path taken when the pages are not freed to the main pool but instaed are reserved for the process doing the freeing.
: 164-165If the process already has reserved pages, it is not allowed to reserve any more so return back. This is unusual as balance_classzone() assumes that more than one page block may be returned on this list. It is likely to be an oversight but may still work if the first page block freed is the same order and zone as required by balance_classzone()
: 166-167An interrupt does not have process context so it has to free in the normal fashion. It is unclear how an interrupt could end up here at all. This check is likely to be bogus and impossible to be true
: 169Add the page block to the list for the processes local_pages
: 170Record what order allocation it was for freeing later
: 171Increase the use count for nr_local_pages

F.4 Free Helper Functions

These functions are very similar to the page allocation helper functions in that they do no “real” work themselves and depend on the __free_pages() function to perform the actual free.

F.4.1 Function: free_pages

Source: mm/page_alloc.c

This function takes an address instead of a page as a parameter to free. It is declared as follows

457 void free_pages(unsigned long addr, unsigned int order)
458 {
459       if (addr != 0)
460             __free_pages(virt_to_page(addr), order);
461 }

: 460 The function is discussed in Section F.3.1. The macro virt_to_page() returns the struct page for the addr

F.4.2 Function: __free_page

Source: include/linux/mm.h

This trivial macro just calls the function __free_pages() (See Section F.3.1) with an order 0 for 1 page. It is declared as follows

472 #define __free_page(page) __free_pages((page), 0)

F.4.3 Function: free_page

Source: include/linux/mm.h

This trivial macro just calls the function free_pages(). The essential difference between this macro and __free_page() is that this function takes a virtual address as a parameter and __free_page() takes a struct page.

472 #define free_page(addr) free_pages((addr),0)