Appendix F Physical Page Allocation
F.1 Allocating Pages
F.1.1 Function: alloc_pages
Source: include/linux/mm.h
The call graph for this function is shown at 6.3. It is
declared as follows;
439 static inline struct page * alloc_pages(unsigned int gfp_mask,
unsigned int order)
440 {
444 if (order >= MAX_ORDER)
445 return NULL;
446 return _alloc_pages(gfp_mask, order);
447 }
- 439The gfp_mask (Get Free Pages) flags tells the
allocator how it may behave. For example
GFP_WAIT is not set, the allocator will
not block and instead return NULL if memory is tight. The
order is the power of two number of pages to allocate
- 444-445A simple debugging check optimized away at compile time
- 446This function is described next
F.1.2 Function: _alloc_pages
Source: mm/page_alloc.c
The function _alloc_pages() comes in two varieties. The
first is designed to only work with UMA architectures such as the x86
and is in mm/page_alloc.c. It only refers to the static node
contig_page_data. The second is in mm/numa.c and
is a simple extension. It uses a node-local allocation policy which means
that memory will be allocated from the bank closest to the processor.
For the purposes of this book, only the mm/page_alloc.c
version will be examined but developers on NUMA architectures should read
_alloc_pages() and _alloc_pages_pgdat() as well in
mm/numa.c
244 #ifndef CONFIG_DISCONTIGMEM
245 struct page *_alloc_pages(unsigned int gfp_mask,
unsigned int order)
246 {
247 return __alloc_pages(gfp_mask, order,
248 contig_page_data.node_zonelists+(gfp_mask & GFP_ZONEMASK));
249 }
250 #endif
- 244The ifndef is for UMA architectures like the x86. NUMA
architectures used the _alloc_pages() function
in mm/numa.c which employs a node local policy for
allocations
- 245The gfp_mask flags tell the allocator how it may
behave. The order is the power of two number of
pages to allocate
- 247node_zonelists is an array of preferred fallback zones to
allocate from. It is initialised in
build_zonelists()(See Section B.1.6)
The lower 16 bits of gfp_mask indicate what
zone is preferable to allocate from. Applying the bitmask
gfp_mask & GFP_ZONEMASK
will give the index in node_zonelists we prefer
to allocate from.
F.1.3 Function: __alloc_pages
Source: mm/page_alloc.c
At this stage, we've reached what is described as the “heart of the
zoned buddy allocator”, the __alloc_pages() function. It is
responsible for cycling through the fallback zones and selecting one suitable
for the allocation. If memory is tight, it will take some steps to address
the problem. It will wake kswapd and if necessary it will do the
work of kswapd manually.
327 struct page * __alloc_pages(unsigned int gfp_mask,
unsigned int order,
zonelist_t *zonelist)
328 {
329 unsigned long min;
330 zone_t **zone, * classzone;
331 struct page * page;
332 int freed;
333
334 zone = zonelist->zones;
335 classzone = *zone;
336 if (classzone == NULL)
337 return NULL;
338 min = 1UL << order;
339 for (;;) {
340 zone_t *z = *(zone++);
341 if (!z)
342 break;
343
344 min += z->pages_low;
345 if (z->free_pages > min) {
346 page = rmqueue(z, order);
347 if (page)
348 return page;
349 }
350 }
- 334Set zone to be the preferred zone to allocate from
- 335The preferred zone is recorded as the classzone. If one
of the pages low watermarks is reached later, the classzone
is marked as needing balance
- 336-337An unnecessary sanity check. build_zonelists()
would need to be seriously broken for this to happen
- 338-350This style of block appears a number of times in this
function. It reads as “cycle through all zones in this
fallback list and see can the allocation be satisfied without
violating watermarks”. Note that the pages_low
for each fallback zone is added together. This is deliberate
to reduce the probability a fallback zone will be used.
- 340z is the zone currently been examined. The zone
variable is moved to the next fallback zone
- 341-342If this is the last zone in the fallback list, break
- 344Increment the number of pages to be allocated by the
watermark for easy comparisons. This happens for each zone
in the fallback zones. While this appears first to be a bug,
this behavior is actually intended to reduce the probability
a fallback zone is used.
- 345-349Allocate the page block if it can be assigned without
reaching the pages_min watermark.
rmqueue()(See Section F.1.4) is responsible from
removing the block of pages from the zone
- 347-348If the pages could be allocated, return a pointer to them
352 classzone->need_balance = 1;
353 mb();
354 if (waitqueue_active(&kswapd_wait))
355 wake_up_interruptible(&kswapd_wait);
356
357 zone = zonelist->zones;
358 min = 1UL << order;
359 for (;;) {
360 unsigned long local_min;
361 zone_t *z = *(zone++);
362 if (!z)
363 break;
364
365 local_min = z->pages_min;
366 if (!(gfp_mask & __GFP_WAIT))
367 local_min >>= 2;
368 min += local_min;
369 if (z->free_pages > min) {
370 page = rmqueue(z, order);
371 if (page)
372 return page;
373 }
374 }
375
- 352Mark the preferred zone as needing balance. This flag will be
read later by kswapd
- 353This is a memory barrier. It ensures that all CPU's will see
any changes made to variables before this line of code. This
is important because kswapd could be running on
a different processor to the memory allocator.
- 354-355Wake up kswapd if it is asleep
- 357-358Begin again with the first preferred zone and min value
- 360-374Cycle through all the zones. This time, allocate the pages
if they can be allocated without hitting the
pages_min watermark
- 365local_min how low a number of free pages this
zone can have
- 366-367If the process can not wait or reschedule
(__GFP_WAIT is clear), then allow the zone
to be put in further memory pressure than the watermark
normally allows
376 /* here we're in the low on memory slow path */
377
378 rebalance:
379 if (current->flags & (PF_MEMALLOC | PF_MEMDIE)) {
380 zone = zonelist->zones;
381 for (;;) {
382 zone_t *z = *(zone++);
383 if (!z)
384 break;
385
386 page = rmqueue(z, order);
387 if (page)
388 return page;
389 }
390 return NULL;
391 }
- 378This label is returned to after an attempt is made to
synchronusly free pages. From this line on, the low on memory
path has been reached. It is likely the process will sleep
- 379-391These two flags are only set by the OOM killer. As the
process is trying to kill itself cleanly, allocate the
pages if at all possible as it is known they will be freed
very soon
393 /* Atomic allocations - we can't balance anything */
394 if (!(gfp_mask & __GFP_WAIT))
395 return NULL;
396
397 page = balance_classzone(classzone, gfp_mask, order, &freed);
398 if (page)
399 return page;
400
401 zone = zonelist->zones;
402 min = 1UL << order;
403 for (;;) {
404 zone_t *z = *(zone++);
405 if (!z)
406 break;
407
408 min += z->pages_min;
409 if (z->free_pages > min) {
410 page = rmqueue(z, order);
411 if (page)
412 return page;
413 }
414 }
415
416 /* Don't let big-order allocations loop */
417 if (order > 3)
418 return NULL;
419
420 /* Yield for kswapd, and try again */
421 yield();
422 goto rebalance;
423 }
- 394-395If the calling process can not sleep, return NULL as the
only way to allocate the pages from here involves sleeping
- 397balance_classzone()(See Section F.1.6)
performs the work of kswapd in a synchronous fashion.
The principal difference is that instead of freeing the
memory into a global pool, it is kept for the process using
the current→local_pages
linked list
- 398-399If a page block of the right order has been freed, return
it. Just because this is NULL does not mean an allocation
will fail as it could be a higher order of pages that
was released
- 403-414This is identical to the block above. Allocate the page
blocks if it can be done without hitting the pages_min
watermark
- 417-418Satisifing a large allocation like 24 number of
pages is difficult. If it has not been satisfied by now,
it is better to simply return NULL
- 421Yield the processor to give kswapd a chance to work
- 422Attempt to balance the zones again and allocate
F.1.4 Function: rmqueue
Source: mm/page_alloc.c
This function is called from __alloc_pages(). It is responsible
for finding a block of memory large enough to be used for the allocation. If
a block of memory of the requested size is not available, it will look for
a larger order that may be split into two buddies. The actual splitting is
performed by the expand() (See Section F.1.5) function.
198 static FASTCALL(struct page *rmqueue(zone_t *zone,
unsigned int order));
199 static struct page * rmqueue(zone_t *zone, unsigned int order)
200 {
201 free_area_t * area = zone->free_area + order;
202 unsigned int curr_order = order;
203 struct list_head *head, *curr;
204 unsigned long flags;
205 struct page *page;
206
207 spin_lock_irqsave(&zone->lock, flags);
208 do {
209 head = &area->free_list;
210 curr = head->next;
211
212 if (curr != head) {
213 unsigned int index;
214
215 page = list_entry(curr, struct page, list);
216 if (BAD_RANGE(zone,page))
217 BUG();
218 list_del(curr);
219 index = page - zone->zone_mem_map;
220 if (curr_order != MAX_ORDER-1)
221 MARK_USED(index, curr_order, area);
222 zone->free_pages -= 1UL << order;
223
224 page = expand(zone, page, index, order,
curr_order, area);
225 spin_unlock_irqrestore(&zone->lock, flags);
226
227 set_page_count(page, 1);
228 if (BAD_RANGE(zone,page))
229 BUG();
230 if (PageLRU(page))
231 BUG();
232 if (PageActive(page))
233 BUG();
234 return page;
235 }
236 curr_order++;
237 area++;
238 } while (curr_order < MAX_ORDER);
239 spin_unlock_irqrestore(&zone->lock, flags);
240
241 return NULL;
242 }
- 199The parameters are the zone to allocate from and what order of
pages are required
- 201Because the free_area is an array of linked lists,
the order may be used an an index within the array
- 207Acquire the zone lock
- 208-238This while block is responsible for finding what order of
pages we will need to allocate from. If there isn't a free
block at the order we are interested in, check the higher
blocks until a suitable one is found
- 209head is the list of free page blocks for this order
- 210curr is the first block of pages
- 212-235If there is a free page block at this order, then allocate
it
- 215page is set to be a pointer to the first page in the free block
- 216-217Sanity check that checks to make sure the page this page
belongs to this zone and is within the
zone_mem_map. It is unclear how this could
possibly happen without severe bugs in the allocator itself
that would place blocks in the wrong zones
- 218As the block is going to be allocated, remove it from the free
list
- 219index treats the zone_mem_map as an array
of pages so that index will be the offset within the array
- 220-221Toggle the bit that represents this pair of buddies.
MARK_USED() is a macro which calculates which
bit to toggle
- 222Update the statistics for this
zone. 1UL<<order is the number of
pages been allocated
- 224expand()(See Section F.1.5) is the function responsible
for splitting page blocks of higher orders
- 225No other updates to the zone need to take place so release the
lock
- 227Show that the page is in use
- 228-233Sanity checks
- 234Page block has been successfully allocated so return it
- 236-237If a page block was not free of the correct order, move to
a higher order of page blocks and see what can be found
there
- 239No other updates to the zone need to take place so release the
lock
- 241No page blocks of the requested or higher order are available
so return failure
F.1.5 Function: expand
Source: mm/page_alloc.c
This function splits page blocks of higher orders until a page block of the
needed order is available.
177 static inline struct page * expand (zone_t *zone,
struct page *page,
unsigned long index,
int low,
int high,
free_area_t * area)
179 {
180 unsigned long size = 1 << high;
181
182 while (high > low) {
183 if (BAD_RANGE(zone,page))
184 BUG();
185 area--;
186 high--;
187 size >>= 1;
188 list_add(&(page)->list, &(area)->free_list);
189 MARK_USED(index, high, area);
190 index += size;
191 page += size;
192 }
193 if (BAD_RANGE(zone,page))
194 BUG();
195 return page;
196 }
- 177The parameters are
-
- zone is where the allocation is coming from
- page is the first page of the block been split
- index is the index of page within mem_map
- low is the order of pages needed for the allocation
- high is the order of pages that is been split for the allocation
- area is the free_area_t representing the high
order block of pages
- 180size is the number of pages in the block that is to be
split
- 182-192Keep splitting until a block of the needed page order is
found
- 183-184Sanity check that checks to make sure the page this page
belongs to this zone and is within the
zone_mem_map
- 185area is now the next free_area_t
representing the lower order of page blocks
- 186high is the next order of page blocks to be split
- 187The size of the block been split is now half as big
- 188Of the pair of buddies, the one lower in the mem_map
is added to the free list for the lower order
- 189Toggle the bit representing the pair of buddies
- 190index now the index of the second buddy of the newly
created pair
- 191page now points to the second buddy of the newly
created paid
- 193-194Sanity check
- 195The blocks have been successfully split so return the page
F.1.6 Function: balance_classzone
Source: mm/page_alloc.c
This function is part of the direct-reclaim path. Allocators which can sleep
will call this function to start performing the work of kswapd
in a synchronous fashion. As the process is performing the work itself,
the pages it frees of the desired order are reserved in a linked list in
current→local_pages and the number of page blocks in
the list is stored in current→nr_local_pages. Note that
page blocks is not the same as number of pages. A page block could be of any
order.
253 static struct page * balance_classzone(zone_t * classzone,
unsigned int gfp_mask,
unsigned int order,
int * freed)
254 {
255 struct page * page = NULL;
256 int __freed = 0;
257
258 if (!(gfp_mask & __GFP_WAIT))
259 goto out;
260 if (in_interrupt())
261 BUG();
262
263 current->allocation_order = order;
264 current->flags |= PF_MEMALLOC | PF_FREE_PAGES;
265
266 __freed = try_to_free_pages_zone(classzone, gfp_mask);
267
268 current->flags &= ~(PF_MEMALLOC | PF_FREE_PAGES);
269
- 258-259If the caller is not allowed to sleep, then goto out
to exit the function. For this to occur, the function would have to be called
directly or __alloc_pages() would need to be deliberately broken
- 260-261This function may not be used by interrupts. Again, deliberate
damage would have to be introduced for this condition to occur
- 263Record the desired size of the allocation in
current→allocation_order. This is actually unused although
it could have been used to only add pages of the desired order to the
local_pages list. As it is, the order of pages in the list is
stored in page→index
- 264Set the flags which will the free functions to add the pages to the
local_list
- 266Free pages directly from the desired zone
with try_to_free_pages_zone()
(See Section J.5.3). This is where the direct-reclaim path intersects
with kswapd
- 268Clear the flags again so that the free functions do not continue to
add pages to the local_pages list
270 if (current->nr_local_pages) {
271 struct list_head * entry, * local_pages;
272 struct page * tmp;
273 int nr_pages;
274
275 local_pages = ¤t->local_pages;
276
277 if (likely(__freed)) {
278 /* pick from the last inserted so we're lifo */
279 entry = local_pages->next;
280 do {
281 tmp = list_entry(entry, struct page, list);
282 if (tmp->index == order &&
memclass(page_zone(tmp), classzone)) {
283 list_del(entry);
284 current->nr_local_pages--;
285 set_page_count(tmp, 1);
286 page = tmp;
287
288 if (page->buffers)
289 BUG();
290 if (page->mapping)
291 BUG();
292 if (!VALID_PAGE(page))
293 BUG();
294 if (PageLocked(page))
295 BUG();
296 if (PageLRU(page))
297 BUG();
298 if (PageActive(page))
299 BUG();
300 if (PageDirty(page))
301 BUG();
302
303 break;
304 }
305 } while ((entry = entry->next) != local_pages);
306 }
Presuming that pages exist in the local_pages list, this function
will cycle through the list looking for a page block belonging to the desired
zone and order.
- 270Only enter this block if pages are stored in the local list
- 275Start at the beginning of the list
- 277If pages were freed with try_to_free_pages_zone()
then...
- 279The last one inserted is chosen first as it is likely to be cache
hot and it is desirable to use pages that have been recently referenced
- 280-305Cycle through the pages in the list until we find one of the
desired order and zone
- 281Get the page from this list entry
- 282The order of the page block is stored in
page→index so check if the order matches the desired order
and that it belongs to the right zone. It is unlikely that pages from another
zone are on this list but it could occur if swap_out() is called to
free pages directly from process page tables
- 283This is a page of the right order and zone so remove it from the
list
- 284Decrement the number of page blocks in the list
- 285Set the page count to 1 as it is about to be freed
- 286Set page as it will be returned. tmp is needed
for the next block for freeing the remaining pages in the local list
- 288-301Perform the same checks that are performed in
__free_pages_ok() to ensure it is safe to free this page
- 305Move to the next page in the list if the current one was not of the
desired order and zone
308 nr_pages = current->nr_local_pages;
309 /* free in reverse order so that the global
* order will be lifo */
310 while ((entry = local_pages->prev) != local_pages) {
311 list_del(entry);
312 tmp = list_entry(entry, struct page, list);
313 __free_pages_ok(tmp, tmp->index);
314 if (!nr_pages--)
315 BUG();
316 }
317 current->nr_local_pages = 0;
318 }
319 out:
320 *freed = __freed;
321 return page;
322 }
This block frees the remaining pages in the list.
- 308Get the number of page blocks that are to be freed
- 310Loop until the local_pages list is empty
- 311Remove this page block from the list
- 312Get the struct page for the entry
- 313Free the page with __free_pages_ok()
(See Section F.3.2)
- 314-315If the count of page blocks reaches zero and there is still
pages in the list, it means that the accounting is seriously broken somewhere
or that someone added pages to the local_pages list manually so call
BUG()
- 317Set the number of page blocks to 0 as they have all been freed
- 320Update the freed parameter to tell the caller how many
pages were freed in total
- 321Return the page block of the requested order and zone. If the
freeing failed, this will be returning NULL
F.2 Allocation Helper Functions
This section will cover miscellaneous helper functions and macros
the Buddy Allocator uses to allocate pages. Very few of them do “real”
work and are available just for the convenience of the programmer.
F.2.1 Function: alloc_page
Source: include/linux/mm.h
This trivial macro just calls alloc_pages() with an order of 0 to
return 1 page. It is declared as follows
449 #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
F.2.2 Function: __get_free_page
Source: include/linux/mm.h
This trivial function calls __get_free_pages() with an order
of 0 to return 1 page. It is declared as follows
454 #define __get_free_page(gfp_mask) \
455 __get_free_pages((gfp_mask),0)
F.2.3 Function: __get_free_pages
Source: mm/page_alloc.c
This function is for callers who do not want to worry about pages and only
get back an address it can use. It is declared as follows
428 unsigned long __get_free_pages(unsigned int gfp_mask,
unsigned int order)
428 {
430 struct page * page;
431
432 page = alloc_pages(gfp_mask, order);
433 if (!page)
434 return 0;
435 return (unsigned long) page_address(page);
436 }
- 431 alloc_pages() does the work of allocating the page
block. See Section F.1.1
- 433-434 Make sure the page is valid
- 435 page_address() returns the physical address of the page
F.2.4 Function: __get_dma_pages
Source: include/linux/mm.h
This is of principle interest to device drivers. It will return memory from
ZONE_DMA suitable for use with DMA devices. It is declared as follows
457 #define __get_dma_pages(gfp_mask, order) \
458 __get_free_pages((gfp_mask) | GFP_DMA,(order))
- 458 The gfp_mask is or-ed with GFP_DMA
to tell the allocator to allocate from ZONE_DMA
F.2.5 Function: get_zeroed_page
Source: mm/page_alloc.c
This function will allocate one page and then zero out the contents of it. It is
declared as follows
438 unsigned long get_zeroed_page(unsigned int gfp_mask)
439 {
440 struct page * page;
441
442 page = alloc_pages(gfp_mask, 0);
443 if (page) {
444 void *address = page_address(page);
445 clear_page(address);
446 return (unsigned long) address;
447 }
448 return 0;
449 }
- 438gfp_mask are the flags which affect allocator
behaviour.
- 442alloc_pages() does the work of allocating the page
block. See Section F.1.1
- 444page_address() returns the physical address of the page
- 445clear_page() will fill the contents of a page with zero
- 446Return the address of the zeroed page
F.3 Free Pages
F.3.1 Function: __free_pages
Source: mm/page_alloc.c
The call graph for this function is shown in Figure
6.4. Just to be confusing, the opposite
to alloc_pages() is not free_pages(), it is
__free_pages(). free_pages() is a helper function
which takes an address as a parameter, it will be discussed in a later section.
451 void __free_pages(struct page *page, unsigned int order)
452 {
453 if (!PageReserved(page) && put_page_testzero(page))
454 __free_pages_ok(page, order);
455 }
- 451The parameters are the page we wish to free and what
order block it is
- 453Sanity checked. PageReserved() indicates that the page
is reserved by the boot memory allocator. put_page_testzero()
is just a macro wrapper around atomic_dec_and_test() decrements
the usage count and makes sure it is zero
- 454Call the function that does all the hard work
F.3.2 Function: __free_pages_ok
Source: mm/page_alloc.c
This function will do the actual freeing of the page and coalesce the buddies
if possible.
81 static void FASTCALL(__free_pages_ok (struct page *page,
unsigned int order));
82 static void __free_pages_ok (struct page *page, unsigned int order)
83 {
84 unsigned long index, page_idx, mask, flags;
85 free_area_t *area;
86 struct page *base;
87 zone_t *zone;
88
93 if (PageLRU(page)) {
94 if (unlikely(in_interrupt()))
95 BUG();
96 lru_cache_del(page);
97 }
98
99 if (page->buffers)
100 BUG();
101 if (page->mapping)
102 BUG();
103 if (!VALID_PAGE(page))
104 BUG();
105 if (PageLocked(page))
106 BUG();
107 if (PageActive(page))
108 BUG();
109 page->flags &= ~((1<<PG_referenced) | (1<<PG_dirty));
- 82The parameters are the beginning of the page block to free and
what order number of pages are to be freed.
- 93-97A dirty page on the LRU will still have the LRU bit set when pinned
for IO. On IO completion, it is freed so it must now be removed from the LRU
list
- 99-108Sanity checks
- 109The flags showing a page has being referenced and is dirty have to
be cleared because the page is now free and not in use
110
111 if (current->flags & PF_FREE_PAGES)
112 goto local_freelist;
113 back_local_freelist:
114
115 zone = page_zone(page);
116
117 mask = (~0UL) << order;
118 base = zone->zone_mem_map;
119 page_idx = page - base;
120 if (page_idx & ~mask)
121 BUG();
122 index = page_idx >> (1 + order);
123
124 area = zone->free_area + order;
125
- 111-112If this flag is set, the pages freed are
to be kept for the process doing the freeing. This is set by
balance_classzone()(See Section F.1.6) during page
allocation if the caller is freeing the pages itself rather than waiting
for kswapd to do the work
- 115The zone the page belongs to is encoded within the page flags. The
page_zone() macro returns the zone
- 117The calculation of mask is discussed in companion document. It
is basically related to the address calculation of the buddy
- 118base is the beginning of this
zone_mem_map. For the buddy calculation to work, it was to be
relative to an address 0 so that the addresses will be a power of two
- 119page_idx treats the zone_mem_map as an
array of pages. This is the index page within the map
- 120-121If the index is not the proper power of two, things are
severely broken and calculation of the buddy will not work
- 122This index is the bit index within
free_area→map
- 124area is the area storing the free lists and map for
the order block the pages are been freed from.
126 spin_lock_irqsave(&zone->lock, flags);
127
128 zone->free_pages -= mask;
129
130 while (mask + (1 << (MAX_ORDER-1))) {
131 struct page *buddy1, *buddy2;
132
133 if (area >= zone->free_area + MAX_ORDER)
134 BUG();
135 if (!__test_and_change_bit(index, area->map))
136 /*
137 * the buddy page is still allocated.
138 */
139 break;
140 /*
141 * Move the buddy up one level.
142 * This code is taking advantage of the identity:
143 * -mask = 1+~mask
144 */
145 buddy1 = base + (page_idx ^ -mask);
146 buddy2 = base + page_idx;
147 if (BAD_RANGE(zone,buddy1))
148 BUG();
149 if (BAD_RANGE(zone,buddy2))
150 BUG();
151
152 list_del(&buddy1->list);
153 mask <<= 1;
154 area++;
155 index >>= 1;
156 page_idx &= mask;
157 }
- 126The zone is about to be altered so take out the lock. The lock is
an interrupt safe lock as it is possible for interrupt handlers to allocate
a page in this path
- 128Another side effect of the calculation of mask is that
-mask is the number of pages that are to be freed
- 130-157The allocator will keep trying to coalesce blocks together
until it either cannot merge or reaches the highest order that can be merged.
mask will be adjusted for each order block that is merged. When the highest
order that can be merged is reached, this while loop will evaluate to 0 and
exit.
- 133-134If by some miracle, mask is corrupt, this check will
make sure the free_area array will not not be read beyond the end
- 135Toggle the bit representing this pair of buddies. If the bit
was previously zero, both buddies were in use. As this buddy is been freed,
one is still in use and cannot be merged
- 145-146The calculation of the two addresses is discussed in Chapter
6
- 147-150Sanity
check to make sure the pages are within the correct
zone_mem_map and actually belong to this zone
- 152The buddy has been freed so remove it from any list it was part of
- 153-156Prepare to examine the higher order buddy for merging
- 153Move the mask one bit to the left for order 2k+1
- 154area is a pointer within an array so area++ moves to
the next index
- 155The index in the bitmap of the higher order
- 156The page index within the zone_mem_map for the buddy
to merge
158 list_add(&(base + page_idx)->list, &area->free_list);
159
160 spin_unlock_irqrestore(&zone->lock, flags);
161 return;
162
163 local_freelist:
164 if (current->nr_local_pages)
165 goto back_local_freelist;
166 if (in_interrupt())
167 goto back_local_freelist;
168
169 list_add(&page->list, ¤t->local_pages);
170 page->index = order;
171 current->nr_local_pages++;
172 }
- 158As much merging as possible as completed and a new page block
is free so add it to the free_list for this order
- 160-161Changes to the zone is complete so free the lock and return
- 163This is the code path taken when the pages are not freed to the
main pool but instaed are reserved for the process doing the freeing.
- 164-165If the process already has reserved pages, it is
not allowed to reserve any more so return back. This is unusual as
balance_classzone() assumes that more than one page block may be
returned on this list. It is likely to be an oversight but may still work
if the first page block freed is the same order and zone as required by
balance_classzone()
- 166-167An interrupt does not have process context so it has to
free in the normal fashion. It is unclear how an interrupt could end up here
at all. This check is likely to be bogus and impossible to be true
- 169Add the page block to the list for the processes
local_pages
- 170Record what order allocation it was for freeing later
- 171Increase the use count for nr_local_pages
F.4 Free Helper Functions
These functions are very similar to the page allocation helper
functions in that they do no “real” work themselves and depend on the
__free_pages() function to perform the actual free.
F.4.1 Function: free_pages
Source: mm/page_alloc.c
This function takes an address instead of a page as a parameter to free. It
is declared as follows
457 void free_pages(unsigned long addr, unsigned int order)
458 {
459 if (addr != 0)
460 __free_pages(virt_to_page(addr), order);
461 }
- 460 The function is discussed in Section F.3.1. The macro virt_to_page() returns the struct page
for
the addr
F.4.2 Function: __free_page
Source: include/linux/mm.h
This trivial macro just calls the function __free_pages() (See
Section F.3.1) with an order 0 for 1 page. It is declared
as follows
472 #define __free_page(page) __free_pages((page), 0)
F.4.3 Function: free_page
Source: include/linux/mm.h
This trivial macro just calls the function free_pages(). The
essential difference between this macro and __free_page() is that
this function takes a virtual address as a parameter and
__free_page() takes a struct page.
472 #define free_page(addr) free_pages((addr),0)