H.1 Cache Manipulation
H.1.1 Cache Creation
H.1.1.1 Function: kmem_cache_create
Source: mm/slab.c
The call graph for this function is shown in 8.3. This
function is responsible for the creation of a new cache and will be dealt
with in chunks due to its size. The chunks roughly are;
-
Perform basic sanity checks for bad usage
- Perform debugging checks if CONFIG_SLAB_DEBUG is set
- Allocate a kmem_cache_t from the cache_cache
slab cache
- Align the object size to the word size
- Calculate how many objects will fit on a slab
- Align the slab size to the hardware cache
- Calculate colour offsets
- Initialise remaining fields in cache descriptor
- Add the new cache to the cache chain
621 kmem_cache_t *
622 kmem_cache_create (const char *name, size_t size,
623 size_t offset, unsigned long flags,
void (*ctor)(void*, kmem_cache_t *, unsigned long),
624 void (*dtor)(void*, kmem_cache_t *, unsigned long))
625 {
626 const char *func_nm = KERN_ERR "kmem_create: ";
627 size_t left_over, align, slab_size;
628 kmem_cache_t *cachep = NULL;
629
633 if ((!name) ||
634 ((strlen(name) >= CACHE_NAMELEN - 1)) ||
635 in_interrupt() ||
636 (size < BYTES_PER_WORD) ||
637 (size > (1<<MAX_OBJ_ORDER)*PAGE_SIZE) ||
638 (dtor && !ctor) ||
639 (offset < 0 || offset > size))
640 BUG();
641
Perform basic sanity checks for bad usage
- 622The parameters of the function are
- name The human readable name of the cache
- size The size of an object
- offset This is used to specify a specific alignment for objects in the
cache but it usually left as 0
- flags Static cache flags
- ctor A constructor function to call for each object during slab
creation
- dtor The corresponding destructor function. It is expected the
destructor function leaves an object in an initialised state
- 633-640 These are all serious usage bugs that prevent the cache even
attempting to create
- 634If the human readable name is greater than the maximum size for
a cache name (CACHE_NAMELEN)
- 635An interrupt handler cannot create a cache as access to
interrupt-safe spinlocks and semaphores are needed
- 636The object size must be at least a word in size. The slab allocator
is not suitable for objects whose size is measured in individual bytes
- 637The largest possible slab that can be created is
2MAX_OBJ_ORDER number of pages which provides 32 pages
- 638A destructor cannot be used if no constructor is available
- 639The offset cannot be before the slab or beyond the boundary of the
first page
- 640Call BUG() to exit
642 #if DEBUG
643 if ((flags & SLAB_DEBUG_INITIAL) && !ctor) {
645 printk("%sNo con, but init state check
requested - %s\n", func_nm, name);
646 flags &= ~SLAB_DEBUG_INITIAL;
647 }
648
649 if ((flags & SLAB_POISON) && ctor) {
651 printk("%sPoisoning requested, but con given - %s\n",
func_nm, name);
652 flags &= ~SLAB_POISON;
653 }
654 #if FORCED_DEBUG
655 if ((size < (PAGE_SIZE>>3)) &&
!(flags & SLAB_MUST_HWCACHE_ALIGN))
660 flags |= SLAB_RED_ZONE;
661 if (!ctor)
662 flags |= SLAB_POISON;
663 #endif
664 #endif
670 BUG_ON(flags & ~CREATE_MASK);
This block performs debugging checks if CONFIG_SLAB_DEBUG is set
- 643-646The flag SLAB_DEBUG_INITIAL requests that
the constructor check the objects to make sure they are in an initialised
state. For this, a constructor must exist. If it does not, the flag
is cleared
- 649-653A slab can be poisoned with a known pattern to make sure
an object wasn't used before it was allocated but a constructor would ruin
this pattern falsely reporting a bug. If a constructor exists, remove the
SLAB_POISON flag if set
- 655-660Only small objects will be red zoned for debugging. Red zoning
large objects would cause severe fragmentation
- 661-662If there is no constructor, set the poison bit
- 670The CREATE_MASK is set with all the allowable
flags kmem_cache_create() (See Section H.1.1.1) can
be called with. This prevents callers using debugging flags when they are
not available and BUG()s it instead
673 cachep =
(kmem_cache_t *) kmem_cache_alloc(&cache_cache,
SLAB_KERNEL);
674 if (!cachep)
675 goto opps;
676 memset(cachep, 0, sizeof(kmem_cache_t));
Allocate a kmem_cache_t from the cache_cache
slab cache.
- 673Allocate a cache descriptor object from the
cache_cache with kmem_cache_alloc()
(See Section H.3.2.1)
- 674-675If out of memory goto opps which handles the oom
situation
- 676Zero fill the object to prevent surprises with uninitialised data
682 if (size & (BYTES_PER_WORD-1)) {
683 size += (BYTES_PER_WORD-1);
684 size &= ~(BYTES_PER_WORD-1);
685 printk("%sForcing size word alignment
- %s\n", func_nm, name);
686 }
687
688 #if DEBUG
689 if (flags & SLAB_RED_ZONE) {
694 flags &= ~SLAB_HWCACHE_ALIGN;
695 size += 2*BYTES_PER_WORD;
696 }
697 #endif
698 align = BYTES_PER_WORD;
699 if (flags & SLAB_HWCACHE_ALIGN)
700 align = L1_CACHE_BYTES;
701
703 if (size >= (PAGE_SIZE>>3))
708 flags |= CFLGS_OFF_SLAB;
709
710 if (flags & SLAB_HWCACHE_ALIGN) {
714 while (size < align/2)
715 align /= 2;
716 size = (size+align-1)&(~(align-1));
717 }
Align the object size to some word-sized boundary.
- 682If the size is not aligned to the size of a word then...
- 683-684Increase the object by the size of a word then mask out
the lower bits, this will effectively round the object size up to the next
word boundary
- 685Print out an informational message for debugging purposes
- 688-697If debugging is enabled then the alignments have to change
slightly
- 694Do not bother trying to align things to the hardware cache if the
slab will be red zoned. The red zoning of the object is going to offset it
by moving the object one word away from the cache boundary
- 695The size of the object increases by two BYTES_PER_WORD
to store the red zone mark at either end of the object
- 698Initialise the alignment to be to a word boundary. This will change
if the caller has requested a CPU cache alignment
- 699-700If requested, align the objects to the L1 CPU cache
- 703If the objects are large, store the slab descriptors off-slab. This
will allow better packing of objects into the slab
- 710If hardware cache alignment is requested, the size of the objects
must be adjusted to align themselves to the hardware cache
- 714-715Try and pack objects into one cache line if they fit while
still keeping the alignment. This is important to arches (e.g. Alpha or
Pentium 4) with large L1 cache bytes. align will be adjusted to be
the smallest that will give hardware cache alignment. For machines with large
L1 cache lines, two or more small objects may fit into each line. For example,
two objects from the size-32 cache will fit on one cache line from a Pentium 4
- 716Round the cache size up to the hardware cache alignment
724 do {
725 unsigned int break_flag = 0;
726 cal_wastage:
727 kmem_cache_estimate(cachep->gfporder,
size, flags,
728 &left_over,
&cachep->num);
729 if (break_flag)
730 break;
731 if (cachep->gfporder >= MAX_GFP_ORDER)
732 break;
733 if (!cachep->num)
734 goto next;
735 if (flags & CFLGS_OFF_SLAB &&
cachep->num > offslab_limit) {
737 cachep->gfporder--;
738 break_flag++;
739 goto cal_wastage;
740 }
741
746 if (cachep->gfporder >= slab_break_gfp_order)
747 break;
748
749 if ((left_over*8) <= (PAGE_SIZE<<cachep->gfporder))
750 break;
751 next:
752 cachep->gfporder++;
753 } while (1);
754
755 if (!cachep->num) {
756 printk("kmem_cache_create: couldn't
create cache %s.\n", name);
757 kmem_cache_free(&cache_cache, cachep);
758 cachep = NULL;
759 goto opps;
760 }
Calculate how many objects will fit on a slab and adjust the slab size as
necessary
- 727-728kmem_cache_estimate() (see Section H.1.2.1)
calculates the number of objects that can fit on a
slab at the current gfp order and what the amount of leftover bytes will be
- 729-730The break_flag is set if the number of objects
fitting on the slab exceeds the number that can be kept when offslab slab
descriptors are used
- 731-732The order number of pages used must not exceed
MAX_GFP_ORDER (5)
- 733-734If even one object didn't fill, goto next: which will increase
the gfporder used for the cache
- 735If the slab descriptor is kept off-cache but the number of objects
exceeds the number that can be tracked with bufctl's off-slab then ...
- 737Reduce the order number of pages used
- 738Set the break_flag so the loop will exit
- 739Calculate the new wastage figures
- 746-747The slab_break_gfp_order is the order to not
exceed unless 0 objects fit on the slab. This check ensures the order is
not exceeded
- 749-759This is a rough check for internal fragmentation. If the wastage
as a fraction of the total size of the cache is less than one eight, it is
acceptable
- 752If the fragmentation is too high, increase the gfp order and
recalculate the number of objects that can be stored and the wastage
- 755If after adjustments, objects still do not fit in the cache, it
cannot be created
- 757-758Free the cache descriptor and set the pointer to NULL
- 758Goto opps which simply returns the NULL pointer
761 slab_size = L1_CACHE_ALIGN(
cachep->num*sizeof(kmem_bufctl_t) +
sizeof(slab_t));
762
767 if (flags & CFLGS_OFF_SLAB && left_over >= slab_size) {
768 flags &= ~CFLGS_OFF_SLAB;
769 left_over -= slab_size;
770 }
Align the slab size to the hardware cache
- 761slab_size is the total size of the slab descriptor not the
size of the slab itself. It is the size slab_t struct and the
number of objects * size of the bufctl
- 767-769If there is enough left over space for the slab descriptor
and it was specified to place the descriptor off-slab, remove the flag and
update the amount of left_over bytes there is. This will impact
the cache colouring but with the large objects associated with off-slab
descriptors, this is not a problem
773 offset += (align-1);
774 offset &= ~(align-1);
775 if (!offset)
776 offset = L1_CACHE_BYTES;
777 cachep->colour_off = offset;
778 cachep->colour = left_over/offset;
Calculate colour offsets.
- 773-774offset is the offset within the page the caller
requested. This will make sure the offset requested is at the correct alignment
for cache usage
- 775-776If somehow the offset is 0, then set it to be aligned for the
CPU cache
- 777This is the offset to use to keep objects on different cache lines.
Each slab created will be given a different colour offset
- 778This is the number of different offsets that can be used
781 if (!cachep->gfporder && !(flags & CFLGS_OFF_SLAB))
782 flags |= CFLGS_OPTIMIZE;
783
784 cachep->flags = flags;
785 cachep->gfpflags = 0;
786 if (flags & SLAB_CACHE_DMA)
787 cachep->gfpflags |= GFP_DMA;
788 spin_lock_init(&cachep->spinlock);
789 cachep->objsize = size;
790 INIT_LIST_HEAD(&cachep->slabs_full);
791 INIT_LIST_HEAD(&cachep->slabs_partial);
792 INIT_LIST_HEAD(&cachep->slabs_free);
793
794 if (flags & CFLGS_OFF_SLAB)
795 cachep->slabp_cache =
kmem_find_general_cachep(slab_size,0);
796 cachep->ctor = ctor;
797 cachep->dtor = dtor;
799 strcpy(cachep->name, name);
800
801 #ifdef CONFIG_SMP
802 if (g_cpucache_up)
803 enable_cpucache(cachep);
804 #endif
Initialise remaining fields in cache descriptor
- 781-782For caches with slabs of only 1 page, the
CFLGS_OPTIMIZE flag is set. In reality it makes no difference
as the flag is unused
- 784Set the cache static flags
- 785Zero out the gfpflags. Defunct operation as memset()
after the cache descriptor was allocated would do this
- 786-787If the slab is for DMA use, set the GFP_DMA
flag so the buddy allocator will use ZONE_DMA
- 788Initialise the spinlock for access the cache
- 789Copy in the object size, which now takes hardware cache alignment if
necessary
- 790-792Initialise the slab lists
- 794-795If the descriptor is kept off-slab, allocate a slab manager and
place it for use in slabp_cache. See Section H.2.1.2
- 796-797Set the pointers to the constructor and destructor functions
- 799Copy in the human readable name
- 802-803If per-cpu caches are enabled, create a set for this cache. See
Section 8.5
806 down(&cache_chain_sem);
807 {
808 struct list_head *p;
809
810 list_for_each(p, &cache_chain) {
811 kmem_cache_t *pc = list_entry(p,
kmem_cache_t, next);
812
814 if (!strcmp(pc->name, name))
815 BUG();
816 }
817 }
818
822 list_add(&cachep->next, &cache_chain);
823 up(&cache_chain_sem);
824 opps:
825 return cachep;
826 }
Add the new cache to the cache chain
- 806Acquire the semaphore used to synchronise access to the cache chain
- 810-816Check every cache on the cache chain and make sure there is
no other cache with the same name. If there is, it means two caches of the
same type are been created which is a serious bug
- 811Get the cache from the list
- 814-815Compare the names and if they match, BUG(). It is
worth noting that the new cache is not deleted, but this error is the result
of sloppy programming during development and not a normal scenario
- 822Link the cache into the chain.
- 823Release the cache chain semaphore.
- 825Return the new cache pointer
H.1.2 Calculating the Number of Objects on a Slab
H.1.2.1 Function: kmem_cache_estimate
Source: mm/slab.c
During cache creation, it is determined how many objects can be stored in
a slab and how much waste-age there will be. The following function calculates
how many objects may be stored, taking into account if the slab and bufctl's
must be stored on-slab.
388 static void kmem_cache_estimate (unsigned long gfporder,
size_t size,
389 int flags, size_t *left_over, unsigned int *num)
390 {
391 int i;
392 size_t wastage = PAGE_SIZE<<gfporder;
393 size_t extra = 0;
394 size_t base = 0;
395
396 if (!(flags & CFLGS_OFF_SLAB)) {
397 base = sizeof(slab_t);
398 extra = sizeof(kmem_bufctl_t);
399 }
400 i = 0;
401 while (i*size + L1_CACHE_ALIGN(base+i*extra) <= wastage)
402 i++;
403 if (i > 0)
404 i--;
405
406 if (i > SLAB_LIMIT)
407 i = SLAB_LIMIT;
408
409 *num = i;
410 wastage -= i*size;
411 wastage -= L1_CACHE_ALIGN(base+i*extra);
412 *left_over = wastage;
413 }
- 388The parameters of the function are as follows
-
- gfporder The 2gfporder number of pages to allocate for each slab
- size The size of each object
- flags The cache flags
- left_over The number of bytes left over in the slab. Returned to caller
- num The number of objects that will fit in a slab. Returned to caller
- 392wastage is decremented through the function. It starts with
the maximum possible amount of wastage.
- 393extra is the number of bytes needed to store
kmem_bufctl_t
- 394base is where usable memory in the slab starts
- 396If the slab descriptor is kept on cache, the base begins at the
end of the slab_t struct and the number of bytes needed to
store the bufctl is the size of kmem_bufctl_t
- 400i becomes the number of objects the slab can hold
- 401-402This counts up the number of objects that the cache can store.
i*size is the the size of the object
itself. L1_CACHE_ALIGN(base+i*extra) is slightly
trickier. This is calculating the amount of memory needed to
store the kmem_bufctl_t needed for every object in the
slab. As it is at the beginning of the slab, it is L1 cache aligned
so that the first object in the slab will be aligned to hardware
cache. i*extra will calculate the amount of space needed
to hold a kmem_bufctl_t for this object. As wast-age
starts out as the size of the slab, its use is overloaded here.
- 403-404Because the previous loop counts until the slab overflows, the
number of objects that can be stored is i-1.
- 406-407SLAB_LIMIT is the absolute largest number of
objects a slab can store. Is is defined as 0xffffFFFE as this the
largest number kmem_bufctl_t(), which is an unsigned
integer, can hold
- 409num is now the number of objects a slab can hold
- 410Take away the space taken up by all the objects from wastage
- 411Take away the space taken up by the kmem_bufctl_t
- 412Wast-age has now been calculated as the left over space in the slab
H.1.3 Cache Shrinking
The call graph for kmem_cache_shrink() is shown in Figure
8.5. Two varieties of shrink functions are provided. kmem_cache_shrink()
removes all slabs from
slabs_free and returns the number of pages freed as a
result. __kmem_cache_shrink() frees all slabs from slabs_free and
then verifies that slabs_partial and slabs_full are empty. This
is important during cache destruction when it doesn't matter how many pages
are freed, just that the cache is empty.
H.1.3.1 Function: kmem_cache_shrink
Source: mm/slab.c
This function performs basic debugging checks and then acquires the cache
descriptor lock before freeing slabs. At one time, it also used to call
drain_cpu_caches() to free up objects on the per-cpu cache. It is
curious that this was removed as it is possible slabs could not be freed
due to an object been allocation on a per-cpu cache but not in use.
966 int kmem_cache_shrink(kmem_cache_t *cachep)
967 {
968 int ret;
969
970 if (!cachep || in_interrupt() ||
!is_chained_kmem_cache(cachep))
971 BUG();
972
973 spin_lock_irq(&cachep->spinlock);
974 ret = __kmem_cache_shrink_locked(cachep);
975 spin_unlock_irq(&cachep->spinlock);
976
977 return ret << cachep->gfporder;
978 }
- 966The parameter is the cache been shrunk
- 970Check that
-
The cache pointer is not NULL
- That an interrupt is not the caller
- That the cache is on the cache chain and not a bad pointer
- 973Acquire the cache descriptor lock and disable interrupts
- 974Shrink the cache
- 975Release the cache lock and enable interrupts
- 976This returns the number of pages freed but does not take into
account the objects freed by draining the CPU.
H.1.3.2 Function: __kmem_cache_shrink
Source: mm/slab.c
This function is identical to kmem_cache_shrink() except it returns
if the cache is empty or not. This is important during cache destruction
when it is not important how much memory was freed, just that it is safe to
delete the cache and not leak memory.
945 static int __kmem_cache_shrink(kmem_cache_t *cachep)
946 {
947 int ret;
948
949 drain_cpu_caches(cachep);
950
951 spin_lock_irq(&cachep->spinlock);
952 __kmem_cache_shrink_locked(cachep);
953 ret = !list_empty(&cachep->slabs_full) ||
954 !list_empty(&cachep->slabs_partial);
955 spin_unlock_irq(&cachep->spinlock);
956 return ret;
957 }
- 949Remove all objects from the per-CPU objects cache
- 951Acquire the cache descriptor lock and disable interrupts
- 952Free all slabs in the slabs_free list
- 954-954Check the slabs_partial and slabs_full lists
are empty
- 955Release the cache descriptor lock and re-enable interrupts
- 956Return if the cache has all its slabs free or not
H.1.3.3 Function: __kmem_cache_shrink_locked
Source: mm/slab.c
This does the dirty work of freeing slabs. It will keep destroying them until
the growing flag gets set, indicating the cache is in use or until there is no
more slabs in slabs_free.
917 static int __kmem_cache_shrink_locked(kmem_cache_t *cachep)
918 {
919 slab_t *slabp;
920 int ret = 0;
921
923 while (!cachep->growing) {
924 struct list_head *p;
925
926 p = cachep->slabs_free.prev;
927 if (p == &cachep->slabs_free)
928 break;
929
930 slabp = list_entry(cachep->slabs_free.prev,
slab_t, list);
931 #if DEBUG
932 if (slabp->inuse)
933 BUG();
934 #endif
935 list_del(&slabp->list);
936
937 spin_unlock_irq(&cachep->spinlock);
938 kmem_slab_destroy(cachep, slabp);
939 ret++;
940 spin_lock_irq(&cachep->spinlock);
941 }
942 return ret;
943 }
- 923While the cache is not growing, free slabs
- 926-930Get the last slab on the slabs_free list
- 932-933If debugging is available, make sure it is not in use. If it is
not in use, it should not be on the slabs_free list in the first place
- 935Remove the slab from the list
- 937Re-enable interrupts. This function is called with interrupts
disabled and this is to free the interrupt as quickly as possible.
- 938Delete the slab with kmem_slab_destroy()
(See Section H.2.3.1)
- 939Record the number of slabs freed
- 940Acquire the cache descriptor lock and disable interrupts
H.1.4 Cache Destroying
When a module is unloaded, it is responsible for destroying any cache is has
created as during module loading, it is ensured there is not two caches of
the same name. Core kernel code often does not destroy its caches as their
existence persists for the life of the system. The steps taken to destroy
a cache are
-
Delete the cache from the cache chain
- Shrink the cache to delete all slabs (see Section 8.1.8)
- Free any per CPU caches (kfree())
- Delete the cache descriptor from the cache_cache (see Section:
8.3.3)
H.1.4.1 Function: kmem_cache_destroy
Source: mm/slab.c
The call graph for this function is shown in Figure 8.7.
997 int kmem_cache_destroy (kmem_cache_t * cachep)
998 {
999 if (!cachep || in_interrupt() || cachep->growing)
1000 BUG();
1001
1002 /* Find the cache in the chain of caches. */
1003 down(&cache_chain_sem);
1004 /* the chain is never empty, cache_cache is never destroyed */
1005 if (clock_searchp == cachep)
1006 clock_searchp = list_entry(cachep->next.next,
1007 kmem_cache_t, next);
1008 list_del(&cachep->next);
1009 up(&cache_chain_sem);
1010
1011 if (__kmem_cache_shrink(cachep)) {
1012 printk(KERN_ERR
"kmem_cache_destroy: Can't free all objects %p\n",
1013 cachep);
1014 down(&cache_chain_sem);
1015 list_add(&cachep->next,&cache_chain);
1016 up(&cache_chain_sem);
1017 return 1;
1018 }
1019 #ifdef CONFIG_SMP
1020 {
1021 int i;
1022 for (i = 0; i < NR_CPUS; i++)
1023 kfree(cachep->cpudata[i]);
1024 }
1025 #endif
1026 kmem_cache_free(&cache_cache, cachep);
1027
1028 return 0;
1029 }
- 999-1000Sanity check. Make sure the cachep is not null,
that an interrupt is not trying to do this and that the cache has not been
marked as growing, indicating it is in use
- 1003Acquire the semaphore for accessing the cache chain
- 1005-1007Acquire the list entry from the cache chain
- 1008Delete this cache from the cache chain
- 1009Release the cache chain semaphore
- 1011Shrink the cache to free all slabs with
__kmem_cache_shrink() (See Section H.1.3.2)
- 1012-1017The shrink function returns true if there is still slabs
in the cache. If there is, the cache cannot be destroyed so it is added back
into the cache chain and the error reported
- 1022-1023If SMP is enabled, the per-cpu data structures are deleted
with kfree() (See Section H.4.3.1)
- 1026Delete the cache descriptor from the cache_cache with
kmem_cache_free() (See Section H.3.3.1)
H.1.5 Cache Reaping
H.1.5.1 Function: kmem_cache_reap
Source: mm/slab.c
The call graph for this function is shown in Figure 8.4.
Because of the size of this function, it will be broken up into three separate
sections. The first is simple function preamble. The second is the selection
of a cache to reap and the third is the freeing of the slabs. The basic tasks
were described in Section 8.1.7.
1738 int kmem_cache_reap (int gfp_mask)
1739 {
1740 slab_t *slabp;
1741 kmem_cache_t *searchp;
1742 kmem_cache_t *best_cachep;
1743 unsigned int best_pages;
1744 unsigned int best_len;
1745 unsigned int scan;
1746 int ret = 0;
1747
1748 if (gfp_mask & __GFP_WAIT)
1749 down(&cache_chain_sem);
1750 else
1751 if (down_trylock(&cache_chain_sem))
1752 return 0;
1753
1754 scan = REAP_SCANLEN;
1755 best_len = 0;
1756 best_pages = 0;
1757 best_cachep = NULL;
1758 searchp = clock_searchp;
- 1738The only parameter is the GFP flag. The only check made is against
the __GFP_WAIT flag. As the only caller, kswapd,
can sleep, this parameter is virtually worthless
- 1748-1749Can the caller sleep? If yes, then acquire the semaphore
- 1751-1752Else, try and acquire the semaphore and if not available,
return
- 1754REAP_SCANLEN (10) is the number of caches to examine.
- 1758Set searchp to be the last cache that was examined at
the last reap
1759 do {
1760 unsigned int pages;
1761 struct list_head* p;
1762 unsigned int full_free;
1763
1765 if (searchp->flags & SLAB_NO_REAP)
1766 goto next;
1767 spin_lock_irq(&searchp->spinlock);
1768 if (searchp->growing)
1769 goto next_unlock;
1770 if (searchp->dflags & DFLGS_GROWN) {
1771 searchp->dflags &= ~DFLGS_GROWN;
1772 goto next_unlock;
1773 }
1774 #ifdef CONFIG_SMP
1775 {
1776 cpucache_t *cc = cc_data(searchp);
1777 if (cc && cc->avail) {
1778 __free_block(searchp, cc_entry(cc),
cc->avail);
1779 cc->avail = 0;
1780 }
1781 }
1782 #endif
1783
1784 full_free = 0;
1785 p = searchp->slabs_free.next;
1786 while (p != &searchp->slabs_free) {
1787 slabp = list_entry(p, slab_t, list);
1788 #if DEBUG
1789 if (slabp->inuse)
1790 BUG();
1791 #endif
1792 full_free++;
1793 p = p->next;
1794 }
1795
1801 pages = full_free * (1<<searchp->gfporder);
1802 if (searchp->ctor)
1803 pages = (pages*4+1)/5;
1804 if (searchp->gfporder)
1805 pages = (pages*4+1)/5;
1806 if (pages > best_pages) {
1807 best_cachep = searchp;
1808 best_len = full_free;
1809 best_pages = pages;
1810 if (pages >= REAP_PERFECT) {
1811 clock_searchp =
list_entry(searchp->next.next,
1812 kmem_cache_t,next);
1813 goto perfect;
1814 }
1815 }
1816 next_unlock:
1817 spin_unlock_irq(&searchp->spinlock);
1818 next:
1819 searchp =
list_entry(searchp->next.next,kmem_cache_t,next);
1820 } while (--scan && searchp != clock_searchp);
This block examines REAP_SCANLEN number of caches to select one
to free
- 1767Acquire an interrupt safe lock to the cache descriptor
- 1768-1769If the cache is growing, skip it
- 1770-1773If the cache has grown recently, skip it and clear the flag
- 1775-1781Free any per CPU objects to the global pool
- 1786-1794Count the number of slabs in the slabs_free list
- 1801Calculate the number of pages all the slabs hold
- 1802-1803If the objects have constructors, reduce the page count by one
fifth to make it less likely to be selected for reaping
- 1804-1805If the slabs consist of more than one page, reduce the page
count by one fifth. This is because high order pages are hard to acquire
- 1806If this is the best candidate found for reaping so far, check if
it is perfect for reaping
- 1807-1809Record the new maximums
- 1808best_len is recorded so that it is easy to know how many slabs is
half of the slabs in the free list
- 1810If this cache is perfect for reaping then
- 1811Update clock_searchp
- 1812Goto perfect where half the slabs will be freed
- 1816This label is reached if it was found the cache was growing after
acquiring the lock
- 1817Release the cache descriptor lock
- 1818Move to the next entry in the cache chain
- 1820Scan while REAP_SCANLEN has not been reached and we
have not cycled around the whole cache chain
1822 clock_searchp = searchp;
1823
1824 if (!best_cachep)
1826 goto out;
1827
1828 spin_lock_irq(&best_cachep->spinlock);
1829 perfect:
1830 /* free only 50% of the free slabs */
1831 best_len = (best_len + 1)/2;
1832 for (scan = 0; scan < best_len; scan++) {
1833 struct list_head *p;
1834
1835 if (best_cachep->growing)
1836 break;
1837 p = best_cachep->slabs_free.prev;
1838 if (p == &best_cachep->slabs_free)
1839 break;
1840 slabp = list_entry(p,slab_t,list);
1841 #if DEBUG
1842 if (slabp->inuse)
1843 BUG();
1844 #endif
1845 list_del(&slabp->list);
1846 STATS_INC_REAPED(best_cachep);
1847
1848 /* Safe to drop the lock. The slab is no longer
1849 * lined to the cache.
1850 */
1851 spin_unlock_irq(&best_cachep->spinlock);
1852 kmem_slab_destroy(best_cachep, slabp);
1853 spin_lock_irq(&best_cachep->spinlock);
1854 }
1855 spin_unlock_irq(&best_cachep->spinlock);
1856 ret = scan * (1 << best_cachep->gfporder);
1857 out:
1858 up(&cache_chain_sem);
1859 return ret;
1860 }
This block will free half of the slabs from the selected cache
- 1822Update clock_searchp for the next cache reap
- 1824-1826If a cache was not found, goto out to free the cache chain
and exit
- 1828Acquire the cache chain spinlock and disable interrupts. The
cachep descriptor has to be held by an interrupt safe lock as some
caches may be used from interrupt context. The slab allocator has no way to
differentiate between interrupt safe and unsafe caches
- 1831Adjust best_len to be the number of slabs to free
- 1832-1854Free best_len number of slabs
- 1835-1847If the cache is growing, exit
- 1837Get a slab from the list
- 1838-1839If there is no slabs left in the list, exit
- 1840Get the slab pointer
- 1842-1843If debugging is enabled, make sure there is no active objects
in the slab
- 1845Remove the slab from the slabs_free list
- 1846Update statistics if enabled
- 1851Free the cache descriptor and enable interrupts
- 1852Destroy the slab. See Section 8.2.8
- 1851Re-acquire the cache descriptor spinlock and disable interrupts
- 1855Free the cache descriptor and enable interrupts
- 1856ret is the number of pages that was freed
- 1858-1859Free the cache semaphore and return the number of pages
freed
H.2.1 Storing the Slab Descriptor
H.2.1.1 Function: kmem_cache_slabmgmt
Source: mm/slab.c
This function will either allocate allocate space to keep the slab descriptor
off cache or reserve enough space at the beginning of the slab for the
descriptor and the bufctls.
1032 static inline slab_t * kmem_cache_slabmgmt (
kmem_cache_t *cachep,
1033 void *objp,
int colour_off,
int local_flags)
1034 {
1035 slab_t *slabp;
1036
1037 if (OFF_SLAB(cachep)) {
1039 slabp = kmem_cache_alloc(cachep->slabp_cache,
local_flags);
1040 if (!slabp)
1041 return NULL;
1042 } else {
1047 slabp = objp+colour_off;
1048 colour_off += L1_CACHE_ALIGN(cachep->num *
1049 sizeof(kmem_bufctl_t) +
sizeof(slab_t));
1050 }
1051 slabp->inuse = 0;
1052 slabp->colouroff = colour_off;
1053 slabp->s_mem = objp+colour_off;
1054
1055 return slabp;
1056 }
- 1032 The parameters of the function are
-
- cachep The cache the slab is to be allocated to
- objp When the function is called, this points to the beginning of the slab
- colour_off The colour offset for this slab
- local_flags These are the flags for the cache
- 1037-1042 If the slab descriptor is kept off cache....
- 1039 Allocate memory from the sizes cache. During cache creation,
slabp_cache is set to the appropriate size cache to allocate from.
- 1040 If the allocation failed, return
- 1042-1050 Reserve space at the beginning of the slab
- 1047 The address of the slab will be the beginning of the slab
(objp) plus the colour offset
- 1048 colour_off is calculated to be the offset where the first
object will be placed. The address is L1 cache aligned. cachep->num *
sizeof(kmem_bufctl_t) is the amount of space needed to hold the bufctls for
each object in the slab and sizeof(slab_t) is the size of the slab
descriptor. This effectively has reserved the space at the beginning of the
slab
- 1051The number of objects in use on the slab is 0
- 1052The colouroff is updated for placement of the new object
- 1053The address of the first object is calculated as the address of
the beginning of the slab plus the offset
H.2.1.2 Function: kmem_find_general_cachep
Source: mm/slab.c
If the slab descriptor is to be kept off-slab, this function, called during
cache creation will find the appropriate sizes cache to use and will be
stored within the cache descriptor in the field slabp_cache.
1620 kmem_cache_t * kmem_find_general_cachep (size_t size,
int gfpflags)
1621 {
1622 cache_sizes_t *csizep = cache_sizes;
1623
1628 for ( ; csizep->cs_size; csizep++) {
1629 if (size > csizep->cs_size)
1630 continue;
1631 break;
1632 }
1633 return (gfpflags & GFP_DMA) ? csizep->cs_dmacachep :
csizep->cs_cachep;
1634 }
- 1620 size is the size of the slab descriptor.
gfpflags is always 0 as DMA memory is not needed for a slab
descriptor
- 1628-1632 Starting with the smallest size, keep increasing the size
until a cache is found with buffers large enough to store the slab descriptor
- 1633 Return either a normal or DMA sized cache depending on the
gfpflags passed in. In reality, only the cs_cachep
is ever passed back
H.2.2 Slab Creation
H.2.2.1 Function: kmem_cache_grow
Source: mm/slab.c
The call graph for this function is shown in 8.11. The
basic tasks for this function are;
-
Perform basic sanity checks to guard against bad usage
- Calculate colour offset for objects in this slab
- Allocate memory for slab and acquire a slab descriptor
- Link the pages used for the slab to the slab and cache descriptors
- Initialise objects in the slab
- Add the slab to the cache
1105 static int kmem_cache_grow (kmem_cache_t * cachep, int flags)
1106 {
1107 slab_t *slabp;
1108 struct page *page;
1109 void *objp;
1110 size_t offset;
1111 unsigned int i, local_flags;
1112 unsigned long ctor_flags;
1113 unsigned long save_flags;
Basic declarations. The parameters of the function are
-
- cachep The cache to allocate a new slab to
- flags The flags for a slab creation
1118 if (flags & ~(SLAB_DMA|SLAB_LEVEL_MASK|SLAB_NO_GROW))
1119 BUG();
1120 if (flags & SLAB_NO_GROW)
1121 return 0;
1122
1129 if (in_interrupt() &&
(flags & SLAB_LEVEL_MASK) != SLAB_ATOMIC)
1130 BUG();
1131
1132 ctor_flags = SLAB_CTOR_CONSTRUCTOR;
1133 local_flags = (flags & SLAB_LEVEL_MASK);
1134 if (local_flags == SLAB_ATOMIC)
1139 ctor_flags |= SLAB_CTOR_ATOMIC;
Perform basic sanity checks to guard against bad usage. The checks are made
here rather than kmem_cache_alloc() to protect the speed-critical
path. There is no point checking the flags every time an object needs to
be allocated.
- 1118-1119Make sure only allowable flags are used for allocation
- 1120-1121Do not grow the cache if this is set. In reality, it is never
set
- 1129-1130If this called within interrupt context, make
sure the ATOMIC flag is set so we don't sleep when
kmem_getpages()(See Section H.7.0.3) is called
- 1132This flag tells the constructor it is to init the object
- 1133The local_flags are just those relevant to the page allocator
- 1134-1139If the SLAB_ATOMIC flag is set, the constructor
needs to know about it in case it wants to make new allocations
1142 spin_lock_irqsave(&cachep->spinlock, save_flags);
1143
1145 offset = cachep->colour_next;
1146 cachep->colour_next++;
1147 if (cachep->colour_next >= cachep->colour)
1148 cachep->colour_next = 0;
1149 offset *= cachep->colour_off;
1150 cachep->dflags |= DFLGS_GROWN;
1151
1152 cachep->growing++;
1153 spin_unlock_irqrestore(&cachep->spinlock, save_flags);
Calculate colour offset for objects in this slab
- 1142Acquire an interrupt safe lock for accessing the cache descriptor
- 1145Get the offset for objects in this slab
- 1146Move to the next colour offset
- 1147-1148If colour has been reached, there is no more offsets
available, so reset colour_next to 0
- 1149colour_off is the size of each offset, so offset
* colour_off will give how many bytes to offset the objects to
- 1150Mark the cache that it is growing so that
kmem_cache_reap() (See Section H.1.5.1) will ignore
this cache
- 1152Increase the count for callers growing this cache
- 1153Free the spinlock and re-enable interrupts
1165 if (!(objp = kmem_getpages(cachep, flags)))
1166 goto failed;
1167
1169 if (!(slabp = kmem_cache_slabmgmt(cachep,
objp, offset,
local_flags)))
1160 goto opps1;
Allocate memory for slab and acquire a slab descriptor
- 1165-1166Allocate pages from the page allocator for the slab with
kmem_getpages() (See Section H.7.0.3)
- 1169Acquire a slab descriptor with kmem_cache_slabmgmt()
(See Section H.2.1.1)
1173 i = 1 << cachep->gfporder;
1174 page = virt_to_page(objp);
1175 do {
1176 SET_PAGE_CACHE(page, cachep);
1177 SET_PAGE_SLAB(page, slabp);
1178 PageSetSlab(page);
1179 page++;
1180 } while (--i);
Link the pages for the slab used to the slab and cache descriptors
- 1173i is the number of pages used for the slab. Each page has
to be linked to the slab and cache descriptors.
- 1174objp is a pointer to the beginning of the slab. The
macro virt_to_page() will give the struct page
for that address
- 1175-1180Link each pages list field to the slab and cache descriptors
- 1176SET_PAGE_CACHE() links the page to the cache
descriptor using the page→list.next field
- 1178SET_PAGE_SLAB() links the page to the slab
descriptor using the page→list.prev field
- 1178Set the PG_slab page flag. The full set of
PG_ flags is listed in Table 2.1
- 1179Move to the next page for this slab to be linked
1182 kmem_cache_init_objs(cachep, slabp, ctor_flags);
- 1182Initialise all objects (See Section H.3.1.1)
1184 spin_lock_irqsave(&cachep->spinlock, save_flags);
1185 cachep->growing--;
1186
1188 list_add_tail(&slabp->list, &cachep->slabs_free);
1189 STATS_INC_GROWN(cachep);
1190 cachep->failures = 0;
1191
1192 spin_unlock_irqrestore(&cachep->spinlock, save_flags);
1193 return 1;
Add the slab to the cache
- 1184Acquire the cache descriptor spinlock in an interrupt safe fashion
- 1185Decrease the growing count
- 1188Add the slab to the end of the slabs_free list
- 1189If STATS is set, increase the
cachep→grown field STATS_INC_GROWN()
- 1190Set failures to 0. This field is never used elsewhere
- 1192Unlock the spinlock in an interrupt safe fashion
- 1193Return success
1194 opps1:
1195 kmem_freepages(cachep, objp);
1196 failed:
1197 spin_lock_irqsave(&cachep->spinlock, save_flags);
1198 cachep->growing--;
1199 spin_unlock_irqrestore(&cachep->spinlock, save_flags);
1300 return 0;
1301 }
Error handling
- 1194-1195opps1 is reached if the pages for the slab were
allocated. They must be freed
- 1197Acquire the spinlock for accessing the cache descriptor
- 1198Reduce the growing count
- 1199Release the spinlock
- 1300Return failure
H.2.3 Slab Destroying
H.2.3.1 Function: kmem_slab_destroy
Source: mm/slab.c
The call graph for this function is shown at Figure
8.13. For reability, the debugging sections has
been omitted from this function but they are almost identical to the debugging
section during object allocation. See Section H.3.1.1
for how the markers and poison pattern are checked.
555 static void kmem_slab_destroy (kmem_cache_t *cachep, slab_t *slabp)
556 {
557 if (cachep->dtor
561 ) {
562 int i;
563 for (i = 0; i < cachep->num; i++) {
564 void* objp = slabp->s_mem+cachep->objsize*i;
565-574 DEBUG: Check red zone markers
575 if (cachep->dtor)
576 (cachep->dtor)(objp, cachep, 0);
577-584 DEBUG: Check poison pattern
585 }
586 }
587
588 kmem_freepages(cachep, slabp->s_mem-slabp->colouroff);
589 if (OFF_SLAB(cachep))
590 kmem_cache_free(cachep->slabp_cache, slabp);
591 }
- 557-586If a destructor is available, call it for each object in the
slab
- 563-585Cycle through each object in the slab
- 564Calculate the address of the object to destroy
- 575-576Call the destructor
- 588Free the pages been used for the slab
- 589If the slab descriptor is been kept off-slab, then free the memory
been used for it
H.3 Objects
This section will cover how objects are managed. At this point, most
of the real hard work has been completed by either the cache or slab managers.
H.3.1 Initialising Objects in a Slab
H.3.1.1 Function: kmem_cache_init_objs
Source: mm/slab.c
The vast part of this function is involved with debugging so we will start
with the function without the debugging and explain that in detail before
handling the debugging part. The two sections that are debugging are marked
in the code excerpt below as Part 1 and Part 2.
1058 static inline void kmem_cache_init_objs (kmem_cache_t * cachep,
1059 slab_t * slabp, unsigned long ctor_flags)
1060 {
1061 int i;
1062
1063 for (i = 0; i < cachep->num; i++) {
1064 void* objp = slabp->s_mem+cachep->objsize*i;
1065-1072 /* Debugging Part 1 */
1079 if (cachep->ctor)
1080 cachep->ctor(objp, cachep, ctor_flags);
1081-1094 /* Debugging Part 2 */
1095 slab_bufctl(slabp)[i] = i+1;
1096 }
1097 slab_bufctl(slabp)[i-1] = BUFCTL_END;
1098 slabp->free = 0;
1099 }
- 1058The parameters of the function are
-
- cachepThe cache the objects are been initialised for
- slabpThe slab the objects are in
- ctor_flagsFlags the constructor needs whether this is an atomic
allocation or not
- 1063Initialise cache→num number of objects
- 1064The base address for objects in the slab is s_mem. The
address of the object to allocate is then i * (size of a single
object)
- 1079-1080If a constructor is available, call it
- 1095The macro slab_bufctl() casts slabp to
a slab_t slab descriptor and adds one to it. This brings
the pointer to the end of the slab descriptor and then casts it back to
a kmem_bufctl_t effectively giving the beginning of the
bufctl array.
- 1098The index of the first free object is 0 in the bufctl
array
That covers the core of initialising objects. Next the first debugging part
will be covered
1065 #if DEBUG
1066 if (cachep->flags & SLAB_RED_ZONE) {
1067 *((unsigned long*)(objp)) = RED_MAGIC1;
1068 *((unsigned long*)(objp + cachep->objsize -
1069 BYTES_PER_WORD)) = RED_MAGIC1;
1070 objp += BYTES_PER_WORD;
1071 }
1072 #endif
- 1066If the cache is to be red zones then place a marker at either
end of the object
- 1067Place the marker at the beginning of the object
- 1068Place the marker at the end of the object. Remember that the
size of the object takes into account the size of the red markers when red
zoning is enabled
- 1070Increase the objp pointer by the size of the marker for the
benefit of the constructor which is called after this debugging block
1081 #if DEBUG
1082 if (cachep->flags & SLAB_RED_ZONE)
1083 objp -= BYTES_PER_WORD;
1084 if (cachep->flags & SLAB_POISON)
1086 kmem_poison_obj(cachep, objp);
1087 if (cachep->flags & SLAB_RED_ZONE) {
1088 if (*((unsigned long*)(objp)) != RED_MAGIC1)
1089 BUG();
1090 if (*((unsigned long*)(objp + cachep->objsize -
1091 BYTES_PER_WORD)) != RED_MAGIC1)
1092 BUG();
1093 }
1094 #endif
This is the debugging block that takes place after the constructor, if it
exists, has been called.
- 1082-1083The objp pointer was increased by the size of
the red marker in the previous debugging block so move it back again
- 1084-1086If there was no constructor, poison the object with a known
pattern that can be examined later to trap uninitialised writes
- 1088Check to make sure the red marker at the beginning of the object
was preserved to trap writes before the object
- 1090-1091Check to make sure writes didn't take place past the end
of the object
H.3.2 Object Allocation
H.3.2.1 Function: kmem_cache_alloc
Source: mm/slab.c
The call graph for this function is shown in Figure
8.14. This trivial function simply calls __kmem_cache_alloc().
1529 void * kmem_cache_alloc (kmem_cache_t *cachep, int flags)
1531 {
1532 return __kmem_cache_alloc(cachep, flags);
1533 }
H.3.2.2 Function: __kmem_cache_alloc (UP Case)
Source: mm/slab.c
This will take the parts of the function specific to the UP case. The SMP case
will be dealt with in the next section.
1338 static inline void * __kmem_cache_alloc (kmem_cache_t *cachep,
int flags)
1339 {
1340 unsigned long save_flags;
1341 void* objp;
1342
1343 kmem_cache_alloc_head(cachep, flags);
1344 try_again:
1345 local_irq_save(save_flags);
1367 objp = kmem_cache_alloc_one(cachep);
1369 local_irq_restore(save_flags);
1370 return objp;
1371 alloc_new_slab:
1376 local_irq_restore(save_flags);
1377 if (kmem_cache_grow(cachep, flags))
1381 goto try_again;
1382 return NULL;
1383 }
- 1338The parameters are the cache to allocate from and allocation
specific flags
- 1343This function makes sure the appropriate combination of DMA flags
are in use
- 1345Disable interrupts and save the flags. This function is used by
interrupts so this is the only way to provide synchronisation in the UP case
- 1367kmem_cache_alloc_one() (see Section H.3.2.5)
allocates an object from one of the lists and returns
it. If no objects are free, this macro (note it isn't a function) will goto
alloc_new_slab at the end of this function
- 1369-1370Restore interrupts and return
- 1376At this label, no objects were free in slabs_partial
and slabs_free is empty so a new slab is needed
- 1377Allocate a new slab (see Section 8.2.2)
- 1379A new slab is available so try again
- 1382No slabs could be allocated so return failure
H.3.2.3 Function: __kmem_cache_alloc (SMP Case)
Source: mm/slab.c
This is what the function looks like in the SMP case
1338 static inline void * __kmem_cache_alloc (kmem_cache_t *cachep,
int flags)
1339 {
1340 unsigned long save_flags;
1341 void* objp;
1342
1343 kmem_cache_alloc_head(cachep, flags);
1344 try_again:
1345 local_irq_save(save_flags);
1347 {
1348 cpucache_t *cc = cc_data(cachep);
1349
1350 if (cc) {
1351 if (cc->avail) {
1352 STATS_INC_ALLOCHIT(cachep);
1353 objp = cc_entry(cc)[--cc->avail];
1354 } else {
1355 STATS_INC_ALLOCMISS(cachep);
1356 objp =
kmem_cache_alloc_batch(cachep,cc,flags);
1357 if (!objp)
1358 goto alloc_new_slab_nolock;
1359 }
1360 } else {
1361 spin_lock(&cachep->spinlock);
1362 objp = kmem_cache_alloc_one(cachep);
1363 spin_unlock(&cachep->spinlock);
1364 }
1365 }
1366 local_irq_restore(save_flags);
1370 return objp;
1371 alloc_new_slab:
1373 spin_unlock(&cachep->spinlock);
1374 alloc_new_slab_nolock:
1375 local_irq_restore(save_flags);
1377 if (kmem_cache_grow(cachep, flags))
1381 goto try_again;
1382 return NULL;
1383 }
- 1338-1347Same as UP case
- 1349Obtain the per CPU data for this cpu
- 1350-1360If a per CPU cache is available then ....
- 1351If there is an object available then ....
- 1352Update statistics for this cache if enabled
- 1353Get an object and update the avail figure
- 1354Else an object is not available so ....
- 1355Update statistics for this cache if enabled
- 1356Allocate batchcount number of objects, place all but one
of them in the per CPU cache and return the last one to objp
- 1357-1358The allocation failed, so goto
alloc_new_slab_nolock to grow the cache and allocate a new slab
- 1360-1364If a per CPU cache is not available, take out the cache
spinlock and allocate one object in the same way the UP case does. This is
the case during the initialisation for the cache_cache for example
- 1363Object was successfully assigned, release cache spinlock
- 1366-1370Re-enable interrupts and return the allocated object
- 1371-1372If kmem_cache_alloc_one() failed to allocate an
object, it will goto here with the spinlock still held so it must be released
- 1375-1383Same as the UP case
H.3.2.4 Function: kmem_cache_alloc_head
Source: mm/slab.c
This simple function ensures the right combination of slab and GFP flags are
used for allocation from a slab. If a cache is for DMA use, this function will
make sure the caller does not accidently request normal memory and vice versa
1231 static inline void kmem_cache_alloc_head(kmem_cache_t *cachep,
int flags)
1232 {
1233 if (flags & SLAB_DMA) {
1234 if (!(cachep->gfpflags & GFP_DMA))
1235 BUG();
1236 } else {
1237 if (cachep->gfpflags & GFP_DMA)
1238 BUG();
1239 }
1240 }
- 1231The parameters are the cache we are allocating from and the flags
requested for the allocation
- 1233If the caller has requested memory for DMA use and ....
- 1234The cache is not using DMA memory then BUG()
- 1237Else if the caller has not requested DMA memory and this cache is
for DMA use, BUG()
H.3.2.5 Function: kmem_cache_alloc_one
Source: mm/slab.c
This is a preprocessor macro. It may seem strange to not make this an
inline function but it is a preprocessor macro for a goto optimisation in
__kmem_cache_alloc() (see Section H.3.2.2)
1283 #define kmem_cache_alloc_one(cachep) \
1284 ({ \
1285 struct list_head * slabs_partial, * entry; \
1286 slab_t *slabp; \
1287 \
1288 slabs_partial = &(cachep)->slabs_partial; \
1289 entry = slabs_partial->next; \
1290 if (unlikely(entry == slabs_partial)) { \
1291 struct list_head * slabs_free; \
1292 slabs_free = &(cachep)->slabs_free; \
1293 entry = slabs_free->next; \
1294 if (unlikely(entry == slabs_free)) \
1295 goto alloc_new_slab; \
1296 list_del(entry); \
1297 list_add(entry, slabs_partial); \
1298 } \
1299 \
1300 slabp = list_entry(entry, slab_t, list); \
1301 kmem_cache_alloc_one_tail(cachep, slabp); \
1302 })
- 1288-1289Get the first slab from the slabs_partial list
- 1290-1298If a slab is not available from this list, execute this block
- 1291-1293Get the first slab from the slabs_free list
- 1294-1295If there is no slabs on slabs_free,
then goto alloc_new_slab(). This goto label is in
__kmem_cache_alloc() and it is will grow the cache by one slab
- 1296-1297Else remove the slab from the free list and place it on
the slabs_partial list because an object is about to be removed
from it
- 1300Obtain the slab from the list
- 1301Allocate one object from the slab
H.3.2.6 Function: kmem_cache_alloc_one_tail
Source: mm/slab.c
This function is responsible for the allocation of one object from a slab.
Much of it is debugging code.
1242 static inline void * kmem_cache_alloc_one_tail (
kmem_cache_t *cachep,
1243 slab_t *slabp)
1244 {
1245 void *objp;
1246
1247 STATS_INC_ALLOCED(cachep);
1248 STATS_INC_ACTIVE(cachep);
1249 STATS_SET_HIGH(cachep);
1250
1252 slabp->inuse++;
1253 objp = slabp->s_mem + slabp->free*cachep->objsize;
1254 slabp->free=slab_bufctl(slabp)[slabp->free];
1255
1256 if (unlikely(slabp->free == BUFCTL_END)) {
1257 list_del(&slabp->list);
1258 list_add(&slabp->list, &cachep->slabs_full);
1259 }
1260 #if DEBUG
1261 if (cachep->flags & SLAB_POISON)
1262 if (kmem_check_poison_obj(cachep, objp))
1263 BUG();
1264 if (cachep->flags & SLAB_RED_ZONE) {
1266 if (xchg((unsigned long *)objp, RED_MAGIC2) !=
1267 RED_MAGIC1)
1268 BUG();
1269 if (xchg((unsigned long *)(objp+cachep->objsize -
1270 BYTES_PER_WORD), RED_MAGIC2) != RED_MAGIC1)
1271 BUG();
1272 objp += BYTES_PER_WORD;
1273 }
1274 #endif
1275 return objp;
1276 }
- 1230The parameters are the cache and slab been allocated from
- 1247-1249If stats are enabled, this will set three statistics.
ALLOCED is the total number of objects that have been
allocated. ACTIVE is the number of active objects in the cache.
HIGH is the maximum number of objects that were active as a
single time
- 1252inuse is the number of objects active on this slab
- 1253Get a pointer to a free object. s_mem is a pointer to
the first object on the slab. free is an index of a free object in
the slab. index * object size gives an offset within the slab
- 1254This updates the free pointer to be an index of the next free
object
- 1256-1259If the slab is full, remove it from the
slabs_partial list and place it on the slabs_full.
- 1260-1274Debugging code
- 1275Without debugging, the object is returned to the caller
- 1261-1263If the object was poisoned with a known pattern, check it
to guard against uninitialised access
- 1266-1267If red zoning was enabled, check the marker at the beginning
of the object and confirm it is safe. Change the red marker to check for
writes before the object later
- 1269-1271Check the marker at the end of the object and change it to
check for writes after the object later
- 1272Update the object pointer to point to after the red marker
- 1275Return the object
H.3.2.7 Function: kmem_cache_alloc_batch
Source: mm/slab.c
This function allocate a batch of objects to a CPU cache of
objects. It is only used in the SMP case. In many ways it is very
similar kmem_cache_alloc_one()(See Section H.3.2.5).
1305 void* kmem_cache_alloc_batch(kmem_cache_t* cachep,
cpucache_t* cc, int flags)
1306 {
1307 int batchcount = cachep->batchcount;
1308
1309 spin_lock(&cachep->spinlock);
1310 while (batchcount--) {
1311 struct list_head * slabs_partial, * entry;
1312 slab_t *slabp;
1313 /* Get slab alloc is to come from. */
1314 slabs_partial = &(cachep)->slabs_partial;
1315 entry = slabs_partial->next;
1316 if (unlikely(entry == slabs_partial)) {
1317 struct list_head * slabs_free;
1318 slabs_free = &(cachep)->slabs_free;
1319 entry = slabs_free->next;
1320 if (unlikely(entry == slabs_free))
1321 break;
1322 list_del(entry);
1323 list_add(entry, slabs_partial);
1324 }
1325
1326 slabp = list_entry(entry, slab_t, list);
1327 cc_entry(cc)[cc->avail++] =
1328 kmem_cache_alloc_one_tail(cachep, slabp);
1329 }
1330 spin_unlock(&cachep->spinlock);
1331
1332 if (cc->avail)
1333 return cc_entry(cc)[--cc->avail];
1334 return NULL;
1335 }
- 1305The parameters are the cache to allocate from, the per CPU cache
to fill and allocation flags
- 1307batchcount is the number of objects to allocate
- 1309Obtain the spinlock for access to the cache descriptor
- 1310-1329Loop batchcount times
- 1311-1324This is example the same as kmem_cache_alloc_one()(See Section H.3.2.5).
It selects a slab from either
slabs_partial or slabs_free to allocate from. If
none are available, break out of the loop
- 1326-1327Call kmem_cache_alloc_one_tail()
(See Section H.3.2.6) and place it in the per CPU cache
- 1330Release the cache descriptor lock
- 1332-1333Take one of the objects allocated in this batch and return it
- 1334If no object was allocated, return.
__kmem_cache_alloc() (See Section H.3.2.2) will grow the cache by one slab and try again
H.3.3 Object Freeing
H.3.3.1 Function: kmem_cache_free
Source: mm/slab.c
The call graph for this function is shown in Figure 8.15.
1576 void kmem_cache_free (kmem_cache_t *cachep, void *objp)
1577 {
1578 unsigned long flags;
1579 #if DEBUG
1580 CHECK_PAGE(virt_to_page(objp));
1581 if (cachep != GET_PAGE_CACHE(virt_to_page(objp)))
1582 BUG();
1583 #endif
1584
1585 local_irq_save(flags);
1586 __kmem_cache_free(cachep, objp);
1587 local_irq_restore(flags);
1588 }
- 1576The parameter is the cache the object is been freed from and the
object itself
- 1579-1583If debugging is enabled, the page will first be checked
with CHECK_PAGE() to make sure it is a slab page. Secondly the
page list will be examined to make sure it belongs to this cache (See Figure
8.8)
- 1585Interrupts are disabled to protect the path
- 1586__kmem_cache_free()
(See Section H.3.3.2) will free the object to the per-CPU
cache for the SMP case and to the global pool in the normal case
- 1587Re-enable interrupts
H.3.3.2 Function: __kmem_cache_free (UP Case)
Source: mm/slab.c
This covers what the function looks like in the UP case. Clearly, it simply
releases the object to the slab.
1493 static inline void __kmem_cache_free (kmem_cache_t *cachep,
void* objp)
1494 {
1517 kmem_cache_free_one(cachep, objp);
1519 }
H.3.3.3 Function: __kmem_cache_free (SMP Case)
Source: mm/slab.c
This case is slightly more interesting. In this case, the object is released
to the per-cpu cache if it is available.
1493 static inline void __kmem_cache_free (kmem_cache_t *cachep,
void* objp)
1494 {
1496 cpucache_t *cc = cc_data(cachep);
1497
1498 CHECK_PAGE(virt_to_page(objp));
1499 if (cc) {
1500 int batchcount;
1501 if (cc->avail < cc->limit) {
1502 STATS_INC_FREEHIT(cachep);
1503 cc_entry(cc)[cc->avail++] = objp;
1504 return;
1505 }
1506 STATS_INC_FREEMISS(cachep);
1507 batchcount = cachep->batchcount;
1508 cc->avail -= batchcount;
1509 free_block(cachep,
1510 &cc_entry(cc)[cc->avail],batchcount);
1511 cc_entry(cc)[cc->avail++] = objp;
1512 return;
1513 } else {
1514 free_block(cachep, &objp, 1);
1515 }
1519 }
- 1496Get the data for this per CPU cache (See Section 8.5.1)
- 1498Make sure the page is a slab page
- 1499-1513If a per-CPU cache is available, try to use it. This is not
always available. During cache destruction for instance, the per CPU caches
are already gone
- 1501-1505If the number of available in the per CPU cache is below
limit, then add the object to the free list and return
- 1506Update statistics if enabled
- 1507The pool has overflowed so batchcount number of objects is going
to be freed to the global pool
- 1508Update the number of available (avail) objects
- 1509-1510Free a block of objects to the global cache
- 1511Free the requested object and place it on the per CPU pool
- 1513If the per-CPU cache is not available, then free this object to
the global pool
H.3.3.4 Function: kmem_cache_free_one
Source: mm/slab.c
1414 static inline void kmem_cache_free_one(kmem_cache_t *cachep,
void *objp)
1415 {
1416 slab_t* slabp;
1417
1418 CHECK_PAGE(virt_to_page(objp));
1425 slabp = GET_PAGE_SLAB(virt_to_page(objp));
1426
1427 #if DEBUG
1428 if (cachep->flags & SLAB_DEBUG_INITIAL)
1433 cachep->ctor(objp, cachep,
SLAB_CTOR_CONSTRUCTOR|SLAB_CTOR_VERIFY);
1434
1435 if (cachep->flags & SLAB_RED_ZONE) {
1436 objp -= BYTES_PER_WORD;
1437 if (xchg((unsigned long *)objp, RED_MAGIC1) !=
RED_MAGIC2)
1438 BUG();
1440 if (xchg((unsigned long *)(objp+cachep->objsize -
1441 BYTES_PER_WORD), RED_MAGIC1) !=
RED_MAGIC2)
1443 BUG();
1444 }
1445 if (cachep->flags & SLAB_POISON)
1446 kmem_poison_obj(cachep, objp);
1447 if (kmem_extra_free_checks(cachep, slabp, objp))
1448 return;
1449 #endif
1450 {
1451 unsigned int objnr = (objp-slabp->s_mem)/cachep->objsize;
1452
1453 slab_bufctl(slabp)[objnr] = slabp->free;
1454 slabp->free = objnr;
1455 }
1456 STATS_DEC_ACTIVE(cachep);
1457
1459 {
1460 int inuse = slabp->inuse;
1461 if (unlikely(!--slabp->inuse)) {
1462 /* Was partial or full, now empty. */
1463 list_del(&slabp->list);
1464 list_add(&slabp->list, &cachep->slabs_free);
1465 } else if (unlikely(inuse == cachep->num)) {
1466 /* Was full. */
1467 list_del(&slabp->list);
1468 list_add(&slabp->list, &cachep->slabs_partial);
1469 }
1470 }
1471 }
- 1418Make sure the page is a slab page
- 1425Get the slab descriptor for the page
- 1427-1449Debugging material. Discussed at end of section
- 1451Calculate the index for the object been freed
- 1454As this object is now free, update the bufctl to reflect that
- 1456If statistics are enabled, disable the number of active objects in
the slab
- 1461-1464If inuse reaches 0, the slab is free and is moved to
the slabs_free list
- 1465-1468If the number in use equals the number of objects in a slab,
it is full so move it to the slabs_full list
- 1471End of function
- 1428-1433If SLAB_DEBUG_INITIAL is set, the constructor
is called to verify the object is in an initialised state
- 1435-1444Verify the red marks at either end of the object are still
there. This will check for writes beyond the boundaries of the object and for
double frees
- 1445-1446Poison the freed object with a known pattern
- 1447-1448This function will confirm the object is a part of this
slab and cache. It will then check the free list (bufctl) to make
sure this is not a double free
H.3.3.5 Function: free_block
Source: mm/slab.c
This function is only used in the SMP case when the per CPU cache gets too
full. It is used to free a batch of objects in bulk
1481 static void free_block (kmem_cache_t* cachep, void** objpp,
int len)
1482 {
1483 spin_lock(&cachep->spinlock);
1484 __free_block(cachep, objpp, len);
1485 spin_unlock(&cachep->spinlock);
1486 }
- 1481The parameters are;
-
- cachep The cache that objects are been freed from
- objpp Pointer to the first object to free
- len The number of objects to free
- 1483Acquire a lock to the cache descriptor
- 1486__free_block()(See Section H.3.3.6) performs the
actual task of freeing up each of the pages
- 1487Release the lock
H.3.3.6 Function: __free_block
Source: mm/slab.c
This function is responsible for freeing each of the objects in
the per-CPU array objpp.
1474 static inline void __free_block (kmem_cache_t* cachep,
1475 void** objpp, int len)
1476 {
1477 for ( ; len > 0; len--, objpp++)
1478 kmem_cache_free_one(cachep, *objpp);
1479 }
- 1474The parameters are the cachep the objects belong to,
the list of objects(objpp) and the number of objects to free
(len)
- 1477Loop len number of times
- 1478Free an object from the array
H.4 Sizes Cache
H.4.1 Initialising the Sizes Cache
H.4.1.1 Function: kmem_cache_sizes_init
Source: mm/slab.c
This function is responsible for creating pairs of caches for small memory
buffers suitable for either normal or DMA memory.
436 void __init kmem_cache_sizes_init(void)
437 {
438 cache_sizes_t *sizes = cache_sizes;
439 char name[20];
440
444 if (num_physpages > (32 << 20) >> PAGE_SHIFT)
445 slab_break_gfp_order = BREAK_GFP_ORDER_HI;
446 do {
452 snprintf(name, sizeof(name), "size-%Zd",
sizes->cs_size);
453 if (!(sizes->cs_cachep =
454 kmem_cache_create(name, sizes->cs_size,
455 0, SLAB_HWCACHE_ALIGN, NULL, NULL))) {
456 BUG();
457 }
458
460 if (!(OFF_SLAB(sizes->cs_cachep))) {
461 offslab_limit = sizes->cs_size-sizeof(slab_t);
462 offslab_limit /= 2;
463 }
464 snprintf(name, sizeof(name), "size-%Zd(DMA)",
sizes->cs_size);
465 sizes->cs_dmacachep = kmem_cache_create(name,
sizes->cs_size, 0,
466 SLAB_CACHE_DMA|SLAB_HWCACHE_ALIGN,
NULL, NULL);
467 if (!sizes->cs_dmacachep)
468 BUG();
469 sizes++;
470 } while (sizes->cs_size);
471 }
- 438Get a pointer to the cache_sizes array
- 439The human readable name of the cache . Should be sized
CACHE_NAMELEN which is defined to be 20 bytes long
- 444-445slab_break_gfp_order determines how
many pages a slab may use unless 0 objects fit into the slab. It is
statically initialised to BREAK_GFP_ORDER_LO (1). This
check sees if more than 32MiB of memory is available and if it is, allow
BREAK_GFP_ORDER_HI number of pages to be used because
internal fragmentation is more acceptable when more memory is available.
- 446-470Create two caches for each size of memory allocation needed
- 452Store the human readable cache name in name
- 453-454Create the cache, aligned to the L1 cache
- 460-463Calculate the off-slab bufctl limit which determines the number
of objects that can be stored in a cache when the slab descriptor is kept
off-cache.
- 464The human readable name for the cache for DMA use
- 465-466Create the cache, aligned to the L1 cache and suitable for
DMA user
- 467if the cache failed to allocate, it is a bug. If memory is
unavailable this early, the machine will not boot
- 469Move to the next element in the cache_sizes array
- 470The array is terminated with a 0 as the last element
H.4.2.1 Function: kmalloc
Source: mm/slab.c
Ths call graph for this function is shown in Figure 8.16.
1555 void * kmalloc (size_t size, int flags)
1556 {
1557 cache_sizes_t *csizep = cache_sizes;
1558
1559 for (; csizep->cs_size; csizep++) {
1560 if (size > csizep->cs_size)
1561 continue;
1562 return __kmem_cache_alloc(flags & GFP_DMA ?
1563 csizep->cs_dmacachep :
csizep->cs_cachep, flags);
1564 }
1565 return NULL;
1566 }
- 1557cache_sizes is the array of caches for each size
(See Section 8.4)
- 1559-1564Starting
with the smallest cache, examine the size of each
cache until one large enough to satisfy the request is found
- 1562If the allocation is for use with DMA, allocate an object from
cs_dmacachep else use the cs_cachep
- 1565If a sizes cache of sufficient size was not available or an object
could not be allocated, return failure
H.4.3.1 Function: kfree
Source: mm/slab.c
The call graph for this function is shown in Figure 8.17. It is
worth noting that the work this function does is almost identical to the
function kmem_cache_free() with debugging enabled (See Section
H.3.3.1).
1597 void kfree (const void *objp)
1598 {
1599 kmem_cache_t *c;
1600 unsigned long flags;
1601
1602 if (!objp)
1603 return;
1604 local_irq_save(flags);
1605 CHECK_PAGE(virt_to_page(objp));
1606 c = GET_PAGE_CACHE(virt_to_page(objp));
1607 __kmem_cache_free(c, (void*)objp);
1608 local_irq_restore(flags);
1609 }
- 1602Return if the pointer is NULL. This is possible if a caller
used kmalloc() and had a catch-all failure routine which called
kfree() immediately
- 1604Disable interrupts
- 1605Make sure the page this object is in is a slab page
- 1606Get the cache this pointer belongs to (See Section 8.2)
- 1607Free the memory object
- 1608Re-enable interrupts
H.5 Per-CPU Object Cache
The structure of the Per-CPU object cache and how objects are added or removed
from them is covered in detail in Sections 8.5.1 and 8.5.2.
H.5.1 Enabling Per-CPU Caches
H.5.1.1 Function: enable_all_cpucaches
Source: mm/slab.c
Figure H.1: Call Graph: enable_all_cpucaches() |
This function locks the cache chain and enables the cpucache for every cache.
This is important after the cache_cache and sizes cache have
been enabled.
1714 static void enable_all_cpucaches (void)
1715 {
1716 struct list_head* p;
1717
1718 down(&cache_chain_sem);
1719
1720 p = &cache_cache.next;
1721 do {
1722 kmem_cache_t* cachep = list_entry(p, kmem_cache_t, next);
1723
1724 enable_cpucache(cachep);
1725 p = cachep->next.next;
1726 } while (p != &cache_cache.next);
1727
1728 up(&cache_chain_sem);
1729 }
- 1718Obtain the semaphore to the cache chain
- 1719Get the first cache on the chain
- 1721-1726Cycle through the whole chain
- 1722Get a cache from the chain. This code will skip the first cache
on the chain but cache_cache doesn't need a cpucache as it is so
rarely used
- 1724Enable the cpucache
- 1725Move to the next cache on the chain
- 1726Release the cache chain semaphore
H.5.1.2 Function: enable_cpucache
Source: mm/slab.c
This function calculates what the size of a cpucache should be
based on the size of the objects the cache contains before calling
kmem_tune_cpucache() which does the actual allocation.
1693 static void enable_cpucache (kmem_cache_t *cachep)
1694 {
1695 int err;
1696 int limit;
1697
1699 if (cachep->objsize > PAGE_SIZE)
1700 return;
1701 if (cachep->objsize > 1024)
1702 limit = 60;
1703 else if (cachep->objsize > 256)
1704 limit = 124;
1705 else
1706 limit = 252;
1707
1708 err = kmem_tune_cpucache(cachep, limit, limit/2);
1709 if (err)
1710 printk(KERN_ERR
"enable_cpucache failed for %s, error %d.\n",
1711 cachep->name, -err);
1712 }
- 1699-1700If an object is larger than a page, do not create a per-CPU
cache as they are too expensive
- 1701-1702If an object is larger than 1KiB, keep the cpu cache below
3MiB in size. The limit is set to 124 objects to take the size of the cpucache
descriptors into account
- 1703-1704For smaller objects, just make sure the cache doesn't go
above 3MiB in size
- 1708Allocate the memory for the cpucache
- 1710-1711Print out an error message if the allocation failed
H.5.1.3 Function: kmem_tune_cpucache
Source: mm/slab.c
This function is responsible for allocating memory for the cpucaches. For
each CPU on the system, kmalloc gives a block of memory large
enough for one cpu cache and fills a ccupdate_struct_t
struct. The function smp_call_function_all_cpus() then calls
do_ccupdate_local() which swaps the new information with the old
information in the cache descriptor.
1639 static int kmem_tune_cpucache (kmem_cache_t* cachep,
int limit, int batchcount)
1640 {
1641 ccupdate_struct_t new;
1642 int i;
1643
1644 /*
1645 * These are admin-provided, so we are more graceful.
1646 */
1647 if (limit < 0)
1648 return -EINVAL;
1649 if (batchcount < 0)
1650 return -EINVAL;
1651 if (batchcount > limit)
1652 return -EINVAL;
1653 if (limit != 0 && !batchcount)
1654 return -EINVAL;
1655
1656 memset(&new.new,0,sizeof(new.new));
1657 if (limit) {
1658 for (i = 0; i< smp_num_cpus; i++) {
1659 cpucache_t* ccnew;
1660
1661 ccnew = kmalloc(sizeof(void*)*limit+
1662 sizeof(cpucache_t),
GFP_KERNEL);
1663 if (!ccnew)
1664 goto oom;
1665 ccnew->limit = limit;
1666 ccnew->avail = 0;
1667 new.new[cpu_logical_map(i)] = ccnew;
1668 }
1669 }
1670 new.cachep = cachep;
1671 spin_lock_irq(&cachep->spinlock);
1672 cachep->batchcount = batchcount;
1673 spin_unlock_irq(&cachep->spinlock);
1674
1675 smp_call_function_all_cpus(do_ccupdate_local, (void *)&new);
1676
1677 for (i = 0; i < smp_num_cpus; i++) {
1678 cpucache_t* ccold = new.new[cpu_logical_map(i)];
1679 if (!ccold)
1680 continue;
1681 local_irq_disable();
1682 free_block(cachep, cc_entry(ccold), ccold->avail);
1683 local_irq_enable();
1684 kfree(ccold);
1685 }
1686 return 0;
1687 oom:
1688 for (i--; i >= 0; i--)
1689 kfree(new.new[cpu_logical_map(i)]);
1690 return -ENOMEM;
1691 }
- 1639The parameters of the function are
-
- cachep The cache this cpucache is been allocated for
- limit The total number of objects that can exist in the cpucache
- batchcount The number of objects to allocate in one batch when the
cpucache is empty
- 1647The number of objects in the cache cannot be negative
- 1649A negative number of objects cannot be allocated in batch
- 1651A batch of objects greater than the limit cannot be allocated
- 1653A batchcount must be provided if the limit is positive
- 1656Zero fill the update struct
- 1657If a limit is provided, allocate memory for the cpucache
- 1658-1668For every CPU, allocate a cpucache
- 1661The amount of memory needed is limit number of pointers
and the size of the cpucache descriptor
- 1663If out of memory, clean up and exit
- 1665-1666Fill in the fields for the cpucache descriptor
- 1667Fill in the information for ccupdate_update_t struct
- 1670Tell the ccupdate_update_t struct what cache is
been updated
- 1671-1673Acquire an interrupt safe lock to the cache descriptor and
set its batchcount
- 1675Get each CPU to update its cpucache information for itself. This
swaps the old cpucaches in the cache descriptor with the new ones in
new using do_ccupdate_local()
(See Section H.5.2.2)
- 1677-1685After smp_call_function_all_cpus()
(See Section H.5.2.1), the old cpucaches are in
new. This block of code cycles through them all, frees any objects
in them and deletes the old cpucache
- 1686Return success
- 1688In the event there is no memory, delete all cpucaches that have
been allocated up until this point and return failure
H.5.2 Updating Per-CPU Information
H.5.2.1 Function: smp_call_function_all_cpus
Source: mm/slab.c
This calls the function func() for all CPU's. In the context of
the slab allocator, the function is do_ccupdate_local() and the
argument is ccupdate_struct_t.
859 static void smp_call_function_all_cpus(void (*func) (void *arg),
void *arg)
860 {
861 local_irq_disable();
862 func(arg);
863 local_irq_enable();
864
865 if (smp_call_function(func, arg, 1, 1))
866 BUG();
867 }
- 861-863Disable interrupts locally and call the function for this CPU
- 865For all other CPU's, call the function.
smp_call_function() is an architecture specific function and will
not be discussed further here
H.5.2.2 Function: do_ccupdate_local
Source: mm/slab.c
This function swaps the cpucache information in the cache descriptor with the
information in info for this CPU.
874 static void do_ccupdate_local(void *info)
875 {
876 ccupdate_struct_t *new = (ccupdate_struct_t *)info;
877 cpucache_t *old = cc_data(new->cachep);
878
879 cc_data(new->cachep) = new->new[smp_processor_id()];
880 new->new[smp_processor_id()] = old;
881 }
- 876info is a pointer to the ccupdate_struct_t
which is then passed to smp_call_function_all_cpus()(See Section H.5.2.1)
- 877Part of the ccupdate_struct_t is a pointer
to the cache this cpucache belongs to. cc_data() returns the
cpucache_t for this processor
- 879Place the new cpucache in cache descriptor. cc_data()
returns the pointer to the cpucache for this CPU.
- 880Replace the pointer in new with the old cpucache so it can
be deleted later by the caller of smp_call_function_call_cpus(),
kmem_tune_cpucache() for example
H.5.3 Draining a Per-CPU Cache
This function is called to drain all objects in a per-cpu cache. It is called
when a cache needs to be shrunk for the freeing up of slabs. A slab would not
be freeable if an object was in the per-cpu cache even though it is not in use.
H.5.3.1 Function: drain_cpu_caches
Source: mm/slab.c
885 static void drain_cpu_caches(kmem_cache_t *cachep)
886 {
887 ccupdate_struct_t new;
888 int i;
889
890 memset(&new.new,0,sizeof(new.new));
891
892 new.cachep = cachep;
893
894 down(&cache_chain_sem);
895 smp_call_function_all_cpus(do_ccupdate_local, (void *)&new);
896
897 for (i = 0; i < smp_num_cpus; i++) {
898 cpucache_t* ccold = new.new[cpu_logical_map(i)];
899 if (!ccold || (ccold->avail == 0))
900 continue;
901 local_irq_disable();
902 free_block(cachep, cc_entry(ccold), ccold->avail);
903 local_irq_enable();
904 ccold->avail = 0;
905 }
906 smp_call_function_all_cpus(do_ccupdate_local, (void *)&new);
907 up(&cache_chain_sem);
908 }
- 890Blank the update structure as it is going to be clearing all data
- 892Set new.cachep to cachep so that
smp_call_function_all_cpus() knows what cache it is affecting
- 894Acquire the cache descriptor semaphore
- 895do_ccupdate_local()(See Section H.5.2.2)
swaps the cpucache_t information in the cache descriptor with
the ones in new so they can be altered here
- 897-905For each CPU in the system ....
- 898Get the cpucache descriptor for this CPU
- 899If the structure does not exist for some reason or there is no
objects available in it, move to the next CPU
- 901Disable interrupts on this processor. It is possible an allocation
from an interrupt handler elsewhere would try to access the per CPU cache
- 902Free the block of objects with free_block()
(See Section H.3.3.5)
- 903Re-enable interrupts
- 904Show that no objects are available
- 906The information for each CPU has been updated so call
do_ccupdate_local() (See Section H.5.2.2) for each
CPU to put the information back into the cache descriptor
- 907Release the semaphore for the cache chain
H.6 Slab Allocator Initialisation
H.6.0.2 Function: kmem_cache_init
Source: mm/slab.c
This function will
-
Initialise the cache chain linked list
- Initialise a mutex for accessing the cache chain
- Calculate the cache_cache colour
416 void __init kmem_cache_init(void)
417 {
418 size_t left_over;
419
420 init_MUTEX(&cache_chain_sem);
421 INIT_LIST_HEAD(&cache_chain);
422
423 kmem_cache_estimate(0, cache_cache.objsize, 0,
424 &left_over, &cache_cache.num);
425 if (!cache_cache.num)
426 BUG();
427
428 cache_cache.colour = left_over/cache_cache.colour_off;
429 cache_cache.colour_next = 0;
430 }
- 420Initialise the semaphore for access the cache chain
- 421Initialise the cache chain linked list
- 423kmem_cache_estimate()(See Section H.1.2.1)
calculates the number of objects and amount of bytes wasted
- 425If even one kmem_cache_t cannot be stored in a page,
there is something seriously wrong
- 428colour is the number of different cache lines that can
be used while still keeping L1 cache alignment
- 429colour_next indicates which line to use next. Start at 0
H.7 Interfacing with the Buddy Allocator
H.7.0.3 Function: kmem_getpages
Source: mm/slab.c
This allocates pages for the slab allocator
486 static inline void * kmem_getpages (kmem_cache_t *cachep,
unsigned long flags)
487 {
488 void *addr;
495 flags |= cachep->gfpflags;
496 addr = (void*) __get_free_pages(flags, cachep->gfporder);
503 return addr;
504 }
- 495Whatever flags were requested for the allocation, append the
cache flags to it. The only flag it may append is ZONE_DMA
if the cache requires DMA memory
- 496Allocate from the buddy allocator with
__get_free_pages() (See Section F.2.3)
- 503Return the pages or NULL if it failed
H.7.0.4 Function: kmem_freepages
Source: mm/slab.c
This frees pages for the slab allocator. Before it calls the buddy allocator
API, it will remove the PG_slab bit from the page flags.
507 static inline void kmem_freepages (kmem_cache_t *cachep, void *addr)
508 {
509 unsigned long i = (1<<cachep->gfporder);
510 struct page *page = virt_to_page(addr);
511
517 while (i--) {
518 PageClearSlab(page);
519 page++;
520 }
521 free_pages((unsigned long)addr, cachep->gfporder);
522 }
- 509Retrieve the order used for the original allocation
- 510Get the struct page for the address
- 517-520Clear the PG_slab bit on each page
- 521Free the pages to the buddy allocator with free_pages()
(See Section F.4.1)