view ols/2011/index.html @ 124:41ca6e4a8b6d default tip

Add Ottawa Linux Symposium 2011 and 2012 index pages.
author Rob Landley <>
date Fri, 26 Jul 2013 15:23:43 -0500
line wrap: on
line source

<title>Ottawa Linux Symposium (OLS) papers for 2011</title>

<p>Ottawa Linux Symposium (OLS) Papers for 2011:</p>

<hr><h2><a href="ols2011-gadre.pdf">X-XEN : Huge Page Support in Xen</a> - A.&nbsp;Gadre, K.&nbsp;Kabra, A.&nbsp;Vasani, K.&nbsp;Darak</h2>

<p>Huge pages are the memory pages of size 2MB (x86-PAE and x86_64). The
number of page walks required for translation from a virtual address to
physical 2MB page are reduced as compared to page walks required for
translation from a virtual address to physical 4kB page. Also the number
of TLB entries per 2MB chunk in memory is reduced by a factor of 512 as
compared to 4kB pages. In this way huge pages improve the performance of
the applications which perform memory intensive operations. In the
context of virtualization, i.e. Xen hypervisor, we propose a design and
implementation to support huge pages for paravirtualized guest paging

<p>Our design reserves 2MB pages (MFNs) from the domain's committed memory
as per configuration specified before a domain boots. The rest of the
memory is continued to be used as 4kB pages. Thus availability of the
huge pages is guaranteed and actual physical huge pages can be provided
to the paravirtualized domain. This increases the performance of the
applications hosted on the guest operating system which require the huge
page support. This design solves the problem of availability of 2MB
chunk in guest's physical address space (virtualized) as well as the
Xen's physical address space which would otherwise may be unavailable
due to fragmentation.

<hr><h2><a href="ols2011-lim.pdf">NPTL Optimization for Lightweight Embedded Devices</a> - Geunsik Lim, Hyun-Jin Choi, Sang-Bum Suh</h2>


<p>One of the main changes included in the current Linux kernel is that, Linux thread model is transferred from LinuxThread to NPTL\citenptl-design for scalability and high performance. Each thread of user-space allocates one thread (1:1 mapping model) as a kernel for each thread's fast creation and termination. The management and scheduling of each thread within a single process is to take advantage of a multiple processor hardware. The direct management by the kernel thread can be scheduled by each thread. Each thread in a multi-processor system will be able to run simultaneously on a different CPU. In addition, the system service while blocked will not be delayed. In other words, even if one thread calls blocking a system call, another thread is not blocked.</p>

<p>However, NPTL made features on Linux 2.6 to optimize a server and a desktop against Linux 2.4 dramatically. However, embedded systems are extremely limited on physical resources of the CPU and Memory such as DTV, Mobile phone. Some absences of effective and suitable features for embedded environments needs to be improved to NPTL. For example, the thread's stack size, enforced / arbitrary thread priority manipulation in non-preemptive kernel, thread naming to interpret their essential role, and so on.</p>

<p>In this paper, a lightweight NPTL (Native POSIX Threads Library) that runs effectively on embedded systems, for the purpose of a way to optimize is described.

<hr><h2><a href="ols2011-suzaki.pdf">Analysis of Disk Access Patterns on File Systems for Content Addressable Storage</a> - Kuniyasu Suzaki, Kengo Iijima, Toshiki Yagi, Cyrille Artho</h2>


<p>CAS (Content Addressable Storage) is virtual disk with deduplication, which merges same-content chunks and reduces the consumption of physical storage. The performance of CAS depends on the allocation strategy of the individual file system and its access patterns (size, frequency, and locality of reference) since the effect of merging depends on the size of a chunk (access unit) used in deduplication.
We propose a method to evaluate the affinity between file system and CAS, which compares the degree of deduplication by storing many same-contents files throughout a file system. The results show the affinity and semantic gap between the file systems (ext3, ext4, XFS, JFS, ReiserFS (they are bootable file systems), NILFS, btrfs, FAT32 and NTFS, and CAS.</p>

<p>We also measured disk accesses through five bootable file systems at installation (Ubuntu 10.10) and at boot time, and found a variety of access patterns, even if same contents were installed. The results indicate that the five file systems allocate data scattered from a macroscopic view, but keep block contiguity for data from a microscopic view.

<hr><h2><a href="ols2011-lissy1.pdf">Verifications around the Linux kernel</a> - A.&nbsp;Lissy, S.&nbsp;Lauri&egrave;re, P.&nbsp;Martineau</h2>

<p>Ensuring software safety has always been needed, whether you are designing an on-board aircraft computer or next-gen mobile phone, even if the purpose of the
verification is not the same in both cases. We propose to show the current state
of the art of work around the verification of the Linux kernel, and by extension
also present what has been done on other kernels. We will conclude with future
needs that must be addressed, and some way of improvements that should be

<hr><h2><a href="ols2011-lissy2.pdf">Faults in Patched Kernel</a> - A.&nbsp;Lissy, S.&nbsp;Lauri&egrave;re, P.&nbsp;Martineau</h2>

<p>Tools have been designed to detect for faults in the Linux Kernel,
such as Coccinelle, Sparse, or Undertaker, and studies of their results over the
vanilla tree have been published. We are interested in a specific point: since
Linux distributions patch the kernel (as other software) and since those patches
might target less common use cases, it may result in a lower quality assurance
level and fewer bugs
found. So, we ask ourselves: is there any difference between upstream and
distributions' kernel from a faults point of view ? We present an existing tool,
Undertaker, and detail a methodology for reliably counting bugs in patched and
non-patched kernel source code, applied to vanilla and distributions'
kernels (Debian, Mandriva, openSUSE). We show that the difference is negligible
but in favor of patched kernels.

<hr><h2><a href="ols2011-mitake.pdf">Towards Co-existing of Linux and Real-Time OSes</a> - H.&nbsp;Mitake, T-H.&nbsp;Lin, H.&nbsp;Shimada, Y.&nbsp;Kinebuchi, N.&nbsp;Li, T.&nbsp;Nakajima</h2>

<p>The capability of real-time resource management in the Linux kernel is
dramatically improving due to the effective contribution of the real-time Linux
community. However, to develop commercial products cost-effectively,
it must be possible to re-use existing real-time applications from
other real-time OSes whose OS API differs significantly from the POSIX
interface. A virtual machine monitor that executes multiple operating
systems simultaneously is a promising solution, but existing virtual
machine monitors such as Xen and KVM are hard to used for embedded
systems due to their complexities and throughput oriented designs.
In this paper, we introduce a lightweight processor abstraction layer
named SPUMONE. SPUMONE provides virtual CPUs (vCPUs) for respective guest OSes,
and schedules them according to their priorities. In a typical case,
SPUMONE schedules Linux with a low priority and an RTOS with a high priority. The
important features of SPUMONE are the exploitation of an interrupt
prioritizing mechanism and a vCPU migration mechanism that
improves real-time capabilities in order to make the virtualization layer
more suitable for embedded systems. We also discuss why the
traditional virtual machine monitor design is not appropriate for
embedded systems, and how the features of SPUMONE allow us to design
modern complex embedded systems with less efforts.

<hr><h2><a href="ols2011-vasavada.pdf"> Comparing different approaches for Incremental Checkpointing: The Showdown </a> - M.&nbsp;Vasavada, F.&nbsp;Mueller, P.&nbsp;Hargrove, E.&nbsp;Roman</h2>


<p>The rapid increase in the number of cores and nodes in high
performance computing (HPC) has made petascale computing a reality
with exascale on the horizon. Harnessing such computational power
presents a challenge as system reliability deteriorates with the
increase of building components of a given single-unit
reliability. Today's high-end HPC installations require applications
to perform checkpointing if they want to run at scale so that failures
during runs over hours or days can be dealt with by restarting from
the last checkpoint. Yet, such checkpointing results in high overheads
due to often simultaneous writes of all nodes to the parallel file
system (PFS), which reduces the productivity of such systems in terms of
throughput computing. Recent work on checkpoint/restart (C/R) has shown
that incremental C/R techniques can reduce the amount of data written
at checkpoints and thus the overall C/R overhead and impact on the PFS.</p>

<p>The contributions of this work are twofold. First, it presents the
design and implementation of two memory management schemes that enable
incremental checkpointing. We describe unique approaches to
incremental checkpointing that do not require kernel patching in one
case and only require minimal kernel extensions in the other
case. The work is carried out within the latest Berkeley Labs
Checkpoint Restart (BLCR) as part of an upcoming release. Second, we
evaluate the two schemes in terms of their system overhead for
single-node microbenchmarks and multi-node cluster workloads. In
short, this work is the final showdown between page write bit (WB) protection
and dirty bit (DB) page tracking as a hardware means to support incremental
Our results show savings of the DB approach over WB approach in almost
all the tests. Further, DB has the potential of a significant reduction
in kernel activity, which is of utmost relevance for proactive fault
tolerance where an immanent fault can be circumvented if DB-based live
migrations moves a process away from hardware about to fail.

<hr><h2><a href="ols2011-clavis.pdf">User-level scheduling on NUMA multicore systems under Linux</a> - Sergey Blagodurov, Alexandra Fedorova</h2>

<p>The problem of scheduling on multicore systems remains one of the hottest and the most challenging topics in systems research. Introduction of non-uniform memory access (NUMA) multicore architectures further complicates this problem, as on NUMA systems the scheduler needs not only consider the placement of threads on cores, but also the placement of memory. Hardware performance counters and hardware-supported instruction sampling, available on major CPU models, can help tackle the scheduling problem as they provide a wide variety of potentially useful information characterizing system behavior. The challenge, however, is to determine what information from counters is most useful for scheduling and how to properly obtain it on user level. </p>

<p>In this paper we provide a brief overview of user-level scheduling techniques in Linux, discuss the types of hardware counter information that is most useful for scheduling, and demonstrate how this information can be used in an online user-level scheduler. The Clavis scheduler, created as a result of this research , is released as an open source project.

<hr><h2><a href="ols2011-vallee.pdf">Management of Virtual Large-scale High-performance Computing Systems</a> - Geoffroy Vall&eacute;e, Thomas Naughton, Stephen L.&nbsp;Scott</h2>

Linux is widely used on high-performance computing (HPC) systems,
from commodity clusters to Cray supercomputers (which run the Cray Linux
Environment). These platforms primarily differ in their system configuration:
some only use SSH to access compute nodes, whereas others employ full resource
management systems (e.g., Torque and ALPS on Cray XT systems).
Furthermore, the latest improvements in system-level virtualization techniques, 
such as hardware support, virtual machine migration for system resilience
purposes, and reduction of virtualization overheads, enable the usage of 
virtual machines on HPC platforms.</p>

<p>Currently, tools for the management of virtual machines in the
context of HPC systems are still quite basic, and often tightly coupled to
the target platform.
In this document, we present a new system tool for the management of virtual
machines in the context of large-scale HPC systems, including a run-time
system and the support for all major virtualization solutions.
The proposed solution is based on two key aspects.
First, Virtual System Environments (VSE), introduced in a previous
study, provide a flexible method to define the software environment that will
be used within virtual machines. Secondly, we propose a new system run-time for
the management and deployment of VSEs on HPC systems, which supports a wide
range of system configurations. For instance, this generic run-time can
interact with resource managers such as Torque for the management of virtual

<p>Finally, the proposed solution provides appropriate abstractions to enable
use with a variety of virtualization solutions on different Linux HPC
platforms, to include Xen, KVM and the HPC oriented Palacios.</p>

<hr><h2><a href="ols2011-grekhov.pdf">The Easy-Portable Method of Illegal Memory Access Errors Detection for Embedded Computing Systems</a> - Ekaterina Gorelkina, Alexey Gerenkov, Sergey Grekhov</h2>


<p> Nowadays applications on embedded systems become more and more complex and require
more effective facilities for debugging, particularly, for detecting memory access
errors. Existing tools usually have strong dependence on the architecture of processors
that makes its usage difficult due to big variety of types of CPUs. In this paper
an easy-portable solution of problem of heap memory overflow errors detection is
suggested. The proposed technique uses substitution of standard allocation functions
for creating additional memory regions (so called \it red zones) for detecting overflows
and intercepting of page faulting mechanism for tracking memory accesses. Tests have
shown that this approach allows detecting illegal memory access errors in heap with
sufficient precision. Besides, it has a small processor-dependent part that makes this
method easy-portable for embedded systems which have big variety of types of processors.

<hr><h2><a href="ols2011-giraldeau.pdf">Recovering System Metrics from Kernel Trace</a> - F.&nbsp;Giraldeau, J.&nbsp;Desfossez, D.&nbsp;Goulet, M. Dagenais, M.&nbsp;Desnoyers</h2>

<p>Important Linux kernel subsystems are statically instrumented with tracepoints, which enables the gathering of detailed information about a running system, such as process scheduling, system calls and memory management. Each time a tracepoint is encountered, an event is generated and can be recorded to disk for offline analysis. Kernel tracing provides system-wide instrumentation that has low performance impact, suitable for tracing online systems in order to debug hard-to-reproduce errors or analyze the performance.</p>

<p>Despite these benefits, a kernel trace may be difficult to analyze due to the large number of events. Moreover, trace events expose low-level behavior of the kernel that requires deep understanding of kernel internals to analyze. In many cases, the meaning of an event may depend on previous events. To get valuable information from a kernel trace, fast and reliable analysis tools are required.</p>

<p>In this paper, we present required trace analysis to provide familiar and meaningful metrics to system administrators and software developers, including CPU, disk, file and network usage. We present an open source prototype implementation that performs these analysis with the LTTng tracer. It leverages kernel traces for performance optimization and debugging.

<hr><h2><a href="ols2011-masters.pdf">State of the kernel</a> - John C.&nbsp;Masters</h2>

<p>Slides from the talk follow.

<hr><h2><a href="ols2011-riker.pdf">Android Development</a> - Tim&nbsp;Riker</h2>

<p>Slides from the talk follow.