view ols/2012/index.html @ 124:41ca6e4a8b6d default tip

Add Ottawa Linux Symposium 2011 and 2012 index pages.
author Rob Landley <rob@landley.net>
date Fri, 26 Jul 2013 15:23:43 -0500
parents
children
line wrap: on
line source

<html>
<title>Ottawa Linux Symposium (OLS) papers for 2012</title>
<body>

<p>Ottawa Linux Symposium (OLS) Papers for 2012:</p>

<hr><h2><a href="ols2012-komu.pdf">Sockets and Beyond: Assessing the Source Code of Network Applications</a> - M.&nbsp;Komu, S.&nbsp;Varjonen, A.&nbsp;Gurtov, S.&nbsp;Tarkoma</h2>

<p>Network applications are typically developed with frameworks that hide
the details of low-level networking. The motivation is to allow
developers to focus on application-specific logic rather than
low-level mechanics of networking, such as name resolution, reliability,
asynchronous processing and quality of service. In this article, we
characterize statistically how open-source applications use the
Sockets API and identify a number of requirements for network applications based on our
analysis. The analysis considers five fundamental questions: naming
with end-host identifiers, name resolution, multiple end-host
identifiers, multiple transport protocols and
security. We discuss the significance of these findings for
network application frameworks and their development. As two of our
key contributions, we present generic solutions for a problem with OpenSSL
initialization in C-based applications and a multihoming issue with
UDP in all of the analyzed four frameworks.
</p>

<hr><h2><a href="ols2012-lim.pdf">Load-Balancing for Improving User Responsiveness on Multicore Embedded Systems</a> - Geunsik Lim, Changwoo Min, YoungIk Eom</h2>

<p></p>

<p>Most commercial embedded devices have been deployed with a single processor architecture. The code size and complexity of applications running on embedded devices are rapidly increasing due to the emergence of application business models such as Google Play Store and Apple App Store. As a result, a high-performance multicore CPUs have become a major trend in the embedded market as well as in the personal computer market. </p>

<p>Due to this trend, many device manufacturers have been able to adopt more attractive user interfaces and high-performance applications for better user experiences on the multicore systems.</p>

<p>In this paper, we describe how to improve the real-time performance by reducing the user waiting time on multicore systems that use a partitioned per-CPU run queue scheduling technique. Rather than focusing on naive load-balancing scheme for equally balanced CPU usage, our approach tries to minimize the cost of task migration by considering the importance level of running tasks and to optimize per-CPU utilization on multicore embedded systems.</p>

<p>Consequently, our approach improves the real-time characteristics such as cache efficiency, user responsiveness, and latency. Experimental results under heavy background stress show that our approach reduces the average scheduling latency of an urgent task by 2.3 times.
</p>

<hr><h2><a href="ols2012-mansoor.pdf">Experiences with Power Management Enabling on the Intel Medfield Phone</a> -  R.&nbsp;Muralidhar, H.&nbsp;Seshadri, V.&nbsp;Bhimarao, V.&nbsp;Rudramuni, I.&nbsp;Mansoor, S.&nbsp;Thomas, B.&nbsp;K.&nbsp;Veera, Y.&nbsp;Singh, S.&nbsp;Ramachandra</h2>

<p>Medfield is Intel's first smartphone SOC platform built on a 32~nm process and the platform implements several key innovations in hardware and software to accomplish aggressive power management. It has multiple logical and physical power partitions that enable software/firmware to selectively control power to functional components, and to the entire platform as well, with very low latencies. </p>

<p>This paper describes the architecture, implementation and key experiences from enabling power management on the Intel Medfield phone platform. We describe how the standard Linux and Android power management architectures integrate with the capabilities provided by the platform to provide aggressive power management capabilities. We also present some of the key learning from our power management experiences that we believe will be useful to other Linux/Android-based platforms. 
</p>

<hr><h2><a href="ols2012-adepoutovitch.pdf">File Systems: More Cooperations - Less Integration.</a> - A.&nbsp;Depoutovitch, A.&nbsp;Warkentin</h2>

<p>Conventionally, file systems manage storage space available to user programs and provide it through the file interface. 
Information about the physical location of used and unused space is hidden from users. This makes the file system free space unavailable to other storage stack kernel components due to performance or layering violation reasons. This forces file systems architects to integrate additional functionality, like snapshotting and volume management, inside file systems increasing their complexity. </p>

<p>We propose a simple and easy-to-implement file system interface that allows different software components to efficiently share free storage space with a file system at a block level. We demonstrate the benefits of the new interface by optimizing an existing volume manager to store snapshot data in the file system free space, instead of requiring the space to be reserved in advance, which would make it unavailable for other uses.
</p>

<hr><h2><a href="ols2012-warkentin.pdf">``Now if we could get a solution to the home directory dotfile hell!''</a> - A.&nbsp;Warkentin</h2>

<p>Unix environments have traditionally consisted of
multi-user and diverse multi-computer configurations, backed by
expensive network-attached storage.
The recent growth and proliferation of desktop- and single machine-
centric GUI environments, however, has made it very difficult to share
a network-mounted home directory
across multiple machines. This is particularly noticeable in the
context of concurrent graphical logins or logins
into systems with a different installed software base.The typical
offenders are the ``modern'' bits of software such as
desktop environments (e.g.&nbsp;GNOME), services (dbus, PulseAudio), and
applications (Firefox),
which all abuse dotfiles.</p>

<p>Frequent changes to configuration
format prevents the same set of configuration files from being easily used
across even close versions of the same software. And whereas dotfiles
historically contained read-once configuration,
they are now misused for runtime lock files and writeable configuration databases,
with no effort to guarantee correctness across concurrent accesses and
differently-versioned components. Running such software concurrently, across different
machines with a network mounted home directory, results in corruption, data loss, misbehavior
and deadlock, as the majority of configuration is system-, machine- and installation- specific,
rather than user-specific.</p>

<p>This paper explores a simpler alternative to rewriting all
existing broken software, namely, implementing separate host-specific profiles via
filesystem redirection of dotfile accesses. Several approaches are
discussed and the presented solution, the
Host Profile File System, although Linux-centric, can be easily
adapted to other similar environments such as OS X, Solaris and the
BSDs.
</p>

<hr><h2><a href="ols2012-subramanian.pdf">Improving RAID1 Synchronization Performance Using File System Metadata</a> - H.&nbsp;Subramanian, A.&nbsp;Warkentin, A.&nbsp;Depoutovitch</h2>

<p>  Linux MD software RAID1 is used ubiquitously by end users,
  corporations and as a core technology component of other software
  products and solutions, such as the VMware vSphere
  Appliance(vSA). MD RAID1 mode provides data persistence and
  availability in face of hard drive failures by maintaining two or
  more copies (mirrors) of the same data. vSA makes data available
  even in the event of a failure of other hardware and software
  components, e.g.&nbsp;storage adapter, network, or the entire
  vSphere
  server. For recovery from a failure, MD has a mechanism for change
  tracking and mirror synchronization.
  
  However, data synchronization can consume a significant amount of
  time and resources. In the worst case scenario, when one of the
  mirrors has to be replaced with a new one, it may take up to a few
  days to synchronize the data on a large multi-terabyte disk volume.
  During this time, the MD RAID1 volume and contained user data are
  vulnerable to failures and MD operates below optimal performance.
  Because disk sizes continue to grow at a much faster pace compared
  to disk speeds, this problem is only going to become worse in the
  near future.
  
  This paper presents a solution for improving the synchronization of
  MD RAID1 volumes by leveraging information already tracked by file
  systems about disk utilization. We describe and compare three
  different implementations that tap into the file system and assist
  the MD RAID1 synchronization algorithm to avoid copying unused
  data. With real-life average disk utilization of 43%
  
  synchronization time of a typical MD RAID1 volume compared to the
  existing synchronization mechanism.
</p>

<hr><h2><a href="ols2012-verma.pdf">Out of band Systems Management in enterprise Computing Environment</a> - D.&nbsp;Verma, S.&nbsp;Gowda, A.&nbsp;Vellimalai, S.&nbsp;Prabhakar</h2>

<p>Out of band systems management provides an innovative mechanism to keep the digital ecosystem inside data centers in shape even when the parent system goes down. This is an upcoming trend where monitoring and safeguarding of servers is offloaded to another embedded system which is most likely an embedded Linux implementation. </p>

<p>In today's context, where virtualized servers/workloads are the most prevalent compute nodes inside a data center, it is important to evaluate  systems management and associated challenges in that perspective. This paper explains how to leverage Out Of Band systems management infrastructure in virtualized environment. </p>

<hr><h2><a href="ols2012-thiell.pdf">ClusterShell, a scalable execution framework for parallel tasks</a> - S.&nbsp;Thiell, A. Degr&eacute;mont, H.&nbsp;Doreau, A.&nbsp;Cedeyn</h2>

<p>
Cluster-wide administrative tasks and other distributed jobs are often
executed by administrators using locally developed tools and do not rely on a
solid, common and efficient execution framework. This document covers this
subject by giving an overview of ClusterShell, an open source Python
middleware framework developed to improve the administration of HPC Linux
clusters or server farms.</p>

<p>ClusterShell provides an event-driven library interface that eases the
management of parallel system tasks, such as copying files, executing shell
commands and gathering results. By default, remote shell commands rely on SSH,
a standard and secure network protocol. Based on a scalable, distributed
execution model using asynchronous and non-blocking I/O, the library has shown
very good performance on petaflop systems. Furthermore, by providing efficient
support for node sets and more particularly node groups bindings, the library
and its associated tools can ease cluster installations and daily tasks
performed by administrators.</p>

<p>In addition to the library interface, this document addresses resiliency and
topology changes in homogeneous or heterogeneous environments. It also focuses
on scalability challenges encountered during software development and
on the lessons learned to achieve maximum performance from a Python software
engineering point of view. </p>

<hr><h2><a href="ols2012-salve.pdf">DEXT3: Block Level Inline Deduplication for EXT3 File System</a> - A.&nbsp;More, Z.&nbsp;Shaikh, V.&nbsp;Salve</h2>

<p>Deduplication is basically an intelligent storage and compression technique that avoids saving redundant data onto the disk. Solid State Disk (SSD) media have gained popularity these days owing to their low power demands, resistance to natural shocks and vibrations and a high quality random access performance. However, these media come with limitations such as high cost, small capacity and a limited erase-write cycle lifespan. Inline deduplication helps alleviate these problems by avoiding redundant writes to the disk and making efficient use of disk space. In this paper, a block level inline deduplication layer for EXT3 file system named the DEXT3 layer is proposed. This layer identifies the possibility of writing redundant data to the disk by maintaining an in-core metadata structure of the previously written data. The metadata structure is made persistent to the disk, ensuring that the deduplication process does not crumble owing to a system shutdown or reboot. The DEXT3 layer also takes care of the modification and the deletion a file whose blocks have been referred by other files, which otherwise would have created data loss issues for the referred files.
</p>

<hr><h2><a href="ols2012-chang.pdf">ARMvisor: System Virtualization for ARM</a> - J-H.&nbsp;Ding, C-J.&nbsp;Lin, P-H.&nbsp;Chang, C-H.&nbsp;Tsang, W-C.&nbsp;Hsu, Y-C.&nbsp;Chung</h2>

<p>In recent years, system virtualization technology has gradually shifted its focus from data centers to embedded systems for enhancing security, simplifying the process of application porting as well as increasing system robustness and reliability. In traditional servers, which are mostly based on x86 or PowerPC processors, Kernel-based Virtual Machine (KVM) is a commonly adopted virtual machine monitor.
However, there are no such KVM implementations available for the ARM architecture which dominates modern embedded systems. In order to understand the challenges of system virtualization for embedded systems, we have implemented a hypervisor, called ARMvisor, which is based on KVM for the ARM architecture.</p>

<p>In a typical hypervisor, there are three major components: CPU virtualization, memory virtualization, and I/O virtualization. For CPU virtualization, ARMvisor uses traditional ``trap and emulate'' to deal with sensitive instructions. Since there is no hardware support for virtualization in ARM architecture V6 and earlier, we have to patch the guest OS to force critical instructions to trap. For memory virtualization, the functionality of the MMU, which translates a guest virtual address to host physical address, is emulated. In ARMvisor, a shadow page table is dynamically allocated to avoid the inefficiency and inflexibility of static allocation for the guest OSes. In addition, ARMvisor uses R-Map to take care of protecting the memory space of the guest OS. For I/O virtualization, ARMvisor relies on QEMU to emulate I/O devices. We have implemented KVM on ARM-based Linux kernel for all three components in ARMvisor. At this time, we can successfully run a guest Ubuntu system on an Ubuntu host OS with ARMvisor on the ARM-based TI BeagleBoard.
</p>

<hr><h2><a href="ols2012-lissy.pdf">Clustering the Kernel</a> - A.&nbsp;Lissy, J.&nbsp;Parpaillon, P.&nbsp;Martineau</h2>

<p>Model-checking techniques are limited in the number of states
that can be handled, even with new optimizations to increase capacity.
To be able to apply these techniques on very large code base such as the
Linux Kernel, we propose to slice the problem into parts that are manageable for
model-checking. A first step toward this goal is to study the
current topology of internal dependencies in the kernel.
</p>

<hr><h2><a href="ols2012-zeldovich.pdf">Non-scalable locks are dangerous</a> - Silas Boyd-Wickizer, M. Frans Kaashoek, Robert Morris, and Nickolai Zeldovich</h2>

<p>
  Several operating systems rely on non-scalable spin locks for serialization.
  For example, the Linux kernel uses ticket spin locks, even though scalable
  locks have better theoretical properties.
  Using Linux on a 48-core machine, this paper
  shows that non-scalable locks can cause dramatic collapse in the
  performance of real workloads, even for very short critical sections.
  The nature and sudden onset of
  collapse are explained with a new Markov-based performance model.  Replacing
  the offending non-scalable spin locks with scalable spin locks avoids the
  collapse and requires modest changes to source code.</p>

<hr><h2><a href="ols2012-brahmaroutu.pdf">Fine Grained Linux I/O Subsystem Enhancements to Harness Solid State Storage</a> - S.&nbsp;Brahmaroutu, R.&nbsp;Patel, H.&nbsp;Rajagopalan, S.&nbsp;Vidyadhara, A.&nbsp;Vellimalai</h2>

<p>Enterprise Solid State Storage (SSS) are high performing class devices targeted at business critical applications that can benefit from fast-access storage. While it is exciting to see the improving affordability and applicability of the technology, enterprise software and Operating Systems (OS) have not undergone pertinent design modifications to reap the benefits offered by SSS. This paper investigates the I/O submission path to identify the critical system components that significantly impact SSS performance. Specifically, our analysis focuses on the Linux I/O schedulers on the submission side of the I/O. We demonstrate that the Deadline scheduler offers the best performance under random I/O intensive workloads for the SATA SSS. Further, we establish that all I/O schedulers including Deadline are not optimal for PCIe SSS, quantifying the possible performance improvements with a new design that leverages device level I/O ordering intelligence and other I/O stack enhancements.
</p>

<hr><h2><a href="ols2012-wang.pdf">Optimizing eCryptfs for better performance and security</a> - Li Wang, Y.&nbsp;Wen, J.&nbsp;Kong, X.&nbsp;Yi</h2>

<p>This paper describes the improvements we have done to eCryptfs, a POSIX-compliant
enterprise-class stacked cryptographic filesystem for Linux. The major improvements are as follows.
First, for stacked filesystems, by default, the Linux VFS framework will maintain page caches for each level of filesystem in the stack, which means that the same part of file data will be cached multiple times. However, in some situations, multiple caching is not needed and wasteful, which motivates us to perform redundant cache elimination, to reduce ideally half of the memory consumption and to avoid unnecessary memory copies between page caches. The benefits are verified by experiments, and this approach is applicable to other stacked filesystems. Second, as a filesystem highlighting security, we equip eCryptfs with HMAC verification, which enables eCryptfs to detect unauthorized data modification and unexpected data corruption, and the experiments demonstrate that the decrease in throughput is modest. Furthermore, two minor optimizations are introduced. One is that we introduce a thread pool, working in a pipeline manner to perform encryption and write down, to fully exploit parallelism, with notable performance improvements. The other is a simple but useful and effective write optimization. In addition, we discuss the ongoing and future works on eCryptfs.
</p>

<hr><h2><a href="ols2012-messier.pdf">Android SDK under Linux</a> - Jean-Francois Messier</h2>

<p>This is a tutorial about installing the various components required to
have an actual Android development station under Linux. The commands are
simple ones and are written to be as independent as possible of your
flavour of Linux. All commands and other scripts are in a set of files
that will be available on-line. Some processes that would usually
require user attendance have been scripted to run unattended and are
pre-downloaded. The entire set of files (a couple of gigs) can be copied
after the tutorial for those with a portable USB key or hard disk.
</p>

</body>
</html>