Entries in 'grid research'

« Previous Page

Google’s Seattle Conference on Scalability

Here is a good set of pointers about Google’s recent Seattle Conference on Scalability:

http://glinden.blogspot.com/2007/06/more-on-google-scalability-conference.html

Workspace EC2 integration; Contextualization

It’s been busy lately, attended the first dev.Globus All Hands Meeting and TeraGrid ‘07 right here in Madison.

At TG07, Kate gave a talk which is online. The paper she presented discusses among other things contextualization, the structure and mechanisms by which an appliance/workspace is “told” what it needs in order to adapt to its deployed environment. This is not just adaptation to site specific services but also to other appliances that may be deployed with it such as in a virtual cluster deployment.

Amidst the bustle we implemented a new backend to the Workspace Service, to Amazon’s Elastic Compute Cloud (EC2). We’ve deployed it to the University of Chicago’s Teraport cluster and will currently pay for usage by selected collaborators.

Besides being somewhat fun to implement (including getting the Globus and Amazon Secure Message stacks on the same wavelength), I think it’s going to be interesting.

Because grid resources are cautiously approaching the pioneering switch to virtualizing resources [1], even in part, it is going to be interesting and educational to see what people will be able to accomplish with workspaces when a large pool of resources is actually available on tap — today.

Because the same deployment protocols can be used for both native and EC2 resources, there are of course capacity overflow use cases. In the right situations, VMs are a good mechanism for providers to dynamically reach more consumers as the need arises.

For a feature list and description, see What is the EC2 backend?

——-

[1] and some would say inevitable switch, even with the performance costs. Consider also that ‘virtualizing resources’ may mean physical node re-imaging, cf. Virtual Workspaces: Achieving Quality of Service and Quality of Life in the Grid.

CFP: Special issue on Networks for Grid Applications

From Call for papers: Special issue on Networks for Grid Applications

Grid developers and practitioners are increasingly realising the importance of an efficient network support. Entire classes of applications would greatly benefit by a network-aware Grid middleware, able to effectively manage the network resource in terms of scheduling, access and use. Conversely, the peculiar requirements of Grid applications provide stimulating drivers for new challenging research towards the development of Grid-aware networks.

Cooperation between Grid middleware and network infrastructure driven by a common control plane is a key factor to effectively empower the global Grid platform for the execution of network-intensive applications, requiring massive data transfers, very fast and low-latency connections, and stable and guaranteed transmission rates. Big e-science projects, as well as industrial and engineering applications for data analysis, image processing, multimedia, or visualisation just to name a few are awaiting an efficient Grid network support. They would be boosted by a global Grid platform enabling end-to-end dynamic bandwidth allocation, broadband and low-latency access, interdomain access control, and other network performance monitoring capabilities.

As a natural extension of the discussion forum provided by the Gridnets conference series, this special section aims at gathering top-quality contributions to the most debated topics currently tackled in Grid networking research. Topics include, but are not limited to:

* Network architectures and technologies for grids
* The network as a first class Grid resource: network resource information publication, brokering and co-scheduling with other Grid resources
* Interaction of the network with distributed data management systems
* Network monitoring, traffic characterisation and performance analysis
* Inter-layer interactions: optical layer with higher layer protocols, integration among layers
* Experience with pre-production Grid network infrastructures and exchange points
* Peer-to-peer network enhancements applied to the Grid
* Network support for wireless and ad hoc grids
* Data replication and multicasting strategies and novel data transport protocols
* Fault-tolerance, self healing networks
* Security and scalability issues when connecting a large number of sites within a virtual organization VPN
* Simulations
* New concepts and requirements which may fundamentally reshape the evolution of Networks.
* Integration of advanced optical networking technologies into the Grid environment
* End to end lightpath provisioning software systems and emergent standards

A Scalable Approach To Deploying And Managing Appliances

Our paper about virtual appliance configuration and management was accepted to the TeraGrid 2007 conference and is now online: A Scalable Approach To Deploying And Managing Appliances.

This paper examines configuration and security issues that large and heterogeneous deployments of virtual appliances/workspaces will face.

From the introduction:

The goal of this paper is to develop a holistic approach that would provide scalable and sustainable ways of managing and deploying virtual workspaces implemented as VM images. We will discuss ways of leveraging existing configuration management tools, exemplified by the Bcfg2 system, for VM image lifecycle management that will allow systems staff to deploy robust virtualized resources for their users. We will also describe the process of contextualization — integration of an appliance in its deployment context — and discuss its reference implementation using Bcfg2 and the Workspace Service.

A Resource Management Model for VM-Based Virtual Workspaces

My colleague Borja Sotomayor’s Masters paper, A Resource Management Model for VM-Based Virtual Workspaces, is now available for download. Congratulations Borja!

This is a long but well organized paper that goes into detail about different resource management scenarios for VMs and grid computing. It includes discussion and experimental results of combining different scheduling techniques for VMs (including advanced reservation) and accurately dealing with overheads (this problem is introduced in Overhead Matters: A Model for Virtual Resource Management).

Abstract follows in quotes. I also recommend the two page introduction to get a better idea of what this is all about.

Virtual workspaces provide an abstraction for dynamically deployable execution environments on a Grid. For this abstraction to be effective, it must be possible to provide on-demand software environments and enforceable fine grained resource allocations for these workspaces. Virtual machines are a promising vehicle to realize the virtual workspace abstraction, as they allow us to instantiate a precisely defined virtual resource, configured with desired software configuration and hardware properties, on a set of physical resources.

In this paper, we describe a model of virtual machine provisioning in a Grid environment that allows us to define such virtual resources and instantiate them on a physical Grid infrastructure. Our model focuses, firstly, on providing users with an accurate representation of virtual resources. To accomplish this, the overhead resulting from instantiating and managing virtual resources is scheduled at the same level as virtual resources, instead of being deducted from a user’s resource allocation. Secondly, our model also focuses on efficiently managing virtual resources by reducing the amount of overhead.

We argue that this model, compared to resource management models that rely on the job abstraction for remote execution, enables resource providers to accurately provision resources to users, while using their physical resources efficiently. We show experimental results that demonstrate the benefits of this model both from the resource providers and the user’s perspective, in two common resource management scenarios for virtual workspaces: advance reservations and batch-style submissions.

For more relevant talks and papers from the group, see the Workspace publications page.

Virtualization Workshop Update

In this month’s Globus Consortium Journal is an article by Kate Keahey giving an update on VTDC 06 (she was the PC). She discusses adoption issues, especially current missing links. Highly recommended if you are interested in the intersection between Grid computing and virtualization!

CFP: Grid 2007

On April 7, technical paper submissions are due for Grid 2007 which is being held in Austin, TX, September 19-21. CFP (pdf).

Container abstractions in grid computing

This all also gets me thinking about what container abstraction is the best for grid applications… I think it is a very complicated subject. My off the cuff conclusion is that if we had perfect infrastructure available to deploy each kind, grids would probably be able put it all to use to best satisfy different scenarios and constraints (constraints that are coming from both client and resource provider). That’s getting way ahead of things though. A lot of this software, and the tools to manage it, are still maturing. And for the timebeing, production grids are just warming up to the idea of one virtualization platform (Xen), not five at once :-)

In the long run, an important factor is the onus placed on the remote user when preparing its environment for deployment across a grid. With VMs or any kind of “contained” guest, you’ve always got to lock in your “capsule” to a certain environment in order for the container to accept it, be it:

For grid applications, it is yet to be seen how important locking in to instruction sets is, but Xen is still a great option (acceptable performance, very portable and very isolated). The choice can affect a lot of things: ease of maintenance, security policies, resource availability, performance, etc.

What is apparent is the advantages of having a consistent compiler chain, libc, and other libraries. It can mean the difference between being able to use a site’s resources or not (see slide 18) and even if the dependencies at a site seem to line up with requirements, it could take a large effort to actually verify the environment. Xen based VMs provide a path out of this mess.

As for requirements for needing to customize below the Linux userspace API (or needing some other OS entirely), I’ve always thought it would be cool to see more code developed for kernelspace (in the vein of the tux webserver). Pervasively available virtualization platforms may make this a real option for grid applications or infrastructure. Then again, some memory protection is a good thing :-).

Ultimately, the workspace abstraction is geared to handle many different implementations, e.g. physical workspaces (node re-imaging) and different kinds of VMMs. After all, they are all just containers with different enforcement and isolation capabilities. In the long run, it is going to be very interesting to seriously evaluate the different approaches (under both pathological and real grid application workloads) vs. the current Xen backend.

Xen vs. kernel containers: performance and efficiency

(This is part of a series of entries)

Because Xen and KVM both support unmodified guests, I’d speculate that in the long run their raw CPU performance will converge on whatever concrete limitation that hardware-assisted virtualization presents. And paravirtualization may continue to reign here, or it may not. The harder issues to think about are disk and network I/O.

I was part of an investigation into how to make resource guarantees for workspaces under even the worst conditions on non-dedicated VMMs (Division of Labor: Tools for Growth and Scalability of Grids). The amount of CPU needed to support the guests’ I/O work (what I like to casually call the “on behalf of” work in the service domain) was pretty high and we looked at how to measure what guarantees were needed for the service domain itself to make sure the guest guarantees were met. So we had to write code that would extrapolate the CPU reservations needed across all domains (including the service domain).

One major source of the extra CPU work is context switching overhead, the service domain needs to switch in to process pending I/O events (on large SMPs, I’ve heard recommendations to just dedicate a CPU to the service domain). Also, in networking’s case, the packets are zero copy but they must still traverse the bridging stack in the service domain.

One important thing to consider for the long run on this issue is that there is a lot of work being done to make slices of HW such as Infiniband available directly to guest VMs, this will obviate the need for a driver domain to context switch in. See High Performance VMM-Bypass I/O in Virtual Machines

Container based, kernelspace solutions offer a way out of a lot of this overhead by being implemented directly in the kernel that is doing the “on behalf of” work. They also take advantage of the resource management code already in the Linux kernel.

They can more effectively schedule resources being used inside their regular userspace right alongside the VMs (I’m assuming) — and more easily know what kernel work should be “charged” to what process (I’m assuming). These two things could prove useful, avoiding some of the monitoring and juggling that is needed to correctly do that in a Xen environment (see e.g., the Division of Labor paper mentioned above and the Xen related work from HP).

There is an interesting paper Container-based Operating System Virtualization: A Scalable, High-performance Alternative to Hypervisors out of Princeton.

The authors contrast Xen and VServer and present cases where hard-partitioning (that you find in Xen) breeds too much overhead for grid and high performance use cases. Where full fault isolation and OS heterogeneity are not needed, they advocate that the CPU overhead issues of Xen I/O and VM context switches can be avoided.

(The idea presented there of live updating the kernel (as you migrate the VM) is interesting. For jobs that take months (that will miss out on kernel updates to their template images) or services that should not be interrupted, this presents an interesting alternative for important security updates (though for Linux, I’m under the impression that security problems are far more of a problem in userspace).)

TeraGrid07: CFP

THE ANNUAL TERAGRID CONFERENCE, TERAGRID ‘07: BROADENING PARTICIPATION IN TERAGRID, invites all interested individuals and organizations to participate. Attendees will include scientists and engineers, faculty, post docs, graduate and undergraduate students, high school teachers, representatives from federal agencies, grid computing industry representatives, and staff from TeraGrid resource providers and partners.

Submissions should address the development of grid computing capabilities and the applications of the TeraGrid to research and education, in particular:

* Scientific impacts that are the results of work on the TeraGrid and with TeraGrid partners
* Technology development, capabilities, and services
* Grid education/training and grids in support of education
* Education, outreach, and training

Full papers are due January 12, 2007.

http://www.union.wisc.edu/teragrid07/

[[ UPDATE: paper deadline has been extended to February 8th ]]


« Previous Page