Entries in 'scheduling'

NC State’s Virtual Computing Lab

An interesting project I ran across, it started in 2004.

From http://vcl.ncsu.edu/:

The Virtual Computing Lab (VCL) is a remote access service that allows you to reserve a computer with a desired set of applications for yourself, and remotely access it over the Internet.

You can use all your favorite applications such as Matlab, Maple, SAS, Solidworks, and many others. Linux, Solaris and numerous Windows environments are now available to all NC State students and faculty.

Leasing custom environments to “public-ish” users via PXE or similar technology was happening in other places in 2004, but I never saw anything at this scale.

It is clear that some kind of reconfiguration/resetting happens:

What rights do I have on the VCL machine?

On custom Windows and Linux environments you have adminstrative and root level rights. Since the VCL system reloads each expired reservations with a clean environment, there is no threat of any residual data being left on a machine for the next user.

On Linux and Solaris Lab machine environments, you only have user level rights. The same premissions as you would experience at the console of a walk-in lab.

I wonder when they added the VM support mentioned at http://vcl.ncsu.edu/help/general-information/how-it-works:

The management nodes each control a subset of the VCL resources. These can be blades, virtual machines or lab machines. Currently, a set of individual blades or virtual machines can only be managed by a single management node. Typically there are anywhere from 80-120 physical computer nodes (blades) under one management node. Again the physical computer nodes can either be running a bare metal environment or a Virtual Machine hypervisor.

Here are deployment stats captured on Aug 25, 2008:

  • Total blades online: 438
  • Total blades offline: 87
  • Active Reservations: 49

Cool.

Workspace Service TP1.3.1

Some cool new features:

On behalf of the workspace team, I am happy to announce the TP 1.3.1 release of the Workspace Service. You can download the new release from: http://workspace.globus.org/downloads/index.html

The main new feature in this release is the implementation of the workspace pilot which provides non-invasive adaptations to batch schedulers (such as PBS) enabling sites to run virtual machines alongside jobs. The details of this approach are described in: http://workspace.globus.org/papers/workspace-pilot-paper-submitted.pdf

In addition, the release also contains the ensemble service that allows clients to create ensembles of heterogeneous virtual machines to be deployed and managed together, improvements to the client, and several bug fixes. The complete changelog can be found at: http://workspace.globus.org/vm/TP1.3.1/index.html#changelog

We welcome comments, feedback, and bug reports. Information about the project, software downloads, documentation and instructions on how to join the workspace-user mailing list for support questions can be found at: http://workspace.globus.org

Happy Valentine’s Day!

As you can read there, the main new feature is the pilot infrastructure. The paper Kate refers to in the announcement is a relatively short read and lays out the ideas (and a practical evaluation) in an organized way. But briefy: the pilot is a program the service will submit to a local site resource manager in order to obtain time on the VMM nodes. When not allocated to the workspace service, these nodes will be used for jobs as normal. Those jobs run in normal system accounts in Xen domain 0 with no guest VMs running.

Importantly, the approach leaves the site resource manager in full control of the nodes and requires no modifications to the site resource manager. Save perhaps possible configuration changes you might like to make. For example, you can mark particular nodes as able to accomodate guest VMs: the workspace service supports sending pilot requests to particular LRM queues, or providing a particular node property etc. This allows you to really organize not just when but where VMs can run.

Several extra safeguards have been added to make sure the node is returned from VM hosting mode at the proper time, including support for:

Also included is a one-command “kill 9″ facility for administrators as a “worst case scenario” contingency.

 

So as a buzzword experiment, I want to put in a particular keyword here and see how the search engine hits work out :-). I think you know what it may be…

Cloud computing

Go make a cloud!

And with the workspace pilot, you won’t have to switch over all at once. Take it for a test run and tell us about it on workspace-user.

We’ve got some exciting stuff in the pipeline for the next few months, too (see the last release announcement and the self-configuring 100 node VM cluster news). I am really happy with where the project is going and has been recently.

- Tim

Workspace Service TP1.3

Kate Keahey writes:

On behalf of the workspace team, I am happy to announce the TP 1.3 release of the Workspace Service. You can download the new release from:http://workspace.globus.org/downloads/index.html

This release adds significant new features. First, the workspace service can now start multiple VM instances grafted off of the same VM image in one request; the VM instances can be managed as a group or individually. Second, we added accounting functionality and services allowing users to query accounting information. Last but not least, we added configuration enhancements to make service administration easier, as well as numerous functionality and usability enhancements for the client.

These new features and enhancements necessitated some WSDL changes as well as an addition of a new namespace. The current release has a technology preview status: both interfaces and implementation are likely to change to some extent.

This new 1.3 release provides a baseline for a cycle of releases that aim to break up the service into several replacable components that could be easily and/or independently used. In particular, the next releases will include the following functionality:

  • the contextualization service allowing VMs to be adapted at deployment time to the context of a site, an organization, or other VMs
  • the workspace pilot which can be used in conjunction with existing batch schedulers (such as e.g. PBS) to run VMs as well as normal jobs on the same cluster

These technologies are currently undergoing alpha testing by selected friendly communities. If you are interested in testing, and don’t mind a little bit of adventure, give us a call.

We welcome comments, feedback, and bug reports. Information about the project, software downloads, documentation and instructions on how to join the workspace-user mailing list for support questions can be found
at: http://workspace.globus.org/

GridWay 5.2.1 released

See the release notes for details.

Linux 2.6.21: tickless idling

In Torvalds releases 2.6.21 kernel, Steven J. Vaughan-Nichols reports at length on the 2.6.21 kernel’s clockevents and dyntick (dynamic ticks) patches.

The result of these merges is a uniform timekeeping/scheduling interface (clockevents) and the ability to take the CPU into a true idle state if nothing is going on, shutting down the timer event and instead waiting for a regular interrupt or a scheduled, future event (this is dynticks).

This could apparently help with virtualization scheduling:

In the future, both these features will be used to improve virtualization. The virtualization manager — rather than scheduling by HZ — will determine which program or virtual operating system should have the lion’s share of the processor’s time. This is not a pie in the sky idea. The technique was already being used with Linux on IBM mainframes years ago, when trying to deal with a thousand virtual Linux servers at once using HZ scheduling. It lead to situations where the timer interrupt overhead alone was using up almost all of the processors’ time.

I’d imagine a thousand virtual servers splitting the system at once could do that :-)

He points to this informative LWN article from February. That further explains clockevents and dynticks — but also suggests that going tickless during idle times is just the beginning for dynticks:

What’s in 2.6.21 is, thus, not a full dynamic tick implementation. Eliminating the tick during idle times is a good step forward, but there is value in getting rid of the tick while the system is running as well - especially on virtualized systems which may be sharing a host with quite a few other clients. The dynamic tick documentation file suggests that the developers have this goal in mind

Well. On to more important stuff this may promise: battery life! I am joining this LWN commenter, crossing my fingers:

I haven’t seen benchmarks on this in particular, but I remember reading a piece a while ago where a guy claimed that his laptop battery lasts TWICE as long with a 100Hz tick than with a 1000Hz tick… So, I’m expecting great things…

A Resource Management Model for VM-Based Virtual Workspaces

My colleague Borja Sotomayor’s Masters paper, A Resource Management Model for VM-Based Virtual Workspaces, is now available for download. Congratulations Borja!

This is a long but well organized paper that goes into detail about different resource management scenarios for VMs and grid computing. It includes discussion and experimental results of combining different scheduling techniques for VMs (including advanced reservation) and accurately dealing with overheads (this problem is introduced in Overhead Matters: A Model for Virtual Resource Management).

Abstract follows in quotes. I also recommend the two page introduction to get a better idea of what this is all about.

Virtual workspaces provide an abstraction for dynamically deployable execution environments on a Grid. For this abstraction to be effective, it must be possible to provide on-demand software environments and enforceable fine grained resource allocations for these workspaces. Virtual machines are a promising vehicle to realize the virtual workspace abstraction, as they allow us to instantiate a precisely defined virtual resource, configured with desired software configuration and hardware properties, on a set of physical resources.

In this paper, we describe a model of virtual machine provisioning in a Grid environment that allows us to define such virtual resources and instantiate them on a physical Grid infrastructure. Our model focuses, firstly, on providing users with an accurate representation of virtual resources. To accomplish this, the overhead resulting from instantiating and managing virtual resources is scheduled at the same level as virtual resources, instead of being deducted from a user’s resource allocation. Secondly, our model also focuses on efficiently managing virtual resources by reducing the amount of overhead.

We argue that this model, compared to resource management models that rely on the job abstraction for remote execution, enables resource providers to accurately provision resources to users, while using their physical resources efficiently. We show experimental results that demonstrate the benefits of this model both from the resource providers and the user’s perspective, in two common resource management scenarios for virtual workspaces: advance reservations and batch-style submissions.

For more relevant talks and papers from the group, see the Workspace publications page.