Entries in 'network'

One-click clusters, VWS TP1.3.3

A lot of developments with the workspace service and science clouds recently!

The cluster technology lets you bootstrap generic images into new network and security contexts on the fly. We built a sample cluster on top of the technology that lets you create the cluster and be immediately ready to submit jobs to a Torque cluster fronted by GRAM and GridFTP that use a newly created self-signed certificate:

 

  1. cloud-client.sh –run –hours 12 –cluster base-cluster.xml
  2. Wait a few minutes, once launched note the head-node hostname
  3. scp -r root@HOSTNAME:certs/*  lib/certs/

    (SSH was bootstrapped end to end already)

  4. Make sure your grid tools trust this certificate and then submit work

 

This can be done with nearly anything that can run on a non-virtual cluster. Check out these links for more information:

Cloud bandwidth management

Interesting #13 here (well, they’re all interesting): 25 radical network research projects you should know about.

This points us to Cloud Control with Distributed Rate Limiting which is a paper about distributed bandwidth management.

From the conclusion:

As cloud-based services transition from marketing vaporware to real, deployed systems, the demands on traditional Web-hosting and Internet service providers are likely to shift dramatically. In particular, current models of resource provisioning and accounting lack the flexibility to effectively support the dynamic composition and rapidly shifting load enabled by the software as a service paradigm. We have identified one key aspect of this problem, namely the need to rate limit network traffc in a distributed fashion, and provided two novel algorithms to address this pressing need.

Check out the summary at networkworld but also here is an excerpt from a UCSD post about it:

If half your company’s bandwidth is allocated to your mirror in New York, and it’s the middle of the night there, and your sites in London and Tokyo are slammed, that New York bandwidth is going to waste. UC San Diego computer scientists have designed, implemented, and evaluated a new bandwidth management system for cloud-based applications capable of solving this problem.

The UCSD algorithm enables distributed rate limiters to work together to enforce global bandwidth rate limits, and dynamically shift bandwidth allocations across multiple sites or networks, according to current network demand.”

CFP: Special issue on Networks for Grid Applications

From Call for papers: Special issue on Networks for Grid Applications

Grid developers and practitioners are increasingly realising the importance of an efficient network support. Entire classes of applications would greatly benefit by a network-aware Grid middleware, able to effectively manage the network resource in terms of scheduling, access and use. Conversely, the peculiar requirements of Grid applications provide stimulating drivers for new challenging research towards the development of Grid-aware networks.

Cooperation between Grid middleware and network infrastructure driven by a common control plane is a key factor to effectively empower the global Grid platform for the execution of network-intensive applications, requiring massive data transfers, very fast and low-latency connections, and stable and guaranteed transmission rates. Big e-science projects, as well as industrial and engineering applications for data analysis, image processing, multimedia, or visualisation just to name a few are awaiting an efficient Grid network support. They would be boosted by a global Grid platform enabling end-to-end dynamic bandwidth allocation, broadband and low-latency access, interdomain access control, and other network performance monitoring capabilities.

As a natural extension of the discussion forum provided by the Gridnets conference series, this special section aims at gathering top-quality contributions to the most debated topics currently tackled in Grid networking research. Topics include, but are not limited to:

* Network architectures and technologies for grids
* The network as a first class Grid resource: network resource information publication, brokering and co-scheduling with other Grid resources
* Interaction of the network with distributed data management systems
* Network monitoring, traffic characterisation and performance analysis
* Inter-layer interactions: optical layer with higher layer protocols, integration among layers
* Experience with pre-production Grid network infrastructures and exchange points
* Peer-to-peer network enhancements applied to the Grid
* Network support for wireless and ad hoc grids
* Data replication and multicasting strategies and novel data transport protocols
* Fault-tolerance, self healing networks
* Security and scalability issues when connecting a large number of sites within a virtual organization VPN
* Simulations
* New concepts and requirements which may fundamentally reshape the evolution of Networks.
* Integration of advanced optical networking technologies into the Grid environment
* End to end lightpath provisioning software systems and emergent standards

DRBD, LVM, GNBD, and Xen for free and reliable SAN

At home, I wanted a reliable disk solution for backups and also wanted a big, blank and resizable storage system for virtual machines. I knew I wanted to be able to get at the shared disk remotely from other nodes and wanted to be able to replace broken hardware quickly if something failed. I also didn’t want to spend a lot of time reconfiguring OSs and software in the case of a total system failure.

I have two cheap computers and so I put some big disks in them and mirrored the disks over the network. Instead of using one file server node and RAID1, this is something like a “whole system RAID”. If anything at all breaks in either computer, hosted services can keep running and data is unharmed except for whatever was unsynced in RAM.

To accomplish the disk mirror I used DRBD. DRBD is a special block device that is designed for highly available clusters, it mirrors activity directly at the block device level across the network to another disk. So like a RAID1 configuration over the network. It lets you build something like the shared storage devices on a SAN, but without any special hardware. This provides the basic reliability layer.

diagram: two hosts mirrored with DRBD over crossover cable

Linux Logical Volume Management (LVM) is a popular tool that lets you flexibly manage disk space. Instead of just partitioning the disk, using LVM lets us do on-the-fly logical partition resizing, snapshots (including hosting snapshot+diffs), and adding more physical disks into the volume group as needs grow (you can even resize a logical partition across multiple underlying disks). Each logical partition is formatted with a filesystem of its own. Using LVM avoided some future headaches I think.

That is how the disk is setup, now how to access it remotely? You could run a shared filesystem of course, exporting via an NFS server on host A (or B). Instead, having heard good things about Global Network Block Device (GNBD) on the Xen mailing lists, I chose to export the logical block devices (from LVM) directly over the network with GNBD. Another node makes a GNBD import and the block device appears to be a local block device there, ready to mount into the file hierarchy. This is like iSCSI but it is a snap to set up and use.

And if that other node is a Xen domain 0, that block device is very handily ready to be used as a VM image, just as if it was a raw partition on that node.

diagram: one of the nodes of the disk array exporting an LVM partition over GNBD to a Xen dom0

Here’s an example Xen configuration using the imported block device:

disk = [ ‘phy:/dev/gnbd/vmimage001,sda1,w’]

The guest VM needs no awareness of all these tools, it just sees its sda1 and mounts it like anything else Xen presents to it as one of its “hardware” partitions.

Instead of just using the file store for backups and VMs that are used intermittently, I’m also running persistent services like websites, the incremental-backup server and a media server in VMs stored there.

First, this allows for basic backups of the LAN services without any backup software, that’s nice to have, although I really prefer a combination of incremental backups and RAID1. (Here we also avoid a Russell’s paradox situation with the backup server).

Second, keeping time-consuming-to-configure services in a VM allows me to replace hardware quickly, including whole computers in the event of a total failure: the only software I’ll ever need to reinstall is {Linux, DRBD, LVM, GNBD} for a file server node and {Linux, Xen} for a VMM node.

As long as net latency is really low (here it is sub-ms) it doesn’t really matter that the disk is remote for any of my uses. The VMs always respond very well.

(I should mention: you could of course take GNBD out of the picture and run the VMs on host A and B if Xen were installed there)

Another bonus: using GNBD, you can live-migrate the VM to any node that can do a GNBD import. This is nice to have. I only live migrate manually, though. Both DRBD and GNBD have some features that allow for seamless failover but I don’t really need any of this at home.

To learn more about that, check out this paper on the new DRBD (it is interesting): http://www.drbd.org/fileadmin/drbd/publications/drbd8_wpnr.pdf

Thinking about high availability in this kind of setup for a minute, a possible and simple to execute arrangement for services that need to be up at all times would be to take two DRBD mirrored nodes, run VMs on one or both of them, and have the physical nodes heartbeat each other. This is a simpler approach than a centralized file server with block device export, here we just have two peer VMMs that are “watching out” for each other.

You’d have two master/slave arrangements, so in the normal operating case: one VMM with partition A as DRBD master and partition B as secondary, then on the other VMM you have partition A as secondary and partition B as master. VMs run from a partition that is the DRBD master.

Let’s say you split four services into four VMs and put two VMs on each physical node. One of the physical node’s disks fail entirely and a monitor process notices. The heartbeat script makes sure the OK node is now the DRBD master for both partition A and B. Then it boots the two VMs previously hosted on the failed node on the OK node, re-allocating RAM for the time being to accomodate all four VMs.

diagram: 2 VMs migrate to the OK node

The applications in those VMs recover just as if they went through a normal hard system reset (their network addresses can stay the same since both physical nodes are on the same LAN). Once the administrator gets the alert email and puts a new disk in, another script is ready to resync DRBD and then migrate two of the VMs back to their normal place.

This seems like something to consider for a highly hammered and important head node (like a Globus GRAM node for example). All it takes is another node, commodity hardware and open source software!

Virtual Workspaces Service TP1.2.2, DHCP support added

Kate Keahey writes on workspace-announce:

We are happy to announce the release TP1.2.2 of the Workspace Service. The new release DHCP-enables workspace network configuration mechanisms. In addition, we have streamlined the workspace logistics information, added unit tests, and extended the documentation.

For a detailed changelog, see the TP1.2.2 documentation:

http://workspace.globus.org/vm/TP1.2.2/index.html

You can download the new release from:

http://workspace.globus.org/downloads/index.html

This is an exciting release because we include an “invisible” method for configuring VMs as they are deployed. We added support for running a local DHCP server on hypervisor nodes that, before the VM is deployed, is dynamically keyed with the specific information intended for each of the workspace’s NICs (including IP, DNS, default route, hostname, etc).

Because we assign a unique MAC address to each NIC (and also dynamically install ebtables rules to make sure that the NIC can only use that MAC address), the DHCP server can be configured to respond with the exact information needed.

Further, the DHCP broadcast request is routed to the local DHCP server only, meaning that if your site already has a DHCP server running the request will not make it there (even if there is not a DHCP server already present this 1. prevents unnecessary noise on the network and 2. prevents other VMs from running their own DHCP servers and responding to the requests).

Support for ensuring only the intended IP address is used by each NIC was also added.

For more information, see the administrator guide’s DHCP overview and configuration section. That section also includes a link to a design document that goes into more depth.

Xen network diagrams

Gabriel Gunderson has posted a new set of Xen network diagrams. There are some other diagrams here on the Xen wiki.