Entries in 'scalability'

Next Page »

Cloud bandwidth management

Interesting #13 here (well, they’re all interesting): 25 radical network research projects you should know about.

This points us to Cloud Control with Distributed Rate Limiting which is a paper about distributed bandwidth management.

From the conclusion:

As cloud-based services transition from marketing vaporware to real, deployed systems, the demands on traditional Web-hosting and Internet service providers are likely to shift dramatically. In particular, current models of resource provisioning and accounting lack the flexibility to effectively support the dynamic composition and rapidly shifting load enabled by the software as a service paradigm. We have identified one key aspect of this problem, namely the need to rate limit network traffc in a distributed fashion, and provided two novel algorithms to address this pressing need.

Check out the summary at networkworld but also here is an excerpt from a UCSD post about it:

If half your company’s bandwidth is allocated to your mirror in New York, and it’s the middle of the night there, and your sites in London and Tokyo are slammed, that New York bandwidth is going to waste. UC San Diego computer scientists have designed, implemented, and evaluated a new bandwidth management system for cloud-based applications capable of solving this problem.

The UCSD algorithm enables distributed rate limiters to work together to enforce global bandwidth rate limits, and dynamically shift bandwidth allocations across multiple sites or networks, according to current network demand.”

Cloud lock-in is not such a big deal

There’s been a lot of talk about the dangers of getting locked in to cloud platforms, developing an application that is only suited to one platform.

Here’s a, let’s say… “embellished” example: Gangsta cloud wars could pivot on the traffic-driving power of Google and Microsoft/Yahoo.

When you’re using VMs like Xen (e.g. on EC2), if you design things for it you “should be able to” move without a ton of hassle (research. plan.). The workspace project has been working on portability and usability (see The first one-click STAR production cluster) and one of the things we can do now is use the same VM image on a regular cluster (such as on the Teraport cloud) and EC2. The contextualization software can be configured to sense if it is on EC2 or not (and will bootstrap accordingly). It “would be nice” if such things were standardized but this is not a real problem right now (IMHO).

About something more “strongly typed” like Google’s AppEngine. Application migration might be a bit harder, but not if the APIs are well known and repeatable. Google’s SDK is even Apache 2 licensed.

To that point, have a look at Announcing AppDrop.com (host Google App Engine projects on EC2). It’s not there yet (database is a flat file) but, hey, it was developed in a few days. Cool. Read more at http://appdrop.com.

The long term idea is not that this would solve all your problems magically but that such things are possible, and if there’s a real market for choices, it seems like more work on things of this nature are also inevitable.

I’m no datacenter business expert, but the biggest problem right now seems to be that few people will be able to compete on costs/efficiencies of scale with Google/Amazon/Microsoft/eBay. (<predictions…>) It feels like it would naturally approach the straight web hosting business, though. Let’s say a standard, open source cloud computing infrastructure emerges (such as Apache httpd in the analogy). There will be various levels of players as far as the capital they have and certainly better and worse companies to choose from (including those that differentiate on service etc). But if you’re really sweating the savings an enormous company could provide with such efficiencies vs. a normal size company/datacenter, you’re probably at the point where you could save a whole lot more by buying your own computers.(</predictions…>)

Miscellaneous point about lock-in: something user-facing that ties you to a provider does not seem like a wise idea (e.g. Google’s Users API).

Google launches application hosting

They’ve taken the application level approach (Python currently).

And unlike Sun’s attempt (which also needed porting of app to a platform instead of the looser requirements of EC2 style), there is an interesting entry incentive:

“It’s free to get started. Every Google App Engine application can use up to 500MB of persistent storage and enough bandwidth and CPU for 5 million monthly page views.”

http://code.google.com/appengine/

http://code.google.com/appengine/docs/whatisgoogleappengine.html

http://googleappengine.blogspot.com/2008/04/introducing-google-app-engine-our-new.html

http://appgallery.appspot.com/

http://groups.google.com/group/google-appengine

Nimbus: The University of Chicago Science Cloud

If you’re on the workspace-announce list, you will have already seen the “Science Cloud Available at the University of Chicago” email.

Built with the workspace service, we’ve made some nice client enhancements to get to “cloud simplicity” and it’s up and running on 16 nodes and already serving guests. See the the documentation for command samples, the idea is to make it as simple as possible. On the service side, Nimbus uses TP1.3.1 with some very small additions (mostly this differs because of a new authorization plugin). Building cloud computing solutions is the main business of the workspace service.

Have a look!

Workspace Service TP1.3.1

Some cool new features:

On behalf of the workspace team, I am happy to announce the TP 1.3.1 release of the Workspace Service. You can download the new release from: http://workspace.globus.org/downloads/index.html

The main new feature in this release is the implementation of the workspace pilot which provides non-invasive adaptations to batch schedulers (such as PBS) enabling sites to run virtual machines alongside jobs. The details of this approach are described in: http://workspace.globus.org/papers/workspace-pilot-paper-submitted.pdf

In addition, the release also contains the ensemble service that allows clients to create ensembles of heterogeneous virtual machines to be deployed and managed together, improvements to the client, and several bug fixes. The complete changelog can be found at: http://workspace.globus.org/vm/TP1.3.1/index.html#changelog

We welcome comments, feedback, and bug reports. Information about the project, software downloads, documentation and instructions on how to join the workspace-user mailing list for support questions can be found at: http://workspace.globus.org

Happy Valentine’s Day!

As you can read there, the main new feature is the pilot infrastructure. The paper Kate refers to in the announcement is a relatively short read and lays out the ideas (and a practical evaluation) in an organized way. But briefy: the pilot is a program the service will submit to a local site resource manager in order to obtain time on the VMM nodes. When not allocated to the workspace service, these nodes will be used for jobs as normal. Those jobs run in normal system accounts in Xen domain 0 with no guest VMs running.

Importantly, the approach leaves the site resource manager in full control of the nodes and requires no modifications to the site resource manager. Save perhaps possible configuration changes you might like to make. For example, you can mark particular nodes as able to accomodate guest VMs: the workspace service supports sending pilot requests to particular LRM queues, or providing a particular node property etc. This allows you to really organize not just when but where VMs can run.

Several extra safeguards have been added to make sure the node is returned from VM hosting mode at the proper time, including support for:

Also included is a one-command “kill 9″ facility for administrators as a “worst case scenario” contingency.

 

So as a buzzword experiment, I want to put in a particular keyword here and see how the search engine hits work out :-). I think you know what it may be…

Cloud computing

Go make a cloud!

And with the workspace pilot, you won’t have to switch over all at once. Take it for a test run and tell us about it on workspace-user.

We’ve got some exciting stuff in the pipeline for the next few months, too (see the last release announcement and the self-configuring 100 node VM cluster news). I am really happy with where the project is going and has been recently.

- Tim

One dollar for a million SQS operations

Amazon SQS is a distributed message queue system with a simple, robust API and real infrastructure to back it. And their prices just dropped significantly from a penny per 100 requests to a penny per 10,000:

Dear Amazon SQS Developers,

We wanted to let you know about some changes we are making to Amazon SQS, based on customer feedback and watching the way customers are using the service. One thing we’ve heard consistently is that customers want to be able to use SQS along with our other services (e.g. Amazon EC2, Amazon S3), but need SQS to be less expensive for this to be more feasible. We looked at our architecture and feature set, and found a way to make a few, targeted changes, by deprecating a few infrequently used requests, which allow us to operate the service much more efficiently. Simultaneously, we are introducing a new pricing structure that replaces the previous per-messages-sent charge ($0.10/1,000 messages) with a new per-request fee ($0.01/10,000 requests, including all Amazon SQS operations). The net result is that the new pricing will result in significantly lower charges for most developers being billed for SQS.

I’m hoping we’ll look back in five years and reminisce about how they charged so much for EC2 as well :-) (I do think it’s a good price now unless you are looking to continually use many, many computers).

Virtual Cluster Appliances

This Better Know a VM entry, Virtual Cluster Appliances, gives an overview of VM contextualization technology which is scheduled to be part of the next workspace service release. This is not just relevant to classic grid computing, but any situation where you’d like to automatically launch many virtual machines that work together and want them to securely organize themselves and adapt to the deployment environment. It can even be used for one VM, we’ll look at such cases later.

Volunteer computing mixed with traditional grid computing

http://www.utexas.edu/oncampus/2007/11/15/tacc-feature/

The Texas Advanced Computing Center (TACC) recently announced its partnership with the World Community Grid. It will assist the project by running World Community Grid software on its employee PCs, installing the client on the new Stampede cluster –helping scientists scale their research for the World Community Grid – and allowing other large TACC clusters to run Grid computations when there are idle processors.
[…]
“We look forward to working with IBM to explore how researchers can most effectively utilize both TACC advanced systems and the World Community Grid to address problems with deep impact to society as well as science.”


Next Page »