The 42nd fastest supercomputer on earth doesn’t exist.This fall, Amazon built a virtual supercomputer atop its Elastic Compute Cloud — a web service that spins up virtual servers whenever you want them — and this nonexistent mega-machine outraced all but 41 of the world’s real supercomputers.
Yes, beneath Amazon’s virtual supercomputer, there’s real hardware. When all is said and done, it’s a cluster of machines, like any other supercomputer. But that virtual layer means something. This isn’t a supercomputer that Amazon uses for its own purposes. It’s a supercomputer that can be used by anyone.
Amazon is the poster child for the age of cloud computing. Alongside their massive e-tail business, Jeff Bezos and company have built a worldwide network of data centers that gives anyone instant access to computing resources, including not only virtual servers but virtual storage and all sorts of other services that can be accessed from any machine on the net. This global infrastructure is so large, it can run one of the fastest supercomputers on earth — even as it’s running thousands upon thousands of other virtual servers for the world’s businesses and developers.
This not only shows the breadth of Amazon’s service. It shows that in the internet age, just about anyone can run a supercomputer-sized application without actually building a supercomputer. “If you wanted to spin up a ten or twenty thousand [processor] core cluster, you could do it with a single mouse click,†says Jason Stowe, the CEO of Cycle Computing, an outfit that helps researchers and businesses run supercomputing applications atop EC2. “Fluid dynamics simulations. Molecular dynamics simulations. Financial analysis. Risk analysis. DNA sequencing. All of those things can run exceptionally well atop the [Amazon EC2 infrastructure].â€
And you could do it for a pittance — at least compared to the cost of erecting your own supercomputer. This fall, Cycle Computing setup a virtual supercomputer for an unnamed pharmaceutical giant that spans 30,000 processor cores, and it cost $1,279 an hour. Stowe — who has spent more than two decades in the supercomputing game, working with supercomputers at Carnegie Mellon University and Cornell — says there’s still a need for dedicated supercomputers you install in your own data center, but things are changing.
“I’ve been doing this kind of stuff for awhile,†he says, “and I think that five or 10 years from now, researchers won’t be worrying about administering their own clusters. They’ll be spinning up the infrastructure they need [from services like EC2] to answer the question they have. The days of having your own internal cluster are numbered.â€
To Cloud or Not to Cloud
The old guard does not agree. Last month, during a round table discussion at the Four Seasons hotel in San Francisco, many of the companies that help build the world’s supercomputers — including Cray and Penguin Computing — insisted that cloud services can’t match what you get from dedicated cluster when it comes to “high-performance computing,†or HPC. “Cloud for HPC is still hype,†said Charlie Wuischpard, the CEO of Penguin Computing. “You can do some wacky experiments to show you could use HPC in that environment, but it’s really not something you would use today.â€
But it is being used today. And Amazon’s climb up the Top 500 supercomputer list shows that EC2 has the capacity to compete with at least the supercomputers that are built with ordinary microprocessors and other commodity hardware parts. “Rather than building your own cluster,†says Jack Dongarra, the University of Tennessee professor who oversees the annual list of the Top 500 supercomputers, “Amazon is an option.â€
Amazon’s virtual supercomputer wasn’t nearly as powerful as the massive computing clusters sitting at the peak of the Top 500. It could handle about 240 trillion calculations a second — aka 240 teraflops — while the machine at the top of the list, Japan’s K Computer, reaches 10 quadrillion calculations a second, or 10.51 petaflops. As Dongarra points out, clusters like the K Computer use specialized hardware you won’t find at Amazon or other supercomputers below, say, the top 25 on earth. “The top 25 are rather specialized machines,†Dongarra says. “They’re designed in some sense for a subset of very specialized applications.â€
But according to Dongarra, you could still run these specialized applications atop Amazon. They just wouldn’t be quite as fast. And though some researchers and business need are looking for petaflops, others will do just fine with teraflops.
The irony is that Charlie Wuischpard and Penguin Computing actually offer their own online supercomputing service. They call it Penguin-On-Demand. But this is a little different from Amazon EC2. In essence, Penguin is offering remote access to a specific set of machines running in one of its data centers, whereas Amazon offers access to a virtual infrastructure that shared among everyone using the service. “[POD] is not a virtualized resource,†Wuischpard tells us. “It’s especially built for high-performance computing workloads. Amazon is now trying to add this sort of thing to their toolkit, if you will, but I still think we have a leg up on them.â€
The distinction between the two is rather difficult to get at. Ultimately, it comes down to two things: Penguin can tell you exactly where your application is running, and it has a long history with supercomputing. “There is a lot of difficulty in getting your application to run in the cloud,†Wuischpard says. “There’s network drivers and compilers and other stuff. You could figure out a lot of that on your own, but part of our aim with POD is to provide of expertise in building and running these machines to help our customers get on board and start using it.†According to Chuck Moore, a corporate fellow and technology group CTO at chip-designer Advanced Micro Devices, application will require a significant rewrite if you’re moving them from an old school supercomputer to a service like Amazon.
Some operations do prefer Penguin’s service to Amazon. Earthtime — a company that offers 3-D maps of the world much like Google Street View offers 2-D images — uses POD to generate these 3-D models, and company founder and chief technology officer John Ristevski cites Penguin’s support as a reason his company doesn’t use Amazon. “You need a certain level of support, help with things like loading data off out disks and tweaking the performance of the cluster to suit our needs,†he tells Wired. “That’s not something we’ll ever get from Amazon. Amazon is never going to manage the distribution of the jobs or the processing itself, which is something that Penguin does.â€
But with Amazon, a company like Cycle Computing can provide this sort of help, and even Penguin CEO Charlie Wuischpard acknowledges that the gap between Amazon and dedicated supercomputers is shrinking. Amazon built its virtual supercomputer for the Top 500 list as a way of announcing a new type of virtual server instance on EC2 that’s specifically designed for HPC applications. It’s unclear how Amazon ran its benchmark tests for the Top 500 List — the company did not respond to multiple requests for comment — but it looks like they ran the tests on a new cluster of physical machines before they were actually added to Amazon’s public service. Amazon previously offered instances for HPC applications, but these new CC2 instances are even beefier.
Spin Up, Spin Down
The point is that Amazon is an option. And it’s a rather convenient option. For Jason Stowe, the CEO of Cycle Computing, the idea of building 30,000-core supercomputer with no hardware that costs just $1,279 an hour to run is something that can’t be ignored. “It’s just absurd,†he says. “If you created a 30,000-core cluster in a data center, that would cost you $5 million, $10 million, and you’d have to pick a vendor, buy all the hardware, wait for it to come, rack it, stack it, cable it, and actually get it working. You’d have to wait six months, 12 months before you go it running.â€
And by that time, he says, your application may have changed. “Your question may have evolved since you first provisioned your infrastructure,†Stowe says. “You may need more than 30,000 cores.†The added twist is that after you spin up 30,000 machines on Amazon, you can just as easily spin them down when you don’t need them.
Stowe agrees that Amazon isn’t for everyone. He acknowledges that Amazon’s virtualization layer may put a real drag on certain applications — a dedicated supercomputer runs without virtualization — but he says there are far more applications that will run just fine on a cloud service. And any drag will be much less than the six to 12 months it would take to build a supercomputer — not to mention the expense. “Your application may run 5 percent slower,†he says. “But you’re still getting access to world-class compute power.â€
[Wired]