When the Amman Stock Exchange’s IT team found problems with high availability last year, they came up with an elegant virtualisation solution to tackle the risks associated with downtime.
“The most challenging thing at a stock exchange is high availability. The stock exchange operates for three or three-and-a-half hours every day, and the trading system has to be available for every second of these three-and-a-half hours.”
So says Engineer Mohammed Khatib, Director of ICT, Amman Stock Exchange. Being at the digital helm of a stock exchange, where a country’s economy can either soar or tumble, he’s well aware of the problems that can come about when the IT system experiences any downtime whatsoever.
“If you’re down for a few seconds, people will know about it,” he says. “There’s a process that you have to go through, where you have to disclose that you’ve had a problem, stop the service, and disclose when the service is going to come back again. The publicity is really bad and the reputational cost is extremely high.
Certainly Khatib was facing a large problem, then, given that last year, the stock exchange had issues with the high availability of its website and web services.
“We’re running multiple domains – one is for the official website, and one is for a product that we call MarketWatch, which is where you can look at the prices of the stocks and see where they’re going,” he says. “All in all, we’re running approximately 60 apache servers, so it’s quite extensive. “
The Amman Stock Exchange hosts these services over an SDN1 link – the 155Mbps connection. Speed, Khatib says, was no issue. It was the disaster recovery site that caused headaches for the IT team.
“We had a choice of replicating everything at the DR site, but that would create problems of replicas, and the integrity of the data and switching. We thought of using the traditional cluster technology that we always used for everything else, which is Cluster A at Site A, Cluster B at Site B, and then SAN to SAN mirroring,” says Khatib.
The problem with this approach, however, was that the conversion would typically take three to five minutes – an unacceptable amount of downtime at a stock exchange. Impressively, Khatib came up with a solution using products for something they were never intended for, warranting him a speaker’s slot at the recent IDC Middle East CIO Summit.
“I wanted a technology that would replicate while everything is being treated as one cluster stretched across two locations, rather than individual clusters at both locations, and it wouldn’t affect the high availability,” he explains.
“We started looking at storage virtualisation because any storage virtualisation appliance will give you a reliable replica of the data, while, if you have the fibre infrastructure that we have, you can have all the machines in both locations connected to the same appliance rather than to different appliances. At the same time, you cluster these appliances, so that, if one in one location fails, the other one takes over.”
It sounds like a simple solution. However, Khatib and his team then realised that this would do nothing if the data itself was corrupted. Of course, the Amman Stock Exchange invests heavily in firewalls, IPSs and data leakage prevention, but a highly skilled hacker might still be able to get through. In this scenario, simple storage virtualisation would not cut it, Khatib says.
The team moved onto another idea, then, but it seemed even more improbable: “We thought that the only solution was to create a redundant software environment that is updated and managed completely separately. This means 60 servers over here, 60 servers over there, then another 60 here and another 60 there. It becomes ridiculous,” he says.
The answer, it seemed was to look toward server virtualisation technology. Having scoured the vendor market for a viable solution, Khatib landed on VMware’s Enterprise Plus, the highest-tier product in the vendor’s portfolio. The attraction, Khatib says, is that the virtual machine will expand and contract according to the load it gets.
“We thought that we’d create Environment A and B – software-wise – and we’ll put five gigantic servers at Site A, another five gigantic servers at Site B, virtualise everything and just throw it at the VMware.”
The result, Khatib says, is that virtual machines getting the hits are fully expanded as they’re being hit, but the redundant environment is completely shrunk because the virtual machine just lies there. The redundant machine is being updated, but it’s not actually managing any hits itself.
Meanwhile, the IT team could switch between Environment A and B through their load balancers. Switching from A to B resulted in Environment A gradually shrinking back to the smallest size, while B would immediately expand to serve the extra traffic. It’s a simple yet elegant solution.
The IT team completed the project with IBM, because, Khatib says, they liked the vendor’s storage virtualisation appliance, the Storage Virtualisation Centre (SVC). What’s impressive is that neither IBM’s SVC or VMware’s Enterprise Plus, were originally meant to improve high availability – they’re virtualisation solutions, sure, but Khatib and his team have combined them to solve an altogether different problem.
But how well does the system work? Khatib, for one, seems happy with the results.
“The services have always been running, but now all the high availability issues are done automatically, and the convergence time between the two sites is less than one second – it’s really fast,” he says. “I think we have managed to leverage storage virtualisation and server virtualisation effectively to achieve a goal that was not intended.”
That said, the solution wasn’t simple made available overnight; Khatib says that it took months of work before he was able to even start thinking about migration from the old system.
“These projects are complicated. You have to be patient, and your vendor has to be very patient,” he explains. “The implementation of something like this, it took two months of extensive work – two months of complete focus on the project – and then another four months for migration. It’s quite intense.”
The migration itself took four months, much of which was still spent running on the old environment: “It is easier to have redundancy in equipment and redundancy of set-up is in fact cheaper for you than downtime,” Khatib says. “And downtime is really unmeasurable because of the reputational cost. Using the load balancers, you just switch between one or the other.”
Of course, migration during the working week was hardly possible, given how important it was for the system to be online. Khatib says that he took to working on the weekends, starting at, say, 7pm on a Thursday, when the system is hardly receiving any hits at all. He monitored over the weekends, and then tested the new environment at the beginning of each week. If there was any hint of something going wrong, he’d immediately switch back to the old environment.
“On Sunday, it’s live on the new set-up, and you’re on your toes – anything goes wrong, you switch back to the old environment,” he says. And even after everything was set up and running smoothly, Khatib says that the team did a parallel run for a month, just to eliminate the threat of any downtime.
I think storage virtualisation is actually one of the best things that we have done, because now that the set-up is there, and you can easily migrate your other applications into it.
In the pink
“It’s very smooth now,” Khatib says. “I think storage virtualisation is actually one of the best things that we have done, because now that the set-up is there, and you can easily migrate your other applications into it. And it’s a very good luxury to be able to treat two sites as one.”
The solution wouldn’t work for everyone, though, Khatib points out. The project was made possible by the Amman Stock Exchange’s fibre infrastructure, which, Khatib says, is either impossible to install or at least very expensive to install. Luckily for him, he was able to complete the installation of a fibre infrastructure last year.
What’s more, Khatib was wise in his choice of vendors. Sure, the products might have been exactly what he was looking for, but given the set-up and migration time, his vendors had to be on-hand for six months – it comes as high praise indeed that Khatib came out of the process as a satisfied customer.
“You look at partners who are willing to be committed, who are looking at this as strategic, who are looking at you as a business partner – not someone they want to make a quick profit from. These kinds of projects are really difficult, and you have to be very careful who you’re working with,” he says.
In the current economic climate, there is no telling how the Amman Stock Exchange will fare financially in the coming years. But one thing is for sure: there’s no way the economy will falter due to a lack of high availability.