How to migrate legacy data centres to the cloud

How to migrate legacy data centres to the cloud

Francis Miers, director at Automation Consultants, explains the importance of ensuring everything is properly tested before migrating your applications from a legacy data centre to the cloud. 

Moving your data centre to the cloud can lower total cost of ownership, improve flexibility and time-to-market, and increase system availability and durability. With a variety of available options from Amazon Web Services (AWS) to Microsoft Azure to Google Cloud Platform (GCP), amongst others, there has never been a better time to make the jump for businesses looking to update their infrastructure.

In migrating a system of any significant complexity or business importance, it is normally necessary to perform a trial migration, before attempting to migrate the live application. A trial migration is done by copying the live data and performing the migration with the copy data, while still keeping real users on the existing system. If all goes well, a copy of the live application, with copied data, should be up and running in the cloud. At this point, the trial migration should be tested.

Testing a complex system is not straightforward, even with a successful trial migration. A complex system still interfaces with many other systems. Care must be taken that the trial copy does not try to connect to live systems including live databases. If it does, it can send incorrect data to live systems, causing them to malfunction. Otherwise, the trial copy will not work correctly, and is hard to test, unless it can connect to the systems it was designed to connect to. 

This problem can be solved to some extent by creating stubs to connect to its interfaces. Stubs mimic the responses of a real system to which the system under test must be connected. Various stubbing tools exist. For example, SmartBear ServiceV Pro, CA Lisa and IBM Rational Test Virtualization Server. These tools provide dummy responses to the interfaces of the system under test, and allow it to run and be tested. Of course, the testing must account for the fact that the set of responses provided by the stub(s) is never as full as what would be provided by a real system connected to the interface.

The appropriate amount of testing should be driven by two factors, how important is the system concerned, and how complex is it? If a system is customer-facing and revenue-generating and very complex (e.g. TSB’s online banking system), a lot of testing is appropriate. If the system is simple and of minor business importance (e.g. a meeting-room booking system), in-depth testing would be wasteful.

Testing of a migration is not the same as testing in software development. Migrations do not generally involve code changes, so application’s functionality can be assumed to work, except where it calls on outside systems, i.e. where it relies on its interfaces. Testing should therefore focus on the interfaces. If an application performs a complex calculation, there is no need to test it, but if the calculation relies on getting a piece of data from an external system, that aspect of it should be tested. Migration test plans should therefore be long in proportion to the importance and complexity of the application, and should focus on testing the interfaces.

Testing should not only consist of testing by the technical staff performing the migration, but also include user acceptance testing (UAT). Thanks to their subject matter knowledge, users often notice bugs that escape the technical staff performing the migration. UAT can also help build confidence and buy-in among users in the migration. Complex migrations can be long and arduous, and user support can make the difference between success and failure.

Migrating an entire data centre normally takes months or years. Applications are not migrated all at once; instead a rolling programme of migrations is put in place with trial migrations, testing and live migrations (see below) all taking place at the same time on different applications. One or more live migrations will take place on most weekends.

Migrations

The migration of a live application to the cloud must often take place in a restricted time window such as a weekend. With some systems, such as some data warehouses, daily use is not required, and a longer planned outage may be possible. In the main, however, users must be able to use the system until close of business on a Friday, and will expect it to be up and running when they return to work the next Monday.

Given the tight time window, a live migration of any complexity must be very carefully planned. Most operations must work right the first time or the migration will overrun. This is a major reason for performing a trial migration beforehand. The trial migration should have eliminated any errors, and allowed every operation to be timed to ensure that whole process will fit in the time window. During the trial migration, if any operation is found to take more time than is available during the go-live weekend, steps can be taken to make it go faster, such as writing a script to automate it; or it can be started earlier in the weekend, or run in parallel with other tasks, or broken down such that the live migration takes place in phases over more than one weekend.

Even with the best of planning, problems can occur over a go-live weekend. To guard against this, a roll-back plan should be put in place. Under roll-back, the migration is aborted, and the system is returned to its state before the migration. This creates delay and additional cost in the migration programme, but it prevents an unplanned outage with all the damage that it can inflict on the organisation.

A go-live weekend consists of carefully planned, intense activity. If all goes well, the migration will be successful and a set of quick sanity tests (“smoke tests”) will be performed and will pass. At that point the decision will be taken to commit to the migration (i.e. not roll back), and the system will go live in the cloud ready for use on the Monday, and any necessary user communication will be done.

Operations and optimisations

After the go-live, the migration team will remain on hand for a few days to react quickly in case anything should go wrong. During and immediately after this period, optimisations can be put in place for performance, e.g. by database tuning, or adjusting the size of the cloud servers running the application, or making use of content delivery networks such as AWS CloudFront.

Conclusions

Migration of applications to the cloud has much in common with migration to another data centre, but there are certain particularities. It is not possible to use legacy hardware or network protocols in the cloud, and latency is likely to be a factor unless the cloud data centre happens to be very close. On the other hand, AWS and the other cloud platforms provide a range of tools and aids to help with migration to the cloud, such as tools to discover existing systems, VPNs to connect your data centre directly to their cloud, and their partners have training and experience in performing migrations to the cloud.

Migrations of complex systems to the cloud are complex and challenging. Careful planning and testing are essential to avoid very damaging consequences. The rewards, though, can be great, including lower IT costs and greater reliability, and the possibility for your organisation to innovate and bring products to market faster.