Recover from Failed Amazon EC2 Instances

One of the things that’s not immediately obvious about Amazon EC2 instances is that they could fail, in fact Amazon says:

It’s inevitable that EC2 instances will fail, and you need to plan for it. An instance failure isn’t a problem if your application is designed to handle it.

The EC2 forum posts are littered with users whose EC2 instances have become unresponsive and cann’t be stopped or restarted. Instances can get “stuck” in “stopping” mode for 24 hours or more. Amazon generally recommends issuing a forced stop via the client tools “ec2-stop-instances force” command, but this actually doesn’t seem to work in most cases.

I recommend everyone follow these steps to prepare for a failure scenario:

  1. In the “Instances” panel: create a new instance using the same AMI as your production instance. This is your backup instance. “Stop! the instance after it is created. (Amazon will not charge you for any stopped instances).
  2. In “Volumes”: detach and then drop the drive that was created as part of this new instance.
  3. Still in Volumes: create a snapshot of your production drive.
  4. Go to the “Snapshots” section of the panel, select your new snapshot and choose “create volume from snapshot.” Be sure to choose the same availability zone as your instance. I’ve seen some caching issues here, so if you don’t see your snapshot when selecting this menu, be sure to refresh.
  5. Go back to “Volumes” and choose “attach volume” on your new available volume. Choose your stopped backup instance and type in the same device as your original volume (visible under “attachment information” for the volume)
  6. Go ahead and start your backup instance, it should be an exact copy of your production instance.


Be the first to comment

Leave a Reply

Your email address will not be published.


CommentLuv badge