Question:
I am currently running a process on an ec2 server that needs to run consistently in the background. I tried to login to the server and I continue to get a Network Error: Connection timed out prompt. When I check the instance, I get the following message:
Instance reachability check failed at February 22, 2020 at 11:15:00 PM UTC-5 (1 days, 13 hours and 34 minutes ago)
To troubleshoot, I have tried rebooting the server but that did not correct the problem. How do I correct this and also prevent it from happening again?
Answer:
An instance status check failure indicates a problem with the
instance, such as:
- Failure to boot the operating system
- Failure to mount volumes correctly
- File system issues
- Incompatible drivers
- Kernel panic
- Severe memory pressures
You can check following for troubleshooting
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/TroubleshootingInstancesStopping.html
For future reprting and auto recovery you can create a CloudWatch
AlarmFor second part
Nothing you can do to stop its occurrence, but for up-time and availability YES you can create another EC2 and add ALB on the top of both instances which checks the health of instance, so that your users/customers/service might be available during recovery time (from second instance). You can increase number of instances as more as you want for high availability (obviously it involves cost)