-
Notifications
You must be signed in to change notification settings - Fork 28
AMI FAQ
At night, when I'm not actively working on the servers, I stop all servers. You can do this by first resizing all Asgard controlled auto scaling groups down to min, max = 0. After the clusters have resized to zero, then you want to use the EC2 console to stop (not terminate) the WXS catalog server, the Asgard Server, and the Eureka server. Please note that upon startup you will have to rerun the steps in the next question.
Question: Ok, I restarted the Asgard/Eureka/WXS Catalog server and nothing is working? How do I fix it.
The issue is that the IP addresses (and internal hostnames) change when you restart an instance. Therefore the clusters are trying to locate the old names you put into AcmeAirUserDataProvider.groovy previous. To fix this you need to:
- login to the Asgard server and killall -9 java and then fix the AcmeAirUserDataProdiver.groovy for the new internal dns names for the WXS Catalog server and Eureka server.
- restart Asgard with service tomcatd start
- let Asgard load (takes a few minutes) and delete the three clusters and recreate them. This step is required as there is no easy way in Asgard to update a launch configuration (the old launch configurations are still storing the old internal dns names).
- recreate the three Asgard clusters again from scratch
- re-run the grid data loader program
- Acme Air Web App, Acme Air Auth Service : /opt/wlp/usr/servers/server1/logs
- WXS Container, WXS Catalog : /opt/ObjectGrid/acmeair-netflix/logs
- Asgard, Eureka : /opt/apache-tomcat-VERSION/logs
[9/9/13 15:01:23:139 UTC] 00000033 SystemOut O ERROR - 1008081 - [DiscoveryClient_Heartbeat] (DiscoveryClient.java:1228) - DiscoveryClient_ACMEAIR-AUTH-SERVICE/domU-12-31-39-05-1A-E0 - was unable to send heartbeat!
com.sun.jersey.api.client.ClientHandlerException: java.net.UnknownHostException: ip-10-139-53-106
at com.sun.jersey.client.apache4.ApacheHttpClient4Handler.handle(ApacheHttpClient4Handler.java:184)
at com.sun.jersey.api.client.filter.GZIPContentEncodingFilter.handle(GZIPContentEncodingFilter.java:120)
at com.sun.jersey.api.client.Client.handle(Client.java:648)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:680)
at com.sun.jersey.api.client.WebResource.put(WebResource.java:211)
at com.netflix.discovery.DiscoveryClient.makeRemoteCall(DiscoveryClient.java:789)
at com.netflix.discovery.DiscoveryClient.makeRemoteCall(DiscoveryClient.java:753)
at com.netflix.discovery.DiscoveryClient.access$200(DiscoveryClient.java:87)
at com.netflix.discovery.DiscoveryClient$HeartbeatThread.run(DiscoveryClient.java:1212)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Caused by: java.net.UnknownHostException: ip-10-139-53-106
at java.net.InetAddress.getAllByName0(InetAddress.java:1370)
at java.net.InetAddress.getAllByName(InetAddress.java:1274)
at java.net.InetAddress.getAllByName(InetAddress.java:1199)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.resolveHostname(DefaultClientConnectionOperator.java:242)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:130)
at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:149)
at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:573)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:776)
at com.sun.jersey.client.apache4.ApacheHttpClient4Handler.handle(ApacheHttpClient4Handler.java:170)
If you are getting an error like the above, please look for if the server is coming up on a "domU-" internal hostname. If it is, kill it off and let EC2 allocate another instance on "ip-". This is a known issue that needs to be resolved, but isn't currently.