-
Notifications
You must be signed in to change notification settings - Fork 314
Stack Creation Failures
Sean Smith edited this page Feb 21, 2019
·
5 revisions
If you see:
$ pcluster create mycluster
Creating stack named: parallelcluster-mycluster
Status: parallelcluster-mycluster - ROLLBACK_IN_PROGRESS
Cluster creation failed. Failed events:
- AWS::EC2::Instance MasterServer Received FAILURE signal with UniqueId i-07af1cb218dd6a081
There was a problem creating the cluster, to diagnose, re-run the create with the --norollback
flag. Then ssh into the cluster:
$ pcluster create mycluster --norollback
...
$ pcluster ssh mycluster
From there, there's three log files that will tell you what error occurred:
-
/var/log/cfn-init.log
start here, likely you'll see an error likeCommand chef failed
, look above that line for the specific error -
/var/log/cloud-init.log
if you don't see anything telling incfn-init.log
this a good resource -
/var/log/cloud-init-output.log
you can view the stdout of the cfn-init command. Usually not needed.
If in the /var/log/cfn-init.log
file you see this,
STDERR: Traceback (most recent call last):
File "/usr/local/sbin/attachVolume.py", line 90, in <module>
main()
File "/usr/local/sbin/attachVolume.py", line 68, in main
response = ec2.attach_volume(VolumeId=volumeId, InstanceId=instanceId, Device=dev)
File "/usr/lib/python2.7/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/lib/python2.7/site-packages/botocore/client.py", line 661, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidParameterValue) when calling the AttachVolume operation: Invalid value '/dev/sdb' for unixDevice. Attachment point /dev/sdb is already in use
This is a known bug in pcluster-2.1.1
that effects NVME based instances type, such as c5, m5, z1d. See #823 for more info.
Upgrade to a version > 2.1.1 to fix.