Skip to content

Stack Creation Failures

Sean Smith edited this page Feb 21, 2019 · 5 revisions

If you see:

$ pcluster create mycluster
Creating stack named: parallelcluster-mycluster
Status: parallelcluster-mycluster - ROLLBACK_IN_PROGRESS                        
Cluster creation failed.  Failed events:
  - AWS::EC2::Instance MasterServer Received FAILURE signal with UniqueId i-07af1cb218dd6a081

There was a problem creating the cluster, to diagnose, re-run the create with the --norollback flag. Then ssh into the cluster:

$ pcluster create mycluster --norollback
...
$ pcluster ssh mycluster

From there, there's three log files that will tell you what error occurred:

  • /var/log/cfn-init.log start here, likely you'll see an error like Command chef failed, look above that line for the specific error
  • /var/log/cloud-init.log if you don't see anything telling in cfn-init.log this a good resource
  • /var/log/cloud-init-output.log you can view the stdout of the cfn-init command. Usually not needed.

Attach Volume Failure

Error

If in the /var/log/cfn-init.log file you see this,

STDERR: Traceback (most recent call last):
  File "/usr/local/sbin/attachVolume.py", line 90, in <module>
    main()
  File "/usr/local/sbin/attachVolume.py", line 68, in main
    response = ec2.attach_volume(VolumeId=volumeId, InstanceId=instanceId, Device=dev)
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 661, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidParameterValue) when calling the AttachVolume operation: Invalid value '/dev/sdb' for unixDevice. Attachment point /dev/sdb is already in use

Fix

This is a known bug in pcluster-2.1.1 that effects NVME based instances type, such as c5, m5, z1d. See #823 for more info.

Upgrade to a version > 2.1.1 to fix.

Clone this wiki locally