-
Notifications
You must be signed in to change notification settings - Fork 696
Closed
Labels
Description
We are using a ephemeral arm64 builders and are intermittently having builds stall due to a lack of capacity, even with on-demand instances.
Every time this happens, the build permanently stalls and has to be manually cancelled.
From a bit of digging it looks like adding InsufficientInstanceCapacity to the list of what's considered a "Scaling Error" should fix this.
Redacted CloudWatch log:
2023-02-06T10:27:56.604+11:00 2023-02-05 23:27:56.494 WARN [runners:34ecb39a-ae35-5506-b158-efc0938129a8 index.js:120365 createRunner] No instances created by fleet request. Check configuration! Response:
2023-02-06T10:27:56.604+11:00 {
2023-02-06T10:27:56.604+11:00 FleetId: 'fleet-92368284-5b0d-44bc-0e18-af80f19be5e5',
2023-02-06T10:27:56.604+11:00 Errors: [
2023-02-06T10:27:56.604+11:00 {
2023-02-06T10:27:56.604+11:00 LaunchTemplateAndOverrides: {
2023-02-06T10:27:56.604+11:00 LaunchTemplateSpecification: {
2023-02-06T10:27:56.604+11:00 LaunchTemplateId: 'lt-REDACTED',
2023-02-06T10:27:56.604+11:00 Version: '4'
2023-02-06T10:27:56.604+11:00 },
2023-02-06T10:27:56.604+11:00 Overrides: {
2023-02-06T10:27:56.604+11:00 InstanceType: 'c6gd.8xlarge',
2023-02-06T10:27:56.604+11:00 SubnetId: 'subnet-REDACTED'
2023-02-06T10:27:56.604+11:00 }
2023-02-06T10:27:56.604+11:00 },
2023-02-06T10:27:56.604+11:00 Lifecycle: 'on-demand',
2023-02-06T10:27:56.604+11:00 ErrorCode: 'InsufficientInstanceCapacity',
2023-02-06T10:27:56.604+11:00 ErrorMessage: 'We currently do not have sufficient c6gd.8xlarge capacity in the Availability Zone you requested (REDACTED). Our system will be working on provisioning additional capacity. You can currently get c6gd.8xlarge capacity by not specifying an Availability Zone in your request or choosing us-west-2b, us-west-2c, us-west-2d.'
2023-02-06T10:27:56.604+11:00 }
2023-02-06T10:27:56.604+11:00 ],
2023-02-06T10:27:56.604+11:00 Instances: []
2023-02-06T10:27:56.604+11:00 }
2023-02-06T10:27:56.622+11:00 2023-02-05 23:27:56.621 WARN [runners:34ecb39a-ae35-5506-b158-efc0938129a8 index.js:120384 createRunner] Create fleet failed, error not recognized as scaling error.
2023-02-06T10:27:56.622+11:00 [
2023-02-06T10:27:56.622+11:00 {
2023-02-06T10:27:56.622+11:00 LaunchTemplateAndOverrides: {
2023-02-06T10:27:56.622+11:00 LaunchTemplateSpecification: {
2023-02-06T10:27:56.622+11:00 LaunchTemplateId: 'lt-REDACTED',
2023-02-06T10:27:56.622+11:00 Version: '4'
2023-02-06T10:27:56.622+11:00 },
2023-02-06T10:27:56.622+11:00 Overrides: {
2023-02-06T10:27:56.622+11:00 InstanceType: 'c6gd.8xlarge',
2023-02-06T10:27:56.622+11:00 SubnetId: 'subnet-REDACTED'
2023-02-06T10:27:56.622+11:00 }
2023-02-06T10:27:56.622+11:00 },
2023-02-06T10:27:56.622+11:00 Lifecycle: 'on-demand',
2023-02-06T10:27:56.622+11:00 ErrorCode: 'InsufficientInstanceCapacity',
2023-02-06T10:27:56.622+11:00 ErrorMessage: 'We currently do not have sufficient c6gd.8xlarge capacity in the Availability Zone you requested (REDACTED). Our system will be working on provisioning additional capacity. You can currently get c6gd.8xlarge capacity by not specifying an Availability Zone in your request or choosing REDACTED.'
2023-02-06T10:27:56.622+11:00 }
2023-02-06T10:27:56.622+11:00 ]
2023-02-06T10:27:56.622+11:00 2023-02-05 23:27:56.622 WARN [scale-runners:34ecb39a-ae35-5506-b158-efc0938129a8 index.js:120511 Runtime.handler] Ignoring error: Create fleet failed, no instance created. {"runnerType":"Org","runnerOwner":"gravitational","event":"workflow_job","id":"11123528452"}
2023-02-06T10:27:56.623+11:00 END RequestId: 34ecb39a-ae35-5506-b158-efc0938129a8
2023-02-06T10:27:56.623+11:00 REPORT RequestId: 34ecb39a-ae35-5506-b158-efc0938129a8 Duration: 2269.85 ms Billed Duration: 2270 ms Memory Size: 512 MB Max Memory Used: 219 MB
REPORT RequestId: 34ecb39a-ae35-5506-b158-efc0938129a8 Duration: 2269.85 ms Billed Duration: 2270 ms Memory Size: 512 MB Max Memory Used: 219 MB