Troubleshooting AWS runners
Learn how to troubleshoot AWS runners.
If you encounter any issues while setting up or operating a runner, please follow these steps:
- Review the common problems.
- If the issue persists, reach out to support.
Contacting Support
To start a support chat, use the bubble icon located in the bottom right corner of the application. When contacting support, please include the following information:
-
Any error messages and relevant screenshots.
-
Report Issue
Copy Runner ID and Version
-
Navigate to Settings > Runners.
-
Locate your runner card.
-
Click
...
in the top right corner and selectCopy ID
. -
The Runner Version is displayed as the last item in the menu.
Find Runner ID and Version
Find CloudFormation Stack
- Navigate to Settings > Runners.
- Open the runner card to find the Stack Name, URL, and region.
Retrieve Runner Logs (ECS Task Logs)
Using ECS Console
To view the logs for the runner using the ECS console:
- Navigate to the AWS ECS console.
- Locate the cluster by the stack name.
- Select the service associated with the runner.
- Go to the Tasks tab and find the most recent failed or active task.
- Click the task ID to open the details.
- Check the Logs tab or find the CloudWatch log stream.
Note that each task has two log groups: one for the runner itself and another for Prometheus (monitoring); we need the former.
Using AWS CLI
To look up the cluster name and task ID using AWS CLI, follow these commands:
- To list all clusters and find your cluster name by the stack name:
- To list tasks in a specific cluster and find your task ID:
- Once you have the cluster name and task ID, you can view the logs for the runner:
Common Problems
Network misconfigurations are the most frequent causes of installation issues. Please refer to the infrastructure prerequisites to ensure all requirements are met. Below are common problems along with their diagnostics.
CloudFormation Stack Fails
-
Symptoms:
- Stack Event Status:
ROLLBACK_COMPLETE
orROLLBACK_IN_PROGRESS
due to missing VPC, availability zones, or subnets. - Stack Event Status Reasons:
Parameter validation failed: parameter value for EC2RunnerInstancesSubnet does not exist.
Parameter validation failed: parameter value for parameter name EC2RunnerInstancesSubnet does not exist.
Parameter validation failed: parameter value for parameter name EC2RunnerAzs does not exist.
- Stack Event Status:
-
Diagnostics:
- On the initial page of the CloudFormation stack creation, ensure you select a VPC, at least one availability zone, and a subnet.
- Choose subnets across multiple availability zones for fault tolerance.
Runner Task Fails
-
Symptoms:
- Stack Event Status:
CREATE_FAILED
orROLLBACK_IN_PROGRESS
because the runner task fails to launch or is stuck in a pending state. - Stack Event Status Reason:
ECS Deployment Circuit Breaker was triggered.
- Runner task fails initialization with errors such as
ResourceInitializationError: ...
. - Secrets Manager or other AWS services are inaccessible to the runner.
- The runner cannot pull container images or resolve DNS queries.
- Stack Event Status:
-
Diagnostics:
- Verify that the VPC has an Internet Gateway or NAT Gateway configured.
- Update the route tables to direct public subnets to the Internet Gateway and private subnets to the NAT Gateway.
- For private subnets, add VPC endpoints for services like Secrets Manager, S3, and ECR.
- Confirm that security groups allow outbound traffic to the required services.
Instance Type Not Available
If you encounter an error stating that the requested instance type is unavailable in a specific availability zone (e.g., “The selected instance type m6i.xlarge is not available in the automatically assigned zone us-east-1e”), this is often due to regional or zone-specific availability constraints within AWS.
Some zones, like
us-east-1d
andus-east-1e
, have been reported to experience resource shortages more frequently. If possible, avoid using these zones exclusively and instead install your runners across multiple zones or regions.
Here’s how you can address this:
-
Install a Runner to a Different Region:
- Some instance types may be unavailable in certain regions or zones due to resource constraints. Refer to AWS instance type availability for details. If necessary, install runners to use a different AWS region that supports your preferred instance type.
-
Select Multiple Availability Zones:
- When installing a runner using the AWS CloudFormation Stack, ensure that you select multiple subnets. For example, instead of restricting your environment to only the subnet corresponding to
us-east-1e
, include subnets corresponding tous-east-1a
andus-east-1b
zones to improve availability.- You can also update the existing stack parameters.
- When installing a runner using the AWS CloudFormation Stack, ensure that you select multiple subnets. For example, instead of restricting your environment to only the subnet corresponding to
-
Use an Alternate Instance Type:
- If the desired instance type (e.g.,
m6i.xlarge
) is unavailable, consider using a different instance type, such asc5.xlarge
, which may have better availability. - To update, create a new environment class using the alternate instance type and disable the existing class.
- If the desired instance type (e.g.,
-
Retry Later:
- Instance availability can be transient. If none of the above options resolve the issue, wait and try again later, as AWS resources might become available after a brief period.
Unexpected Costs
-
Symptoms:
- You notice unexpected charges in your AWS bill that you believe are related to the runner infrastructure.
- You continue receiving bills for resources even after deleting a runner.
-
Diagnostics:
- Use the Controls for Managing Costs guide to investigate the specific AWS resources contributing to the charges.
- After deleting a runner, verify that the associated CloudFormation stack has been fully deleted. Additionally, check for any residual resources such as EC2 instances or EBS volumes associated with environment IDs, and manually delete them if necessary to avoid ongoing costs.