Automated CI/CD Debugging with GitLab & Amazon Bedrock

Automated CI/CD Debugging

An Interactive Flow for GitLab & Amazon Bedrock Integration

Introduction: The Challenge of CI/CD Failures

In a complex multi-account AWS environment, CI/CD pipeline failures can be time-consuming to diagnose. This workflow demonstrates an automated solution using a custom GitLab Component to invoke an Amazon Bedrock Agent. The agent intelligently analyzes failures, determines the root cause, and proposes a solution—either as a code fix via a Merge Request or as a configuration change, significantly reducing manual debugging efforts.

1

GitLab CI Job Fails

A job in the pipeline (e.g., `terraform apply`) fails during execution.

What's Happening:

The pipeline executes as normal, but a command returns a non-zero exit code, triggering a failure state in GitLab. This is the entry point for our automated debugging process.

Example Error Log:

Error: creating EC2 Instance: InvalidIAMInstanceProfile.Name: The IAM instance profile ami-0c55b159cbfafe1f0 does not exist
on main.tf line 25, in resource "aws_instance" "web":
25: resource "aws_instance" "web" {
ERROR: Job failed: exit code 1
2

Post-Stage Debug Component Runs

A component in the `.post` stage triggers on failure.

What's Happening:

This special job runs only when previous jobs in the pipeline have failed. It uses a custom GitLab Component template, which contains the logic to gather context and call the Bedrock Agent. The GitLab Runner executing this job assumes an IAM Instance Role with the necessary permissions.

Example `.gitlab-ci.yml` Usage:

include:
- component: 'my-org/gitlab-components/bedrock-debugger@1.0.0'
inputs:
bedrock_agent_id: 'ABC123XYZ'
bedrock_agent_alias_id: 'TSTALIASID'
aws_region: 'us-east-1'

debug:on-failure:
stage: .post
script:
- /usr/bin/python3 /path/to/script/in/component.py
when: on_failure
3

Invoke Amazon Bedrock Agent

The component script sends job logs and metadata to the agent.

What's Happening:

A Python script (part of the component) uses the GitLab API to fetch the logs of the failed job. It combines this with predefined CI/CD variables (like project URL and commit SHA) and sends it all as a payload to the specified Bedrock Agent endpoint using the AWS SDK (Boto3).

Example Payload to Bedrock:

{
"sessionAttributes": {},
"promptSessionAttributes": {},
"inputText": "A GitLab CI job failed. Here are the details. Please perform a root cause analysis.\n\nProject URL: ${CI_PROJECT_URL}\nCommit: ${CI_COMMIT_SHA}\n\nLogs:\nError: creating EC2 Instance: InvalidIAMInstanceProfile.Name..."
}
4

Agent Performs Analysis

Bedrock analyzes logs, queries AWS, and checks source code.

What's Happening:

The Bedrock Agent executes its pre-configured action groups. It can:
1. Parse Logs: Identify specific error messages.
2. Query AWS: Use its underlying Lambda functions and IAM role to run `aws cli` or SDK commands to check the state of resources (e.g., check if an IAM role exists).
3. Examine Code: Use a GitLab Project Access Token (configured in the agent) to clone the repository at the specific commit and analyze the Terraform or script files referenced in the error log.

5

Root Cause Identified & Resolution Proposed

The agent determines the failure type and generates a solution.

Outcome: Merge Request Created

The agent identified a typo in the Terraform source code. It generates a suggested fix and uses the GitLab API to create a new branch and submit a Merge Request for review.

Example MR Description:
AI-Generated Fix for Pipeline Failure

Root Cause:
The job failed due to `InvalidIAMInstanceProfile.Name`. Analysis of `main.tf` revealed a typo in the `iam_instance_profile` name.

Proposed Change:
Corrected `my-instance-pofile` to `my-instance-profile` in `resource "aws_instance" "web"`.