Introduction: The Challenge of CI/CD Failures

In a complex multi-account AWS environment, CI/CD pipeline failures can be time-consuming to diagnose. This workflow demonstrates an automated solution using a custom GitLab Component to invoke an Amazon Bedrock Agent. The agent intelligently analyzes failures, determines the root cause, and proposes a solution—either as a code fix via a Merge Request or as a configuration change, significantly reducing manual debugging efforts.

GitLab CI Job Fails

A job in the pipeline (e.g., `terraform apply`) fails during execution.

What's Happening:

The pipeline executes as normal, but a command returns a non-zero exit code, triggering a failure state in GitLab. This is the entry point for our automated debugging process.

Example Error Log:

                        Error: creating EC2 Instance: InvalidIAMInstanceProfile.Name: The IAM instance profile ami-0c55b159cbfafe1f0 does not exist

                          on main.tf line 25, in resource "aws_instance" "web":

                          25: resource "aws_instance" "web" {

                        ERROR: Job failed: exit code 1

Post-Stage Debug Component Runs

A component in the `.post` stage triggers on failure.

What's Happening:

This special job runs only when previous jobs in the pipeline have failed. It uses a custom GitLab Component template, which contains the logic to gather context and call the Bedrock Agent. The GitLab Runner executing this job assumes an IAM Instance Role with the necessary permissions.

Example `.gitlab-ci.yml` Usage:

                        include:

                          - component: 'my-org/gitlab-components/bedrock-debugger@1.0.0'

                            inputs:

                              bedrock_agent_id: 'ABC123XYZ'

                              bedrock_agent_alias_id: 'TSTALIASID'

                              aws_region: 'us-east-1'

                        debug:on-failure:

                          stage: .post

                          script:

                            - /usr/bin/python3 /path/to/script/in/component.py

                          when: on_failure

Invoke Amazon Bedrock Agent

The component script sends job logs and metadata to the agent.

What's Happening:

A Python script (part of the component) uses the GitLab API to fetch the logs of the failed job. It combines this with predefined CI/CD variables (like project URL and commit SHA) and sends it all as a payload to the specified Bedrock Agent endpoint using the AWS SDK (Boto3).

Example Payload to Bedrock:

                        {

                          "sessionAttributes": {},

                          "promptSessionAttributes": {},

                          "inputText": "A GitLab CI job failed. Here are the details. Please perform a root cause analysis.\n\nProject URL: ${CI_PROJECT_URL}\nCommit: ${CI_COMMIT_SHA}\n\nLogs:\nError: creating EC2 Instance: InvalidIAMInstanceProfile.Name..."

                        }

Agent Performs Analysis

Bedrock analyzes logs, queries AWS, and checks source code.

What's Happening:

The Bedrock Agent executes its pre-configured action groups. It can:
1. Parse Logs: Identify specific error messages.
2. Query AWS: Use its underlying Lambda functions and IAM role to run `aws cli` or SDK commands to check the state of resources (e.g., check if an IAM role exists).
3. Examine Code: Use a GitLab Project Access Token (configured in the agent) to clone the repository at the specific commit and analyze the Terraform or script files referenced in the error log.

Root Cause Identified & Resolution Proposed

The agent determines the failure type and generates a solution.

Outcome: Merge Request Created

The agent identified a typo in the Terraform source code. It generates a suggested fix and uses the GitLab API to create a new branch and submit a Merge Request for review.

Example MR Description:

                           AI-Generated Fix for Pipeline Failure

                           Root Cause:

                           The job failed due to `InvalidIAMInstanceProfile.Name`. Analysis of `main.tf` revealed a typo in the `iam_instance_profile` name.

                           Proposed Change:

                           Corrected `my-instance-pofile` to `my-instance-profile` in `resource "aws_instance" "web"`.

The IAM role assumed by the GitLab Runner EC2 instance requires two sets of permissions: one for the component to function, and another for the Bedrock Agent's own execution role to perform diagnostics.

For the GitLab Runner Component:

                            {

                              "Version": "2012-10-17",

                              "Statement": [

                                {

                                  "Effect": "Allow",

                                  "Action": "bedrock:InvokeAgent",

                                  "Resource": "arn:aws:bedrock:us-east-1:123456789012:agent/ABC123XYZ"

                                }

                              ]

                            }

For the Bedrock Agent's Execution Role (Read-Only Diagnostics):

                            {

                              "Version": "2012-10-17",

                              "Statement": [

                                {

                                  "Effect": "Allow",

                                  "Action": [

                                    "iam:Get*", "iam:List*",

                                    "ec2:Describe*",

                                    "s3:Get*", "s3:List*",

                                    "rds:Describe*",

                                    "cloudwatch:GetLogEvents", "cloudwatch:DescribeLogStreams",

                                    "sts:GetCallerIdentity"

                                  ],

                                  "Resource": "*"

                                }

                              ]

                            }

Note: The Bedrock agent also needs a GitLab Project Access Token with `api` scope stored securely (e.g., in AWS Secrets Manager) to access code and create Merge Requests.

Automated CI/CD Debugging

Introduction: The Challenge of CI/CD Failures

GitLab CI Job Fails

What's Happening:

Example Error Log:

Post-Stage Debug Component Runs

What's Happening:

Example `.gitlab-ci.yml` Usage:

Invoke Amazon Bedrock Agent

What's Happening:

Example Payload to Bedrock:

Agent Performs Analysis

What's Happening:

Root Cause Identified & Resolution Proposed

Outcome: Merge Request Created

Example MR Description:

Outcome: RCA and IAM Fix Provided

Example GitLab Job Comment:

Required AWS IAM Permissions

For the GitLab Runner Component:

For the Bedrock Agent's Execution Role (Read-Only Diagnostics):