Symptoms
You push a commit, open GitLab, and the pipeline simply isn't there. Or it shows as pending for ten minutes, then twenty, then it's been an hour and nothing has moved. Sometimes a pipeline is created but every job just sits with the spinning clock icon. Other times, a merge request fires no pipeline at all even though you're certain the rules should have matched.
These are the classic presentations of a stuck or non-triggering GitLab CI pipeline:
- Pipeline status stays pending indefinitely with no job output
- Push event completes but no pipeline appears under CI/CD → Pipelines
- Merge request pipelines don't run despite rules targeting MR events
- Scheduled pipelines silently skip execution
- Trigger-based pipelines return HTTP 200 but no pipeline is created
- All jobs stay in created state and never advance to pending
I've worked through every one of these scenarios across self-managed GitLab instances. What follows is a systematic breakdown of the most common causes, how to confirm each one, and how to fix it without guessing.
Root Cause 1: Runner Is Offline
This is the most frequent culprit. Jobs move from created to pending as soon as GitLab queues them, but they can't advance to running without an available, online runner that matches the job's tags. If all matching runners are offline, the job sits in pending forever.
A runner goes offline for several reasons: the GitLab Runner service crashed, the host was rebooted without the runner service enabled for startup, a Docker daemon died underneath a Docker executor, or the runner was deregistered. In my experience, the most surprising cause is a host-level kernel update that rebooted the machine — the runner was never configured with
systemctl enable, so it didn't come back up after the reboot.
How to Identify It
Navigate to Settings → CI/CD → Runners on your project or group. A runner with a gray dot and a Never contacted label, or a last-contact timestamp more than a couple of minutes old, is effectively offline. Confirm from the runner host itself:
infrarunbook-admin@sw-infrarunbook-01:~$ sudo systemctl status gitlab-runner
● gitlab-runner.service - GitLab Runner
Loaded: loaded (/lib/systemd/system/gitlab-runner.service; disabled; vendor preset: enabled)
Active: inactive (dead)
Apr 15 09:12:03 sw-infrarunbook-01 systemd[1]: gitlab-runner.service: Service hold-off time over, scheduling restart.
Apr 15 09:12:03 sw-infrarunbook-01 systemd[1]: Stopped GitLab Runner.The word
disabledin the Loaded line is the smoking gun — it wasn't enabled to start on boot.
How to Fix It
infrarunbook-admin@sw-infrarunbook-01:~$ sudo systemctl start gitlab-runner
infrarunbook-admin@sw-infrarunbook-01:~$ sudo systemctl enable gitlab-runner
Created symlink /etc/systemd/system/multi-user.target.wants/gitlab-runner.service \u2192 /lib/systemd/system/gitlab-runner.service
infrarunbook-admin@sw-infrarunbook-01:~$ sudo gitlab-runner verify
Running in system-mode.
Verifying runner... is alive runner=Ab3xR7KpAfter the runner comes back online, GitLab picks it up within about 30 seconds and any pending jobs assigned to a matching runner start automatically. If you're still seeing stalls after the service is restored, check the
concurrentlimit in
/etc/gitlab-runner/config.toml— that's covered under the resource limit section below.
Root Cause 2: Webhook Not Configured or Misconfigured
For self-managed GitLab instances that integrate with external systems — or when you're using repository mirroring, push mirrors, or custom webhook triggers — a missing or broken webhook means GitLab never receives the event that should fire the pipeline.
Even on native GitLab without external triggers, this shows up when someone manually deleted a system hook, or when a network change between GitLab and an external CI trigger broke an existing endpoint. I've also seen this happen after migrating a project between groups: the webhooks didn't follow the project to its new home.
How to Identify It
Go to Settings → Webhooks on the project. Click Edit on each webhook and scroll to Recent Deliveries. A failed delivery will show a red indicator and an HTTP error code:
POST https://ci.solvethenetwork.com/hooks/gitlab
Response: 502 Bad Gateway
Delivered at: 2026-04-15 08:43:11 UTC
Duration: 30002ms (timeout)You can test any webhook manually using the Test button on that same page. For system-level hooks on a self-managed instance, check under Admin → System Hooks. From the GitLab server, the rails logs will show hook delivery attempts:
infrarunbook-admin@sw-infrarunbook-01:~$ sudo tail -f /var/log/gitlab/gitlab-rails/production.log | grep -i webhook
{"severity":"ERROR","time":"2026-04-15T08:43:11.204Z","message":"WebHook Error","url":"https://ci.solvethenetwork.com/hooks/gitlab","http_status":502}How to Fix It
If the webhook URL is wrong, update it in Settings → Webhooks. If the target endpoint is down, that's a service issue on the receiving end. Confirm connectivity directly from the GitLab host before assuming the webhook configuration is at fault:
infrarunbook-admin@sw-infrarunbook-01:~$ curl -v -X POST https://ci.solvethenetwork.com/hooks/gitlab \
-H "X-Gitlab-Token: your-secret-token" \
-H "Content-Type: application/json" \
-d '{"object_kind":"push"}'
* Connected to ci.solvethenetwork.com (10.10.20.15) port 443
< HTTP/1.1 200 OKOnce the endpoint responds correctly and you've re-saved the webhook, use GitLab's built-in tester to fire a test event before making another real push. That tester isolates whether the problem is GitLab's side or the receiver.
Root Cause 3: Branch Protection Blocking Pipeline Execution
Branch protection is a common governance control, but it interacts with CI in ways that catch people off guard. When a branch is protected with strict push restrictions, pushes from automation accounts that aren't on the allowed list can fail silently — or the push succeeds but certain MR pipeline configurations treat the commit as unauthorized.
The more subtle case I've seen: the Allowed to push setting is set to No one, which prevents force pushes and direct commits. A release automation job that pushes a version bump commit to
mainfails quietly, no pipeline runs, and nobody notices until the release is late.
How to Identify It
Go to Settings → Repository → Protected Branches and review the Allowed to push and Allowed to merge settings for the affected branch. Then look at whether your pipeline rules use the
$CI_COMMIT_REF_PROTECTEDvariable as a condition:
rules:
- if: '$CI_COMMIT_BRANCH == "main" && $CI_COMMIT_REF_PROTECTED == "true"'If
$CI_COMMIT_REF_PROTECTEDevaluates to
falseunexpectedly — maybe because the push came from a service account whose token doesn't trigger that flag correctly — the rule won't match and no pipeline runs. Add a temporary debug job to verify variable state:
debug-vars:
stage: .pre
script:
- echo "Branch: $CI_COMMIT_BRANCH"
- echo "Protected: $CI_COMMIT_REF_PROTECTED"
- echo "User: $GITLAB_USER_LOGIN"
rules:
- when: alwaysCheck the GitLab audit log under Admin → Audit Events for push rejection events tied to the branch. A rejected push won't create a pipeline — full stop.
How to Fix It
If legitimate CI automation is being blocked, add the automation account or runner's associated service account to the allowed push list on the protected branch. Better yet, use a project access token scoped to the project with the right permissions — it won't break when an employee leaves and it's easier to audit.
If pipeline rules are the actual problem, avoid depending on
$CI_COMMIT_REF_PROTECTEDunless you specifically need that distinction. Explicit branch names are more predictable:
rules:
- if: '$CI_COMMIT_BRANCH =~ /^(main|release\/.*)$/'Root Cause 4: Resource Limit on the Runner
Even when a runner is online and healthy, it will silently refuse new jobs if it's already at capacity. GitLab Runner has a
concurrentglobal limit and a per-runner
limitthat control how many jobs can run simultaneously. When all slots are occupied, additional jobs queue as pending but don't start — and from GitLab's perspective they look exactly like normal queuing.
The subtler version is memory or CPU pressure on the runner host. If Docker is the executor and the host is thrashing swap, new containers fail to start cleanly. Jobs stay pending, GitLab doesn't distinguish this from normal wait time, and you have no obvious error to look at.
How to Identify It
Check the runner's configuration file first:
infrarunbook-admin@sw-infrarunbook-01:~$ sudo cat /etc/gitlab-runner/config.toml
concurrent = 1
check_interval = 0
[[runners]]
name = "sw-infrarunbook-01-docker"
url = "https://gitlab.solvethenetwork.com/"
token = "glrt-Ab3xR7KpXXXXXXXX"
executor = "docker"
limit = 1
[runners.docker]
image = "alpine:latest"
privileged = falseA
concurrent = 1with
limit = 1means only one job runs at a time on that runner. If a pipeline stage fans out five parallel jobs and this is your only runner, four of them will queue. That's expected behavior — but if you didn't configure it intentionally, it's a surprise. Now check actual resource usage on the host:
infrarunbook-admin@sw-infrarunbook-01:~$ free -h
total used free shared buff/cache available
Mem: 7.7G 7.4G 112M 244M 198M 89M
Swap: 2.0G 1.9G 102M
infrarunbook-admin@sw-infrarunbook-01:~$ docker stats --no-stream
CONTAINER ID NAME CPU % MEM USAGE / LIMIT
a1b2c3d4e5f6 runner-Ab3xR7Kp-project-42 94.3% 7.1GiB / 7.7GiBThat's a system effectively out of memory. New Docker containers won't start, and jobs will hang at the container creation phase with no useful error surfaced in the GitLab UI.
How to Fix It
Increase the concurrency limits in
/etc/gitlab-runner/config.tomlif the host has headroom:
concurrent = 4
[[runners]]
name = "sw-infrarunbook-01-docker"
limit = 4Restart the runner to apply the change:
infrarunbook-admin@sw-infrarunbook-01:~$ sudo systemctl restart gitlab-runnerFor memory pressure, kill containers that are lingering past their job completion and prune Docker's storage:
infrarunbook-admin@sw-infrarunbook-01:~$ docker ps -a --filter "status=exited" -q | xargs docker rm
infrarunbook-admin@sw-infrarunbook-01:~$ docker system prune -f
Deleted Containers: 14
Deleted Images: 8
Total reclaimed space: 4.2GBIf you're regularly hitting resource ceilings, the long-term answer is horizontal scaling — add more runner hosts or configure GitLab's autoscaling runner using the Kubernetes executor so capacity expands on demand.
Root Cause 5: Trigger Token Is Wrong or Expired
Pipeline triggers let you kick off CI from external systems — deploy scripts, monitoring tools, release automation. They work by sending a POST request with a trigger token to the GitLab API. When that token is wrong, expired, revoked, or pointed at the wrong project ID, you'll get a misleading HTTP 200 response with no pipeline ever created. GitLab doesn't reject the request loudly — it just does nothing.
This catches teams off guard more than any other cause. Someone rotates all credentials in a secrets manager, updates most references, and misses the CI trigger token buried in a deployment script from two years ago. Everything looks fine until someone asks why the nightly release didn't run.
How to Identify It
Trigger tokens are managed under Settings → CI/CD → Pipeline triggers. If the token being sent isn't listed there, it won't work. The tell is in the response body of the API call — most people only check the HTTP status code, which will be 404 even if curl exits cleanly:
infrarunbook-admin@sw-infrarunbook-01:~$ curl -s -X POST \
--form token="glptt-WRONG-TOKEN-HERE" \
--form ref="main" \
https://gitlab.solvethenetwork.com/api/v4/projects/42/trigger/pipeline
{"message":"404 Not Found"}Compare that to a working trigger response:
infrarunbook-admin@sw-infrarunbook-01:~$ curl -s -X POST \
--form token="glptt-CORRECT-TOKEN-HERE" \
--form ref="main" \
https://gitlab.solvethenetwork.com/api/v4/projects/42/trigger/pipeline
{"id":1847,"iid":23,"project_id":42,"sha":"a3f2c1b9d...","ref":"main","status":"pending","source":"trigger",...}The
"status":"pending"and
"source":"trigger"fields confirm it worked. If the project ID in the URL is also wrong, you'll get a different 404 — check that the project ID matches the one in Settings → General (shown at the top of the page).
How to Fix It
Go to Settings → CI/CD → Pipeline triggers and create a new trigger token. Copy it to wherever it's consumed. If you store it in a secrets manager:
infrarunbook-admin@sw-infrarunbook-01:~$ vault kv put secret/gitlab/ci-trigger \
token="glptt-NEW-VALID-TOKEN-HERE"
Key Value
--- -----
created_time 2026-04-15T09:00:00.000000000Z
version 3Verify the integration end-to-end with the curl test above before calling it done. Then revoke the old token from the GitLab UI — inactive tokens left in place are a security exposure, and cleaning them up keeps the token list meaningful for the next person who has to debug this.
Root Cause 6: Pipeline Rules or Only/Except Misconfiguration
This one is entirely self-inflicted but it causes real confusion, especially after someone refactors a
.gitlab-ci.ymland subtly breaks the logic. If your
rulesconditions don't match the actual event — wrong branch name, wrong variable value, wrong pipeline source — GitLab creates no pipeline and gives no error. It just does nothing, which looks identical to a webhook failure from the outside.
How to Identify It
Use the CI/CD → Editor lint tool in GitLab to validate syntax. Then look at the commit in GitLab — if the pipeline was evaluated but skipped, you'll see a skipped badge rather than no pipeline at all. A common trap with the
rulessyntax is confusing pipeline sources:
# This will NEVER run on a direct push to main.
# $CI_PIPELINE_SOURCE for a direct push is "push", not "merge_request_event".
build:
script: make build
rules:
- if: '$CI_PIPELINE_SOURCE == "merge_request_event" && $CI_COMMIT_BRANCH == "main"'How to Fix It
Be explicit about pipeline sources and list conditions separately:
build:
script: make build
rules:
- if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
- if: '$CI_COMMIT_BRANCH == "main"'When the two conditions are on separate lines like this, GitLab evaluates them as OR — the job runs if either condition matches. This is almost always what you actually want.
Root Cause 7: Runner Tag Mismatch
Jobs define tags that must match at least one online runner's registered tags. If no runner has all the required tags, the job queues as pending forever. This is easy to introduce when you rename a tag set or add a new job with a tag that was never registered on any runner.
How to Identify It
Click the stuck job under CI/CD → Jobs. GitLab shows a message like: "This job is stuck because no runners are online, assigned, or available." Cross-reference the job's tag list with what the runner has registered:
deploy-prod:
script: ./deploy.sh
tags:
- aws-deploy
- productioninfrarunbook-admin@sw-infrarunbook-01:~$ sudo gitlab-runner list
Runtime platform arch=amd64 os=linux pid=1234
Listing configured runners ConfigFile=/etc/gitlab-runner/config.toml
sw-infrarunbook-01-docker Executor=docker Token=Ab3xR7Kp URL=https://gitlab.solvethenetwork.com/
Tags: docker, linuxThe runner advertises
dockerand
linux. The job needs
aws-deployand
production. No match. Job stuck.
How to Fix It
Edit the runner's tags under Settings → CI/CD → Runners → Edit runner and add the missing tags. Or update the job's
tagsto match the available runner tag set. If the job genuinely needs a specific execution environment (one with cloud credentials pre-configured, for example), register a new dedicated runner on infrastructure with those properties and tag it accordingly.
Prevention
Proactive monitoring is the difference between discovering a broken pipeline from a push alert versus hearing about it from a developer asking why their MR hasn't run CI in two days.
Start by enabling systemd persistence on every runner host — this is the single most common oversight I see in self-managed environments.
systemctl enable gitlab-runnercosts you nothing and prevents the most common class of runner outage. Back it up with a simple cron-based health check:
*/5 * * * * infrarunbook-admin /usr/bin/gitlab-runner verify 2>&1 | grep -v "is alive" | mail -s "Runner verify failed on sw-infrarunbook-01" ops@solvethenetwork.comFor deeper observability, enable the runner's built-in Prometheus metrics endpoint and scrape it from your monitoring stack:
# In /etc/gitlab-runner/config.toml
listen_address = "10.10.10.5:9252"Alert on
gitlab_runner_jobs{state="pending"}sitting above a threshold for more than five minutes, and on the runner process disappearing from your process monitoring entirely.
Treat trigger tokens like any other credential. Rotate them on a schedule, store them in a secrets manager, and use GitLab's project access tokens rather than personal access tokens for automation — they're scoped to the project and don't break when an employee's account is deprovisioned.
For pipeline rule changes, make the CI lint tool part of your code review checklist. Syntax errors get caught automatically, but logic errors — wrong variable names, mismatched pipeline sources, branch patterns that don't cover all cases — require a human eye. Any significant change to
.gitlab-ci.ymlshould include a test push to a feature branch to confirm rules behave as expected before merging anywhere protected.
Document your runner tags and their purpose. A comment in
config.tomlexplaining why a runner carries a specific tag set costs almost nothing. Six months later, when someone inherits the infrastructure and is staring at a stuck job, that comment is the difference between a ten-minute fix and a two-hour archaeology dig.
