GitLab Runner Performance Optimization: From Slow Pipelines to Speed

Ever watched a simple lint check take 5 minutes when it should take 30 seconds? Or stared at a “Pending” pipeline wondering if your runner is even alive? Yeah, me too.

Here’s the thing - most GitLab Runner guides focus on getting it working, not making it fast. But if you’re self-hosting GitLab with a separate runner VM, there are hidden performance killers that nobody warns you about.

This is what we learned after debugging slow pipelines on our self-hosted GitLab setup. Real optimizations that cut our pipeline times by 60% without upgrading hardware.

⏱️ Reading time: 12-15 minutes

Why Pipelines Are Slow (It’s Not Your CPU)

When I first set up our GitLab Runner, pipelines were painfully slow. I checked htop on both the GitLab server and runner - barely any load. The VMs weren’t struggling, so what was going on?

After hours of debugging, here’s what I found:

Network latency through Cloudflare - Runner was going through the public internet to reach GitLab
No caching - Every job ran npm ci from scratch
Suboptimal runner config - Default settings aren’t optimized for performance
DNS resolution delays - Docker containers couldn’t resolve internal hostnames

Here’s what proper optimization fixes:

Pipeline times drop 50-70% - Direct connections + caching = speed
No more “Pending” jobs - Runner picks up work instantly
Consistent build times - Cache hits mean predictable performance
Lower resource usage - Less network traffic, fewer redundant operations

The Real Bottleneck

For self-hosted GitLab behind Cloudflare or any reverse proxy, the biggest performance killer is usually network routing - not CPU, memory, or disk.

Our Infrastructure Setup

Before diving into optimizations, here’s what we’re working with:

Component	Specs	IP Address
GitLab Server	4 vCPUs, 12GB RAM, 80GB SSD	192.168.1.10
GitLab Runner	4 vCPUs, 8GB RAM, 80GB SSD	192.168.1.11
External Access	Cloudflare Tunnel	gitlab.example.com

Both VMs run on Proxmox in the same local network. External users access GitLab through Cloudflare Tunnel, which proxies gitlab.example.com to the internal server.

flowchart LR
  subgraph Internet
      DEV[Developer Laptop]
      CF[Cloudflare]
  end

  subgraph "Local Network (192.168.1.x)"
      GS[GitLab Server<br/>192.168.1.10]
      GR[GitLab Runner<br/>192.168.1.11]
  end

  DEV -->|HTTPS| CF
  CF -->|Tunnel| GS
  GR -.->|"❌ Slow: via Cloudflare"| CF
  GR -->|"✅ Fast: Direct HTTP"| GS

  style CF fill:#f5a623,stroke:#333
  style GS fill:#7b42bc,stroke:#333,color:#fff
  style GR fill:#2496ed,stroke:#333,color:#fff

The problem: By default, the runner was connecting to gitlab.example.com which resolved to Cloudflare IPs, sending all traffic through the internet and back - even though both VMs are on the same network!

Optimization 1: Direct Internal Connection

This is the single biggest performance improvement. Make the runner talk directly to GitLab over the local network.

The Problem

Check your runner logs:

sudo journalctl -u gitlab-runner -n 20 --no-pager

If you see errors like this, your runner is going through Cloudflare:

dial tcp 104.21.9.142:443: i/o timeout

Those IPs (104.21.x.x, 172.67.x.x) are Cloudflare, not your GitLab server.

The Fix

Step 1: Find what port GitLab is listening on

On your GitLab server:

ss -tlun | grep -E '80|443|5443'

Typical output:

tcp   LISTEN 0      511    0.0.0.0:80    0.0.0.0:*

This shows GitLab is listening on port 80 (HTTP) internally.

Step 2: Update runner config.toml

On the runner VM, edit /etc/gitlab-runner/config.toml:

[[runners]]
  name = "gitlab-runner-01"
  url = "http://192.168.1.10/"
  clone_url = "http://192.168.1.10/"
  # ... rest of config

Key changes:

url - Use internal IP with HTTP (not HTTPS)
clone_url - Ensures git operations also use internal network

Step 3: Configure Docker containers to resolve hostnames

Jobs run inside Docker containers that also need to reach GitLab. Add extra_hosts:

[runners.docker]
  extra_hosts = ["gitlab.example.com:192.168.1.10", "registry.example.com:192.168.1.10"]

This adds entries to /etc/hosts inside every container, so gitlab.example.com resolves to your internal IP.

Step 4: Restart the runner

sudo gitlab-runner restart
sudo journalctl -u gitlab-runner -f

You should now see successful job checks without timeout errors.

Before vs After

Metric	Before (via Cloudflare)	After (Direct)
Git clone	15-30 seconds	2-5 seconds
Artifact upload	10-20 seconds	1-3 seconds
Cache restore	20-40 seconds	5-10 seconds
Total pipeline	5-8 minutes	2-3 minutes

Optimization 2: Runner Resource Configuration

Default runner settings are conservative. Let’s tune them for performance.

Recommended config.toml

concurrent = 2
check_interval = 3
connection_max_age = "15m0s"
shutdown_timeout = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "gitlab-runner-01"
  url = "http://192.168.1.10/"
  clone_url = "http://192.168.1.10/"
  executor = "docker"
  request_concurrency = 2

  [runners.cache]
    MaxUploadedArchiveSize = 0

  [runners.docker]
    tls_verify = false
    image = "alpine:latest"
    privileged = true
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    shm_size = 536870912
    network_mtu = 0
    cpus = "1.5"
    memory = "2560m"
    pull_policy = ["if-not-present"]
    extra_hosts = ["gitlab.example.com:192.168.1.10", "registry.example.com:192.168.1.10"]

Key Settings Explained

Setting	Value	Why
`concurrent`	2	Run 2 jobs simultaneously (adjust based on RAM)
`check_interval`	3	Poll for jobs every 3 seconds
`request_concurrency`	2	Fixes “long polling” warning
`cpus`	”1.5”	Allocate 1.5 CPUs per container
`memory`	”2560m”	2.5GB per container
`shm_size`	536870912	512MB shared memory (enough for Node.js)
`pull_policy`	”if-not-present”	Don’t re-pull images every time

Memory Budget Calculation

For a runner with 8GB RAM:

System/Docker overhead:  ~1.5GB
Runner process:          ~0.5GB
Container 1:              2.5GB
Container 2:              2.5GB
Buffer:                   1.0GB
─────────────────────────────────
Total:                    8.0GB ✓

Don't Over-Allocate

If concurrent × memory exceeds your available RAM, containers will be OOM-killed. Start conservative and increase based on monitoring.

Optimization 3: Pipeline Caching Strategy

Running npm ci on every job wastes 30-60 seconds. Let’s fix that.

The Problem

Without caching, every job in your pipeline:

Downloads packages from npm registry
Installs all dependencies from scratch
Repeats this even though package-lock.json hasn’t changed

The Solution: Dedicated Install Stage

image: node:24.12.0-trixie-slim

stages:
  - install
  - lint
  - build
  - test
  - deploy

variables:
  NPM_CONFIG_CACHE: .npm
  npm_config_prefer_offline: 'true'
  npm_config_audit: 'false'
  npm_config_fund: 'false'

# Global cache - all jobs can pull from this
cache:
  key:
    files:
      - package-lock.json
  paths:
    - .npm/
    - node_modules/
  policy: pull # Most jobs only read cache

# ===============================
# STAGE: INSTALL
# ===============================
install_deps:
  stage: install
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - .npm/
      - node_modules/
    policy: pull-push # This job updates the cache
  script:
    - npm ci
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH

How Other Jobs Use the Cache

lint_code:
  stage: lint
  needs:
    - job: install_deps
      optional: true
  script:
    - '[ -d node_modules ] || npm ci' # Fallback if cache miss
    - npm run lint
    - npm run format:check
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH =~ /^(develop|main)$/

build_site:
  stage: build
  needs:
    - job: install_deps
      optional: true
  script:
    - '[ -d node_modules ] || npm ci'
    - npm run build
  artifacts:
    paths:
      - dist/
    expire_in: 1 day
  rules:
    - if: $CI_COMMIT_BRANCH

Cache Key Strategy

The cache key is based on package-lock.json:

cache:
  key:
    files:
      - package-lock.json

This means:

Same package-lock.json = cache hit = fast
Changed package-lock.json = cache miss = full install (expected)

Cache Hit vs Miss Performance

Scenario	npm ci Time	Total Job Time
Cache miss (first run)	45-60 seconds	70-90 seconds
Cache hit (subsequent)	0 seconds	15-25 seconds
Partial cache hit	10-20 seconds	30-45 seconds

Optimization 4: Use needs for Parallel Execution

By default, GitLab runs stages sequentially. The needs keyword enables parallel execution.

Without needs (Sequential)

install → lint → build → test → deploy
  30s      40s     60s    20s     30s   = 180s total

With needs (Parallel)

lint_code:
  needs:
    - job: install_deps
      optional: true # Don't fail if install_deps was skipped

build_site:
  needs:
    - job: install_deps
      optional: true

test_build:
  needs:
    - job: build_site
      artifacts: true # Download artifacts from build_site

install ──→ lint ──────────→ deploy
   30s  ╲     40s              30s
         ╲
          → build → test ──→
             60s     20s

Jobs that don’t depend on each other run in parallel, reducing total pipeline time.

Optimization 5: Sync Develop After Production Deploy

This prevents the “source branch is X commits behind target” error in future merge requests.

# ===============================
# STAGE: POST-DEPLOY
# ===============================
# Add GITLAB_INTERNAL_IP as a CI/CD variable (e.g., 192.168.1.10)
sync_develop:
  stage: post-deploy
  image: alpine:latest
  variables:
    GIT_STRATEGY: clone
    GIT_DEPTH: 0
  before_script:
    - apk add --no-cache git
    - git config user.email "[email protected]"
    - git config user.name "GitLab CI"
    - git remote set-url origin "http://oauth2:${PUSH_TOKEN}@${GITLAB_INTERNAL_IP}/${CI_PROJECT_PATH}.git"
  script:
    - git fetch origin develop
    - git checkout develop
    - git merge origin/main --no-edit
    - git push origin develop
  rules:
    - if: $CI_COMMIT_BRANCH == "main" && $CI_PIPELINE_SOURCE == "push"
  allow_failure: true

Required CI/CD Variables

This job needs two CI/CD variables:

PUSH_TOKEN - Project Access Token with write_repository scope
GITLAB_INTERNAL_IP - Your GitLab server’s internal IP (e.g., 192.168.1.10)

Create them at Settings → CI/CD → Variables.

Why Use Internal IP in sync_develop?

Notice we use http://oauth2:${PUSH_TOKEN}@${GITLAB_INTERNAL_IP}/ instead of ${CI_SERVER_HOST}.

That’s because:

CI_SERVER_HOST = gitlab.example.com (external URL)
CI_SERVER_PORT = 443 (Cloudflare HTTPS)
Inside Docker container, this routes through Cloudflare = slow + may fail

Using the internal IP keeps git operations on the local network.

Complete Optimized .gitlab-ci.yml

Here’s a complete example putting it all together:

image: node:24.12.0-trixie-slim

stages:
  - install
  - lint
  - build
  - test
  - deploy
  - post-deploy

default:
  interruptible: true

variables:
  NPM_CONFIG_CACHE: .npm
  npm_config_prefer_offline: 'true'
  npm_config_audit: 'false'
  npm_config_fund: 'false'

cache:
  key:
    files:
      - package-lock.json
  paths:
    - .npm/
    - node_modules/
  policy: pull

# ===============================
# STAGE: INSTALL
# ===============================
install_deps:
  stage: install
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - .npm/
      - node_modules/
    policy: pull-push
  script:
    - npm ci
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH

# ===============================
# STAGE: LINT
# ===============================
lint_code:
  stage: lint
  needs:
    - job: install_deps
      optional: true
  script:
    - '[ -d node_modules ] || npm ci'
    - npm run lint
    - npm run format:check
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH =~ /^(develop|main)$/

lint_commit:
  stage: lint
  needs:
    - job: install_deps
      optional: true
  variables:
    GIT_DEPTH: 0
  script:
    - '[ -d node_modules ] || npm ci'
    - npx commitlint --from $CI_MERGE_REQUEST_DIFF_BASE_SHA --to $CI_MERGE_REQUEST_DIFF_HEAD_SHA
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"

# ===============================
# STAGE: BUILD
# ===============================
build_site:
  stage: build
  needs:
    - job: install_deps
      optional: true
  script:
    - '[ -d node_modules ] || npm ci'
    - npm run build
  artifacts:
    paths:
      - dist/
    expire_in: 1 day
  rules:
    - if: $CI_COMMIT_BRANCH

# ===============================
# STAGE: TEST
# ===============================
test_build:
  stage: test
  needs:
    - job: build_site
      artifacts: true
  script:
    - test -d dist
    - test "$(ls -A dist)"
  rules:
    - if: $CI_COMMIT_BRANCH

# ===============================
# STAGE: DEPLOY
# ===============================
deploy_develop:
  stage: deploy
  needs:
    - job: build_site
      artifacts: true
  variables:
    NODE_ENV: production
  before_script:
    - npm install -g wrangler
  script:
    - wrangler pages deploy dist --project-name=my-project --branch=develop
  environment:
    name: develop
    url: https://develop.my-project.pages.dev
  rules:
    - if: $CI_COMMIT_BRANCH == "develop"

deploy_production:
  stage: deploy
  needs:
    - job: build_site
      artifacts: true
  variables:
    NODE_ENV: production
  before_script:
    - npm install -g wrangler
  script:
    - wrangler pages deploy dist --project-name=my-project --branch=main
  environment:
    name: production
    url: https://my-project.example.com
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

# ===============================
# STAGE: POST-DEPLOY
# ===============================
sync_develop:
  stage: post-deploy
  image: alpine:latest
  variables:
    GIT_STRATEGY: clone
    GIT_DEPTH: 0
  before_script:
    - apk add --no-cache git
    - git config user.email "[email protected]"
    - git config user.name "GitLab CI"
    - git remote set-url origin "http://oauth2:${PUSH_TOKEN}@${GITLAB_INTERNAL_IP}/${CI_PROJECT_PATH}.git"
  script:
    - git fetch origin develop
    - git checkout develop
    - git merge origin/main --no-edit
    - git push origin develop
  rules:
    - if: $CI_COMMIT_BRANCH == "main" && $CI_PIPELINE_SOURCE == "push"
  allow_failure: true

Troubleshooting Common Issues

Issue 1: Pipeline Stuck in “Pending”

Symptoms: Jobs show “Pending” indefinitely, runner appears online.

Check runner logs:

sudo journalctl -u gitlab-runner -f

Common causes:

Runner URL mismatch - config.toml URL doesn’t match GitLab’s expected URL
Network timeout - Runner can’t reach GitLab
Tag mismatch - Jobs require tags the runner doesn’t have

Fix: Verify url in config.toml matches what GitLab expects:

sudo gitlab-runner verify

Issue 2: “connection refused” Errors

dial tcp 192.168.1.10:443: connect: connection refused

Cause: Wrong port. GitLab is on port 80 (HTTP), not 443 (HTTPS).

Fix: Use HTTP URL without port:

url = "http://192.168.1.10/"

Issue 3: “unauthorized” in Container Jobs

fatal: unable to access 'https://gitlab.example.com/...':
Failed to connect to gitlab.example.com port 443

Cause: Container can’t resolve hostname or wrong port.

Fix: Use internal IP directly in job scripts:

git remote set-url origin "http://oauth2:${TOKEN}@${GITLAB_INTERNAL_IP}/${CI_PROJECT_PATH}.git"

Issue 4: Cache Never Hits

Symptoms: npm ci runs every time, “No cache found” in logs.

Common causes:

Cache key changed (check package-lock.json)
Cache expired (default 2 weeks)
Different runner picked up the job

Check cache status:

# In job log, look for:
Checking cache for <key>...
Successfully extracted cache
# or
No URL provided, cache will not be downloaded

Performance Checklist

Use this checklist to verify your setup:

The Bottom Line

Self-hosted GitLab is powerful, but default configurations prioritize compatibility over speed. The biggest wins come from:

Direct internal connections - Bypass Cloudflare for runner ↔ GitLab traffic
Aggressive caching - Cache node_modules, not just .npm
Parallel execution - Use needs to run independent jobs simultaneously
Proper resource allocation - Tune concurrent, cpus, and memory

Our pipeline went from 5-8 minutes to under 2 minutes with these changes. Your mileage may vary, but expect 50-70% improvement on most setups.

Start with the internal connection fix - it’s the lowest effort, highest impact change you can make.

Next Steps: Implement This Today

Quick Wins (30 minutes)

Fix 1: Internal Connection

Update config.toml with internal IP
Add extra_hosts for Docker containers
Restart runner: sudo gitlab-runner restart

Fix 2: Enable Caching

Add install_deps stage to pipeline
Configure cache with package-lock.json key
Set pull_policy: if-not-present

Advanced Optimization (1-2 hours)

Tune concurrent, cpus, memory based on your workload
Set up sync_develop job with PUSH_TOKEN
Add needs dependencies for parallel execution
Configure artifact expiration policies

Additional Resources

Share this post

Found this helpful? Share it with your network!