Ever watched a simple lint check take 5 minutes when it should take 30 seconds? Or stared at a “Pending” pipeline wondering if your runner is even alive? Yeah, me too.

Here’s the thing - most GitLab Runner guides focus on getting it working, not making it fast. But if you’re self-hosting GitLab with a separate runner VM, there are hidden performance killers that nobody warns you about.

This is what we learned after debugging slow pipelines on our self-hosted GitLab setup. Real optimizations that cut our pipeline times by 60% without upgrading hardware.

⏱️ Reading time: 12-15 minutes

Why Pipelines Are Slow (It’s Not Your CPU)

When I first set up our GitLab Runner, pipelines were painfully slow. I checked htop on both the GitLab server and runner - barely any load. The VMs weren’t struggling, so what was going on?

After hours of debugging, here’s what I found:

  • Network latency through Cloudflare - Runner was going through the public internet to reach GitLab
  • No caching - Every job ran npm ci from scratch
  • Suboptimal runner config - Default settings aren’t optimized for performance
  • DNS resolution delays - Docker containers couldn’t resolve internal hostnames

Here’s what proper optimization fixes:

  • Pipeline times drop 50-70% - Direct connections + caching = speed
  • No more “Pending” jobs - Runner picks up work instantly
  • Consistent build times - Cache hits mean predictable performance
  • Lower resource usage - Less network traffic, fewer redundant operations
The Real Bottleneck

For self-hosted GitLab behind Cloudflare or any reverse proxy, the biggest performance killer is usually network routing - not CPU, memory, or disk.

Our Infrastructure Setup

Before diving into optimizations, here’s what we’re working with:

ComponentSpecsIP Address
GitLab Server4 vCPUs, 12GB RAM, 80GB SSD192.168.1.10
GitLab Runner4 vCPUs, 8GB RAM, 80GB SSD192.168.1.11
External AccessCloudflare Tunnelgitlab.example.com

Both VMs run on Proxmox in the same local network. External users access GitLab through Cloudflare Tunnel, which proxies gitlab.example.com to the internal server.

flowchart LR
  subgraph Internet
      DEV[Developer Laptop]
      CF[Cloudflare]
  end

  subgraph "Local Network (192.168.1.x)"
      GS[GitLab Server<br/>192.168.1.10]
      GR[GitLab Runner<br/>192.168.1.11]
  end

  DEV -->|HTTPS| CF
  CF -->|Tunnel| GS
  GR -.->|"❌ Slow: via Cloudflare"| CF
  GR -->|"✅ Fast: Direct HTTP"| GS

  style CF fill:#f5a623,stroke:#333
  style GS fill:#7b42bc,stroke:#333,color:#fff
  style GR fill:#2496ed,stroke:#333,color:#fff

The problem: By default, the runner was connecting to gitlab.example.com which resolved to Cloudflare IPs, sending all traffic through the internet and back - even though both VMs are on the same network!

Optimization 1: Direct Internal Connection

This is the single biggest performance improvement. Make the runner talk directly to GitLab over the local network.

The Problem

Check your runner logs:

sudo journalctl -u gitlab-runner -n 20 --no-pager

If you see errors like this, your runner is going through Cloudflare:

dial tcp 104.21.9.142:443: i/o timeout

Those IPs (104.21.x.x, 172.67.x.x) are Cloudflare, not your GitLab server.

The Fix

Step 1: Find what port GitLab is listening on

On your GitLab server:

ss -tlun | grep -E '80|443|5443'

Typical output:

tcp   LISTEN 0      511    0.0.0.0:80    0.0.0.0:*

This shows GitLab is listening on port 80 (HTTP) internally.

Step 2: Update runner config.toml

On the runner VM, edit /etc/gitlab-runner/config.toml:

[[runners]]
  name = "gitlab-runner-01"
  url = "http://192.168.1.10/"
  clone_url = "http://192.168.1.10/"
  # ... rest of config

Key changes:

  • url - Use internal IP with HTTP (not HTTPS)
  • clone_url - Ensures git operations also use internal network

Step 3: Configure Docker containers to resolve hostnames

Jobs run inside Docker containers that also need to reach GitLab. Add extra_hosts:

[runners.docker]
  extra_hosts = ["gitlab.example.com:192.168.1.10", "registry.example.com:192.168.1.10"]

This adds entries to /etc/hosts inside every container, so gitlab.example.com resolves to your internal IP.

Step 4: Restart the runner

sudo gitlab-runner restart
sudo journalctl -u gitlab-runner -f

You should now see successful job checks without timeout errors.

Before vs After

MetricBefore (via Cloudflare)After (Direct)
Git clone15-30 seconds2-5 seconds
Artifact upload10-20 seconds1-3 seconds
Cache restore20-40 seconds5-10 seconds
Total pipeline5-8 minutes2-3 minutes

Optimization 2: Runner Resource Configuration

Default runner settings are conservative. Let’s tune them for performance.

concurrent = 2
check_interval = 3
connection_max_age = "15m0s"
shutdown_timeout = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "gitlab-runner-01"
  url = "http://192.168.1.10/"
  clone_url = "http://192.168.1.10/"
  executor = "docker"
  request_concurrency = 2

  [runners.cache]
    MaxUploadedArchiveSize = 0

  [runners.docker]
    tls_verify = false
    image = "alpine:latest"
    privileged = true
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    shm_size = 536870912
    network_mtu = 0
    cpus = "1.5"
    memory = "2560m"
    pull_policy = ["if-not-present"]
    extra_hosts = ["gitlab.example.com:192.168.1.10", "registry.example.com:192.168.1.10"]

Key Settings Explained

SettingValueWhy
concurrent2Run 2 jobs simultaneously (adjust based on RAM)
check_interval3Poll for jobs every 3 seconds
request_concurrency2Fixes “long polling” warning
cpus”1.5”Allocate 1.5 CPUs per container
memory”2560m”2.5GB per container
shm_size536870912512MB shared memory (enough for Node.js)
pull_policy”if-not-present”Don’t re-pull images every time

Memory Budget Calculation

For a runner with 8GB RAM:

System/Docker overhead:  ~1.5GB
Runner process:          ~0.5GB
Container 1:              2.5GB
Container 2:              2.5GB
Buffer:                   1.0GB
─────────────────────────────────
Total:                    8.0GB
Don't Over-Allocate

If concurrent × memory exceeds your available RAM, containers will be OOM-killed. Start conservative and increase based on monitoring.

Optimization 3: Pipeline Caching Strategy

Running npm ci on every job wastes 30-60 seconds. Let’s fix that.

The Problem

Without caching, every job in your pipeline:

  1. Downloads packages from npm registry
  2. Installs all dependencies from scratch
  3. Repeats this even though package-lock.json hasn’t changed

The Solution: Dedicated Install Stage

image: node:24.12.0-trixie-slim

stages:
  - install
  - lint
  - build
  - test
  - deploy

variables:
  NPM_CONFIG_CACHE: .npm
  npm_config_prefer_offline: 'true'
  npm_config_audit: 'false'
  npm_config_fund: 'false'

# Global cache - all jobs can pull from this
cache:
  key:
    files:
      - package-lock.json
  paths:
    - .npm/
    - node_modules/
  policy: pull # Most jobs only read cache

# ===============================
# STAGE: INSTALL
# ===============================
install_deps:
  stage: install
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - .npm/
      - node_modules/
    policy: pull-push # This job updates the cache
  script:
    - npm ci
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH

How Other Jobs Use the Cache

lint_code:
  stage: lint
  needs:
    - job: install_deps
      optional: true
  script:
    - '[ -d node_modules ] || npm ci' # Fallback if cache miss
    - npm run lint
    - npm run format:check
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH =~ /^(develop|main)$/

build_site:
  stage: build
  needs:
    - job: install_deps
      optional: true
  script:
    - '[ -d node_modules ] || npm ci'
    - npm run build
  artifacts:
    paths:
      - dist/
    expire_in: 1 day
  rules:
    - if: $CI_COMMIT_BRANCH

Cache Key Strategy

The cache key is based on package-lock.json:

cache:
  key:
    files:
      - package-lock.json

This means:

  • Same package-lock.json = cache hit = fast
  • Changed package-lock.json = cache miss = full install (expected)

Cache Hit vs Miss Performance

Scenarionpm ci TimeTotal Job Time
Cache miss (first run)45-60 seconds70-90 seconds
Cache hit (subsequent)0 seconds15-25 seconds
Partial cache hit10-20 seconds30-45 seconds

Optimization 4: Use needs for Parallel Execution

By default, GitLab runs stages sequentially. The needs keyword enables parallel execution.

Without needs (Sequential)

install lint build test deploy
  30s      40s     60s    20s     30s   = 180s total

With needs (Parallel)

lint_code:
  needs:
    - job: install_deps
      optional: true # Don't fail if install_deps was skipped

build_site:
  needs:
    - job: install_deps
      optional: true

test_build:
  needs:
    - job: build_site
      artifacts: true # Download artifacts from build_site
install ──→ lint ──────────→ deploy
   30s     40s              30s

 build test ──→
             60s     20s

Jobs that don’t depend on each other run in parallel, reducing total pipeline time.

Optimization 5: Sync Develop After Production Deploy

This prevents the “source branch is X commits behind target” error in future merge requests.

# ===============================
# STAGE: POST-DEPLOY
# ===============================
# Add GITLAB_INTERNAL_IP as a CI/CD variable (e.g., 192.168.1.10)
sync_develop:
  stage: post-deploy
  image: alpine:latest
  variables:
    GIT_STRATEGY: clone
    GIT_DEPTH: 0
  before_script:
    - apk add --no-cache git
    - git config user.email "[email protected]"
    - git config user.name "GitLab CI"
    - git remote set-url origin "http://oauth2:${PUSH_TOKEN}@${GITLAB_INTERNAL_IP}/${CI_PROJECT_PATH}.git"
  script:
    - git fetch origin develop
    - git checkout develop
    - git merge origin/main --no-edit
    - git push origin develop
  rules:
    - if: $CI_COMMIT_BRANCH == "main" && $CI_PIPELINE_SOURCE == "push"
  allow_failure: true
Required CI/CD Variables

This job needs two CI/CD variables:

  • PUSH_TOKEN - Project Access Token with write_repository scope
  • GITLAB_INTERNAL_IP - Your GitLab server’s internal IP (e.g., 192.168.1.10)

Create them at Settings → CI/CD → Variables.

Why Use Internal IP in sync_develop?

Notice we use http://oauth2:${PUSH_TOKEN}@${GITLAB_INTERNAL_IP}/ instead of ${CI_SERVER_HOST}.

That’s because:

  • CI_SERVER_HOST = gitlab.example.com (external URL)
  • CI_SERVER_PORT = 443 (Cloudflare HTTPS)
  • Inside Docker container, this routes through Cloudflare = slow + may fail

Using the internal IP keeps git operations on the local network.

Complete Optimized .gitlab-ci.yml

Here’s a complete example putting it all together:

image: node:24.12.0-trixie-slim

stages:
  - install
  - lint
  - build
  - test
  - deploy
  - post-deploy

default:
  interruptible: true

variables:
  NPM_CONFIG_CACHE: .npm
  npm_config_prefer_offline: 'true'
  npm_config_audit: 'false'
  npm_config_fund: 'false'

cache:
  key:
    files:
      - package-lock.json
  paths:
    - .npm/
    - node_modules/
  policy: pull

# ===============================
# STAGE: INSTALL
# ===============================
install_deps:
  stage: install
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - .npm/
      - node_modules/
    policy: pull-push
  script:
    - npm ci
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH

# ===============================
# STAGE: LINT
# ===============================
lint_code:
  stage: lint
  needs:
    - job: install_deps
      optional: true
  script:
    - '[ -d node_modules ] || npm ci'
    - npm run lint
    - npm run format:check
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH =~ /^(develop|main)$/

lint_commit:
  stage: lint
  needs:
    - job: install_deps
      optional: true
  variables:
    GIT_DEPTH: 0
  script:
    - '[ -d node_modules ] || npm ci'
    - npx commitlint --from $CI_MERGE_REQUEST_DIFF_BASE_SHA --to $CI_MERGE_REQUEST_DIFF_HEAD_SHA
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"

# ===============================
# STAGE: BUILD
# ===============================
build_site:
  stage: build
  needs:
    - job: install_deps
      optional: true
  script:
    - '[ -d node_modules ] || npm ci'
    - npm run build
  artifacts:
    paths:
      - dist/
    expire_in: 1 day
  rules:
    - if: $CI_COMMIT_BRANCH

# ===============================
# STAGE: TEST
# ===============================
test_build:
  stage: test
  needs:
    - job: build_site
      artifacts: true
  script:
    - test -d dist
    - test "$(ls -A dist)"
  rules:
    - if: $CI_COMMIT_BRANCH

# ===============================
# STAGE: DEPLOY
# ===============================
deploy_develop:
  stage: deploy
  needs:
    - job: build_site
      artifacts: true
  variables:
    NODE_ENV: production
  before_script:
    - npm install -g wrangler
  script:
    - wrangler pages deploy dist --project-name=my-project --branch=develop
  environment:
    name: develop
    url: https://develop.my-project.pages.dev
  rules:
    - if: $CI_COMMIT_BRANCH == "develop"

deploy_production:
  stage: deploy
  needs:
    - job: build_site
      artifacts: true
  variables:
    NODE_ENV: production
  before_script:
    - npm install -g wrangler
  script:
    - wrangler pages deploy dist --project-name=my-project --branch=main
  environment:
    name: production
    url: https://my-project.example.com
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

# ===============================
# STAGE: POST-DEPLOY
# ===============================
sync_develop:
  stage: post-deploy
  image: alpine:latest
  variables:
    GIT_STRATEGY: clone
    GIT_DEPTH: 0
  before_script:
    - apk add --no-cache git
    - git config user.email "[email protected]"
    - git config user.name "GitLab CI"
    - git remote set-url origin "http://oauth2:${PUSH_TOKEN}@${GITLAB_INTERNAL_IP}/${CI_PROJECT_PATH}.git"
  script:
    - git fetch origin develop
    - git checkout develop
    - git merge origin/main --no-edit
    - git push origin develop
  rules:
    - if: $CI_COMMIT_BRANCH == "main" && $CI_PIPELINE_SOURCE == "push"
  allow_failure: true

Troubleshooting Common Issues

Issue 1: Pipeline Stuck in “Pending”

Symptoms: Jobs show “Pending” indefinitely, runner appears online.

Check runner logs:

sudo journalctl -u gitlab-runner -f

Common causes:

  1. Runner URL mismatch - config.toml URL doesn’t match GitLab’s expected URL
  2. Network timeout - Runner can’t reach GitLab
  3. Tag mismatch - Jobs require tags the runner doesn’t have

Fix: Verify url in config.toml matches what GitLab expects:

sudo gitlab-runner verify

Issue 2: “connection refused” Errors

dial tcp 192.168.1.10:443: connect: connection refused

Cause: Wrong port. GitLab is on port 80 (HTTP), not 443 (HTTPS).

Fix: Use HTTP URL without port:

url = "http://192.168.1.10/"

Issue 3: “unauthorized” in Container Jobs

fatal: unable to access 'https://gitlab.example.com/...':
Failed to connect to gitlab.example.com port 443

Cause: Container can’t resolve hostname or wrong port.

Fix: Use internal IP directly in job scripts:

git remote set-url origin "http://oauth2:${TOKEN}@${GITLAB_INTERNAL_IP}/${CI_PROJECT_PATH}.git"

Issue 4: Cache Never Hits

Symptoms: npm ci runs every time, “No cache found” in logs.

Common causes:

  1. Cache key changed (check package-lock.json)
  2. Cache expired (default 2 weeks)
  3. Different runner picked up the job

Check cache status:

# In job log, look for:
Checking cache for <key>...
Successfully extracted cache
# or
No URL provided, cache will not be downloaded

Performance Checklist

Use this checklist to verify your setup:

  • Runner connects to internal IP (e.g., http://192.168.1.10/)
  • clone_url is set to internal URL
  • extra_hosts configured for Docker containers
  • pull_policy set to if-not-present
  • request_concurrency set to 2+
  • npm cache includes both .npm/ and node_modules/
  • install_deps job uses policy: pull-push
  • Other jobs use policy: pull
  • needs configured for parallel execution
  • interruptible: true set in defaults

The Bottom Line

Self-hosted GitLab is powerful, but default configurations prioritize compatibility over speed. The biggest wins come from:

  1. Direct internal connections - Bypass Cloudflare for runner ↔ GitLab traffic
  2. Aggressive caching - Cache node_modules, not just .npm
  3. Parallel execution - Use needs to run independent jobs simultaneously
  4. Proper resource allocation - Tune concurrent, cpus, and memory

Our pipeline went from 5-8 minutes to under 2 minutes with these changes. Your mileage may vary, but expect 50-70% improvement on most setups.

Start with the internal connection fix - it’s the lowest effort, highest impact change you can make.

Next Steps: Implement This Today

Quick Wins (30 minutes)

Fix 1: Internal Connection

  • Update config.toml with internal IP
  • Add extra_hosts for Docker containers
  • Restart runner: sudo gitlab-runner restart

Fix 2: Enable Caching

  • Add install_deps stage to pipeline
  • Configure cache with package-lock.json key
  • Set pull_policy: if-not-present
Advanced Optimization (1-2 hours)
  • Tune concurrent, cpus, memory based on your workload
  • Set up sync_develop job with PUSH_TOKEN
  • Add needs dependencies for parallel execution
  • Configure artifact expiration policies

Additional Resources