GitLab Runner Performance Optimization: From Slow Pipelines to Speed
A practical guide to optimizing GitLab Runner performance - direct internal connections, Docker executor tuning, caching strategies, and runner configuration.
Ever watched a simple lint check take 5 minutes when it should take 30 seconds? Or stared at a “Pending” pipeline wondering if your runner is even alive? Yeah, me too.
Here’s the thing - most GitLab Runner guides focus on getting it working, not making it fast. But if you’re self-hosting GitLab with a separate runner VM, there are hidden performance killers that nobody warns you about.
This is what we learned after debugging slow pipelines on our self-hosted GitLab setup. Real optimizations that cut our pipeline times by 60% without upgrading hardware.
⏱️ Reading time: 12-15 minutes
Why Pipelines Are Slow (It’s Not Your CPU)
When I first set up our GitLab Runner, pipelines were painfully slow. I checked htop on both the GitLab server and runner - barely any load. The VMs weren’t struggling, so what was going on?
After hours of debugging, here’s what I found:
- Network latency through Cloudflare - Runner was going through the public internet to reach GitLab
- No caching - Every job ran
npm cifrom scratch - Suboptimal runner config - Default settings aren’t optimized for performance
- DNS resolution delays - Docker containers couldn’t resolve internal hostnames
Here’s what proper optimization fixes:
- Pipeline times drop 50-70% - Direct connections + caching = speed
- No more “Pending” jobs - Runner picks up work instantly
- Consistent build times - Cache hits mean predictable performance
- Lower resource usage - Less network traffic, fewer redundant operations
For self-hosted GitLab behind Cloudflare or any reverse proxy, the biggest performance killer is usually network routing - not CPU, memory, or disk.
Our Infrastructure Setup
Before diving into optimizations, here’s what we’re working with:
| Component | Specs | IP Address |
|---|---|---|
| GitLab Server | 4 vCPUs, 12GB RAM, 80GB SSD | 192.168.1.10 |
| GitLab Runner | 4 vCPUs, 8GB RAM, 80GB SSD | 192.168.1.11 |
| External Access | Cloudflare Tunnel | gitlab.example.com |
Both VMs run on Proxmox in the same local network. External users access GitLab through Cloudflare Tunnel, which proxies gitlab.example.com to the internal server.
flowchart LR
subgraph Internet
DEV[Developer Laptop]
CF[Cloudflare]
end
subgraph "Local Network (192.168.1.x)"
GS[GitLab Server<br/>192.168.1.10]
GR[GitLab Runner<br/>192.168.1.11]
end
DEV -->|HTTPS| CF
CF -->|Tunnel| GS
GR -.->|"❌ Slow: via Cloudflare"| CF
GR -->|"✅ Fast: Direct HTTP"| GS
style CF fill:#f5a623,stroke:#333
style GS fill:#7b42bc,stroke:#333,color:#fff
style GR fill:#2496ed,stroke:#333,color:#fff
The problem: By default, the runner was connecting to gitlab.example.com which resolved to Cloudflare IPs, sending all traffic through the internet and back - even though both VMs are on the same network!
Optimization 1: Direct Internal Connection
This is the single biggest performance improvement. Make the runner talk directly to GitLab over the local network.
The Problem
Check your runner logs:
sudo journalctl -u gitlab-runner -n 20 --no-pagerIf you see errors like this, your runner is going through Cloudflare:
dial tcp 104.21.9.142:443: i/o timeoutThose IPs (104.21.x.x, 172.67.x.x) are Cloudflare, not your GitLab server.
The Fix
Step 1: Find what port GitLab is listening on
On your GitLab server:
ss -tlun | grep -E '80|443|5443'Typical output:
tcp LISTEN 0 511 0.0.0.0:80 0.0.0.0:*This shows GitLab is listening on port 80 (HTTP) internally.
Step 2: Update runner config.toml
On the runner VM, edit /etc/gitlab-runner/config.toml:
[[runners]]
name = "gitlab-runner-01"
url = "http://192.168.1.10/"
clone_url = "http://192.168.1.10/"
# ... rest of configKey changes:
url- Use internal IP with HTTP (not HTTPS)clone_url- Ensures git operations also use internal network
Step 3: Configure Docker containers to resolve hostnames
Jobs run inside Docker containers that also need to reach GitLab. Add extra_hosts:
[runners.docker]
extra_hosts = ["gitlab.example.com:192.168.1.10", "registry.example.com:192.168.1.10"]This adds entries to /etc/hosts inside every container, so gitlab.example.com resolves to your internal IP.
Step 4: Restart the runner
sudo gitlab-runner restart
sudo journalctl -u gitlab-runner -fYou should now see successful job checks without timeout errors.
Before vs After
| Metric | Before (via Cloudflare) | After (Direct) |
|---|---|---|
| Git clone | 15-30 seconds | 2-5 seconds |
| Artifact upload | 10-20 seconds | 1-3 seconds |
| Cache restore | 20-40 seconds | 5-10 seconds |
| Total pipeline | 5-8 minutes | 2-3 minutes |
Optimization 2: Runner Resource Configuration
Default runner settings are conservative. Let’s tune them for performance.
Recommended config.toml
concurrent = 2
check_interval = 3
connection_max_age = "15m0s"
shutdown_timeout = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "gitlab-runner-01"
url = "http://192.168.1.10/"
clone_url = "http://192.168.1.10/"
executor = "docker"
request_concurrency = 2
[runners.cache]
MaxUploadedArchiveSize = 0
[runners.docker]
tls_verify = false
image = "alpine:latest"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 536870912
network_mtu = 0
cpus = "1.5"
memory = "2560m"
pull_policy = ["if-not-present"]
extra_hosts = ["gitlab.example.com:192.168.1.10", "registry.example.com:192.168.1.10"]Key Settings Explained
| Setting | Value | Why |
|---|---|---|
concurrent | 2 | Run 2 jobs simultaneously (adjust based on RAM) |
check_interval | 3 | Poll for jobs every 3 seconds |
request_concurrency | 2 | Fixes “long polling” warning |
cpus | ”1.5” | Allocate 1.5 CPUs per container |
memory | ”2560m” | 2.5GB per container |
shm_size | 536870912 | 512MB shared memory (enough for Node.js) |
pull_policy | ”if-not-present” | Don’t re-pull images every time |
Memory Budget Calculation
For a runner with 8GB RAM:
System/Docker overhead: ~1.5GB
Runner process: ~0.5GB
Container 1: 2.5GB
Container 2: 2.5GB
Buffer: 1.0GB
─────────────────────────────────
Total: 8.0GB ✓If concurrent × memory exceeds your available RAM, containers will be OOM-killed. Start conservative and increase based on monitoring.
Optimization 3: Pipeline Caching Strategy
Running npm ci on every job wastes 30-60 seconds. Let’s fix that.
The Problem
Without caching, every job in your pipeline:
- Downloads packages from npm registry
- Installs all dependencies from scratch
- Repeats this even though
package-lock.jsonhasn’t changed
The Solution: Dedicated Install Stage
image: node:24.12.0-trixie-slim
stages:
- install
- lint
- build
- test
- deploy
variables:
NPM_CONFIG_CACHE: .npm
npm_config_prefer_offline: 'true'
npm_config_audit: 'false'
npm_config_fund: 'false'
# Global cache - all jobs can pull from this
cache:
key:
files:
- package-lock.json
paths:
- .npm/
- node_modules/
policy: pull # Most jobs only read cache
# ===============================
# STAGE: INSTALL
# ===============================
install_deps:
stage: install
cache:
key:
files:
- package-lock.json
paths:
- .npm/
- node_modules/
policy: pull-push # This job updates the cache
script:
- npm ci
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
- if: $CI_COMMIT_BRANCHHow Other Jobs Use the Cache
lint_code:
stage: lint
needs:
- job: install_deps
optional: true
script:
- '[ -d node_modules ] || npm ci' # Fallback if cache miss
- npm run lint
- npm run format:check
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
- if: $CI_COMMIT_BRANCH =~ /^(develop|main)$/
build_site:
stage: build
needs:
- job: install_deps
optional: true
script:
- '[ -d node_modules ] || npm ci'
- npm run build
artifacts:
paths:
- dist/
expire_in: 1 day
rules:
- if: $CI_COMMIT_BRANCHCache Key Strategy
The cache key is based on package-lock.json:
cache:
key:
files:
- package-lock.jsonThis means:
- Same
package-lock.json= cache hit = fast - Changed
package-lock.json= cache miss = full install (expected)
Cache Hit vs Miss Performance
| Scenario | npm ci Time | Total Job Time |
|---|---|---|
| Cache miss (first run) | 45-60 seconds | 70-90 seconds |
| Cache hit (subsequent) | 0 seconds | 15-25 seconds |
| Partial cache hit | 10-20 seconds | 30-45 seconds |
Optimization 4: Use needs for Parallel Execution
By default, GitLab runs stages sequentially. The needs keyword enables parallel execution.
Without needs (Sequential)
install → lint → build → test → deploy
30s 40s 60s 20s 30s = 180s totalWith needs (Parallel)
lint_code:
needs:
- job: install_deps
optional: true # Don't fail if install_deps was skipped
build_site:
needs:
- job: install_deps
optional: true
test_build:
needs:
- job: build_site
artifacts: true # Download artifacts from build_siteinstall ──→ lint ──────────→ deploy
30s ╲ 40s 30s
╲
→ build → test ──→
60s 20sJobs that don’t depend on each other run in parallel, reducing total pipeline time.
Optimization 5: Sync Develop After Production Deploy
This prevents the “source branch is X commits behind target” error in future merge requests.
# ===============================
# STAGE: POST-DEPLOY
# ===============================
# Add GITLAB_INTERNAL_IP as a CI/CD variable (e.g., 192.168.1.10)
sync_develop:
stage: post-deploy
image: alpine:latest
variables:
GIT_STRATEGY: clone
GIT_DEPTH: 0
before_script:
- apk add --no-cache git
- git config user.email "[email protected]"
- git config user.name "GitLab CI"
- git remote set-url origin "http://oauth2:${PUSH_TOKEN}@${GITLAB_INTERNAL_IP}/${CI_PROJECT_PATH}.git"
script:
- git fetch origin develop
- git checkout develop
- git merge origin/main --no-edit
- git push origin develop
rules:
- if: $CI_COMMIT_BRANCH == "main" && $CI_PIPELINE_SOURCE == "push"
allow_failure: trueThis job needs two CI/CD variables:
PUSH_TOKEN- Project Access Token withwrite_repositoryscopeGITLAB_INTERNAL_IP- Your GitLab server’s internal IP (e.g.,192.168.1.10)
Create them at Settings → CI/CD → Variables.
Why Use Internal IP in sync_develop?
Notice we use http://oauth2:${PUSH_TOKEN}@${GITLAB_INTERNAL_IP}/ instead of ${CI_SERVER_HOST}.
That’s because:
CI_SERVER_HOST=gitlab.example.com(external URL)CI_SERVER_PORT=443(Cloudflare HTTPS)- Inside Docker container, this routes through Cloudflare = slow + may fail
Using the internal IP keeps git operations on the local network.
Complete Optimized .gitlab-ci.yml
Here’s a complete example putting it all together:
image: node:24.12.0-trixie-slim
stages:
- install
- lint
- build
- test
- deploy
- post-deploy
default:
interruptible: true
variables:
NPM_CONFIG_CACHE: .npm
npm_config_prefer_offline: 'true'
npm_config_audit: 'false'
npm_config_fund: 'false'
cache:
key:
files:
- package-lock.json
paths:
- .npm/
- node_modules/
policy: pull
# ===============================
# STAGE: INSTALL
# ===============================
install_deps:
stage: install
cache:
key:
files:
- package-lock.json
paths:
- .npm/
- node_modules/
policy: pull-push
script:
- npm ci
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
- if: $CI_COMMIT_BRANCH
# ===============================
# STAGE: LINT
# ===============================
lint_code:
stage: lint
needs:
- job: install_deps
optional: true
script:
- '[ -d node_modules ] || npm ci'
- npm run lint
- npm run format:check
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
- if: $CI_COMMIT_BRANCH =~ /^(develop|main)$/
lint_commit:
stage: lint
needs:
- job: install_deps
optional: true
variables:
GIT_DEPTH: 0
script:
- '[ -d node_modules ] || npm ci'
- npx commitlint --from $CI_MERGE_REQUEST_DIFF_BASE_SHA --to $CI_MERGE_REQUEST_DIFF_HEAD_SHA
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
# ===============================
# STAGE: BUILD
# ===============================
build_site:
stage: build
needs:
- job: install_deps
optional: true
script:
- '[ -d node_modules ] || npm ci'
- npm run build
artifacts:
paths:
- dist/
expire_in: 1 day
rules:
- if: $CI_COMMIT_BRANCH
# ===============================
# STAGE: TEST
# ===============================
test_build:
stage: test
needs:
- job: build_site
artifacts: true
script:
- test -d dist
- test "$(ls -A dist)"
rules:
- if: $CI_COMMIT_BRANCH
# ===============================
# STAGE: DEPLOY
# ===============================
deploy_develop:
stage: deploy
needs:
- job: build_site
artifacts: true
variables:
NODE_ENV: production
before_script:
- npm install -g wrangler
script:
- wrangler pages deploy dist --project-name=my-project --branch=develop
environment:
name: develop
url: https://develop.my-project.pages.dev
rules:
- if: $CI_COMMIT_BRANCH == "develop"
deploy_production:
stage: deploy
needs:
- job: build_site
artifacts: true
variables:
NODE_ENV: production
before_script:
- npm install -g wrangler
script:
- wrangler pages deploy dist --project-name=my-project --branch=main
environment:
name: production
url: https://my-project.example.com
rules:
- if: $CI_COMMIT_BRANCH == "main"
# ===============================
# STAGE: POST-DEPLOY
# ===============================
sync_develop:
stage: post-deploy
image: alpine:latest
variables:
GIT_STRATEGY: clone
GIT_DEPTH: 0
before_script:
- apk add --no-cache git
- git config user.email "[email protected]"
- git config user.name "GitLab CI"
- git remote set-url origin "http://oauth2:${PUSH_TOKEN}@${GITLAB_INTERNAL_IP}/${CI_PROJECT_PATH}.git"
script:
- git fetch origin develop
- git checkout develop
- git merge origin/main --no-edit
- git push origin develop
rules:
- if: $CI_COMMIT_BRANCH == "main" && $CI_PIPELINE_SOURCE == "push"
allow_failure: trueTroubleshooting Common Issues
Issue 1: Pipeline Stuck in “Pending”
Symptoms: Jobs show “Pending” indefinitely, runner appears online.
Check runner logs:
sudo journalctl -u gitlab-runner -fCommon causes:
- Runner URL mismatch - config.toml URL doesn’t match GitLab’s expected URL
- Network timeout - Runner can’t reach GitLab
- Tag mismatch - Jobs require tags the runner doesn’t have
Fix: Verify url in config.toml matches what GitLab expects:
sudo gitlab-runner verifyIssue 2: “connection refused” Errors
dial tcp 192.168.1.10:443: connect: connection refusedCause: Wrong port. GitLab is on port 80 (HTTP), not 443 (HTTPS).
Fix: Use HTTP URL without port:
url = "http://192.168.1.10/"Issue 3: “unauthorized” in Container Jobs
fatal: unable to access 'https://gitlab.example.com/...':
Failed to connect to gitlab.example.com port 443Cause: Container can’t resolve hostname or wrong port.
Fix: Use internal IP directly in job scripts:
git remote set-url origin "http://oauth2:${TOKEN}@${GITLAB_INTERNAL_IP}/${CI_PROJECT_PATH}.git"Issue 4: Cache Never Hits
Symptoms: npm ci runs every time, “No cache found” in logs.
Common causes:
- Cache key changed (check
package-lock.json) - Cache expired (default 2 weeks)
- Different runner picked up the job
Check cache status:
# In job log, look for:
Checking cache for <key>...
Successfully extracted cache
# or
No URL provided, cache will not be downloadedPerformance Checklist
Use this checklist to verify your setup:
- Runner connects to internal IP (e.g.,
http://192.168.1.10/) -
clone_urlis set to internal URL -
extra_hostsconfigured for Docker containers -
pull_policyset toif-not-present -
request_concurrencyset to 2+ - npm cache includes both
.npm/andnode_modules/ -
install_depsjob usespolicy: pull-push - Other jobs use
policy: pull -
needsconfigured for parallel execution -
interruptible: trueset in defaults
The Bottom Line
Self-hosted GitLab is powerful, but default configurations prioritize compatibility over speed. The biggest wins come from:
- Direct internal connections - Bypass Cloudflare for runner ↔ GitLab traffic
- Aggressive caching - Cache
node_modules, not just.npm - Parallel execution - Use
needsto run independent jobs simultaneously - Proper resource allocation - Tune
concurrent,cpus, andmemory
Our pipeline went from 5-8 minutes to under 2 minutes with these changes. Your mileage may vary, but expect 50-70% improvement on most setups.
Start with the internal connection fix - it’s the lowest effort, highest impact change you can make.
Next Steps: Implement This Today
Fix 1: Internal Connection
- Update
config.tomlwith internal IP - Add
extra_hostsfor Docker containers - Restart runner:
sudo gitlab-runner restart
Fix 2: Enable Caching
- Add
install_depsstage to pipeline - Configure cache with
package-lock.jsonkey - Set
pull_policy: if-not-present
- Tune
concurrent,cpus,memorybased on your workload - Set up
sync_developjob withPUSH_TOKEN - Add
needsdependencies for parallel execution - Configure artifact expiration policies
Additional Resources
Share this post
Found this helpful? Share it with your network!
