Infinite Ephemeral Preview Environments with ArgoCD

Howdy, folks!

If you’ve ever worked in a team that ships fast, you’ve probably lived through this exact moment:

“Hey, can you not deploy to dev right now? I’m testing my PR there.”

That sentence used to play on loop in our Slack. This blog is the story of how we made it disappear.

The Problem with a Shared `dev` Environment

For the longest time, we ran the classic three-environment setup: dev → stage → prod.

The workflow was simple in theory:

You’d open a feature PR.
To test it end-to-end, you’d point the dev environment to your feature branch.
You’d test, find bugs, push fixes, retest.

Sounds fine — until two engineers want to test their PRs at the same time. Suddenly:

dev is pinned to someone else’s feature branch.
Your changes aren’t even running there.
QA is testing a build that’s a Frankenstein of two half-baked features.
The “what’s actually deployed on dev right now?” question becomes a daily standup item.

dev stopped being a shared environment and became a shared bottleneck.

We needed a model where every PR got its own isolated, real environment — without humans coordinating who owns dev this hour.

The Solution: One Environment Per PR

The shape of the answer was obvious: ephemeral preview environments, spun up per PR, on demand.

The interesting part is the implementation. Our constraints:

It must be opt-in. Not every PR needs a full environment — only the ones that touch deployable services.
It must be fully isolated. Its own pods, services, domain, and config.
It must be self-cleaning. Once the PR is merged or closed, the environment vanishes.
It must give the developer clear feedback right inside the PR, where they’re already working.

Here’s the stack we landed on:

ArgoCD ApplicationSet + Pull Request Generator — discovers PRs and templates Argo Applications from them.
Helm — our existing app chart, parameterized for preview deploys.
GitHub Actions — orchestrates the workflow, posts comments, and copies env vars.
HashiCorp Vault — the source of truth for environment variables.
ExternalDNS + Route 53 — automatically provisions a DNS record per preview.

The trigger? A single label on the PR: preview.

How It Works

1. Tag a PR with `preview`

That’s the entire developer-facing API.

The moment a PR is labeled preview, two things kick off in parallel:

A GitHub Actions workflow starts preparing the environment-specific config (pulling secrets/env vars from Vault, scoped just for this PR).
ArgoCD’s Pull Request Generator notices the labeled PR on its next poll and starts templating an Application for it.

The PR gets an immediate comment from the bot:

🚀 Spinning up your preview environment… I’ll update this comment when it’s ready.

This single comment becomes the dev’s “status page” for their preview. No tab-switching, no Slack-asking.

2. ArgoCD Pull Request Generator does the heavy lifting

ArgoCD’s ApplicationSet controller has a built-in Pull Request Generator that watches your repo and emits a parameter set for every open PR. Combine it with a label filter and you get this beautifully declarative setup (sanitized version of what we actually run):

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: example-service-preview
  namespace: argocd
spec:
  generators:
    - pullRequest:
        github:
          owner: <org>
          repo: example-service
          tokenRef:
            secretName: github-token
            key: token
          labels:
            - preview
        requeueAfterSeconds: 100
  template:
    metadata:
      name: 'example-service-preview-{{number}}'
    spec:
      source:
        repoURL: https://github.com/<org>/gitops.git
        targetRevision: HEAD
        path: charts/universal-chart
        helm:
          valueFiles:
            - values.yaml
            - values/example-service/preview.yaml
          parameters:
            - name: nameOverride
              value: 'example-service-{{number}}'
            - name: fullnameOverride
              value: 'example-service-{{number}}'
            - name: image.repository
              value: '<acct>.dkr.ecr.<region>.amazonaws.com/example-service-preview-{{number}}'
            - name: image.tag
              value: 'preview-{{head_sha}}'
            # Vault paths — preview-scoped env vars + shared sensitive secrets
            - name: secretRef
              value: 'preview/example-service-{{number}}'
            - name: sensitiveSecretRef
              value: 'sensitive/example-service'
            - name: ingress.hosts[0].host
              value: 'example-service-{{number}}.stage.example.com'
      destination:
        server: https://kubernetes.default.svc
        namespace: preview
      project: default
      syncPolicy:
        syncOptions:
          - CreateNamespace=true
          - RespectIgnoreDifferences=true

A few things to call out:

labels: [preview] — the entire opt-in mechanism. PRs without this label are completely ignored.
{{head_sha}} — every push to the PR branch becomes a new image tag and a new sync. The preview always reflects the latest commit.
requeueAfterSeconds: 100 — how often the PR Generator polls GitHub to re-check open PRs, their labels, and their head SHAs. Every ~100s the controller asks GitHub: “what PRs are open with the preview label, and what’s the latest commit?” New labels, new commits, removed labels, and closed PRs all flow through this single poll. The default is 30 minutes — way too slow for an interactive workflow. Going much lower than 100s will start chewing through your GitHub API rate limit, especially once you have many ApplicationSets across many repos. 100s is the sweet spot we landed on: changes feel near-instant without straining the API budget.
One Helm chart for everything. We use a single internal universal-chart for all services; each service just supplies a tiny preview values file. New services get previews “for free.”
Per-PR ECR repo. Each PR’s images go to a dedicated ECR repo with a lifecycle policy that keeps only the last 6 images. No image bloat.
<service>-{{number}}.stage.example.com — every PR gets a unique, predictable URL on our stage zone.

3. ExternalDNS + Route 53 give it a real (private) URL

The Helm chart renders an Ingress with the per-PR host. ExternalDNS sees the new Ingress, talks to Route 53, and creates the DNS record. Cert-manager handles TLS via a wildcard cert on *.stage.<app>.com.

One important detail: all preview URLs are private and behind our VPN by default. They resolve to internal load balancers, not public ones. There’s no scenario where a half-baked feature on a PR is reachable from the open internet — only engineers on the VPN can hit it. (For more on how we run that VPN, see the OpenVPN-on-Kubernetes guide.)

The developer doesn’t have to know any of this exists. They just get a working HTTPS URL that works the moment they connect to VPN.

4. Env vars from Vault — cloned from stage, then yours to mutate

This was the part that made the whole thing actually usable, and it deserves a careful explanation because it’s the most subtle bit.

The goal isn’t “give the preview some secrets.” The goal is:

Let me change env vars for my PR’s testing without touching stage’s env vars.

Stage is a shared environment. If two PRs both need to flip a feature flag or point to a different upstream URL, they’d step on each other — same problem as the old shared dev, just one floor up.

So the pattern is clone-on-create, owned-by-PR:

The GitHub Action authenticates to Vault.
It checks if <project>/preview/<service>-<pr-number> exists in Vault. If yes, leave it alone.
If not, it reads <project>/stage/<service> (the stage env vars), and writes them into <project>/preview/<service>-<pr-number> via Vault APIs.

vault kv get -format json -field=data \
  ${PROJECT}/stage/${SERVICE} > secret.json

vault kv put \
  ${PROJECT}/preview/${SERVICE}-${PR_NUMBER} @secret.json

That’s it. The preview env now has its own copy of env vars, seeded from stage. The developer can:

Flip a feature flag for just their preview.
Point to a sandbox payment provider for one specific test.
Add a temporary debug variable.

…all without touching stage and without coordinating with anyone else. Each PR comment includes a direct Vault UI link to the PR’s secret path, so devs can edit it from their browser and re-trigger the workflow.

The Helm chart pulls these into the pod via two refs — one for the per-PR copy (secretRef) and one for shared sensitive secrets that should never be cloned into a per-PR path (sensitiveSecretRef, e.g. third-party API keys, prod-shared infra credentials).

5. The PR comment is the source of truth

Every state change posts back to the PR:

⏳ Preview workflow started — commit a1b2c3d · view run

✅ Preview is live! → https://pr-1234.preview.primetrace.com Namespace: preview-pr-1234 · Image: my-app:a1b2c3d

Or, when things go sideways:

❌ Preview deploy failed — Helm sync error on Deployment/my-app. view logs

Putting the URL, status, image tag, and a link to logs directly in the PR removed the entire “where do I find my preview?” cognitive overhead.

6. Cleanup on merge (or close)

This is where the whole thing stays sustainable.

When the PR is merged or closed, a pr-closed GitHub Action fires and:

Deletes the per-PR Vault secret at <project>/preview/<service>-<pr-number>.
Deletes the per-PR ECR repo (--force, since we don’t need to keep the images around).
ArgoCD’s Pull Request Generator stops emitting that PR’s parameters on its next poll, and the ApplicationSet controller removes the corresponding Application — taking the Deployment, Service, Ingress, and per-PR resources with it.
ExternalDNS removes the Route 53 record.

But what about PRs that never get merged or closed — the ones that just go stale? We have a scheduled GitHub Action for that:

name: Close Stale PRs
on:
  schedule:
    - cron: '30 7 * * *' # daily at 1 PM IST
jobs:
  stale:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/stale@v9
        with:
          days-before-stale: 15
          days-before-close: 7
          stale-pr-label: 'stale'
          stale-pr-message: >
            This PR has been marked as stale due to inactivity.
            It will be closed in 7 days if no further activity occurs.

A PR with no activity for 15 days gets a stale label. 7 more days of silence and it auto-closes — which fires the same pr-closed cleanup workflow above. No orphaned namespaces, no zombie DNS records, no forgotten Vault secrets, no rotting ECR repos.

The cluster goes back to a clean state on its own.

What This Unlocked

The metrics we cared about:

Zero coordination overhead. Two engineers shipping unrelated features no longer interfere with each other.
QA can test multiple PRs in parallel. Each PR has a stable URL they can bookmark, share, and revisit.
Faster feedback loops. Push a commit → ArgoCD syncs → preview updates in under a minute, with a fresh PR comment.
Catch integration bugs earlier. Real DNS, real ingress, real secrets — issues that only show up in a “real” environment now show up at PR time, not at stage time.
We deleted dev entirely. This is the punchline. Once every developer could spin up their own real environment on demand, the shared dev cluster had no reason to exist. We turned it off. Less infra to maintain, less drift, less “what’s running on dev right now?” — and one fewer environment for releases to flow through.

Things to Watch Out For

A few lessons from running this in production:

Cost. Every preview spins up real pods. Add a requeueAfterSeconds budget, set sane requests/limits in your preview values file, and consider a max-PR cap if you’re cost-sensitive.
Stateful dependencies. If your app needs a DB, decide early: shared sandbox DB, per-PR ephemeral DB (a small MySQL spun up alongside the preview works for most of our services), or point at stage’s DB. There’s no free answer here — we mostly point previews at the stage MySQL since the env-var override pattern lets each PR target a different schema or read replica when needed.
Vault scoping. Don’t be lazy and grant the preview workflow access to prod paths. Use a dedicated Vault role that can only read preview/*.
Wildcard certs. A wildcard cert on *.preview.example.com saves you from per-PR cert provisioning hell.
Cleanup-on-failure / abandonment. PRs sometimes get abandoned without ever being closed. The stale-PR cron + pr-closed workflow combo is what keeps the cluster honest — without that, you will end up with a graveyard of half-built preview namespaces six months in.

Closing Thoughts

The original problem (“don’t deploy to dev, I’m testing!”) was a process problem masquerading as a tooling problem. We could’ve solved it with a Slack bot that managed an environment lock, or a calendar of who-owns-dev-this-hour. That would’ve worked. It also would’ve felt like 2014.

The right fix was to make the constraint disappear: instead of fighting over one environment, give every PR its own. Once we did that, the shared dev environment had no reason to exist anymore — so we deleted it. The bottleneck became the answer.

The pieces — ArgoCD ApplicationSet, Pull Request Generator, Helm, GitHub Actions, ExternalDNS, Vault — already exist in most modern stacks. The trick is wiring them together with a single, dead-simple developer interface: one label.

Tag your PR preview. Get a private, VPN-only environment with its own URL and its own Vault-backed env vars. Merge the PR (or let it go stale). The environment goes away.

That’s the whole product.

If you’re running into the same shared-dev pain, give this pattern a try — and if you do something clever on top of it, I’d love to hear about it. Reach out on LinkedIn.

Happy shipping. 🚀

On This Page

Infinite Ephemeral Preview Environments with ArgoCD

The Problem with a Shared `dev` Environment

The Solution: One Environment Per PR

How It Works

1. Tag a PR with `preview`

2. ArgoCD Pull Request Generator does the heavy lifting

3. ExternalDNS + Route 53 give it a real (private) URL

4. Env vars from Vault — cloned from stage, then yours to mutate

5. The PR comment is the source of truth

6. Cleanup on merge (or close)

What This Unlocked

Things to Watch Out For

Closing Thoughts

Related Post

Simplifying Kubernetes Service Access with OpenVPN - A Complete Production Guide

Stay in the loop

On This Page

Infinite Ephemeral Preview Environments with ArgoCD

The Problem with a Shared dev Environment

The Solution: One Environment Per PR

How It Works

1. Tag a PR with preview

2. ArgoCD Pull Request Generator does the heavy lifting

3. ExternalDNS + Route 53 give it a real (private) URL

4. Env vars from Vault — cloned from stage, then yours to mutate

5. The PR comment is the source of truth

6. Cleanup on merge (or close)

What This Unlocked

Things to Watch Out For

Closing Thoughts

Related Post

Simplifying Kubernetes Service Access with OpenVPN - A Complete Production Guide

Stay in the loop

The Problem with a Shared `dev` Environment

1. Tag a PR with `preview`