Wire persistent DCRCredentialStore into EmbeddedAuthServer#5196
Wire persistent DCRCredentialStore into EmbeddedAuthServer#5196tgrunnagle wants to merge 6 commits intodcr-3b_issue_5184from
Conversation
b0bf320 to
1736a6e
Compare
Type EmbeddedAuthServer.dcrStore against storage.DCRCredentialStore and derive it from the same storage.Storage value returned by createStorage via a single type assertion, so a Redis-backed authserver reuses already-registered RFC 7591 clients across replicas and restarts instead of re-registering at every boot. Phase 2 left two parallel DCR stores: a runner-side in-memory map in dcr_store.go and the storage-level interface added in sub-issue 1. This collapses the runner-side implementation into a thin storageBackedStore adapter that delegates to storage.DCRCredentialStore, leaving exactly one persistence implementation per backend (storage.MemoryStorage and storage.RedisStorage). NewInMemoryDCRCredentialStore is preserved as a test helper that wraps storage.NewMemoryStorage so existing resolver tests compile unchanged; the standalone inMemoryDCRCredentialStore type and its map / RWMutex are deleted. buildPureOAuth2Config is unchanged — the wiring change swaps the implementation passed to the resolver, not the call shape. Add TestEmbeddedAuthServer_DCRSurvivesRestart in embeddedauthserver_test.go (next to TestNewEmbeddedAuthServer_DCRBoot) covering the durable-restart case: boot, close, rebuild against the same storage.MemoryStorage instance, assert the second resolve makes zero AS requests. The integration_test.go file under pkg/authserver would otherwise be the natural home, but it is in package authserver and importing runner from there would cycle (runner already imports authserver); the test docstring records this constraint.
Fixed issues from code review of #5185 wiring change: - HIGH: Storage backend leaked on NewEmbeddedAuthServer error paths. Split the constructor into a public NewEmbeddedAuthServer that calls createStorage and an unexported newEmbeddedAuthServerWithStorage that owns the cleanup contract via a deferred Close gated on a named return error. Verified by TestNewEmbeddedAuthServer_ClosesStorageOnError using a closeTrackingStorage wrapper. - MEDIUM: Comment claimed interface embedding that did not exist. Embed storage.DCRCredentialStore in the storage.Storage interface instead, promoting the runtime type assertion to a compile-time guarantee (the AC's explicitly preferred outcome). The dead error branch and its outdated comment are gone; mocks regenerated via task gen. - MEDIUM: Test placement deviated from AC instruction. Moved TestEmbeddedAuthServer_DCRSurvivesRestart out of the runner package and into a new pkg/authserver/integration_dcr_restart_test.go in package authserver_test, so the test lives next to the other pkg/authserver integration tests without inducing the runner -> authserver import cycle. Added a small public DCRStore() accessor on EmbeddedAuthServer mirroring existing IDPTokenStorage / UpstreamTokenRefresher accessors. - MEDIUM: Durable-restart not exercised end-to-end. Strengthened the restart test to go through NewEmbeddedAuthServer for the first boot (full constructor path with DCR), capture the storage via the new DCRStore() accessor, and assert the DCR row survives the first server's Close. The full "boot, close, boot again, observe zero /register" scenario remains a documented gap (the production Redis path requires Sentinel which miniredis does not speak); the gap and the conditions under which it can be closed are recorded in the test docstring per the review's accept-the-gap branch.
781a0b9 to
565fade
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## dcr-3b_issue_5184 #5196 +/- ##
=====================================================
+ Coverage 67.81% 67.87% +0.05%
=====================================================
Files 610 610
Lines 62379 62420 +41
=====================================================
+ Hits 42302 42366 +64
+ Misses 16902 16877 -25
- Partials 3175 3177 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
tgrunnagle
left a comment
There was a problem hiding this comment.
Multi-Agent Consensus Review
Agents consulted: concurrency, architecture, test-coverage, security, general-quality. Codex unavailable (skipped).
Consensus Summary
| # | Finding | Consensus | Severity | Action |
|---|---|---|---|---|
| F1 | Tests leak MemoryStorage cleanupLoop goroutine via newInMemoryDCRResolutionCache | 9/10 | MEDIUM | Fix |
| F3 | Test names TestInMemoryDCRResolutionCache_* reference deleted type |
7/10 | MEDIUM | Fix |
| F4 | newInMemoryDCRResolutionCache doc still markets it for production deployments | 7/10 | MEDIUM | Fix |
| F5 | Restart test name overstates coverage; depends on undocumented MemoryStorage post-Close behavior | 9/10 | MEDIUM | Fix |
| F6 | Storage interface widening (DCRCredentialStore embed) couples impls to DCR; broadens secret reach | 7/10 | MEDIUM | Discuss |
| F7 | resolutionToCredentials/credentialsToResolution drop fields with no round-trip test | 7/10 | MEDIUM | Fix |
| F8 | Deferred-cleanup slog.Warn logs unredacted retErr; can leak upstream response body |
7/10 | MEDIUM | Fix |
| F9 | newMockUpstreamAS duplicates pkg/authserver/runner.newMockAuthorizationServer | 7/10 | LOW | Fix |
| F10 | DCRStore() accessor: secret reach + delegation pattern + lifecycle docs | 8/10 | MEDIUM | Discuss |
| F11 | PR is DRAFT; size over budget; e2e gap not documented | n/a | INFO | Comment |
Overall
The wiring change is structurally correct: deferred-cleanup contract via named return retErr works, no double-close path between the deferred close and EmbeddedAuthServer.Close → server.Close → storage.Close, and closeOnce continues to make the caller path idempotent. The architectural gate from #4978 (buildPureOAuth2Config remains pure) is preserved. The closeTrackingStorage test seam is the right shape for verifying the contract — interface seam, not a hook field.
The dominant cluster of findings is comment drift and test-naming drift around the dcrResolutionCache / inMemoryDCRResolutionCache rename (F3, F4) plus a real test-infra goroutine leak that this PR introduces by routing test fixtures through storage.NewMemoryStorage (F1). These are fast to fix.
F6 and F10 are the design-discussion items. Embedding DCRCredentialStore into Storage was a deliberate choice to land a compile-time guarantee, but it widens the secret-bearing surface across every consumer of storage.Storage and every future backend. The new public DCRStore() accessor returns the raw secret-bearing interface and breaks the e.server.* delegation pattern used by sibling getters; if its only consumer is the integration test, a test-only export would be cleaner. Both are worth a deliberate decision rather than a side effect of compile-time-safety convenience.
F5 is the integration test's coverage story: the TestEmbeddedAuthServer_DCRSurvivesRestart name promises a restart, but the assertion is "reading from the same MemoryStorage instance after Close still works," which is an undocumented contract of MemoryStorage. The test docstring honestly acknowledges the cross-boot scenario is deferred — but the test name and the PR's user-visible "survives restart with Redis" promise are not yet in the suite.
Documentation
dcrResolutionCacheinterface doc andnewInMemoryDCRResolutionCachedoc need updates per F4.EmbeddedAuthServer.dcrStorefield comment +DCRStore()doc need a security note per F10.- The
Storageinterface doc should call out the DCR-credential reach if F6's design lands as embedded.
Verification notes
Three findings raised by individual agents were verified against the post-PR file state and dropped as false positives — they cited the deleted (-) lines in the diff as if they were still in the new file. Specifically: (a) one HIGH security finding claimed RedisStorage does not implement DCRCredentialStore — verified false at pkg/authserver/storage/redis.go:2040 on the base branch dcr-3b_issue_5184; (b) two findings about the dcrResolutionCache interface doc retaining "in-memory" / "never expired" claims — verified false; the post-PR file at lines 33-46 contains only the new "thin adapter" framing.
Process notes
- Self-review caveat: the authenticated user is the PR author, so the review event is necessarily
COMMENT(GitHub rejects REQUEST_CHANGES on own PRs). The findings would otherwise lean towardREQUEST_CHANGESdue to the cluster of MEDIUM consensus items. - Per
.claude/rules/pr-creation.md, this PR exceeds the 400-line / 10-file budget — the author already acknowledged this in the body. Splitting the integration test (217 LOC) into a follow-up would put the wiring change inside budget.
Generated with Claude Code
Addresses #5196 review comments: - MEDIUM types.go (3203565433) F6: Storage no longer embeds DCRCredentialStore. The embed promoted GetDCRCredentials / StoreDCRCredentials onto every Storage consumer (handlers, registration, session, etc.), broadening the surface that can read raw client_secret and registration_access_token. The compile-time guarantee is preserved by the per-backend var _ DCRCredentialStore = (*MemoryStorage)(nil) / (*RedisStorage)(nil) checks; the runner and authserver constructors now obtain the DCR-capable handle via an explicit type assertion at the boundary, fail-loud if a future backend omits DCR. - MEDIUM embeddedauthserver.go (3203565453) F10: DCRStore() delegates through e.server.DCRStore() instead of holding a redundant e.dcrStore field. The Server interface now exposes DCRStore() with a SECURITY + lifecycle doc noting the returned handle surfaces raw secrets and is bound to Server.Close. Eliminates the drift window between EmbeddedAuthServer and the underlying authserver if the server ever swaps backends.
Addresses #5196 review comments: - MEDIUM dcr_store.go (3203565416) F1: replace newInMemoryDCRResolutionCache (which leaked a MemoryStorage cleanupLoop goroutine on every call) with a test-only newMemoryDCRStore in dcr_testhelpers_test.go that takes *testing.T and registers t.Cleanup(stor.Close). Updates ~32 call sites across dcr_test.go, dcr_store_test.go, and embeddedauthserver_test.go. - MEDIUM dcr_store_test.go (3203565423) F3: rename test functions from TestInMemoryDCRResolutionCache_* to TestStorageBackedStore_* so the suite names match the type they actually exercise (the deleted inMemoryDCRResolutionCache type no longer exists). - MEDIUM dcr_store.go (3203565426) F4: dropped the production-marketing paragraphs from the helper's docstring (production code in NewEmbeddedAuthServer no longer reaches it). The new docstring on newMemoryDCRStore states the test-only purpose and the cleanup contract directly. - MEDIUM integration_dcr_restart_test.go (3203565429) F5: rename TestEmbeddedAuthServer_DCRSurvivesRestart -> TestEmbeddedAuthServer_DCRStorePersistsAcrossClose, and refactor to read credentials BEFORE Close. The test no longer silently depends on MemoryStorage's undocumented post-Close readability. Tightened the 44-line docstring to scope it to what is exercisable today and a pointer to the deferred follow-up. - LOW integration_dcr_restart_test.go (3203565449) F9: added "DO NOT COPY THIS A THIRD TIME" tripwire on both newMockUpstreamAS (authserver_test) and newMockAuthorizationServer (runner) directing the next caller to extract the helper to a shared pkg/authserver/internal/testhelpers package before duplicating it.
Addresses #5196 review comments: - MEDIUM dcr_store.go (3203565436) F7: added TestResolutionCredentialsRoundTrip pinning the field-by-field contract between resolutionToCredentials and credentialsToResolution (preserved, dropped, key-recovered, nil-shortcircuit). Added MUST-update-both-converters comments on the DCRResolution struct in dcr.go and the DCRCredentials struct in storage/types.go so a future contributor adding a field to either type sees the converter obligation at the struct definition rather than only at the converters. Documented the ProviderName asymmetry: the field is storage-only ("debug/audit only" per its own docstring) and is intentionally left unpopulated by the runner; the test asserts that invariant so any future threading is paired with the assertion update. - MEDIUM embeddedauthserver.go (3203565446) F8: route both closeErr and retErr in the deferred-cleanup slog.Warn through sanitizeErrorForLog so a wrapped DCR failure whose error chain inlines an upstream /register response body cannot leak userinfo, query, or fragment components into operator logs. Renamed the "original_error" key to "cause" to match the package-wide vocabulary. Added TestNewEmbeddedAuthServer_DeferredCleanupSanitizesLog which captures the Warn record by swapping slog.Default() and asserts both that literal secret markers are scrubbed and that host components survive for operator correlation.
|
Re: F11 (PR state / size / e2e gap from the multi-agent review body): noted, no commit. The size-over-budget acknowledgement and the deferred Sentinel-restart e2e are already called out in the PR body, and the PR remains in draft pending the size discussion. The Sentinel-restart gap is now explicitly recorded in the docstring on |
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
Drops the package authserver_test workaround introduced to break the runner -> authserver import cycle that blocked extending pkg/authserver/integration_test.go. The test's actual subject is the runner-side DCR-store wiring (it goes through runner.NewEmbeddedAuthServer and asserts on runner.EmbeddedAuthServer.DCRStore()), so the runner package is the natural home and matches the location of its sibling TestNewEmbeddedAuthServer_DCRBoot / _ClosesStorageOnError tests. The runner-package newMockAuthorizationServer helper replaces the duplicate newMockUpstreamAS the cross-package placement forced; the "DO NOT COPY THIS A THIRD TIME" tripwire on it is dropped now that the duplication is gone.
PR size has been reduced below the XL threshold. Thank you for splitting this up!
|
✅ PR size has been reduced below the XL threshold. The size review has been dismissed and this PR can now proceed with normal review. Thank you for splitting this up! |
DRAFT - not ready for review
Summary
DCRCredentialStorethat lived only in the runner package. Restarting (or scaling out to) an authserver dropped every RFC 7591 client registration on the floor and re-registered against the upstream on every boot, which is unworkable for the Datadog-style upstream demo and for any multi-replica deployment. This PR wires the persistentDCRCredentialStoreintroduced in earlier sub-issues (in-memory + Redis backends) intoEmbeddedAuthServerso a Redis-backed authserver reuses already-registered clients across replicas and restarts.EmbeddedAuthServer.dcrStoreis now typed againststorage.DCRCredentialStoreand is derived from the samestorage.Storagevalue returned bycreateStorage, so a singlestorage_type: redisconfig toggles DCR persistence alongside the rest of authserver state. Thestorage.Storageinterface embedsstorage.DCRCredentialStore, promoting the previously-needed runtime type assertion to a compile-time guarantee.DCRCredentialStoreinpkg/authserver/runner/dcr_store.gois collapsed into a thinstorageBackedStoreadapter that delegates tostorage.DCRCredentialStoreand translatesDCRResolution<->DCRCredentialsat the boundary. There is now exactly one persistence implementation per backend.NewEmbeddedAuthServer(creates storage) and an unexportednewEmbeddedAuthServerWithStorage(owns the cleanup contract). Any error after entry closes the storage backend via a deferred cleanup gated on a named return error, so a crash-looping caller no longer leaks the Redis client connection pool /MemoryStoragecleanup goroutine on every restart.buildPureOAuth2Configremains pure (unchanged signature, noctx, no I/O);buildUpstreamConfigsis the boundary that consumes the resolver and overlays DCR-resolved credentials onto each upstream.DCRStore()accessor onEmbeddedAuthServermirroringIDPTokenStorage/UpstreamTokenRefresher, used by integration tests to verify the resolver and the authserver write through the same backend.This PR also lands the dependency stack that #5185 builds on (the persistent
DCRCredentialStoretypes + memory backend, the Redis backend, the operator CRD surface for DCR, and the runner-side DCR resolver wiring). Each layer was developed and reviewed as a separate commit on this branch; commits are sequenced so each one builds and tests cleanly.Closes #5185
Type of change
Test plan
task test)task test-e2e)task lint-fix)Notable test coverage added by this PR:
pkg/authserver/integration_dcr_restart_test.go(new) —TestEmbeddedAuthServer_DCRSurvivesRestartboots anEmbeddedAuthServeragainst a mock AS, captures the DCR store via the newDCRStore()accessor, closes the server, and asserts the persisted DCR row survives the first server'sClose. Lives inpackage authserver_testto avoid the runner -> authserver import cycle. The full "boot, close, boot again, observe zero/register" scenario across a fresh constructor is documented as a gap (the production Redis path requires Sentinel, which miniredis does not speak); test docstring records the conditions under which it can be closed.pkg/authserver/runner/embeddedauthserver_test.go—TestBuildUpstreamConfigs_DCRexercises first-call registration + cache-hit on the second call (zero additional HTTP requests) and asserts the caller'sRunConfig.Upstreamsslice is never mutated.TestNewEmbeddedAuthServer_ClosesStorageOnErroruses acloseTrackingStoragewrapper to verify the deferred-cleanup contract.pkg/authserver/storage/memory_test.go,redis_test.go,redis_integration_test.go— coverage for the persistentDCRCredentialStoreoperations on both backends, includingScopesHashcanonicalisation (sort + dedupe + newline join).API Compatibility
v1beta1API, OR theapi-break-allowedlabel is applied and the migration guidance is described above.The CRD changes are additive:
OAuth2UpstreamConfig.clientIdbecomes optional with a CEL constraint requiring exactly one ofclientIdordcrConfig, and a newdcrConfigfield is added. ExistingMCPExternalAuthConfig/VirtualMCPServerresources that setclientIdcontinue to validate unchanged.Does this introduce a user-facing change?
Yes. Operators of OAuth2 upstreams can now configure RFC 7591 Dynamic Client Registration in the operator CRD via
dcrConfig(withdiscoveryUrlorregistrationEndpoint, plus optionalinitialAccessTokenRef,softwareId,softwareStatement) instead of statically configuringclientId+clientSecret. When the authserver is configured withstorage_type: redis, DCR registrations persist across restarts and are shared across replicas; in single-replica memory mode, registrations live for the process lifetime as before.Special notes for reviewers
DCRCredentialStoretypes + memory + Redis backends), the operator CRD surface, and the Phase 2 resolver wiring. The size is above the usual 400-line / 10-file limit; each commit is self-contained and the stack reads top-to-bottom in commit order. Reviewers may prefer to walk the per-commit diffs./register" cross-constructor restart scenario is not exercised; closing it requires either miniredis-Sentinel emulation or a Docker-based Redis Sentinel cluster in the test harness. The wiring that the second boot would consume — the type ofdcrStorebeing the samestorage.DCRCredentialStorethatauthserver.Newwrites through — is verified at compile time bystorage.Storageembeddingstorage.DCRCredentialStoreand byTestEmbeddedAuthServer_DCRSurvivesRestartasserting the persistence boundary.buildPureOAuth2Configwas kept intentionally pure (noctx, no I/O) to preserve the architectural gate established in Authserver DCR integration (Phase 2, Steps 2a-2g) #4978; the wiring change swaps the implementation passed into the resolver, not the call shape.client_secret,registration_access_token,initial_access_token, refresh tokens) appear as arguments toslog.*calls; the grep assertion from Authserver DCR integration (Phase 2, Steps 2a-2g) #4978 still applies.