feat(lifecycle): skip index creation on large collections and mass-expiration rules#2748
feat(lifecycle): skip index creation on large collections and mass-expiration rules#2748delthas wants to merge 1 commit into
Conversation
Hello delthas,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
Request integration branchesWaiting for integration branch creation to be requested by the user. To request integration branches, please comment on this pull request with the following command: Alternatively, the |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files
... and 1 file with indirect coverage changes
@@ Coverage Diff @@
## development/9.4 #2748 +/- ##
===================================================
+ Coverage 74.65% 74.69% +0.03%
===================================================
Files 199 200 +1
Lines 13654 13682 +28
===================================================
+ Hits 10194 10220 +26
- Misses 3450 3452 +2
Partials 10 10
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
…piration rules Add two pre-flight heuristics to the conductor's index-creation path so lifecycle stays on v1 (full scan) instead of paying the v2 index build cost: - Collection too big: skip when count > 1M or storageSize > 1 GiB, where the build's transient disk usage is unpredictable on existing large collections. - Mass-expiration rules: skip when every enabled rule has no filter and an Expiration action that is already-past / within ~1 day (date) or days <= 1; the index would add little selectivity and be churned before it pays off. Buckets with no enabled selective rule (all disabled / none) also skip. Both complement the existing free-disk (BB-753) and SosAPI (BB-778) checks; buckets that already have the v2 indexes keep using v2. The per-rule verdict lives in util/mongoRules.js (ruleAllowsIndexSkip / allRulesAreMassExpiration), precomputed once per bucket and plumbed via task.allRulesAreMassExpiration. Issue: BB-779
7523139 to
8e0341f
Compare
|
LGTM — both new guards are well-placed and well-tested. The mass-expiration check correctly short-circuits before the MongoDB round trips, the collection-size check reuses the existing collStats fetch, and the vacuous-truth behavior for empty/null rules is safe because the isLifecycled guard fires first. Test coverage is thorough (19 edge-case tests for mongoRules, 6 new conductor tests including the short-circuit verification). No correctness, async-handling, or metric issues found. |
francoisferrand
left a comment
There was a problem hiding this comment.
not sure if this should land in 9.4 or 9.5, let's confirm this with product team
| return false; | ||
| } | ||
| if (action.date && new Date(action.date).getTime() <= dateCutoff) { | ||
| return true; |
There was a problem hiding this comment.
that is not correct:
- setting a date "10 years ago" will not remove a lot
- esp. once the "mass deletion" has been done, the rule deprecates...but will still prevent creating an index
→ this condition would be something like new Date(action.date).getTime() >= currentDate.getTime() - 24 * 60 * 60 * 1000
| if (action.date && new Date(action.date).getTime() <= dateCutoff) { | ||
| return true; | ||
| } | ||
| if (action.days !== undefined && action.days <= 1) { |
There was a problem hiding this comment.
some is not appropriate for this case: e.g. we often want to expire non-current versions "quickly", but not current versions... If we want to define an heuristic on days, should be when all actions match.
| } | ||
| // Allow a day of slack: a rule expiring within ~1 day still clears the | ||
| // whole collection, same as Days <= 1. | ||
| const dateCutoff = currentDate.getTime() + 24 * 60 * 60 * 1000; |
There was a problem hiding this comment.
this 24h should be another constant
| const MAX_AUTO_INDEX_DOC_COUNT = 1000000; | ||
| const MAX_AUTO_INDEX_STORAGE_BYTES = 1024 * 1024 * 1024; |
There was a problem hiding this comment.
I wonder if we should make these "customizable" (i.e. try read from env) ?
i.e. could we have case where we want/need to override the heuristic threshold, at some customer?
(it seems we would only skip v1, max-expire objects, then create index : so not really needed, but I wonder if there could be corner cases...)
| // Operates on the MongoDB internal lifecycle format (lowercase ruleStatus, | ||
| // actions[], filter.rulePrefix) — NOT the AWS S3 format used by util/rules.js. | ||
|
|
||
| function ruleAllowsIndexSkip(rule, currentDate = new Date()) { |
There was a problem hiding this comment.
function name is not clear...
not sure I like allRulesAreMassExpiration, but at least it is clear → ruleIsMassExpiration ?
| } | ||
|
|
||
| function allRulesAreMassExpiration(rules, currentDate = new Date()) { | ||
| return (rules || []).every(r => ruleAllowsIndexSkip(r, currentDate)); |
There was a problem hiding this comment.
should it not be "any" ?
as long as we have 1 mass-expiration, everything will expire and index is not used... (for now, i.e. until the max expiration has been done)
| const BUCKET_CHECKPOINT_PUSH_NUMBER = 50; | ||
| const BUCKET_CHECKPOINT_PUSH_NUMBER_BUCKETD = 50; | ||
| const ACCOUNT_SPLITTER = ':'; | ||
| const MAX_AUTO_INDEX_DOC_COUNT = 1000000; |
There was a problem hiding this comment.
is 1'000'000 that many documents?
That would be about 1GB of total metadata usage, is it really a high-enough threshold?
(same for index: is 1GB of index really the turning point)
| if (action.date && new Date(action.date).getTime() <= dateCutoff) { | ||
| return true; | ||
| } | ||
| if (action.days !== undefined && action.days <= 1) { |
There was a problem hiding this comment.
is days a good criteria for mass-expiration?
- Some use-case may rely on (lots of) short-lived data, and thus use these even though they could benefit form index (lots of data created in the day)
- What we really need is one "pattern" we can document to expire the whole bucket: this is the one that really needs to be detected. Others are probably corner cases, and handled by index failure handling/...
Summary
Add two pre-flight heuristics to the conductor's index-creation path so we keep lifecycle on v1 (full scan) in these cases instead of paying the build cost:
Collection too big — if
count > 1MORstorageSize > 1 GiB, skip auto-creating v2 lifecycle indexes. The build's transient disk usage on existing large collections is unpredictable; spill files plus the final index can easily exceed 5–10× the_id_index size during the merge phase. Risky on tight deployments.Mass-expiration rules — if every enabled rule has no filter (no
rulePrefix, notags) and anExpirationaction that is either already-past or within ~1 day in the future (date) or smallest-legal (days <= 1), skip auto-creating the indexes. They'd provide little selectivity benefit and be churned out before paying for themselves. The "all rules disabled / no rules at all" case also skips (vacuous truth — no enabled selective rule to benefit from indexing).Both guards complement (do not replace) the BB-753 free-disk check and the BB-778 SosAPI check. Buckets that already have the v2 indexes continue to use v2 regardless of these guards.
Implementation notes
listMongodbBucketsand plumbed viatask.allRulesAreMassExpiration.collStatsfetch BB-753 already performs — no extra round trip.ruleStatus,actions[], etc.), distinct from the AWS S3 format thatutil/rules.jsandRulesReducer.jsconsume. The precedent for parsing the MongoDB internal shape isLifecycleOplogPopulatorUtils.isBucketExtensionEnabled.LifecycleMetrics.onLegacyTask:'collectionTooBig'and'massExpirationRule'.Design decisions (open to bikeshedding)
util/mongoRules.jsfile rather than inlining inLifecycleConductor.jsor extendingLifecycleOplogPopulatorUtils.js. Rationale: the helpers are pure rule-shape predicates, not conductor- or populator-specific. Direct unit testability (19 edge-case tests inmongoRules.spec.js) is worth the extra file. Codebase precedent:util/rules.js+util/RulesReducer.jsare both single-consumer files with their own specs.task.isLifecycledandtask.hasSosApi— both flat primitives). The alternative —task.lifecycleRules: [...]with the helper called inside_indexesGetOrCreate— would offer flexibility for future rule-shape checks but breaks the existing flat-primitive convention. If/when we accumulate 3+ rule-based heuristics, that's the natural refactor moment.Full rationale in BB-779.
Issue: BB-779