fix: filter wildcard DNS false positives during resolution#37
Open
0xParth wants to merge 1 commit into
Open
Conversation
Domains with wildcard DNS records (*.example.com) cause every predicted subdomain to resolve successfully, producing massive false positive counts. On a tested domain with 21 known subdomains, this inflated results from ~500 predictions to 2,199 "resolved" subdomains — all matching the same wildcard catch-all endpoint. This adds wildcard detection by probing random subdomains before resolving predictions. When a wildcard is detected, only predictions that resolve to IPs outside the wildcard set are returned. - Add detect_wildcard() that probes 3 random subdomains per apex - Modify get_registered_domains() to accept apex_domain for wildcard filtering - Pass apex_domain from main.py recursive resolution loop - Add tests for wildcard detection on non-wildcard domains - Backward compatible: apex_domain is optional, existing behavior preserved Co-Authored-By: WHO ELSE BUT!!!! AI-Session-Id: 8ef18548-aeed-4766-afa2-7d0cfcfcc6a2 AI-Tool: claude-code AI-Model: unknown
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Domains with wildcard DNS records (
*.example.com) cause every predicted subdomain to resolve successfully, producing massive false positive counts that compound through the recursive inference loop.Tested on a real domain with 21 known subdomains:
The recursive loop in
_get_domains_for_group()amplifies the problem — wildcard-resolved predictions are fed back as seeds for the next inference round, generating even more predictions that all resolve again.Fix
Adds wildcard detection to
resolve.pybefore resolving predictions:detect_wildcard(apex_domain)— probes 3 random subdomains (e.g.a8k2m9x4p1q7w3.example.com). If all resolve and share common IPs, a wildcard is present.get_registered_domains()— new optionalapex_domainparameter. When provided and a wildcard is detected, only returns predictions that resolve to at least one IP outside the wildcard set.Why IP-based filtering (not just "resolves = exists")
On wildcard domains, DNS resolution alone is meaningless — everything resolves. But real subdomains with explicit A records often point to different IPs than the wildcard. This approach catches those while filtering out the noise.
Changes
subwiz/resolve.pydetect_wildcard(),_resolve_ips()helper, updateget_registered_domains()with optionalapex_domainparamsubwiz/main.pyapex_domain=apextoget_registered_domains()in the recursive looptests/test_resolve.pyBackward Compatibility
apex_domaindefaults toNone— existing callers are unaffectedapex_domainis not provided, the original fast path (justis_registered()) is usedTest Plan
test_registered_domains— existing test passes (no regression)test_wildcard_detection_non_wildcard— confirmsdetect_wildcard()returnsNoneforhadrian.iotest_registered_domains_with_apex— confirms non-wildcard domains still resolve correctly whenapex_domainis provided*.narad.io→ CloudFront) — 2,199 false positives reduced to 0