Skip to content

fix(tmq): modify the condition of checking vnode split finished#35273

Open
wangmm0220 wants to merge 1 commit intomainfrom
fix/6981832031
Open

fix(tmq): modify the condition of checking vnode split finished#35273
wangmm0220 wants to merge 1 commit intomainfrom
fix/6981832031

Conversation

@wangmm0220
Copy link
Copy Markdown
Contributor

Description

Issue(s)

Checklist

Please check the items in the checklist if applicable.

  • Is the user manual updated?
  • Are the test cases passed and automated?
  • Is there no significant decrease in test coverage?

Copilot AI review requested due to automatic review settings April 30, 2026 06:55
@wangmm0220 wangmm0220 requested a review from a team as a code owner April 30, 2026 06:55
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates TMQ test utilities to make vnode-split completion checks more strict/stable, addressing flaky split-finish detection in the TMQ vnode-split test suite.

Changes:

  • Adds a consecutive-success counter when polling information_schema.ins_vnodes to confirm split completion.
  • Increases the polling sleep interval and logs query results during polling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 86 to +90
while True:
tdSql.query("select * from information_schema.ins_vnodes where db_name='dbt' and status='leader' and restored=true")
tdLog.info(tdSql.queryResult)
if tdSql.getRows() == 2:
cnt += 1
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkSplitVgroups() can loop forever if the expected vnode state is never reached (e.g., split fails or the environment differs), which can hang the entire test run. Please add a bounded timeout/retry limit and fail fast (e.g., tdLog.exit(...) with the last observed rows/result) similar to other cluster wait helpers that cap retries (e.g., ClusterComCheck.check_vgroups_status() uses count < count_number and exits on failure).

Copilot uses AI. Check for mistakes.
Comment on lines 87 to 89
tdSql.query("select * from information_schema.ins_vnodes where db_name='dbt' and status='leader' and restored=true")
tdLog.info(tdSql.queryResult)
if tdSql.getRows() == 2:
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop logs tdSql.queryResult at INFO on every poll; in this repo, raw queryResult dumps are typically logged at DEBUG to avoid noisy CI logs (see test/new_test_framework/utils/clusterCommonCheck.py where tdLog.debug(tdSql.queryResult) is used). Consider switching this to tdLog.debug(...) and, since you only use the row count, querying select count(*) instead of select * to reduce output and query overhead.

Suggested change
tdSql.query("select * from information_schema.ins_vnodes where db_name='dbt' and status='leader' and restored=true")
tdLog.info(tdSql.queryResult)
if tdSql.getRows() == 2:
tdSql.query("select count(*) from information_schema.ins_vnodes where db_name='dbt' and status='leader' and restored=true")
tdLog.debug(tdSql.queryResult)
if tdSql.queryResult[0][0] == 2:

Copilot uses AI. Check for mistakes.
Comment on lines 89 to +93
if tdSql.getRows() == 2:
cnt += 1
else:
cnt = 0
if cnt >= 10:
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requiring tdSql.getRows() == 2 to be observed 10 consecutive times plus sleep(2) makes this wait at least ~20s after the condition first becomes true, and it will noticeably slow multiple TMQ vnode-split tests that call this helper. If the intent is to avoid transient states, consider waiting for a stability duration (e.g., condition true for N seconds) with a small poll interval, or using a smaller consecutive-count threshold, so successful splits don't always pay the extra 20s.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants