Skip to content

Tamil ITN Cardinal Grammar - Dummy PR#430

Closed
retheckj-star wants to merge 2 commits into
NVIDIA:mainfrom
retheckj-star:ta-itn-dummy-pr
Closed

Tamil ITN Cardinal Grammar - Dummy PR#430
retheckj-star wants to merge 2 commits into
NVIDIA:mainfrom
retheckj-star:ta-itn-dummy-pr

Conversation

@retheckj-star
Copy link
Copy Markdown

Language: Tamil
Task: ITN

Dummy PR created as requested.

Copy link
Copy Markdown

@mayuris-00 mayuris-00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core exercise is complete and correct: the folder structure, data files, and all three TODOs are implemented properly, and the 28 core test cases should pass. The following items should be addressed before review sign-off:

Target branch: this PR is opened against NVIDIA/NeMo-text-processing:main. Section 11 requires it to target the designated training/review branch, not main.
DCO sign-off: neither commit contains a Signed-off-by: line. The -s flag is required or the DCO check will fail. Please amend or rebase with sign-off and force-push.
Commit message: please use the specified format, feat(ta): add cardinal ITN tagger, verbalizer and test cases.
Remove the stale # TODO instruction comments in the file-level comments.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No comment needed. Correctly empty package marker .

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matches the spec table (1–9) exactly.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 → சுழியம் matches the spec, so this is correct for the exercise.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 18 rows (10–20 + round tens) match the spec exactly.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall correct; both TODOs are implemented properly. Inline notes:
On the three string_file(...).invert() lines:
-TODO 1 is implemented correctly. .invert() is applied to all three sources, which is required because the TSV files map number to word while ITN needs word to number.

On the # TODO 1: add .invert()... comment:
-This instruction comment is now stale since the line is complete. Please remove it.

On graph = graph_digit | graph_zero | graph_teens_and_ties:
-TODO 2 is correct and appropriate for the core scope. Numbers in the 21–99 range and hundreds would require place-value composition, which is the Section 9 stretch goal and is not expected here.

On the # TODO 2: Combine them... comment:
-Stale instruction comment; please remove.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall correct. Inline notes:
On + pynini.closure(NEMO_NOT_QUOTE, 1):
-TODO 3 is correct. Matching one or more non-quote characters correctly captures the digit value between the quotes.

On the # TODO 3: keep the digits... comment:
-Stale instruction comment; please remove now that the line is complete.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from the Hindi folder as instructed, which is correct for the exercise. Minor observation: there is a stray from pynini.lib import pynutil in the middle of the file that is not used, it is not harmful but can be removed for clean code.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the Hindi helper copied over, which is exactly what Section 5 instructs, so no change is required for the exercise.

Comment thread run_cardinal_tests.py
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matches the specification's checker script, and the root location is what Section 8 requires. No change needed.

Comment thread test_cases_cardinal.txt
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 28 cases match Section 8 exactly and use the correct input~expected format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants