Tamil ITN Cardinal Grammar - Dummy PR#430
Conversation
for more information, see https://pre-commit.ci
mayuris-00
left a comment
There was a problem hiding this comment.
The core exercise is complete and correct: the folder structure, data files, and all three TODOs are implemented properly, and the 28 core test cases should pass. The following items should be addressed before review sign-off:
Target branch: this PR is opened against NVIDIA/NeMo-text-processing:main. Section 11 requires it to target the designated training/review branch, not main.
DCO sign-off: neither commit contains a Signed-off-by: line. The -s flag is required or the DCO check will fail. Please amend or rebase with sign-off and force-push.
Commit message: please use the specified format, feat(ta): add cardinal ITN tagger, verbalizer and test cases.
Remove the stale # TODO instruction comments in the file-level comments.
There was a problem hiding this comment.
No comment needed. Correctly empty package marker .
There was a problem hiding this comment.
Matches the spec table (1–9) exactly.
There was a problem hiding this comment.
0 → சுழியம் matches the spec, so this is correct for the exercise.
There was a problem hiding this comment.
All 18 rows (10–20 + round tens) match the spec exactly.
There was a problem hiding this comment.
Overall correct; both TODOs are implemented properly. Inline notes:
On the three string_file(...).invert() lines:
-TODO 1 is implemented correctly. .invert() is applied to all three sources, which is required because the TSV files map number to word while ITN needs word to number.
On the # TODO 1: add .invert()... comment:
-This instruction comment is now stale since the line is complete. Please remove it.
On graph = graph_digit | graph_zero | graph_teens_and_ties:
-TODO 2 is correct and appropriate for the core scope. Numbers in the 21–99 range and hundreds would require place-value composition, which is the Section 9 stretch goal and is not expected here.
On the # TODO 2: Combine them... comment:
-Stale instruction comment; please remove.
There was a problem hiding this comment.
Overall correct. Inline notes:
On + pynini.closure(NEMO_NOT_QUOTE, 1):
-TODO 3 is correct. Matching one or more non-quote characters correctly captures the digit value between the quotes.
On the # TODO 3: keep the digits... comment:
-Stale instruction comment; please remove now that the line is complete.
There was a problem hiding this comment.
Copied from the Hindi folder as instructed, which is correct for the exercise. Minor observation: there is a stray from pynini.lib import pynutil in the middle of the file that is not used, it is not harmful but can be removed for clean code.
There was a problem hiding this comment.
This is the Hindi helper copied over, which is exactly what Section 5 instructs, so no change is required for the exercise.
There was a problem hiding this comment.
Matches the specification's checker script, and the root location is what Section 8 requires. No change needed.
There was a problem hiding this comment.
All 28 cases match Section 8 exactly and use the correct input~expected format.
Language: Tamil
Task: ITN
Dummy PR created as requested.