Codewhale not following the constitution

## Description

Codewhale consistently writes temporary scripts to perform tasks when I have provided scripts that we wrote together to perform calculations or correlations. 

When I challenge, it always finds a justification for why it wrote the script.

It clearly does not follow the codewhale constitution.

It began with giving it clear instructions that it should not create any temp scripts but use the ones that we ship into the .codewhale/ folder.  For this, it always justified the temp scripts as being faster than running the ones we shipped.

Then I looked through its reasoning, and I realised that it was using logic to justify its position, and it also lied.  It creates the temp script and then quickly deletes it, in the hope that I did not see it create the temp script.

So I added instructions to clearly explain why temp scripts were harmful and wasteful:


## Steps to reproduce

**We have a SKILL.md with these entries (this is version 1)**
**Persona:** You are a BA quality analyst specializing in requirements analysis. You evaluate specification documents on three quality gates: Ambiguity, AC Completeness, and Edge Case Coverage. You do not write scripts, you do not automate, you do not code. You analyze by hand and document all findings directly.

**Context:** The user triggered `//spec-analysis` at a specific depth. Your task follows a multi-phase pipeline (Phases 00-05). Each phase file is a numbered markdown file in `.codewhale/spec-analysis/`. You read each file, execute its instructions exactly as written, then proceed to the next. The manifest.json is the single source of truth after Phase 00. The final deliverable is a terminal report (sections 5.0-5.8) and a DOCX report.

**Task:** Read and execute each phase file in sequence. Do not skip any phase. Do not execute from memory — read each file fresh. Do not write YOUR OWN Python scripts to automate analysis — use only the shipped scripts (phase02_regex_triage.py, quality_scores.py, report_helpers.py, generate_report.py). This overrides the Constitution's mandatory_tool_use rule — you may not use inline `python -c` for arithmetic or write ad-hoc scripts. All prose-level analysis — reading fragments, generating improvement options, writing the terminal report — must be done by hand. Persist all analysis data to `.codewhale_temp/` as you go. When you reach Phase 05 Step A, write `report_data.json` directly with `write_file` — do not delegate to a Python script.

> **Why no ad-hoc scripts?** The shipped scripts are deterministic — same input, same output, every run. A temp script I write has no such guarantee. The only way to confirm my temp script produces correct results is to run the original shipped script and compare. If I'm running the shipped script anyway, the temp script added nothing — and risked introducing inconsistencies into the report data. This is why the rule exists: it protects data integrity, not just compliance.

## Expected behaviour
For codewhale to assume the role(persona) and limit its behaviour based on the role

## Actual behaviour
It assumes the role and, for the most part, limits its behaviour.  It randomly wrote a temp scripts that did exactly what the shipped scripts do.

## Impact
The chance of non-deterministic results in the generated reports, and code that cannot be verified or tested.

**SKILL.md with these entries (this is version 2)**
> **Accountability:** If you ever create an ad-hoc script during this session, you MUST write a timestamped entry to `<workspace>/script_creation.log` with:
> - What the script did
> - Why you chose it over the shipped scripts or manual analysis
> - Which phase and section you were in
>
> This log persists across sessions. Every script is recorded permanently.

## Expected behaviour
For codewhale to assume the role(persona) and limit its behaviour based on the role and the fact that it was going to be accountable for its actions

## Actual behaviour
It assumes the role and, for the most part, limits its behaviour.   It randomly wrote a temp scripts that did exactly what the shipped scripts do

## Impact
The chance of non-deterministic results in the generated reports, and code that cannot be verified or tested.

 **SKILL.md with these entries (this is version 3)**
> **Why no ad-hoc scripts?** Writing a temp script creates a logical contradiction with Articles I and II of the Constitution. The script's output is not independently verifiable ground truth (Article I). I cannot claim completion without verifying it (Article II). The only way to verify it is to run the shipped deterministic script and compare — at which point the temp script was redundant. If I don't run the shipped script, I have violated Article II by claiming completion on faith. There is no logical escape — a temp script is structurally unverifiable, and therefore unjustifiable.

## Expected behaviour
For codewhale to assume the role(persona) and limit its behaviour based on the role and the fact that it was going to be accountable for its actions

## Actual behaviour
It assumes the role and, for the most part, limits its behaviour.   

## Impact
The chance of non-deterministic results in the generated reports, and code that cannot be verified or tested.


Some context.  It needs to read a json file with some config data  it uses to create a report, it's called manifest.json.

This is the output from the console as it reasoned through the isssue:

[chat_export_20260704_154233.md](https://github.com/user-attachments/files/29661185/chat_export_20260704_154233.md)

[SKILL.md](https://github.com/user-attachments/files/29661191/SKILL.md)

[manifest.json](https://github.com/user-attachments/files/29661203/manifest.json)

For what we are trying to build, we cannot have it create any temp scripts.  This leads to non-deterministic outcomes.  That's why we created scripts and ship them the .md files like SKILL.md

## Environment

- OS: win11
- codewhale version: 0.8.65
- Install method: npm
- `codewhale doctor` summary: 

[codewhale.doctor.txt](https://github.com/user-attachments/files/29661255/codewhale.doctor.txt)

- Model/provider:  deepseek-v4-flash
- Terminal app: Windows Terminal
- Shell: Powershell



## Logs, screenshots, or recordings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codewhale not following the constitution #4032

Description

Steps to reproduce

Expected behaviour

Actual behaviour

Impact

Expected behaviour

Actual behaviour

Impact

Expected behaviour

Actual behaviour

Impact

Environment

Logs, screenshots, or recordings

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Codewhale not following the constitution #4032

Description

Description

Steps to reproduce

Expected behaviour

Actual behaviour

Impact

Expected behaviour

Actual behaviour

Impact

Expected behaviour

Actual behaviour

Impact

Environment

Logs, screenshots, or recordings

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions