Skip to content

biancaraimondi/CompMath-MCQ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CompMath-MCQ

Are LLMs Ready for Higher-Level Math?

arXiv Dataset on HF License: CC

CompMath-MCQ is a benchmark dataset of 1,528 multiple-choice questions designed to evaluate LLMs on graduate-level computational mathematics. All questions were originally authored by university professors and are not sourced from existing textbooks or online repositories, ensuring zero data leakage.

Each question provides 3 answer choices with exactly one correct answer, enabling fully automatic and deterministic evaluation via the lm_eval library.

Topics

Topic Description
Linear Algebra Matrix norms, eigenvalues, definiteness, decompositions
Numerical Optimization Convergence, gradient methods, constrained optimization
Vector Calculus Gradients, divergence, Jacobians, integral theorems
Probability Distributions, expectation, conditional probability, Bayes
Python NumPy, SciPy, scientific computing idioms

Quick Start

Load from Hugging Face

The easiest way to use CompMath-MCQ is to load it directly from Hugging Face Datasets:

from datasets import load_dataset
 
dataset = load_dataset("biancaraimondi/CompMath-MCQ", split="test")
 
# Browse a sample
print(dataset[0])
# {
#   'question': 'Given the matrix A = ..., compute the 2-norm and 1-norm of A.',
#   'options': ['\\(\\|A\\|_2 = 4,\\ \\|A\\|_1 = 4\\)', ...],
#   'correct_label': 0,
#   'subtopic': 'Linear Algebra'
# }

You can also load it with pandas:

import pandas as pd
 
df = pd.read_json(
    "hf://datasets/biancaraimondi/CompMath-MCQ/data.json"
)
print(df.head())
print(df["subtopic"].value_counts())

Dataset Schema

Field Type Description
question string The question text (LaTeX-formatted)
options list[string] 3 answer choices (LaTeX-formatted)
correct_label int Index of the correct answer (0, 1, or 2)
subtopic string One of: Linear Algebra, Numerical Optimization, Vector Calculus, Probability, Python

Evaluation with lm_eval

CompMath-MCQ is designed for plug-and-play evaluation with the Language Model Evaluation Harness.

Setup

# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
 
# Install dependencies
pip install -r requirements.txt

Register the Custom Task

Copy the task files into your lm_eval installation:

# Find your lm_eval tasks directory
TASK_DIR=$(python -c "import lm_eval; import os; print(os.path.join(os.path.dirname(lm_eval.__file__), 'tasks'))")
 
# Create the custom task folder and copy files
mkdir -p "$TASK_DIR/my_custom_task"
cp my_eval_task/mcq_lm_eval_data.jsonl "$TASK_DIR/my_custom_task/"
cp my_eval_task/my_mcq_task.yaml "$TASK_DIR/my_custom_task/"

Run Evaluation

Use the provided script or call lm_eval directly:

# Using the provided script (edit model paths inside first)
bash test_script.sh
 
# Or run directly
lm_eval --model hf \
    --model_args pretrained=meta-llama/Llama-3-8B \
    --tasks my_mcq_task \
    --output_path results/llama3-8b \
    --batch_size auto

Results are saved to results/{model_name}/.

Repository Structure

CompMath-MCQ/
├── README.md
├── requirements.txt
├── test_script.sh              # Evaluation runner script
├── my_eval_task/
│   ├── mcq_lm_eval_data.jsonl  # Dataset in lm_eval format
│   └── my_mcq_task.yaml        # lm_eval task definition
└── ...

Citation

If you use CompMath-MCQ in your research, please cite:

@article{raimondi2026compmath,
  title   = {The CompMath-MCQ Dataset: Are LLMs Ready for Higher-Level Math?},
  author  = {Raimondi, Bianca and Pivi, Francesco and Evangelista, Davide and Gabbrielli, Maurizio},
  journal = {arXiv preprint arXiv:2603.03334},
  year    = {2026}
}

License

This dataset is released under a Creative Commons license. See the Hugging Face dataset card for full details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors