A Python tool that transcribes Chinese audio files and formats them into HSK exercise materials with pinyin and English translations.
- Python 3.7+
- OpenAI API key
uvpackage manageropenaiPython packagejustcommand runner (for batch processing)
- Install the required package:
uv add openai- Set up your OpenAI API key:
export OPENAI_API_KEY="your-api-key-here"Basic usage:
uv run index.py path/to/your/audio.mp3With custom model:
uv run index.py path/to/your/audio.mp3 --model gpt-4o-miniFor processing multiple MP3 files in a folder, use the batch transcription script:
just transcribe <folder_path> [parallelism]Examples:
# Process all MP3 files in ~/Desktop/hsk3 with 4 parallel jobs (default)
just transcribe ~/Desktop/hsk3
# Process with 8 parallel jobs for faster processing
just transcribe ~/Desktop/hsk3 8The batch script will:
- Process all MP3 files in the specified folder
- Only process files with names greater than 8 (filtering logic)
- Create a
transcribe/directory with corresponding.txtoutput files - Run multiple transcriptions in parallel for efficiency
mp3_file_path(required): Path to the MP3 file to transcribe--model(optional): OpenAI model to use for processing (default: gpt-4o)
folder_path(required): Path to folder containing MP3 filesparallelism(optional): Number of parallel jobs (default: 4)
The tool generates structured output in the following format:
== Part number: ... ==
# Exercise number: ..
## Question number:
- Pinyin: ... (with proper tones and capitalization)
- Simplified chinese: ...
- Translation in english
uv run index.py lesson1.mp3Output:
== Part 1: Listening Comprehension ==
# Exercise 1: Daily Conversations
## Question 1:
- Pinyin: Nǐ hǎo, nǐ jiào shén me míng zi?
- Simplified chinese: 你好,你叫什么名字?
- Translation in english: Hello, what is your name?
ffmpeg -i 10.mp3 -af silencedetect=noise=-30dB:d=5 -f null - 2>&1 | grep silence_end | awk 'NR==6' | awk -F ' ' '{print int($5)-int(8)+1}' | xargs -I {} ffmpeg -i 10.mp3 -ss 00:00:00 -t {} output.mp3
Will stop just before the silence number 6 of the file 10.mp3
for f in *.txt; do
paps --paper=a4 --font="Noto Sans CJK SC 11" "$f" | ps2pdf - "${f%.txt}.pdf"
done
This project is licensed under the MIT License - see the LICENSE file for details.