Code for the paper "A Two-Method Framework for Aligning Medical Terminologies", presented at SeWebMeDA-2026, May 10, 2026, Dubrovnik, Croatia.
Ostaszewski S., Kılıç Ö.D., Erol E.E., Dumontier M., Celebi R. (2026). A Two-Method Framework for Aligning Medical Terminologies. SeWebMeDA-2026.
This framework aligns disparate medical terminologies — CPT (procedures), NDC (medications), and ICD-9 (diagnoses) — to SNOMED CT. It combines expert-curated reference sets with two algorithmic strategies:
- Setting A — Missing translation imputation (CPT & NDC): Uses medBERT embeddings and a Logistic Regression classifier to predict SNOMED CT matches for unmapped codes based on textual similarity.
- Setting B — Context-aware ranking (ICD-9): Resolves 1-to-many mappings by ranking SNOMED CT candidates using Node2Vec graph embeddings of the SNOMED CT hierarchy, weighted by the patient's clinical context from their EHR.
Each aligned code is assigned a confidence score (κ) that reflects how decisively the patient's context supports the chosen translation.
.
├── app.py # FastAPI service exposing /align and /align_graph endpoints
├── RankingModel.py # Main ranking model (Setting B)
├── ContextExtractor.py # Extracts patient context codes from an RDF graph
├── embedding_decoder.py # Loads and queries embeddings
├── CodeTreeNode.py # SNOMED CT hierarchy utilities
├── models.py # Pydantic request/response models
├── utils.py # Shared helpers
├── mappers/
│ ├── cpt_snomed_mapper.py
│ ├── icd_snomed_mapper.py
│ ├── loinc_snomed_mapper.py
│ └── ndc_snomed_mapper.py
└── tests/
Requires Python ≥ 3.10. Dependencies are managed with uv.
uv syncuv run uvicorn app:app --host 0.0.0.0 --port 8000Or with Docker:
docker build -t entity-alignment .
docker run -p 8000:8000 entity-alignmentEnvironment variables (.env or Docker args):
| Variable | Description | Default |
|---|---|---|
SPARQL_ENDPOINT |
SPARQL endpoint for patient graphs | http://localhost:3030/dataset/sparql |
SPARQL_USERNAME |
SPARQL auth username | |
SPARQL_PASSWORD |
SPARQL auth password | |
DECODER_PATH |
Path to the trained decoder model | |
EMBEDDINGS |
Path to the embeddings pickle file |
POST /align — Fetch patient graph from SPARQL and return ranked SNOMED CT suggestions.
POST /align_graph — Same, but accepts an RDF graph payload directly.
Both endpoints return a ranked list of SNOMED CT URIs with confidence scores.
| Method | Set 1 Hits@1 | Set 1 Hits@5 |
|---|---|---|
| medBERT text only | 0.06 – 0.12 | 0.53 – 0.63 |
| Shortest SNOMED path | 0.44 – 0.52 | 0.82 – 0.88 |
| Node2Vec (ours) | 0.37 – 0.45 | 0.82 – 0.88 |
Node2Vec matches the accuracy of exact shortest-path computation while being ~480× faster (1,000 rankings/second vs. 8 minutes).
This work is supported by the Horizon Europe Framework Program under Grant Agreement No. 101057062 (AIDAVA). SNOMED CT licenses are handled via this project.