AI Engineer · Agentic RAG & Reranking · LLM Fine-Tuning & RL · Domain-Specific AI
I work on LLM systems for domain-specific applications in Finance, Bio-Medical, and Legal AI, spanning retrieval, agents and model training. I've contributed to Haystack, MTEB, HuggingFace, and scikit-learn, and co-authored MMTEB, published at ICLR 2025. Developing open-source AI at AVNLP.
Developing Open-Source AI @ AVNLP
| Repository | Description |
|---|---|
| BioThink | Self-Reflective Bio-Medical QA training with QLoRA + GRPO to generate structured self-reflection tokens using six reward functions; evaluated across seven metrics via LLM-as-a-Judge. |
| RAG Model Training | Fine-tuning LLMs for Adaptive-RAG, Corrective RAG, RQ-RAG, Self-RAG, Agentic RAG, and ReZero via SFT and GRPO across finance, biomedical, and open-domain QA. |
| GRPO | Four GRPO implementations comparing format/correctness rewards, DeepSpeed vs. PyTorch training, frozen/server/periodic reference models, and vLLM vs. Transformers rollout generation. |
| LLM Finetuning | SFT, DPO, KTO, ORPO, PPO, and GRPO pipelines with QLoRA/LoRA/DoRA/P-Tuning/Prefix-Tuning adapter training across ARC, FactScore, TriviaQA, PopQA, Earnings Calls, and GSM8K. |
| Repository | Description |
|---|---|
| RAG Pipelines | Domain-specific RAG pipelines combining LangGraph orchestration, BAML structured generation, Milvus Hybrid Search, 3-layer metadata enrichment, and instruction-following rerankers for Medical and Financial QA. |
| DSPy Optimizers | DSPy RAG optimization with Weaviate Hybrid Search, Query Rewriting, Sub-Query Decomposition using MIPROv2/COPRO/BootstrapFewShot optimizers on FreshQA, HotpotQA, TriviaQA, and PubMedQA. |
| VectorDB | Haystack and LangChain retrieval pipelines spanning Dense/Sparse/Hybrid search, Reranking, Parent-Child Retrieval, Query Enhancement, and Multi-Tenancy across Pinecone, Weaviate, Milvus, Qdrant, and Chroma. |
| Repository | Description |
|---|---|
| LLM Rankers | LLM rankers using Pairwise, Setwise, and Listwise techniques with RankZephyr/RankLlama, Pydantic-validated structured generation, and efficient zero-shot sorting. |
| Pairwise Ranking Prompting | Zero-shot pairwise reranking with All-Pairs, Heapsort, and Sliding-K strategies, using bidirectional comparison for position-bias mitigation and Pydantic-validated outputs. |
| Reciprocal Rank Fusion and LLM Rankers | Hybrid retrieval combining Reciprocal Rank Fusion with Diversity, Lost-in-the-Middle, and Similarity rankers, evaluated on BEIR (NDCG, MAP, Recall, Precision). |
| LLM Blender | LLM ensembling framework using PairRanker for cross-attention candidate ranking and GenFuser for top-K output fusion, packaged as a Haystack component. |
- Haystack - Built the Haystack evaluation framework (
eval,EvaluationResult,calculate_metrics) and four metrics (EM, F1, SAS, MRR); added HuggingFace TEI Embedders and a sentence-transformer Diversity Ranker. - MTEB - Added the complete LegalBench Benchmark (160+ legal classification and retrieval datasets) and four Japanese benchmarks (JMTEB Clustering, JSICK, JaGovFaqs, NLPJournal).
- Haystack Core Integrations - Implemented INSTRUCTOR Embedders, Optimum Embedders (ONNX runtime), Llama.cpp Generator, Pinecone Document Store, and Cohere V3 Embed model support.
- HuggingFace Transformers, Evaluate -
BioGPTForSequenceClassificationand Trainer-free ViT pre-training scripts in Transformers; scikit-learn integration guides in Evaluate. - scikit-learn, imbalanced-learn - Three core scikit-learn features: OOB fitted scores for Gradient Boosting, sparse-matrix support for
silhouette_samples, and multiclassaverage_precision_score. - voyage-embedders-haystack - Full Haystack integration for Voyage AI: text/document embedders, reranker, multimodal embeddings, and contextualized chunk embeddings; published on PyPI.
MMTEB: Massive Multilingual Text Embedding Benchmark (ICLR 2025)
Largest multilingual text embedding benchmark: 500+ tasks across 250+ languages and 10 task categories. Contributed the complete LegalBench suite - 160+ legal domain classification and retrieval datasets.




