Skip to content
View awinml's full-sized avatar

Organizations

@avnlp

Block or report awinml

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
awinml/README.md

Ashwin Mathur

AI Engineer · Agentic RAG & Reranking · LLM Fine-Tuning & RL · Domain-Specific AI

LinkedInEmail

I work on LLM systems for domain-specific applications in Finance, Bio-Medical, and Legal AI, spanning retrieval, agents and model training. I've contributed to Haystack, MTEB, HuggingFace, and scikit-learn, and co-authored MMTEB, published at ICLR 2025. Developing open-source AI at AVNLP.

Developing Open-Source AI @ AVNLP

LLM Training & RL Alignment

Repository Description
BioThink Self-Reflective Bio-Medical QA training with QLoRA + GRPO to generate structured self-reflection tokens using six reward functions; evaluated across seven metrics via LLM-as-a-Judge.
RAG Model Training Fine-tuning LLMs for Adaptive-RAG, Corrective RAG, RQ-RAG, Self-RAG, Agentic RAG, and ReZero via SFT and GRPO across finance, biomedical, and open-domain QA.
GRPO Four GRPO implementations comparing format/correctness rewards, DeepSpeed vs. PyTorch training, frozen/server/periodic reference models, and vLLM vs. Transformers rollout generation.
LLM Finetuning SFT, DPO, KTO, ORPO, PPO, and GRPO pipelines with QLoRA/LoRA/DoRA/P-Tuning/Prefix-Tuning adapter training across ARC, FactScore, TriviaQA, PopQA, Earnings Calls, and GSM8K.

Retrieval Augmented Generation and Agents

Repository Description
RAG Pipelines Domain-specific RAG pipelines combining LangGraph orchestration, BAML structured generation, Milvus Hybrid Search, 3-layer metadata enrichment, and instruction-following rerankers for Medical and Financial QA.
DSPy Optimizers DSPy RAG optimization with Weaviate Hybrid Search, Query Rewriting, Sub-Query Decomposition using MIPROv2/COPRO/BootstrapFewShot optimizers on FreshQA, HotpotQA, TriviaQA, and PubMedQA.
VectorDB Haystack and LangChain retrieval pipelines spanning Dense/Sparse/Hybrid search, Reranking, Parent-Child Retrieval, Query Enhancement, and Multi-Tenancy across Pinecone, Weaviate, Milvus, Qdrant, and Chroma.

Information Retrieval & Ranking

Repository Description
LLM Rankers LLM rankers using Pairwise, Setwise, and Listwise techniques with RankZephyr/RankLlama, Pydantic-validated structured generation, and efficient zero-shot sorting.
Pairwise Ranking Prompting Zero-shot pairwise reranking with All-Pairs, Heapsort, and Sliding-K strategies, using bidirectional comparison for position-bias mitigation and Pydantic-validated outputs.
Reciprocal Rank Fusion and LLM Rankers Hybrid retrieval combining Reciprocal Rank Fusion with Diversity, Lost-in-the-Middle, and Similarity rankers, evaluated on BEIR (NDCG, MAP, Recall, Precision).
LLM Blender LLM ensembling framework using PairRanker for cross-attention candidate ranking and GenFuser for top-K output fusion, packaged as a Haystack component.

Open-Source Contributions

  • Haystack - Built the Haystack evaluation framework (eval, EvaluationResult, calculate_metrics) and four metrics (EM, F1, SAS, MRR); added HuggingFace TEI Embedders and a sentence-transformer Diversity Ranker.
  • MTEB - Added the complete LegalBench Benchmark (160+ legal classification and retrieval datasets) and four Japanese benchmarks (JMTEB Clustering, JSICK, JaGovFaqs, NLPJournal).
  • Haystack Core Integrations - Implemented INSTRUCTOR Embedders, Optimum Embedders (ONNX runtime), Llama.cpp Generator, Pinecone Document Store, and Cohere V3 Embed model support.
  • HuggingFace Transformers, Evaluate - BioGPTForSequenceClassification and Trainer-free ViT pre-training scripts in Transformers; scikit-learn integration guides in Evaluate.
  • scikit-learn, imbalanced-learn - Three core scikit-learn features: OOB fitted scores for Gradient Boosting, sparse-matrix support for silhouette_samples, and multiclass average_precision_score.
  • voyage-embedders-haystack - Full Haystack integration for Voyage AI: text/document embedders, reranker, multimodal embeddings, and contextualized chunk embeddings; published on PyPI.

Publications

MMTEB: Massive Multilingual Text Embedding Benchmark (ICLR 2025)

Largest multilingual text embedding benchmark: 500+ tasks across 250+ languages and 10 task categories. Contributed the complete LegalBench suite - 160+ legal domain classification and retrieval datasets.

Pinned Loading

  1. avnlp/biothink avnlp/biothink Public

    Self-Reflective Question Answering for Biomedical Reasoning

    Python 5 1

  2. avnlp/agentic-med-diag avnlp/agentic-med-diag Public

    Agentic Graph RAG for Medical Diagnosis

    2

  3. avnlp/llm-finetuning avnlp/llm-finetuning Public

    Pipelines for Fine-Tuning LLMs using SFT and RLHF

    Python 6 3

  4. avnlp/rag-model-training avnlp/rag-model-training Public

    Training code for advanced RAG techniques - Adaptive-RAG, Corrective RAG, RQ-RAG, Self-RAG, Agentic RAG, and ReZero. Reproduces paper methodologies to fine-tune LLMs via SFT and GRPO for adaptive r…

    Python 8 2

  5. avnlp/dspy-opt avnlp/dspy-opt Public

    Advanced RAG pipeline optimization framework using DSPy. Implements modular RAG pipelines with Query-Rewriting, Sub-Query Decomposition, and Hybrid Search via Weaviate. Automates prompt tuning and …

    Python 7 1

  6. avnlp/rag-pipelines avnlp/rag-pipelines Public

    Advanced RAG Pipelines and Evaluation

    Python 12 1