Skip to content

ZJU-DAILY/INSIDE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

INSIDE: Internalization-aware LLM Serving In Dual-Speed Edge-Cloud Cache Evolution

INSIDE is an edge-cloud collaborative inference framework that enables small language models on resource-constrained edge devices to continuously internalize knowledge through a dual-path learning mechanism, achieving near-cloud accuracy at a fraction of the cost.


Project Structure

INSIDE/
├── codes/
│   ├── core/                        
│   │   ├── index.py                 
│   │   ├── router.py                
│   │   ├── retriever.py             
│   │   ├── assembler.py             
│   │   ├── learner.py               
│   │   ├── cloud_client.py         
│   │   ├── gemini_client.py         
│   │   ├── sql_prompt_generator.py  
│   │   └── pipeline.py             
│   │
│   ├── test/                        
│   │   ├── run_experiment.py        
│   │   ├── debug_experiment.py      
│   │   ├── generate_cloud_cache.py  
│   │   ├── generate_popqa_hotspot.py       
│   │   ├── repair_popqa_hotspot.py         
│   │   └── sql_prompt_generator.py  
│   │
│   └── utils/                       
│       ├── analyze_cloud_cache.py   
│       ├── count_cache_tokens.py    
│       ├── count_tokens.py          
│       ├── inspect_data.py          
│       ├── print_index.py           
│       ├── rejudge_prediction_dump.py  
│       └── tojsonl.py               
│
└── data/                        
   └── popqa/
        ├── test.tsv  
        └── popqa_hotspot.jsonl   
        


🌍 Datasets and Tasks

We evaluate INSIDE across diverse workloads. As summarized in the following table, the tasks span multiple representative domains, including general question answering, long-form QA, mathematical reasoning, and structured code generation (Text-to-SQL).

Task Dataset # training samples # test samples Description
General QA MS MARCO 808,731 101,093 Large-scale QA derived from Bing search logs
General QA GooAQ 3,112,679 2,500 Large-scale QA mined from Google search logs
General QA PopQA 11,267 3,000 QA benchmark focused on long-tail entities
General QA PopQA_Hotspot 70,000 10,765 Synthetic benchmark reflecting realistic hotspot workloads
Long-form QA ELI5 216,147 10,000 Long-form QA dataset to evaluate token overhead
Math Problem Solving GSM8K 7,473 1,319 Grade-school math word problems
Text-to-SQL Spider 7,000 1,034 Cross-domain semantic parsing and Text-to-SQL

📊 Baselines

We evaluate INSIDE against a comprehensive suite of representative retrieval, routing, and caching systems. Besides, we compare our framework with pure edge/cloud execution strategies. The baselines are shown below:

Baseline Year Conference / Journal Paper
Self-RAG 2024 ICLR Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
RouteLLM 2025 ICLR RouteLLM: Learning to Route LLMs with Preference Data
GPTCache 2023 NLP-OSS GPTCache: An Open-Source Semantic Cache for LLM Applications
IC-Cache 2025 SOSP IC-Cache: Efficient Large Language Model Serving via In-Context Caching
All-Edge - - All queries are processed locally by Qwen2.5-7B without retrieval
All-ICL - - All queries use retrieval-augmented few-shot prompting on the edge
All-Cloud - - All queries are processed by the cloud LLM (DeepSeek-V3.2)

Quick Start

Step 1: Environment Setup

Make sure you have Python 3.10+ and CUDA (for GPU acceleration) installed, then install the required dependencies:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install transformers>=4.36 peft numpy spacy openai tqdm datasets
python -m spacy download en_core_web_sm

Step 2: Download the Model

Download Qwen2.5-7B-Instruct from HuggingFace and place it under the models/ directory.

Step 3: Download the Datasets

Download datasets and place them under data/.

Step 4: Configure the DeepSeek API Key

Open codes/core/cloud_client.py and set your DeepSeek API key.

Step 5: Configure and Run Experiments

Open codes/test/run_experiment.py and adjust the experiment configuration at the top of the file:

CURRENT_DATASET = "popqa_hotspot"              # Dataset: popqa_hotspot, MS, gooaq, eli5, gsm8k, spider
TEST_DAYS = 1                       # Number of simulation days
SAMPLES_PER_DAY = 1000              # Test samples per day
INIT_INDEX_SIZE = 70000             # Initial index size from training data

Then run the experiment:

cd codes/test
python run_experiment.py

Note: On first run, the system will build the cluster index from the training data, which may take a while. The built index is automatically saved to saved_indices/ and will be reused on subsequent runs.

Step 6: Check Results

After the experiment completes, results are stored in the log/ directory.

About

Codes and datasets for paper :"INSIDE: Internalization-aware LLM Serving In Dual-Speed Edge-Cloud Cache Evolution"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages