Rank | Model | LLM | Corpus | Retriever | MMLU-Med (%) | MedQA-US (%) | MedMCQA (%) | PubMedQA* (%) | BioASQ-Y/N (%) | Average |
---|---|---|---|---|---|---|---|---|---|---|
0 |
GPT-4 (MedRAG) UVa & NIH (Xiong et al., 2024) |
GPT-4-32k-0613 | MedCorp | RRF-4 | 87.24 | 82.80 | 66.65 | 70.60 | 92.56 | 79.97 |
0 |
Llama-3 (MedRAG) UVa & NIH (Xiong et al., 2024) |
Llama-3-70B | MedCorp | RRF-4 | 85.58 | 76.90 | 68.87 | 71.60 | 89.97 | 78.59 |
0 |
Llama-3 (CoT) Meta (Meta et al., 2024) |
Llama-3-70B | -- | -- | 85.77 | 80.91 | 70.93 | 59.00 | 83.01 | 75.92 |
0 |
GPT-4 (CoT) OpenAI (OpenAI et al., 2023) |
GPT-4-32k-0613 | -- | -- | 89.44 | 83.97 | 69.88 | 39.60 | 84.30 | 73.44 |
0 |
GPT-3.5 (MedRAG) UVa & NIH (Xiong et al., 2024) |
GPT-3.5-16k-0613 | MedCorp | RRF-4 | 75.48 | 66.61 | 58.04 | 67.40 | 90.29 | 71.57 |
0 |
Gemini-1.0-Pro (MedRAG) UVa & NIH (Xiong et al., 2024) |
Gemini-1.0-Pro | MedCorp | RRF-4 | 73.65 | 61.90 | 59.65 | 74.60 | 86.89 | 71.34 |
0 |
Mixtral (MedRAG) UVa & NIH (Xiong et al., 2024) |
Mixtral-8x7B | MedCorp | RRF-4 | 75.85 | 60.02 | 56.42 | 67.60 | 87.54 | 69.48 |
0 |
Gemini-1.0-Pro (CoT) (Google et al., 2024) |
Gemini-1.0-Pro | -- | -- | 72.54 | 60.49 | 55.44 | 46.40 | 76.86 | 62.35 |
0 |
Mixtral (CoT) Mistral AI (Jiang et al., 2024) |
Mixtral-8x7B | -- | -- | 74.01 | 64.10 | 56.28 | 35.20 | 77.51 | 61.42 |
0 |
GPT-3.5 (CoT) OpenAI (Brown et al., 2020) |
GPT-3.5-16k-0613 | -- | -- | 72.91 | 65.04 | 55.25 | 36.00 | 74.27 | 60.69 |
0 |
MEDITRON (MedRAG) UVa & NIH (Xiong et al., 2024) |
MEDITRON-70B | MedCorp | RRF-4 | 65.38 | 49.57 | 52.67 | 56.40 | 76.86 | 60.18 |
0 |
MEDITRON (CoT) EPFL (Chen et al., 2023) |
MEDITRON-70B | -- | -- | 64.92 | 51.69 | 46.74 | 53.40 | 68.45 | 57.04 |
0 |
Llama-2 (MedRAG) UVa & NIH (Xiong et al., 2024) |
Llama-2-70B | MedCorp | RRF-4 | 54.55 | 44.93 | 43.08 | 50.40 | 73.95 | 53.38 |
0 |
PMC-LLaMA (MedRAG) UVa & NIH (Xiong et al., 2024) |
PMC-LLaMA-13B | MedCorp | RRF-4 | 52.53 | 42.58 | 48.29 | 56.00 | 65.21 | 52.92 |
0 |
PMC-LLaMA (CoT) SJTU (Wu et al., 2023) |
PMC-LLaMA-13B | -- | -- | 52.16 | 44.38 | 46.55 | 55.80 | 63.11 | 52.40 |
0 |
Llama-2 (CoT) Meta (Touvron et al., 2023) |
Llama-2-70B | -- | -- | 57.39 | 47.84 | 42.60 | 42.20 | 61.17 | 50.24 |