TY - JOUR
T1 - Quantum many-body physics calculations with large language models
AU - Pan, Haining
AU - Mudur, Nayantara
AU - Taranto, William
AU - Tikhanovskaya, Maria
AU - Venugopalan, Subhashini
AU - Bahri, Yasaman
AU - Brenner, Michael P.
AU - Kim, Eun Ah
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/12
Y1 - 2025/12
N2 - Large language models (LLMs) have demonstrated abilities to perform complex tasks in multiple domains, including mathematical and scientific reasoning. We demonstrate that with carefully designed prompts, LLMs can accurately carry out key calculations in research papers in theoretical physics. We focus on a broadly-used approximation method in quantum physics: the Hartree-Fock method, requiring an analytic multi-step calculation deriving approximate Hamiltonian and corresponding self-consistency equations. To carry out the calculations using LLMs, we design multi-step prompt templates that break down the analytic calculation into standardized steps with placeholders for problem-specific information. We evaluate GPT-4’s performance in executing the calculation for 15 papers from the past decade, demonstrating that, with the correction of intermediate steps, it can correctly derive the final Hartree-Fock Hamiltonian in 13 cases. Aggregating across all research papers, we find an average score of 87.5 (out of 100) on the execution of individual calculation steps. We further use LLMs to mitigate the two primary bottlenecks in this evaluation process: (i) extracting information from papers to fill in templates and (ii) automatic scoring of the calculation steps, demonstrating good results in both cases.
AB - Large language models (LLMs) have demonstrated abilities to perform complex tasks in multiple domains, including mathematical and scientific reasoning. We demonstrate that with carefully designed prompts, LLMs can accurately carry out key calculations in research papers in theoretical physics. We focus on a broadly-used approximation method in quantum physics: the Hartree-Fock method, requiring an analytic multi-step calculation deriving approximate Hamiltonian and corresponding self-consistency equations. To carry out the calculations using LLMs, we design multi-step prompt templates that break down the analytic calculation into standardized steps with placeholders for problem-specific information. We evaluate GPT-4’s performance in executing the calculation for 15 papers from the past decade, demonstrating that, with the correction of intermediate steps, it can correctly derive the final Hartree-Fock Hamiltonian in 13 cases. Aggregating across all research papers, we find an average score of 87.5 (out of 100) on the execution of individual calculation steps. We further use LLMs to mitigate the two primary bottlenecks in this evaluation process: (i) extracting information from papers to fill in templates and (ii) automatic scoring of the calculation steps, demonstrating good results in both cases.
UR - https://www.scopus.com/pages/publications/85218494840
U2 - 10.1038/s42005-025-01956-y
DO - 10.1038/s42005-025-01956-y
M3 - Article
AN - SCOPUS:85218494840
SN - 2399-3650
VL - 8
JO - Communications Physics
JF - Communications Physics
IS - 1
M1 - 49
ER -