PARROT

About PARROT 🦜

PARROT (Practical And Realistic BenchmaRk for CrOss-System SQL Translation) was created to support the task of Cross-System SQL Translation (i.e., SQL-to-SQL translation), which involves adapting a query written for one database system into its functionally equivalent form for another.

The main dataset comprises 598 translation pairs from 38 open-source benchmarks and real-world business services, specifically prepared to challenge system-specific SQL understanding.

News

Sept. 18, 2025: Our paper "PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation" has been accepted by NeurIPS 2025 ! 🎉 🎉 🎉
May 15, 2025: We have released PARROT-1.0 (28,003 translation pairs from 38 open-source benchmarks for extensive syntax testing) and published the leaderboard.

Surprise from PARROT

We have experimented different LLMs in terms of (1) usage license, (2) parameter scale, and (3) task scope. These LLMs attain an average accuracy below 38.53 %, underscoring the substantial challenges inherent to SQL-to-SQL translation and the pressing need for more advanced techniques.

Email Subscription

Citation

@inproceedings{zhou2025parrot,
  author       = {Wei Zhou and
                  Guoliang Li and
                  Haoyu Wang and
                  Yuxing Han and
                  Xufei Wu and
                  Fan Wu and
                  Xuanhe Zhou},
  title        = {PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation},
  booktitle    = {NeurIPS},
  year         = {2025}
}

@article{zhou2025cracksql,
  author       = {Wei Zhou and
                  Yuyang Gao and
                  Xuanhe Zhou and
                  Guoliang Li},
  title        = {{Cracking SQL Barriers:} {An}  LLM-based Dialect Transaltion System},
  journal      = {Proc. {ACM} Manag. Data},
  volume       = {3},
  number       = {3 (SIGMOD)},
  year         = {2025}
}

@article{zhou2025cracksqldemo,
  author       = {Wei Zhou and
                  Yuyang Gao and
                  Xuanhe Zhou and
                  Guoliang Li},
  title        = {CrackSQL: A Hybrid SQL Dialect Translation System Powered by Large Language Models},
  journal      = {arXiv Preprint},
  url       = {https://arxiv.org/abs/2504.00882},
  year         = {2025}
}

We have publicly released PARROT along with detailed usage instructions. For more details, please visit the GitHub repository. To update the leaderboard, ensure that your paper or resource is publicly accessible and submit a pull request.

Leaderboard - Dialect Compatability (Acc_EX)

Model	Size	Accuracy (%)
Human Performance Translation Tool + Human DBAs		> 90.00
GPT-4o OpenAI	UNK	53.32
DeepSeek-V3 671B DeepSeek	671B	50.64
Claude 3.7 Sonnet Anthropic	UNK	48.09
DeepSeek-R1 671B DeepSeek	671B	44.42
DeepSeek-R1 32B DeepSeek	32B	41.98
o3-mini OpenAI	UNK	27.94
DeepSeek-Coder-V2 Lite DeepSeek	15.7B	24.84
DeepSeek-R1 7B DeepSeek	7B	17.03

Leaderboard - Result Consistency (Acc_RES)

Model	Size	Accuracy (%)
Human Performance Translation Tool + Human DBAs		> 90.00
o3-mini OpenAI	UNK	54.23
o1-preview OpenAI	UNK	48.69
DeepSeek-R1 671B DeepSeek	671B	40.52
DeepSeek-V3 671B DeepSeek	671B	32.65
Doubao 1.5 Pro Thinking Doubao	UNK	25.70
Claude 3.7 Sonnet Anthropic	UNK	22.74
GPT-4o OpenAI	UNK	21.87
DeepSeek-R1 32B DeepSeek	32B	16.91
Doubao 1.5 Pro Doubao	UNK	14.29

Leaderboard (Oracle) - Dialect Compatability (Acc_EX)

Model	Size	Accuracy (%)
Human Performance Translation Tool + Human DBAs		> 90.00
Claude 3.7 Sonnet Anthropic	UNK	58.00
GPT-4o OpenAI	UNK	55.17
DeepSeek-V3 671B DeepSeek	671B	51.72
DeepSeek-R1 671B DeepSeek	671B	50.00
o3-mini OpenAI	UNK	43.10
DeepSeek-R1 32B DeepSeek	32B	39.66
DeepSeek-Coder-V2 Lite DeepSeek	15.7B	32.76
DeepSeek-R1 7B DeepSeek	7B	17.24

Leaderboard (MySQL) - Dialect Compatability (Acc_EX)

Model	Size	Accuracy (%)
Human Performance Translation Tool + Human DBAs		> 90.00
DeepSeek-R1 32B DeepSeek	32B	58.82
DeepSeek-V3 671B DeepSeek	671B	55.88
GPT-4o OpenAI	UNK	50.00
DeepSeek-R1 671B DeepSeek	671B	44.12
Claude 3.7 Sonnet Anthropic	UNK	44.12
DeepSeek-Coder-V2 Lite DeepSeek	15.7B	32.35
DeepSeek-R1 7B DeepSeek	7B	20.59
o3-mini OpenAI	UNK	8.82