Platform
Prompt Management
Evaluations
Observability
Dataset Management
Prompt Chaining
Docs
Blog
Case Studies
Careers
Contact Us
Log In
Research Papers
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Published
Jun 6, 2024
Is MMLU Broken? A Deep Dive into AI Benchmark Errors
Aryo Pradipta Gema|Joshua Ong Jun Leang|Giwon Hong|Alessio Devoto|Alberto Carlo Maria Mancino|Rohit Saxena|Xuanli He|Yu Zhao|Xiaotang Du|Mohammad Reza Ghasemi Madani|Claire Barale|Robert McHardy|Joshua Harris|Jean Kaddour|Emile van Krieken|Pasquale Minervini
Published
Jun 6, 2024
Can AI Predict Court Judgments? An Indian Legal AI Breakthrough
Shubham Kumar Nigam|Anurag Sharma|Danush Khanna|Noel Shallum|Kripabandhu Ghosh|Arnab Bhattacharya
Published
Jun 6, 2024
AgentGym: Training AI Agents Like Olympians
Zhiheng Xi|Yiwen Ding|Wenxiang Chen|Boyang Hong|Honglin Guo|Junzhe Wang|Dingwen Yang|Chenyang Liao|Xin Guo|Wei He|Songyang Gao|Lu Chen|Rui Zheng|Yicheng Zou|Tao Gui|Qi Zhang|Xipeng Qiu|Xuanjing Huang|Zuxuan Wu|Yu-Gang Jiang
Published
Jun 6, 2024
Boosting AI with Limited Human Feedback: The Proto-RM Approach
Jinghan Zhang|Xiting Wang|Yiqiao Jin|Changyu Chen|Xinhao Zhang|Kunpeng Liu
Published
Jun 6, 2024
The Unexpected Upside of AI Hallucinations
Peiqi Sui|Eamon Duede|Sophie Wu|Richard Jean So
Published
Jun 6, 2024
Catching AI Cheaters: How DICE Spots Data Contamination
Shangqing Tu|Kejian Zhu|Yushi Bai|Zijun Yao|Lei Hou|Juanzi Li
Published
Jun 6, 2024
Can AI Write Legal Documents? We Fine-Tuned a Large Language Model to Find Out
Chun-Hsien Lin|Pu-Jen Cheng
Published
Jun 6, 2024
Training Agents Like Large Language Models
Adam Jelley|Yuhan Cao|Dave Bignell|Sam Devlin|Tabish Rashid
Published
Jun 6, 2024
Can AI Learn Human Values? A New Benchmark Puts LLMs to the Test
Yuanyi Ren|Haoran Ye|Hanjun Fang|Xin Zhang|Guojie Song
1
...
The first platform built for
prompt engineering
Start for free