What the new paper already has:

sentiment analysis (sst2) 2 linguistic acceptability (CoLA) 1

word in context (QNLI) 7 Winograd schema challenge (WNLI) 9 textual entailment (RTE) 8

Full list of GLUE (General Language Understanding Evaluation) benchmark tasks β€” the standard suite of 9 tasks

🧩 Single-Sentence Tasks

  1. CoLA (Corpus of Linguistic Acceptability)
  2. SST-2 (Stanford Sentiment Treebank)

πŸ” Similarity and Paraphrase Tasks

  1. MRPC (Microsoft Research Paraphrase Corpus)
  2. QQP (Quora Question Pairs)
  3. STS-B (Semantic Textual Similarity Benchmark)

🧠 Inference Tasks

  1. MNLI (Multi-Genre Natural Language Inference)
  2. QNLI (Question Natural Language Inference)
  3. RTE (Recognizing Textual Entailment)
  4. WNLI (Winograd NLI)