[New Task] Add ASSIN2 (Brazilian Portuguese) — RTE and STS subtasks

### **Summary**
This proposes adding ASSIN2 to the harness — a standard Brazilian Portuguese benchmark covering two subtasks:

- ASSIN2-RTE: Recognizing Textual Entailment (~6,500 train / 500 val / ~3,000 test sentence pairs, F1-macro metric)
- ASSIN2-STS: Semantic Textual Similarity (same splits, Pearson correlation metric)

Dataset available at nilc-nlp/assin2 on Hugging Face.

### **Motivation**
ASSIN2 is the standard upgrade to ASSIN v1 (already in portuguese_bench) and is the most widely used Brazilian Portuguese NLP benchmark since 2020. Adding it would extend the existing portuguese_bench task group with a more recent evaluation suite, and complements the recently proposed ifeval_pt (#3622).

### **Implementation plan**

- Two task configs: assin2_rte and assin2_sts under portuguese_bench
- Dataset: nilc-nlp/assin2
- RTE: few-shot multiple choice, F1-macro
- STS: generation, Pearson correlation
- Happy to implement if maintainers are interested

### **References**

- Dataset: huggingface.co/datasets/nilc-nlp/assin2
- Paper: Real et al., 2020, "The ASSIN 2 Shared Task: A Quick Overview"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Task] Add ASSIN2 (Brazilian Portuguese) — RTE and STS subtasks #3765

Summary

Motivation

Implementation plan

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[New Task] Add ASSIN2 (Brazilian Portuguese) — RTE and STS subtasks #3765

Description

Summary

Motivation

Implementation plan

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions