Cross-Slavic Transfer Analysis Framework
1. Core Experimental Setup
Three main conditions:
- Zero-shot transfer: Model trained on Russian/Ukrainian, tested on Belarusian (no Belarusian training data)
- Few-shot transfer: Model trained on Russian/Ukrainian + small Belarusian sample (50/100/500 examples)
- Full supervised: Model trained entirely on Belarusian
Models to test:
- Multilingual models: XLM-R, mDeBERTa-v3, mBERT
- Language-specific: Russian BERT, Ukrainian BERT, Belarusian HPLT BERT
- LLMs: Gemma 2, Llama 3, multilingual models with prompting
2. Transfer Directions to Compare
Russian → Belarusian Ukrainian → Belarusian Polish → Belarusian (bonus: West Slavic baseline) Multi-Slavic (Ru+Uk+Pl) → Belarusian Belarusian only (upper bound)
Key research questions:
- Does Ukrainian transfer better than Russian? (Geographically/politically closer but Russian more prevalent)
- How much Belarusian data is needed to match Russian-only performance?
- Do different tasks show different transfer patterns?
3. Task-Specific Transfer Analysis
For each task (CB, COPA, BoolQ), measure:
A) Performance gaps:
Transfer deficit = Belarusian-only score - Transfer score