RWKV-Runner/finetune/lora/v6/fla/ops/simple_gla at 9ff29cd3911c1a46a834b1603015a321a6ee1a06 - RWKV-Runner - pplokijuhyg个人git站点

theluyuan/RWKV-Runner

Files

History

josc146 f05a4acb04 sync https://github.com/JL-er/RWKV-PEFT

2024-05-28 22:35:47 +08:00

..

__init__.py

sync https://github.com/JL-er/RWKV-PEFT

2024-05-28 22:35:47 +08:00

chunk.py

sync https://github.com/JL-er/RWKV-PEFT

2024-05-28 22:35:47 +08:00

naive.py

sync https://github.com/JL-er/RWKV-PEFT

2024-05-28 22:35:47 +08:00

README.md

sync https://github.com/JL-er/RWKV-PEFT

2024-05-28 22:35:47 +08:00

README.md

Simple GLA

Gating mechanism in https://arxiv.org/abs/2103.02143. Compared to GLA, the gating is head-wise instead of elementwise. As a result, we can adapt the RetNet kernel for training using matmul w/o numerical instability. It is faster than GLA but has less expressive power. I will use it as a baseline for the GLA.

S_{t+1} = g_{t+1} \odot S_{t} + K_{t+1} V_{t+1}^{\top} where g is a scalar.