theluyuan/RWKV-Runner

Fork 0

Go to file

josc146 510683c57e remove enableHighPrecisionForLastLayer

2023-06-09 20:49:45 +08:00

.github/workflows

2023-06-09 20:37:05 +08:00

.vscode

dev config

2023-06-05 22:57:01 +08:00

backend-golang

improve update process for macOS and Linux

2023-06-09 20:38:19 +08:00

backend-python

add logs for state cache and switch-model

2023-06-09 20:46:19 +08:00

build

update Readme_Install.txt

2023-06-08 17:11:11 +08:00

deploy-examples/ChatGPT-Next-Web

deploy example for linux

2023-06-08 00:07:08 +08:00

frontend

remove enableHighPrecisionForLastLayer

2023-06-09 20:49:45 +08:00

.gitattributes

upload .gitattributes

2023-05-30 13:17:45 +08:00

.gitignore

add logs

2023-06-03 17:12:59 +08:00

exportModelsJson.js

update manifest.json

2023-05-07 16:09:16 +08:00

go.mod

update

2023-05-17 23:27:52 +08:00

go.sum

update

2023-05-17 23:27:52 +08:00

LICENSE

navigate card

2023-05-05 13:41:54 +08:00

main.go

chore

2023-05-31 14:14:25 +08:00

Makefile

dev config

2023-06-05 22:57:01 +08:00

manifest.json

update manifest.json

2023-06-07 19:45:53 +08:00

README_ZH.md

update readme

2023-06-09 12:08:09 +08:00

README.md

update readme

2023-06-09 12:08:09 +08:00

vendor.yml

upload vendor.yml

2023-05-30 10:35:24 +08:00

wails.json

init

2023-05-03 23:38:54 +08:00

README.md

RWKV Runner

This project aims to eliminate the barriers of using large language models by automating everything for you. All you need is a lightweight executable program of just a few megabytes. Additionally, this project provides an interface compatible with the OpenAI API, which means that every ChatGPT client is an RWKV client.

English | 简体中文

Install

FAQs | Preview | Download | Server-Deploy-Examples

Default configs has enabled custom CUDA kernel acceleration, which is much faster and consumes much less VRAM. If you encounter possible compatibility issues, go to the Configs page and turn off `Use Custom CUDA kernel to Accelerate`.

If Windows Defender claims this is a virus, you can try downloading v1.0.8/v1.0.9 and letting it update automatically to the latest version, or add it to the trusted list.

For different tasks, adjusting API parameters can achieve better results. For example, for translation tasks, you can try setting Temperature to 1 and Top_P to 0.3.

Features

RWKV model management and one-click startup
Fully compatible with the OpenAI API, making every ChatGPT client an RWKV client. After starting the model, open http://127.0.0.1:8000/docs to view more details.
Automatic dependency installation, requiring only a lightweight executable program
Configs with 2G to 32G VRAM are included, works well on almost all computers
User-friendly chat and completion interaction interface included
Easy-to-understand and operate parameter configuration
Built-in model conversion tool
Built-in download management and remote model inspection
Multilingual localization
Theme switching
Automatic updates

API Concurrency Stress Testing

ab -p body.json -T application/json -c 20 -n 100 -l http://127.0.0.1:8000/chat/completions

body.json:

{
  "messages": [
    {
      "role": "user",
      "content": "Hello"
    }
  ]
}

Todo

Model training functionality
CUDA operator int8 acceleration
macOS support
Linux support
Local State Cache DB

RWKV-4-Raven: https://huggingface.co/BlinkDL/rwkv-4-raven/tree/main
ChatRWKV: https://github.com/BlinkDL/ChatRWKV
RWKV-LM: https://github.com/BlinkDL/RWKV-LM

Preview

Homepage

Chat

Completion

Configuration

Model Management

Download Management

Settings

Languages

TypeScript 66.9%

Python 19.9%

Go 8%

Ruby 1.8%

JavaScript 1.3%

Other 2%

README.md

RWKV Runner

Install

Default configs has enabled custom CUDA kernel acceleration, which is much faster and consumes much less VRAM. If you encounter possible compatibility issues, go to the Configs page and turn off Use Custom CUDA kernel to Accelerate.

If Windows Defender claims this is a virus, you can try downloading v1.0.8/v1.0.9 and letting it update automatically to the latest version, or add it to the trusted list.

For different tasks, adjusting API parameters can achieve better results. For example, for translation tasks, you can try setting Temperature to 1 and Top_P to 0.3.

Features

API Concurrency Stress Testing

Todo

Related Repositories:

Preview

Homepage

Chat

Completion

Configuration

Model Management

Download Management

Settings

Default configs has enabled custom CUDA kernel acceleration, which is much faster and consumes much less VRAM. If you encounter possible compatibility issues, go to the Configs page and turn off `Use Custom CUDA kernel to Accelerate`.