support using directory as training data

2023-07-07 21:57:01 +08:00
parent 6fbb86667c
commit bcb125e168
4 changed files with 79 additions and 18 deletions
--- a/frontend/src/_locales/zh-hans/main.json
+++ b/frontend/src/_locales/zh-hans/main.json
@@ -231,5 +231,6 @@
  "You are using WSL 1 for training, please upgrade to WSL 2. e.g. Run \"wsl --set-version Ubuntu-22.04 2\"": "你正在使用WSL 1进行训练，请升级到WSL 2。例如，运行\"wsl --set-version Ubuntu-22.04 2\"",
  "Matched CUDA is not installed": "未安装匹配的CUDA",
  "Failed to convert data": "数据转换失败",
-  "Failed to merge model": "合并模型失败"
+  "Failed to merge model": "合并模型失败",
+  "The data path should be a directory or a file in jsonl format (more formats will be supported in the future).\n\nWhen you provide a directory path, all the txt files within that directory will be automatically converted into training data. This is commonly used for large-scale training in writing, code generation, or knowledge bases.\n\nThe jsonl format file can be referenced at https://github.com/Abel2076/json2binidx_tool/blob/main/sample.jsonl.\nYou can also write it similar to OpenAI's playground format, as shown in https://platform.openai.com/playground/p/default-chat.\nEven for multi-turn conversations, they must be written in a single line using `\\n` to indicate line breaks. If they are different dialogues or topics, they should be written in separate lines.": "数据路径必须是一个文件夹，或者jsonl格式文件 (未来会支持更多格式)\n\n当你填写的路径是一个文件夹时，该文件夹内的所有txt文件会被自动转换为训练数据，通常这用于大批量训练写作，代码生成或知识库\n\njsonl文件的格式参考 https://github.com/Abel2076/json2binidx_tool/blob/main/sample.jsonl\n你也可以仿照openai的playground编写，参考 https://platform.openai.com/playground/p/default-chat\n即使是多轮对话也必须写在一行，用`\\n`表示换行，如果是不同对话或主题，则另起一行"
 }