-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
supprt openai batch api #423
supprt openai batch api #423
Conversation
CONTEXT_PARAGRAPH_LIMIT = 3 | ||
BATCH_CONTEXT_UPDATE_INTERVAL = 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make a config.py file for these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yihong0618
I created config.py and modified the code to use it. Since this is my first time creating config.py, I'm not sure if I'm using it correctly. Could you please review it?
# Replace any characters that are not alphanumeric, underscore, hyphen, or dot with an underscore | ||
sanitized_book_name = re.sub(r"[^\w\-_\.]", "_", book_name) | ||
# Remove leading and trailing underscores and dots | ||
sanitized_book_name = sanitized_book_name.strip("._") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
@@ -388,3 +407,224 @@ def set_model_list(self, model_list): | |||
model_list = list(set(model_list)) | |||
print(f"Using model list {model_list}") | |||
self.model_list = cycle(model_list) | |||
|
|||
def batch_init(self, book_name): | |||
self.book_name = self.sanitize_book_name(book_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this name is not support windows
can we use pathlib
or os.path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed it
#423 (comment)
return f"{os.getcwd()}/batch_files/{self.book_name}_info.json" | ||
|
||
def batch_dir(self): | ||
return f"{os.getcwd()}/batch_files/{self.book_name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto seems not support windows file name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed it
f1e78eb
Improving EPUB Translation Efficiency with ChatGPT Batch API Implementation
This PR implements functionality that significantly improves the efficiency of the EPUB translation process by utilizing ChatGPT's batch API. The main changes are as follows:
Major Feature Additions
Implementation of Batch Translation Feature
--batch
option: Batch process translations using ChatGPT's batch API--batch-use
option: Create files using pre-generated batch translation resultsBatch Processing Workflow
Batch Processing Mechanism
Pattern for Creating Batches (
--batch
option)Initialization
batch_init
methodCreation of Translation Queue
add_to_batch_translate_queue
methodGeneration of Batch Files
BATCH_CONTEXT_UPDATE_INTERVAL
create_batch_context_messages
methodcreate_batch_files
methodBatch Execution
batch
methodSaving Batch Information
Pattern for Using Batches (
--batch-use
option)Checking Batch Results
is_completed_batch
methodRetrieval and Processing of Results
batch_translate
methodContext Generation for Batch Processing
Batch processing implements a different context management method compared to normal sequential processing:
Setting Context Update Interval
BATCH_CONTEXT_UPDATE_INTERVAL
(default 50)Specificity of Context Generation
Context Generation Logic
context_paragraph_limit
paragraphs of 100+ words, tracing back from the index of the text to be translatedEfficient Updates and Caching
create_batch_context_messages
methodDifferences from Existing Behavior
This method optimizes processing efficiency while maintaining appropriate context information during batch processing. However, due to the nature of batch processing, there may be slight differences in translation quality compared to normal sequential processing as the context generation method differs.
Files and Directories Created by Batch Processing
Batch processing creates the following directory structure and files:
Root Directory
os.getcwd()
)batch_files Directory
{current_working_directory}/batch_files/
{book_name}_info.json
{current_working_directory}/batch_files/{book_name}_info.json
{book_name} Directory
{current_working_directory}/batch_files/{book_name}/
Batch Request Files ({number}.jsonl)
{current_working_directory}/batch_files/{book_name}/{number}.jsonl
Notes:
{book_name}
is a safe directory name generated from the original file name (special characters replaced with '_')Technical Improvements
ChatGPTAPI
class: Addition of batch processing-related methodsExpected Effects
Important Notes
--batch
option, then use the results with the--batch-use
option.Usage Examples
1.a Executing batch translation (pattern without using context):
1.b Executing batch translation (pattern using context):
This implementation significantly improves processing efficiency, especially for EPUB files containing large amounts of text. Users can manage the translation process more flexibly, and overall performance is improved. By using batch processing, large-scale translation tasks can be executed efficiently and API requests can be optimized.