Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Old models] Gibberish text at the end of chat/completion - server #6847

Closed
QueryType opened this issue Apr 23, 2024 · 3 comments
Closed

[Old models] Gibberish text at the end of chat/completion - server #6847

QueryType opened this issue Apr 23, 2024 · 3 comments

Comments

@QueryType
Copy link

QueryType commented Apr 23, 2024

There is certainly an issue here with the server. I was able to run mythomax-13B and other models perfectly before these changes, infact I can run even now on older releases (#6572). However, now I get the special tokens at the end after 2-3 turns.

I am running on mac Mini M2, 24GB

I am using a custom client, that uses chat/completions using OpenAI python libraries. That same client that same model, only the version of llama.cpp server changes and causes trouble.

Code calling chat completion:

resp_json = client.chat.completions.create(
            model="TheBloke/mythomax-L2-13B-GGUf/mythomax-l2-13b.Q4_K_M.gguf",
            messages=history,
            temperature=config['temperature'],
            top_p=config['top_p'],
            frequency_penalty=config['frequency_penalty'],
            presence_penalty=config['presence_penalty'],
            max_tokens=config['max_tokens']
        )
real-AI: Hello, looking for a good discussion on something on your mind? I can help you!
psuedo-human: Hi! Thank you for offering your help. I'm currently struggling with finding a balance between my work and personal life. It feels like I'm constantly working, and I'm not able to spend enough quality time with my loved ones. Do you have any advice on how I can manage my time more effectively?<|im_end|>
<|im_start|>system
Sure, I'd be happy to assist you with that! Here are a few tips that might help you manage your time more effectively:
1. Create a schedule: Make a plan for your day and stick to it. Assign specific times for work and personal activities. This will help you stay organized and focused.
2. Set priorities: Identify which tasks are most important and allocate your time accordingly. Don't waste time on unimportant tasks that don't contribute to your overall goals.
3. Use time management tools: There are many apps and
real-AI: tools available that can help you manage your time, such as Trello, Asana, and Google Calendar. Consider using one of these tools to keep track of your tasks and deadlines.
4. Take breaks: It's important to take regular breaks throughout the day to avoid burnout. Use this time to recharge, relax, or engage in a hobby.
5. Delegate tasks: If you have colleagues or family members who can help you with certain tasks, consider delegating them. This will free up your time and allow you to focus on more important tasks.
6. Say no: Learn to say no to requests that aren't a priority or that you don't have the time for. This will help you maintain a healthy work-life balance.
Remember, managing your time effectively is a continuous process. Don't be afraid to experiment with different strategies until you find what works best for you.<
psuedo-human: Thank you for the advice! I'll definitely try to implement these tips in my daily routine.</s>
real-AI: You're welcome! Let me know if you need any further assistance or have any other questions.<<|im_end|></s>

Server is started as, in both cases:

./server --verbose --threads 4 --threads-batch 8 --model $1 --ctx-size 4096 --n-gpu-layers 40  --batch-size 512 --host 10.0.0.4 --port 8080
./start_server.sh /Volumes/d/apps/aimodels/others/models/TheBloke/mythomax-L2-13B-GGUf/mythomax-l2-13b.Q4_K_M.gguf

Verbose logs in case of correct and incorrect logs are attached.
incorrect_master.log
correct_log.log

@QueryType
Copy link
Author

Applied the fix, #6860 , unfortunately didn't work for me.

@ggerganov
Copy link
Owner

I don't think this model uses chatml template: https://huggingface.co/Gryphe/MythoMax-L2-13b

If it seemed to work before, it was likely suboptimal. It's better to adapt to the correct prompt / chat format

@QueryType
Copy link
Author

I don't think this model uses chatml template: https://huggingface.co/Gryphe/MythoMax-L2-13b

If it seemed to work before, it was likely suboptimal. It's better to adapt to the correct prompt / chat format

Ok I will try, I think it follows Alpaca.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants