Repeated greeting in same chat session

This is a replication of the problem described in the following link https://github.com/abetlen/llama-cpp-python/pull/711#issuecomment-1742430189  I met same issue so create an issue to track.
# Problem statement
When chat with using the `llm.create_chat_completion` API, the chatbot keeps greeting and repeat what I had input before.

# Expected Behavior

I'm trying to have a continue conversation with the chat bot, and I expect to have a smooth speech flow.

# Current Behavior
A conversation goes like the following with a simple chat program using `llama-2-7b-chat.Q5_K_M.gguf` downloaded from https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF, the simple chat code is from the following PR https://github.com/abetlen/llama-cpp-python/pull/575:
```
Rob:  Hello! It's nice to meet you. Is there something I can help you with or would you like to chat about something in particular?
User: Let's talk about football
Rob:  Hello! Nice to meet you too! *adjusts avatar glasses* I'm just an AI, so I don't have personal preferences or interests, but I'd be happy to chat with you about anything you'd like. How about we talk about the latest football news? 🏈
User: I like to play football on shiny day.
Rob:  Hello! *adjusts avatar hat* It's great to meet you too! I'm just an AI, so I don't have personal preferences or interests either, but I'd be happy to chat with you about anything you'd like. How about we talk about the latest football news? 🏈
Oh, you like to play football on shiny days? That's great! *adjusts avatar sunglasses* I'm just an AI, so I don't have personal experiences, but I can certainly chat with you about football. What's your favorite team or player? 🤔
```
The response from 'Rob' is generated by llama-cpp-python and we can clearly 'Rob' greets me again and again.

# Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

* Physical (or virtual) hardware you are using, e.g. for Linux:

`$ lscpu`
`12th Gen Intel(R) Core(TM) i7-12700H, with hyper threading off`

* Operating System, e.g. for Linux:
`$ uname -a`
`Linux cortex 6.5.3-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 13 Sep 2023 08:37:40 +0000 x86_64 GNU/Linux`



* SDK version, e.g. for Linux:

```
$ python3 --version
Python 3.11.3
$ make --version
GNU Make 4.4.1
$ g++ --version
g++ (GCC) 13.2.1 20230801
```

# Failure Information (for bugs)

See current behaivor part

# Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

1. Step 1, save the following as `simple_chat.py`
```
from llama_cpp import ChatCompletionMessage, Llama
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("-m", "--model", type=str, default="<your path to your ggml-model.bin>")
args = parser.parse_args()

llm = Llama(model_path=args.model)

# Create a list of messages
messages = [
    ChatCompletionMessage(role='system', content='start chat'),
    ChatCompletionMessage(role='user', content='Hello')
]

while True:
    # Generate a response
    response = llm.create_chat_completion(
        messages,
        temperature=0.2,
        top_p=0.95,
        top_k=40,
        stream=False,
        stop=[],
        max_tokens=256,
        repeat_penalty=1.1,
    )

    output = response['choices'][0]['message']['content']
    print(f"Rob: {output}")

    # Append assistant's message to conversation history
    messages.append(ChatCompletionMessage(role='assistant', content=output))

    user_message = input("User: ")
    messages.append(ChatCompletionMessage(role='user', content=user_message))
```
2. step 2, download `llama-2-7b-chat.Q5_K_M.gguf` downloaded from https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF
3. step 3
Run `python simple_chat.py --model <path to model file>
4. etc.
Chat freely or input prompt from 'User' in section 'Current behavior'

llama-cpp-python version:
commit 43dfe1e2abef2ef0d873732ed65986eb9c3e379f
llama-cpp version:
commit 48edda30ee545fdac2e7a33d505382888f748bbf 

# Failure Logs
See 'current behavior section'


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repeated greeting in same chat session #801

Problem statement

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Repeated greeting in same chat session #801

Description

Problem statement

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions