The output answer is the same as the reference answer, and the blue score has always been 0 demo: if __name__ == '__main__': query = "今天天气怎么样" reference = """今天天气很不错""" response ="""今天天气很不错""" metrics = [BleuScore()] result = evaluate( dataset=dataset, metrics=metrics, llm=llm, embeddings=embeddings ) print(result)