Open
Description
So I finally had the chance to make the comparison. Since the update to q4_0 which should make them more performant, they became unuseful slow. I made a comparison with q4ks.
8B q4_0 4.67GB: freezes not just the app but the whole phone.
8B q4_K_S 4.68GB: runs like always. Notice that it's even minimal bigger in file size.
Reminder: I was able to run Q4_0_4_4 in the past and with this version q4_0 should behave like Q4_0_4_4.
I also made sure that the problem is not related to having the weights in internal storage or external. The behavior is the same in both cases.
I made the test with version 0.8.4a.
If the problem is in llamacpp and not chatterui, we should probably tell them about the problem.