I've tried to optimize things (smaller inlinable methods, switches instead of if/elseif, cached parser instances).
So far, the only thing that did performed better was turning DataCharBuffer into an interface, and add an implementation that wraps a String instead of a char array, thus saving an array copy.
This improved performance on big JSON messages of ~15%.
I'm still puzzled about your implementation being beaten by gson and json-smart on smaller messages. I suspect IndexBuffer's arrays allocation.