I Started Building a Roguelike RPG — Powered by On-Device AI #3
QNN Failed. LiteRT Failed. Then llama.cpp Delivered 42x Speedup. I wanted to write a success story today. It turns out I can. But getting there was a bit rough. What I Tried Today Attempt Result QN...

Source: DEV Community
QNN Failed. LiteRT Failed. Then llama.cpp Delivered 42x Speedup. I wanted to write a success story today. It turns out I can. But getting there was a bit rough. What I Tried Today Attempt Result QNN HTP + libcdsprpc.so workaround HTP initialized, but only 3 of 363 nodes ran on NPU LiteRT-LM GPU GPU memory overflow / engine creation failed llama.cpp + Adreno OpenCL Success. 8.9 tok/s QNN HTP: 3 Out of 363 Nodes I solved the libcdsprpc.so access problem from yesterday. The fix was using apktool to decompile the APK, inject uses-native-library directly into the manifest, and repackage. Not elegant, but it worked. HTP finally initialized: QnnDsp <W> Initializing HtpProvider ✅ QnnDsp <W> PrepareLibLoader Loading libQnnHtpPrepare.so ✅ Then this log appeared: number of nodes in the graph: 363 number of nodes supported by QNN: 3 3 out of 363 nodes ran on the NPU. The INT4 block quantization operator (MatMulNBits) isn't supported by HTP. The remaining 360 nodes fell back to CPU. Gen