After implementing a path tracing routine in OpenGL 4.3+ compute shader, I decided to do a counterpart implementation in cuda, just to see the capability of it and the flexibility for developers. Apparently, you have much more space to do things, especially in memory access from leveraging references/pointers. But it also comes with the most fundamental problems to most developers that it will be bound to nVidia specific cards (unless you have some magic).
Anyways, here is some render(s) from the new routine:

More contents coming in the future.