At first it was mostly just a review of stuff we generally knew I’ve already passed it so not going to go to in depth but now I’m doing the coding exercises and rishabhs right once I get cooking I fucking COOK
I just have to put myself into positions to cook
Anyway so now we have the exercises.
And there was a cudaDeviceReset() but once we eliminated it the print output from the gpu stopped. Why is that? Well the powerful thing about this setup is that theres asynchronous execution of host and device code (host -> cpu) (device -> gpu) (just in case i end up giving the notes to rishabh or someone xd.
Soooo that means I think the cpu instructions ended and then the program just exits completely determined by cpu? Yer at least thats what gpt is saying.
Also this jank colab setup is hella funny lol.
Ok and this is actually on a very simple scale the same bug or issue which Edward Yang was talking about on the pytorch podcast.
People run their code and basically the time they get their error is all fucked up because async code. I’ll have to hear the exact issue again its on one of the cuda episodes but it throws them of course but really its a fundamental disconnect from understanding how execution is happening.
So theres a function called CudaDeviceSynchronize which I guess just halts cpu execution until all the gpu calls before catch up? Let me try it out
Ok yeah very interesting stuff the synchronize func worked great Now onto the really interesting stuff! Which is going to be printing threads.
Each thread has its thread index ;0 So now we can do conditional stuff.which leads to interesting downstream implications
How would I get to threadIdx.y
I experimented with the code blockIdx as well and It is 0 indexed which is interesting.
Well that was fucking awesome And i then i launched with 10 blocks and one thread and got basically random numbers in the output
