i've added in a link to the riscv-llvm "vector extension" RFC, which is particularly interesting. the RISC-V vector extension has some big money behind it (from supercomputing), so it would be extremely sensible to ride off the back of that.
whilst i am not *telling* people how it should go, i am used to... how can i put it... i am used to finding cross-project paths that minimise the amount of effort required to reach a specific goal, taking into account multiple factors along the way.
my feeling on this is therefore that the following approach is one which involve minimal work:
- investigate the ChiselGPU code to see if it can be leveraged (an
"image" added instead of straight ARGB colour)
- OR... add sufficient fixed-function 3D instructions (plus a memory
scratch area) to RISC-V to do the equivalent job
- implement the Simple-V RISC-V "parallelism" extension (which can
parallelise xBitManip *and* the above-suggested 3D fixed-function instructions)
- wait for RISC-V LLVM to have vectorisation support added to it
- MODIFY the resultant RISC-V LLVM code so that it supports Simple-V
- grab the gallium3d-llvm source code and hit the "compile" button
- grab the *standard* mesa3d library, tell it to use the
gallium3d-llvm library and hit the "compile" button
- see what happens.
now, interestingly, if spike is thrown into the mix there (as a cycle-accurate RISC-V simulator) it should be perfectly well possible to get an idea of where performance of the above would need optimisation, just like jeff did with the nyuzi paper. he focussed on specific algorithms and checked the assembly code, and worked out how many instruction cycles per pixel were needed, which is an invaluable measure.
as i mention in the above page, one of the problems with doing a completely separate engine (Nyuzi is actually a general-purpose RISC-based vector processor) is that when it comes to using it, you need to transfer all the "state" data structures from the main core over to the GPU's core.
... but if the main core is RISC-V *and the GPU is RISC-V as well* and they are SMP cores then transferring the state is a simple matter of doing a context-switch... or if *all* cores have vector and 3D instruction extensions, a context-switch is not needed at all.
will that approach work? honestly i have absolutely no idea, but it would be a fascinating and extremely ambitious research project.
can we get people to fund it? yeah i do. there's a lot of buzz about RISC-V, and a lot of buzz can be created about a libre 3D GPU. if that same GPU happens to be good at doing crypto-currency mining there will be a LOT more attention paid, particularly given that people have noticed that relying on proprietary GPUs and CPUs to manage billions of dollars worth of crypto-currency, when the NSA is *known* to have blackmailed intel into putting a spying back-door co-processor in to x86, and that it miiight not be a good idea to trust proprietary hardware libreboot.org/faq#intelme http://libreboot.org/faq#intelme
Would you mind tell me what do you mean by “state” DS from main core and why do you think it is a problem (is it power consuming or will reduce the performance to transfer the data or both)? So if I understood right you are talking about a Larrabee-like RISC-V architecture?