Re: [Libre-silicon-devel] SoC for GPU project

List overview All Threads
Download

newer

older

SoC for GPU project

Results from yesterdays Mumble...

Mohammad Amin Nili

27 Jun 2018 27 Jun '18

5:05 p.m.

Hi all,

After some works with Luke, we managed to create a feature/criteria list of the GPU. So if your GPU IP Core could pass these requirements you’ll choose for the $250,000 USD sponsorship. It is possible that the list change during the time so stay-tuned. However the main requirements (performance, power consumption and license type) won’t change. Here is the link:

http://libre-riscv.org/shakti/m_class/libre_3d_gpu/ http://libre-riscv.org/shakti/m_class/libre_3d_gpu/

Best regards, Manili

P.S. Luke please correct me if I’m missing something.

Attachments:

attachment.html (text/html — 938 bytes)

Show replies by date

Luke Kenneth Casson Leighton

27 Jun 27 Jun

5:50 p.m.

New subject: SoC for GPU project

On Wed, Jun 27, 2018 at 6:05 PM, Mohammad Amin Nili manili.devteam@gmail.com wrote:

...

Hi all,

After some works with Luke, we managed to create a feature/criteria list of the GPU. So if your GPU IP Core could pass these requirements you’ll choose for the $250,000 USD sponsorship.

please please let's be absolutely clear: i can put the *business case* to the anonymous sponsor to *consider* sponsoring a libre GPU, *only* and purely on the basis of a *commercial* decision based on cost and risk analysis, comparing against the best alternative option which is USD $250,000 for a one-time proprietary license for Vivante GC800 using etnaviv.

we need to be *really clear* that there *is* no "guaranteed sponsorship*. this is a pure commercial *business* assessment.

now, it just so happens that there's quite a lot of people who are pissed at how things go in the 3D embedded space. that can be leveraged, by way of a crowd-funding campaign, to invite people to help, put money behind this that has *nothing to do with the libre-riscv anonymous sponsor*.

...

It is possible that the list change during the time so stay-tuned. However the main requirements (performance, power consumption and license type) won’t change.

correct. power consumption is a top priority for this particular project.

...

Here is the link:

http://libre-riscv.org/shakti/m_class/libre_3d_gpu/

i've added in a link to the riscv-llvm "vector extension" RFC, which is particularly interesting. the RISC-V vector extension has some big money behind it (from supercomputing), so it would be extremely sensible to ride off the back of that.

whilst i am not *telling* people how it should go, i am used to... how can i put it... i am used to finding cross-project paths that minimise the amount of effort required to reach a specific goal, taking into account multiple factors along the way.

my feeling on this is therefore that the following approach is one which involve minimal work:

* investigate the ChiselGPU code to see if it can be leveraged (an "image" added instead of straight ARGB colour) * OR... add sufficient fixed-function 3D instructions (plus a memory scratch area) to RISC-V to do the equivalent job * implement the Simple-V RISC-V "parallelism" extension (which can parallelise xBitManip *and* the above-suggested 3D fixed-function instructions) * wait for RISC-V LLVM to have vectorisation support added to it * MODIFY the resultant RISC-V LLVM code so that it supports Simple-V * grab the gallium3d-llvm source code and hit the "compile" button * grab the *standard* mesa3d library, tell it to use the gallium3d-llvm library and hit the "compile" button * see what happens.

now, interestingly, if spike is thrown into the mix there (as a cycle-accurate RISC-V simulator) it should be perfectly well possible to get an idea of where performance of the above would need optimisation, just like jeff did with the nyuzi paper. he focussed on specific algorithms and checked the assembly code, and worked out how many instruction cycles per pixel were needed, which is an invaluable measure.

as i mention in the above page, one of the problems with doing a completely separate engine (Nyuzi is actually a general-purpose RISC-based vector processor) is that when it comes to using it, you need to transfer all the "state" data structures from the main core over to the GPU's core.

... but if the main core is RISC-V *and the GPU is RISC-V as well* and they are SMP cores then transferring the state is a simple matter of doing a context-switch... or if *all* cores have vector and 3D instruction extensions, a context-switch is not needed at all.

will that approach work? honestly i have absolutely no idea, but it would be a fascinating and extremely ambitious research project.

can we get people to fund it? yeah i do. there's a lot of buzz about RISC-V, and a lot of buzz can be created about a libre 3D GPU. if that same GPU happens to be good at doing crypto-currency mining there will be a LOT more attention paid, particularly given that people have noticed that relying on proprietary GPUs and CPUs to manage billions of dollars worth of crypto-currency, when the NSA is *known* to have blackmailed intel into putting a spying back-door co-processor in to x86, and that it miiight not be a good idea to trust proprietary hardware libreboot.org/faq#intelme

Mohammad Amin Nili

8:13 p.m.

New subject: SoC for GPU project

Hello Luke,

...

please please let's be absolutely clear: i can put the *business case* to the anonymous sponsor to *consider* sponsoring a libre GPU, *only* and purely on the basis of a *commercial* decision based on cost and risk analysis, comparing against the best alternative option which is USD $250,000 for a one-time proprietary license for Vivante GC800 using etnaviv.

we need to be *really clear* that there *is* no "guaranteed sponsorship*. this is a pure commercial *business* assessment.

now, it just so happens that there's quite a lot of people who are pissed at how things go in the 3D embedded space. that can be leveraged, by way of a crowd-funding campaign, to invite people to help, put money behind this that has *nothing to do with the libre-riscv anonymous sponsor*.

I added this part to the page to make it more clear for other people.

...

i've added in a link to the riscv-llvm "vector extension" RFC, which is particularly interesting. the RISC-V vector extension has some big money behind it (from supercomputing), so it would be extremely sensible to ride off the back of that.

whilst i am not *telling* people how it should go, i am used to... how can i put it... i am used to finding cross-project paths that minimise the amount of effort required to reach a specific goal, taking into account multiple factors along the way.

my feeling on this is therefore that the following approach is one which involve minimal work:

investigate the ChiselGPU code to see if it can be leveraged (an

"image" added instead of straight ARGB colour)

OR... add sufficient fixed-function 3D instructions (plus a memory

scratch area) to RISC-V to do the equivalent job

implement the Simple-V RISC-V "parallelism" extension (which can

parallelise xBitManip *and* the above-suggested 3D fixed-function instructions)

wait for RISC-V LLVM to have vectorisation support added to it

MODIFY the resultant RISC-V LLVM code so that it supports Simple-V

grab the gallium3d-llvm source code and hit the "compile" button

grab the *standard* mesa3d library, tell it to use the

gallium3d-llvm library and hit the "compile" button

see what happens.

now, interestingly, if spike is thrown into the mix there (as a cycle-accurate RISC-V simulator) it should be perfectly well possible to get an idea of where performance of the above would need optimisation, just like jeff did with the nyuzi paper. he focussed on specific algorithms and checked the assembly code, and worked out how many instruction cycles per pixel were needed, which is an invaluable measure.

as i mention in the above page, one of the problems with doing a completely separate engine (Nyuzi is actually a general-purpose RISC-based vector processor) is that when it comes to using it, you need to transfer all the "state" data structures from the main core over to the GPU's core.

... but if the main core is RISC-V *and the GPU is RISC-V as well* and they are SMP cores then transferring the state is a simple matter of doing a context-switch... or if *all* cores have vector and 3D instruction extensions, a context-switch is not needed at all.

will that approach work? honestly i have absolutely no idea, but it would be a fascinating and extremely ambitious research project.

can we get people to fund it? yeah i do. there's a lot of buzz about RISC-V, and a lot of buzz can be created about a libre 3D GPU. if that same GPU happens to be good at doing crypto-currency mining there will be a LOT more attention paid, particularly given that people have noticed that relying on proprietary GPUs and CPUs to manage billions of dollars worth of crypto-currency, when the NSA is *known* to have blackmailed intel into putting a spying back-door co-processor in to x86, and that it miiight not be a good idea to trust proprietary hardware libreboot.org/faq#intelme http://libreboot.org/faq#intelme

I wanted to add also this part but I don’t know where to add it. So it’s a good idea to add this part to the page if you like.

Best regards, Manili

Luke Kenneth Casson Leighton

8:40 p.m.

New subject: SoC for GPU project

On Wed, Jun 27, 2018 at 9:13 PM, Mohammad Amin Nili manili.devteam@gmail.com wrote:

...

I added this part to the page to make it more clear for other people.

cool.

...

I wanted to add also this part but I don’t know where to add it. So it’s a good idea to add this part to the page if you like.

done.

Mohammad Amin Nili

9:03 p.m.

New subject: SoC for GPU project

...

i've added in a link to the riscv-llvm "vector extension" RFC, which is particularly interesting. the RISC-V vector extension has some big money behind it (from supercomputing), so it would be extremely sensible to ride off the back of that.

whilst i am not *telling* people how it should go, i am used to... how can i put it... i am used to finding cross-project paths that minimise the amount of effort required to reach a specific goal, taking into account multiple factors along the way.

my feeling on this is therefore that the following approach is one which involve minimal work:

investigate the ChiselGPU code to see if it can be leveraged (an

"image" added instead of straight ARGB colour)

OR... add sufficient fixed-function 3D instructions (plus a memory

scratch area) to RISC-V to do the equivalent job

implement the Simple-V RISC-V "parallelism" extension (which can

parallelise xBitManip *and* the above-suggested 3D fixed-function instructions)

wait for RISC-V LLVM to have vectorisation support added to it

MODIFY the resultant RISC-V LLVM code so that it supports Simple-V

grab the gallium3d-llvm source code and hit the "compile" button

grab the *standard* mesa3d library, tell it to use the

gallium3d-llvm library and hit the "compile" button

see what happens.

now, interestingly, if spike is thrown into the mix there (as a cycle-accurate RISC-V simulator) it should be perfectly well possible to get an idea of where performance of the above would need optimisation, just like jeff did with the nyuzi paper. he focussed on specific algorithms and checked the assembly code, and worked out how many instruction cycles per pixel were needed, which is an invaluable measure.

as i mention in the above page, one of the problems with doing a completely separate engine (Nyuzi is actually a general-purpose RISC-based vector processor) is that when it comes to using it, you need to transfer all the "state" data structures from the main core over to the GPU's core.

... but if the main core is RISC-V *and the GPU is RISC-V as well* and they are SMP cores then transferring the state is a simple matter of doing a context-switch... or if *all* cores have vector and 3D instruction extensions, a context-switch is not needed at all.

will that approach work? honestly i have absolutely no idea, but it would be a fascinating and extremely ambitious research project.

can we get people to fund it? yeah i do. there's a lot of buzz about RISC-V, and a lot of buzz can be created about a libre 3D GPU. if that same GPU happens to be good at doing crypto-currency mining there will be a LOT more attention paid, particularly given that people have noticed that relying on proprietary GPUs and CPUs to manage billions of dollars worth of crypto-currency, when the NSA is *known* to have blackmailed intel into putting a spying back-door co-processor in to x86, and that it miiight not be a good idea to trust proprietary hardware libreboot.org/faq#intelme http://libreboot.org/faq#intelme

Would you mind tell me what do you mean by “state” DS from main core and why do you think it is a problem (is it power consuming or will reduce the performance to transfer the data or both)? So if I understood right you are talking about a Larrabee-like RISC-V architecture?

Luke Kenneth Casson Leighton

10:04 p.m.

New subject: SoC for GPU project

--- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Wed, Jun 27, 2018 at 10:03 PM, Mohammad Amin Nili manili.devteam@gmail.com wrote:

...

Would you mind tell me what do you mean by “state” DS from main core and why do you think it is a problem

ok so you can't just execute GPU instructions on the main core, right? can you? because they're assembly code designed for.... the GPU, right? not the CPU, yes?

so you have some OpenGL data structures - "state" - which is on the main CPU, right? and, obviously, it's f***-all use sitting there on the CPU, you have to "package' that up, and get it to the GPU, yes?

but you can't just "throw it at the GPU and hope like hell it'll magically get there and do something", can you?

so you have to:

(a) "package" the OpenGL data structures - "state" - up into a format THAT THE GPU UNDERSTANDS. (b) *TELL* the GPU "here's your data, do something". (c) *** STOP *** the CPU from executing (or context-switch - do something else) whilst the GPU is working on it (d) *** RETURN*** or otherwise communicate with the GPU to tell you when the job is done.

that's damn complicated, isn't it? and how many data structures are there, and how much data to send over? you now need to design a hardware-software API, to deal with all that state, yes?

basically a CPU-GPU interface needs an IPC (Inter-Process Communication) mechanism that has to take into account TWO TOTALLY DIFFERENT ARCHITECTURES, doesn't it?

even if the only hardware was that ChiselGPU code, it would *still* be necessary to write some IPC system, packaging up the data on the CPU, telling the ChiselGPU engine "go", waiting for it to say "done", yes?

now compare that to just... taking the MesaGL source code and hitting the "compile" button, and taking the gallium3d-llvm source code and hitting the "compile" button. does that sound a lot easier?

in the case where you keep all of the code, the state, and the data structures in *ONE* processor, the entire development process becomes drastically, drastically simpler, yes?

one thing: it may be possible to begin profiling the gallium3d-llvm code *right now* (even on x86) to assess the inner loops and see where most of the time is spent, in each of the different areas associated with 3D rendering. take a look at jeff's nyuzi2016 paper to see what i mean. it's *really* important to know how many cycles are spent (on average, per pixel) transferring data from memory into registers (and back). it's really important to know how many cycles per pixel are spent on rasterisation, and so on.

whilst gallium3d-llvm on x86 will be heavily-optimised for SSE, it will at least give a good indication.

...

So if I understood right you are talking about a Larrabee-like RISC-V architecture?

yeah pretty much. with a focus on finding out *where* time is spent, then investigating if fixed-functions can be designed to speed that up (and reduce power consumption at the same time).

also see what low-level data types (FP16, FP12) and what sorts of SIMD / Vector widths would do.

also i would *really* like to know is: if extending RISC-V to 64 registers (it's currently 32), could the extra 32 registers (on a 64-bit system) be effectively used as a substitute for a tiling architecture's scratch-RAM area? 4x4 x 32bpp is basically 16 32-bit registers which is only 8 64-bit SIMD registers. which really is not a lot.

2607

Age (days ago)

2607

Last active (days ago)

libresilicon-developers@list.libresilicon.com

5 comments

2 participants

tags (0)

participants (2)

Luke Kenneth Casson Leighton
Mohammad Amin Nili