Status update

List overview All Threads
Download

newer

older

ARM starts campaign against RISC-V?

GPU/SoC

David Lanzendörfer

5 Jul 2018 5 Jul '18

3:38 p.m.

Hi Following status changes which are partially significant occurred, in a positive way:

a) I've now passed the safety examination and should get my RFID access card to the clean room in a week or so.

...

From that point on, the only thing keeping us from running the tests is the

lack of a finalized test wafer layout. However, since the mask set costs around 1kUSD we shouldn't rush now, because rushing causes mistakes, and mistakes cost money.

b) I've now worked myself into the topic of creating a generic DRAM controller out of lightdram, as Luke has suggested. Turns out it's a bunch of Python scripts which produce a single mashed together Verilog module. It's a total mess and ugly as hell. I think I will just "translate" this Python code into Scala/Chisel so that it can be used directly as an IP core in our SauMauPing framework. This will be especially relevant as soon as we switch over to the Yuen Long series which will be a full feature desktop CPU, which mins we will have DIMM/ SODIMM slots soldered to a DDR3/DDR4 bus, connected to the CPU. So it's already good if we have a nicely parameterizable version of a DDR3/ DDR4 controller inside Chisel.

@Manili: Wanna help with porting the lightram code from Python to Chisel?

Cheers David

Attachments:

signature.asc (application/pgp-signature — 195 bytes)

Show replies by date

Luke Kenneth Casson Leighton

5 Jul 5 Jul

8:52 p.m.

On Thu, Jul 5, 2018 at 4:38 PM, David Lanzendörfer david.lanzendoerfer@o2s.ch wrote:

...

b) I've now worked myself into the topic of creating a generic DRAM controller out of lightdram, as Luke has suggested. Turns out it's a bunch of Python scripts which produce a single mashed together Verilog module. It's a total mess and ugly as hell.

nice!

...

I think I will just "translate" this Python code into Scala/Chisel so that it can be used directly as an IP core in our SauMauPing framework.

mmm... you could save a lot of time and effort and import it as an external verilog module. one of the things that i do which annoys the crap out of people is jump from project to project, leveraging things from one project to augment another, saving vast amounts of time and completing work in a fraction of the time that *entire teams* of people previously tried to do. it's not a popular skill :)

it does however mean not passing judgement. python code is far easier to read and understand than any scala / chisel code i've ever seen. chisel is nowhere near as popular as python, and never will be. but that in some ways is beside the point: the point is that the python code has been used to generate *silicon proven* layouts.

unless you were prepared to write an auto-translator to translate python code into scala, it would be a huge amount of effort... to do what... *abandon* a silicon-proven design?? Does Not Compute :)

save some effort: use the auto-generated verilog as-is. plus... see below: a hyperram interface that i've found is in verilog as well.

btw if you're really concerned and want to test the openram auto-generated verilog, i've been working with cocotb successfully recently, and it means that you can write actual tests that.. y'know... are actually readable? and understandable? and like... use standard python test frameworks? and like... ordinary programmers can help with?

...

This will be especially relevant as soon as we switch over to the Yuen Long series which will be a full feature desktop CPU, which mins we will have DIMM/ SODIMM slots soldered to a DDR3/DDR4 bus, connected to the CPU.

honestly, for a first version, i feel either using SRAM (or multiple SRAM buses, they typically only run up to 133mhz), or better HyperRAM which is only 12 wires, would be quicker and easier. HyperRAM is basically octal SPI. taking any quad-SPI and turning it into octal SPI is... um... not hard (but see below: i did a quick google search and found one already)

that gets you 150mhz 8-bit data which is 150mbytes/sec for a budget of 12 wires. want 300bytes/sec, put down 2x interfaces. want 600mbytes/sec? put down 4x interfaces. that's only 48 wires. 5x12=60 wires and you have something equal to a DDR3 800mhz data rate.... *and it's entirely digital*.

whereas DDR3... *deep breath*.... the PHYs are analog. symbiotica eda quoted USD $300k and 8-12 *months* to do a DDR3 PHY layout. as in, *just* the layout.

later on you could do the DDR version of HyperRAM. it requires 13 wires (differential clock), dropping the voltage to 1.8v, and the data lines now need to be analog as you drive on the rising *and* falling edge. i'm told by experienced people that DDR (rising and falling edge detection) is analog, and so should be avoided if at all possible, to drastically simplify design and chances of first success.

60 wires to get a 750 mbyte/sec data transfer rate with 5 parallel 8-SPI buses and *no need to do analog* has got to be a lot better even than doing SRAM. SRAM is like... ridiculous numbers of wires by comparison. it's 133mhz but 133mhz @ 32 bit i think, but the address lines are totally separate from the data lines, whereas in 8-SPI the address and data encoding are a standard protocol, documented in the HyperRAM spec.

also, using hyperram, the guy behind blackmesalabs did a small PCB (libre licensed, CERN) so you could use one of the cypress off-the-shelf HyperRAM ICs as a first test chip, and have a second independent parallel project drop-in replacement, later.

p.s. https://github.com/blackmesalabs/hyperram looks like it won't be needed to be implemented, you could just use that.

David Lanzendörfer

6 Jul 6 Jul

5:43 a.m.

...

...
b) I've now worked myself into the topic of creating a generic DRAM controller out of lightdram, as Luke has suggested. Turns out it's a bunch of Python scripts which produce a single mashed together Verilog module. It's a total mess and ugly as hell.

nice!

...
I think I will just "translate" this Python code into Scala/Chisel so that it can be used directly as an IP core in our SauMauPing framework.

mmm... you could save a lot of time and effort and import it as an external verilog module. one of the things that i do which annoys the crap out of people is jump from project to project, leveraging things from one project to augment another, saving vast amounts of time and completing work in a fraction of the time that *entire teams* of people previously tried to do. it's not a popular skill

it does however mean not passing judgement. python code is far easier to read and understand than any scala / chisel code i've ever seen. chisel is nowhere near as popular as python, and never will be. but that in some ways is beside the point: the point is that the python code has been used to generate *silicon proven* layouts.

unless you were prepared to write an auto-translator to translate python code into scala, it would be a huge amount of effort... to do what... *abandon* a silicon-proven design?? Does Not Compute

save some effort: use the auto-generated verilog as-is. plus... see below: a hyperram interface that i've found is in verilog as well.

btw if you're really concerned and want to test the openram auto-generated verilog, i've been working with cocotb successfully recently, and it means that you can write actual tests that.. y'know... are actually readable? and understandable? and like... use standard python test frameworks? and like... ordinary programmers can help with?

The thing with this auto generated code is that it meshes everything into one verilog module. While Chisel produces nice hierarchical blocks, which makes it way easier to debug and understand. Additionally it basically assumes that "you want" to do an SoC so it kindof doesn't allow for easy extraction of the IP core... Additionally would we get stuck with an IP core with hard coded bus width and so on. Parameters are much nice, wouldn't you say?

About the silicon verification: We have to do this anyway, because we develop a new process here. And a iVerilog test bench seems good enough for me to start with.

...

...
This will be especially relevant as soon as we switch over to the Yuen Long series which will be a full feature desktop CPU, which mins we will have DIMM/ SODIMM slots soldered to a DDR3/DDR4 bus, connected to the CPU.

honestly, for a first version, i feel either using SRAM (or multiple SRAM buses, they typically only run up to 133mhz), or better HyperRAM which is only 12 wires, would be quicker and easier. HyperRAM is basically octal SPI. taking any quad-SPI and turning it into octal SPI is... um... not hard (but see below: i did a quick google search and found one already)

that gets you 150mhz 8-bit data which is 150mbytes/sec for a budget of 12 wires. want 300bytes/sec, put down 2x interfaces. want 600mbytes/sec? put down 4x interfaces. that's only 48 wires. 5x12=60 wires and you have something equal to a DDR3 800mhz data rate.... *and it's entirely digital*.

whereas DDR3... *deep breath*.... the PHYs are analog. symbiotica eda quoted USD $300k and 8-12 *months* to do a DDR3 PHY layout. as in, *just* the layout.

later on you could do the DDR version of HyperRAM. it requires 13 wires (differential clock), dropping the voltage to 1.8v, and the data lines now need to be analog as you drive on the rising *and* falling edge. i'm told by experienced people that DDR (rising and falling edge detection) is analog, and so should be avoided if at all possible, to drastically simplify design and chances of first success.

60 wires to get a 750 mbyte/sec data transfer rate with 5 parallel 8-SPI buses and *no need to do analog* has got to be a lot better even than doing SRAM. SRAM is like... ridiculous numbers of wires by comparison. it's 133mhz but 133mhz @ 32 bit i think, but the address lines are totally separate from the data lines, whereas in 8-SPI the address and data encoding are a standard protocol, documented in the HyperRAM spec.

also, using hyperram, the guy behind blackmesalabs did a small PCB (libre licensed, CERN) so you could use one of the cypress off-the-shelf HyperRAM ICs as a first test chip, and have a second independent parallel project drop-in replacement, later.

l.

p.s. https://github.com/blackmesalabs/hyperram looks like it won't be needed to be implemented, you could just use that.

The first revision ever will be CMOS-only, we won't have SRAM cells yet most likely by December. So everything more complicated than a NOR-gate needs to be wired onto the dev board externally. That includes SRAM and Flash. I guess we will just check for the least expensive components, with a standardized interface, on TaoBao and will use these for a demo board.

Cheers David

Luke Kenneth Casson Leighton

6:31 a.m.

On Fri, Jul 6, 2018 at 6:43 AM, David Lanzendörfer david.lanzendoerfer@o2s.ch wrote:

...

The thing with this auto generated code is that it meshes everything into one verilog module. While Chisel produces nice hierarchical blocks, which makes it way easier to debug and understand. Additionally it basically assumes that "you want" to do an SoC so it kindof doesn't allow for easy extraction of the IP core... Additionally would we get stuck with an IP core with hard coded bus width and so on. Parameters are much nice, wouldn't you say?

why on earth didn't they put parameters into the python code?? oink??

...

About the silicon verification: We have to do this anyway, because we develop a new process here. And a iVerilog test bench seems good enough for me to start with.

cocotb automatically generates and then wraps the iverilog test bench and gives a python interface which can be interacted with by the python cocotb application.

i just do not understand why the hell people have been developing test benches in verilog. it's basically stone-age technology.

David Lanzendörfer

7 Jul 7 Jul

10:06 a.m.

...

The thing with this auto generated code is that it meshes everything into one

...
verilog module. While Chisel produces nice hierarchical blocks, which makes it way easier to debug and understand. Additionally it basically assumes that "you want" to do an SoC so it kindof doesn't allow for easy extraction of the IP core... Additionally would we get stuck with an IP core with hard coded bus width and so on. Parameters are much nice, wouldn't you say?

why on earth didn't they put parameters into the python code?? oink??

Exactly. Rocket-Chip generates you a test-bench right along with the core...

...

...
About the silicon verification: We have to do this anyway, because we develop a new process here. And a iVerilog test bench seems good enough for me to start with.

cocotb automatically generates and then wraps the iverilog test bench and gives a python interface which can be interacted with by the python cocotb application.

i just do not understand why the hell people have been developing test benches in verilog. it's basically stone-age technology.

I guess that's Sebastian at it's best once again. The stuff has been heavily influenced by M-Labs, which mainly means that guy. And from experience I know that his attitude it "everyone is stupid except me", so it makes perfectly sense that the stuff he produces is in the fashion of "I know better than the customer, what the customer needs".

Now take a long good look at the output of this big stack of python files and their output and start to understand why I'm against getting him into the team ;-)

-lev

2597

Age (days ago)

2599

Last active (days ago)

libresilicon-developers@list.libresilicon.com

4 comments

2 participants

tags (0)

participants (2)

David Lanzendörfer
Luke Kenneth Casson Leighton