Hi
well, the point is: if it's a mess, that's indicative that the subdivisions and data connections in the verilog file are not good enough. it means, "break file down into smaller modules" and "Get S*** Together"
It's more that 32 bit buses are a bit of a mess when not using the bus wire from KiCAD Schematics. And also graph node placing has its own algorithmic theory.
Andreas did a really good job ex-lining too big subcells and grouping them in a fashion which allows to place and route subcells which have an acceptable delay because the amount of asynchronous parts within the clock domains are being reduced. Smaller propagation delay means higher frequencies.
Cheers David