Even if you register every output you ll end up with 7ns Tmin. After that I guess you need to either use a faster library cell for adder or multipliers or decompose the adder or multiplier into smaller modules by logic duplication
We can add onother reguster in the middle of mux and sum so if we added then we get -0.4setup slack though it is getting voilated then we get swap with lvt and can increse the drive strength then voilation may go
We will add pipeline register in this design. Of course adding pipeline would increase the latency by 1 cycle but throughput would increase. Hence the operating frequency would increase
We can register the outputs of each multiplier to reduce the combinational path delay.
Even if you register every output you ll end up with 7ns Tmin. After that I guess you need to either use a faster library cell for adder or multipliers or decompose the adder or multiplier into smaller modules by logic duplication
We can add onother reguster in the middle of mux and sum so if we added then we get -0.4setup slack though it is getting voilated then we get swap with lvt and can increse the drive strength then voilation may go
A, B, C,D are one bit signals. So replace multipliers with AND gate and adder with OR gate.
We will add pipeline register in this design. Of course adding pipeline would increase the latency by 1 cycle but throughput would increase. Hence the operating frequency would increase