DVD - Lecture 8: Clock Tree Synthesis

Adi Teman

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 22 вер 2024

КОМЕНТАРІ • 52

@irvinlu 3 роки тому ⁺¹⁹
Dear Professor Teman,
As a new physical design engineer at tsmc, your lecture is very helpful to me!
Thank you for sharing knowledge to the world 👍
@AdiTeman 3 роки тому ⁺²
You are very welcome!
@enjoynetsl 2 роки тому ⁺³
Your videos saved my job interview. Thanks a lot!
@AdiTeman 2 роки тому
Glad it helped!
@Matthew-tu2jq 4 роки тому ⁺⁵
This is fascinating! Thank you so much for this, super interesting.
@AdiTeman 4 роки тому ⁺¹
Glad you enjoyed it!
@ayushkhare9420 4 роки тому ⁺⁴
Your lectures are very helpful. Thanks for sharing your knowledge!
@AdiTeman 4 роки тому ⁺¹
Your welcome! Thanks for the kind words.
@ayushkhare9420 4 роки тому
@@AdiTeman Why we use buffer instead of an inverter in the clock tree path?
@AdiTeman 3 роки тому
Sorry for the late reply, I didn't see your question until now.
Most of the tools, in fact, enable us to use inverters on the clock tree, as they cost less in terms of both area and power. We tend to call them "buffers", even if they are inverters, to both simplify the discussion and because they do buffer the signal (just with an inversion). There are some drawbacks to using inverters, however. For one, you need an even number to get the right polarity and this could cause some unwanted skew. The other is that since you are having different rise and fall sections along the clock tree, you may hit additional skew (vs. buffers that would have the same transitions everywhere). A third point is that a buffer actually is able to present a low capacitance and drive a high load, since it has the intermediate stage, whereas an inverter is only a single stage in that sense.
But the bottom line is that this is usually an option in the tools that you can choose to use or not to use and analyze the results.
@lalithsamanthapuri2055 5 років тому ⁺²
Thanks a lot sir for teaching Clock Tree Synthesis(CTS) with proper slides and hope you will do more videos on Routing,ECO flow and sign off checks..!
@AdiTeman 5 років тому ⁺¹
Hi Lalith - you are welcome.
Just to note - the routing lecture is online (ua-cam.com/video/kZ9EhWI8veU/v-deo.html).
I "ran out of time" for now and haven't been able to record the sign-off lecture. I hope I will find the time to do it in the coming months.
@lalithsamanthapuri2055 5 років тому ⁺¹
@@AdiTeman Thanks for your concern and I am learning through this youTube video sir.Knowingly or unknowingly you are inspiring young VLSI PD Engineers like me by sharing your knowledge sir.
Sir do you know that a lot of PD Engineers or trainees are there at present,especially trained freshers are more than the VLSI PD Engineers in India..!!
@workthamngan9407 2 роки тому ⁺¹
Hi Prof, it's truly a blessing for having a fantastic channel on VLSI topic. In fact, i refreshed most of my knowledge in VLSI through your videos. Well, I have a question and could you shed some lights on this.
When you talk about the power consumption category, i am totally agree most of the Power spike comes from CTS stage. On the hand, here is my question.
When you mentioned about Caches, Execution Units and Control, are these category comes from the fetch, decode and execution process during the accessing on memory data register?
Next, it is related to the I/O Drivers (from Top Modules perspective), Are those power consumption of I/O drivers from pads and pins of all the blocks or interface?
Please enlighten this part and would be helpful for me if you could reply on this matter. Honestly, I love all your sharing so far!
@AdiTeman 2 роки тому ⁺¹
Hi Work,
I'm not sure I understood your question exactly, but let me try to answer what I think you are asking.
The first question was about caches, execution units and control. I don't exactly remember where I mentioned these in the lecture, but if we look at a CPU, it is made up exactly of control and execution (ALU) and is connected to memory and I/O. When we breakdown a RISC-style processor into pipeline stages, it is easy to separate the datapath into (among others) the stages you mentioned, and indeed, the fetch stage will access the instruction memory, the decode stage will apply control logic and access the register file, and the execution stage will run through the ALU. The data memory is often accessed during an additional stage. The CPU is an example of most other types of computational units (often called "accelerators"), which have some execution logic and some control and almost always are supplemented by some memory or at least registers. These also can often be pipelined. With regards to this specific lecture (CTS), each stage of computation includes registers that store/set the state of the operation and these require a clock signal. In addition, the memories (SRAMs and register files) require a clock signal.
The second question was about the I/O drivers. I am again not exactly sure what you are asking, but I/O cells are only needed at the chip interface, i.e., where the chip connects to the external world. Since we are driving wires that are of a different order-of-magnitude than on chip wires, we need very large buffers (ultra-wide transistors), as well as some other circuitry (See I/O and Packaging lecture ua-cam.com/video/O2Od1Tey-Jo/v-deo.html). Driving these wires can consume quite a bit of power.
However, at the block level, the interface "pins" are almost virtual. They are just a piece of metal that another piece of metal from outside the block must connect to, but this can be a very small wire with small parasitics and therefore, doesn't necessarily require any special considerations for driving it. That being said, we often connect I/O buffers to these pins, but for a different reason than our I/O cells that are connected to the outside world (I'm not sure I discussed this in the lecture series). I/O buffers are just (rather small) buffers that do the following:
- input buffers make sure that the input pin is connected to a short wire with a fan in of 1, thereby reducing the chance that the transition on the input net will be horrible when connecting the block to its top level.
- output buffers make sure that the output pin is driven by a substantially sized buffer so that if a long wire with a large fanout is connected at the toplevel, it has a chance not to be weakly driven.
I hope that helped. If not, please clarify the questions.
Adi
@workthamngan9407 2 роки тому
@@AdiTeman In your sharing, you did shared about the power partitioning like a pie chart and it's not general(you did clarify on that). Hence, I was trying to relate where are these coming from. Besides the basic cycles of a CPU is fetch, decode and execute, so trying to relate them to the pie chart. Below are the details that i would like to know further and i assumed you answered all of them.
1) The whole 40% are coming from the CTS stage which are the clock network structuring from, clock generation, elements, wire and load.
2) Next, are the cache(20%), control(15%), executions(15%),(i was trying to relate with the CPU execution), so if those explanation as mentioned above are contributing to this segment then I absorb it as it is. Literally, your explanation still go back to clock signal in the control and execution stage.
3) Finally, on the IO drivers(10%). Indeed, packaging has another phase of circuitry and it doesn't stopped at gdsII layers.
So if all the explanation above is related to the pie chart of power partitioning then you did another fruitful sharing here.
Thanks for adding value on the buffering. Well, as long you buffered along the wire at a reasonable distance you will not be seeing the transition regardless of the pin, ports or pads. Literally, I do check on the LUT to see the transition value and create BB if necessary to have bigger buffer. In fact, this required a lot of planning at the early stage of TOP floorplanning and different designer has different style of working the floorplan.
Anyway, Prof, i'm glad that you reply promptly to my question which made my day. Thank you and Thank you.
@vlsiupdates22 Рік тому ⁺¹
Complicated topics explained in a detailed way! Thanks for your effort! Could you please suggest the materials to refer to?
@AdiTeman Рік тому
Yes, at the end of each slide deck, I have a list of main references.
In addition, I try to write the source of each figure that I "borrowed" from some other place on the internet.
To find my full slide decks, please refer to my faculty website: www.eng.biu.ac.il/temanad/teaching/
@skilambi 11 місяців тому
Dear Prof. Teman. Thank you for these series. I am enjoying them. One question for you. Do you recommend any good set of books that I can refer to while going along with your course?
@AdiTeman 11 місяців тому
Hi @skilambi,
I cannot really. In all of the lectures, I have a partial list of important references that I used when compiling the lectures, but I cannot say that there is anything I found that is comprehensive with the material of this course. I created a very large amount of the content from my personal knowledge, experience and intuition, and these parts are not documented anywhere but in these slide decks.
Along with some friends and colleagues, we have a general idea that we should write a text book. But this is a huge (like enormous) effort and I do not see how I will have time to start it in the near future. If I do, I will let you know.
@skilambi 11 місяців тому
@@AdiTeman A book from you would be wonderful. Thank you for sharing these videos. They are amazing. I have done mostly FPGA design for 15 years and though they are many things that are common, there are still some differences with custom ASIC. Your lectures cover almost all of those and I am learning alot.
@montyi8 5 років тому ⁺²
Hi Adam, was wondering when will you be uploading the next lecture on Routing? Thanks
@AdiTeman 5 років тому ⁺²
Hi Viji. Thanks for your interest. Hopefully, I'll get it out within a week or two
@arghyakarmakar8422 2 роки тому ⁺¹
can I get PDF of the class?
BTW great video... love from India
@AdiTeman 2 роки тому
Of course.
All of the PDFs are linked to from my faculty website at www.eng.biu.ac.il/temanad/teaching/
Specifically, this PDF is at the following link: www.eng.biu.ac.il/temanad/files/2019/01/Lecture-8-CTS.pdf
@silencetravel8376 4 роки тому
Dear Adi,
Very grateful to learn from this Lecture ,
My question is what is different between leaf pin and clock sink
@AdiTeman 4 роки тому
Hi - good question. I guess you could say that they are synonyms. Depending on the literature or the tool, there can be slight differences in the definition, but a a high level, they are the same. Actually, CTS becomes quite complex in some designs - I would even say "mind boggling". It is really a hard issue and in one lecture without some real world experience, it is hard to grasp why there are all these options and what can be so problematic. But the clock network on SoCs can be really complex and that makes CTS a tough job. So, back to your question, I guess one answer could be that a "sink" is a skew balancing point of the clock tree, while a leaf may be the end of the clock graph but not a skew balancing point. That being said, it really depends on the fine-print of the tool, but for the most part these two terms are the same.
@silencetravel8376 4 роки тому ⁺¹
@@AdiTeman thanks for the detailed reply
I figure it out finally
@ranveerdhawan744 3 роки тому
Sir the content is superb . Can you please make more videos on it using Synopsys ICC2 compiler and its commands ....
@AdiTeman 3 роки тому
Hi Ranveer. Thanks for the comments. Currently, I don't have plans to provide an alternative version with Synopsys commands. However, note that the commands are mainly provided to give some practical perspective of the more abstract concepts presented in the slides. In the end, the specific command for each tool and each vendor will be similar, albeit with many additional options and parameters, but you will need to use the tool user manual for the specific version in order to find the exact usage.
@anithasabhavat6064 3 роки тому ⁺¹
Hello sir,
Thank you so much for this entire pd playlist..
I have a question how do we decide insertion delay and skew..how much value of insertion delay and skew is acceptable?
@AdiTeman 3 роки тому ⁺¹
Good question and no "right" answer. I guess that in general, target insertion delay is a problematic objective, as it is directly proportional to the number of clock sinks in your design. But "the lower the better" is probably the best that can be said. The same goes for skew. To get lower skew, you will have to pay with additional buffers and clock tree levels, which will result in both higher power/area and higher insertion delay, so in a lot of ways, these two objectives are contradictory. The general approach is to iterate and see what the design can do and then constrain it to be about that or a bit more constrained. BUT - that brings us to the discussion of the CCOpt approach, where the skew "doesn't matter" and you should focus on meeting timing... So whatever skew/insertion delay you arrive at that meets timing is fine...
@pankajdhingra9985 3 роки тому ⁺¹
@@AdiTeman
I think it depends on the drv constraints you are having. The tool will try to ensure zero drv violations and for that, it will be placing buffers and inverters in the clock paths - this is lead to insertion/propogation delay. Then, the tool will try to make skew as minimum as possible - called as balanced skew. Then, the tool will try further to meet the setup/hold timings if not met by balanced skew by adding buffers here and there (clock/data path) - called as useful skew.
Adi, please feel free to correct me if I am wrong.
@AdiTeman 3 роки тому ⁺¹
Indeed, thank you Pankaj.
This is basically how the CTS tools work.
I understood Anitha's question in a more abstract fashion, i.e., how do you define these DRV constraints. But once you have defined them, the tool works pretty much how you explained.
@lalithsamanthapuri2055 5 років тому ⁺¹
What are all the constraints related STA with CTS...?
I mean how our constraints in .sdc changes/affects at each stage until CTS stage?
@AdiTeman 5 років тому ⁺²
Good question, but the answer is a bit complicated. In general, I am all for maintaining a single .sdc file across all steps of the design. In this way, if a basic constraint is changed at one stage, it is propagated across all stages without having to go into each file and update it, which often leads to bugs. This can be done with some simple TCL code and variables that represent the stage that the design is currently at.
As for what changes - in a straightforward manner, the propagation of the clock is the main thing. So until CTS, the clock is treated as ideal (set_ideal_network) and afterwards, the clock should be propagated through the design and no longer treated as ideal. There are several sub-constraints that come out of this (for example, updating the I/O latency). These can be handled in many ways, starting from the EDA tools automatically changing settings according to the stage in the design and through particular methodologies that are applied at various companies. There is no "right answer", however, and each design and flow has to be fine tuned. The main thing is that you should have a basic understanding of the fundamental mechanisms going on and then you can adapt your methodology accordingly.
@lalithsamanthapuri2055 5 років тому
@@AdiTeman Sir in my design I have a hold violation of 30ps(i.e.., Hold Slack = -3.0456 ) at the CTS stage.Should I change few constraints in the .sdc file..??
My question is that "what are the constraints should be modified in .sdc which affects the setup slack at Placement stage and hold slack at CTS stage..??"
Is that a proper solution to change my .sdc constraints in my design...???
@AdiTeman 5 років тому ⁺¹
Hi Lalith.
First of all, -3.0456 is 3ns (usually, depending on the settings of the tool), which is a HUGE hold slack. This is something that is not usually fixable in a straightforward fashion and usually results from:
1) incorrect SDC definitions (e.g., clock domain crossings)
2) incorrect or incomplete CTS definitions
Both (1) and (2) are design-specific and require a lot of planning, knowledge and work to get right. But to debug what is causing this extreme hold violation, what you must do is to look at the timing path that causes this error and try to understand where it comes from. First, see if the source of the launch and capture paths are from the same clock (if not, this is a clock domain crossing, which is, by definition, a violation unless synchronized). Second, see what causes the large skew between the launch and capture clock paths. That will give you a start in trying to debug and understand what happened.
@lalithsamanthapuri2055 5 років тому ⁺¹
@@AdiTeman Thanks a lot sir.
@gopakumar6754 4 роки тому
Usually in CTS global skew is optimised instead of local skew. what's the significance of optimizing global skew?
@AdiTeman 4 роки тому ⁺¹
If I understand your question correctly, I think the appropriate answer is that the basis for CTS is that since you have optimized for setup (max delay) during synthesis, assuming an ideal clock, then if you could keep this situation (i.e., ideal clock), you would meet your timing (based on Synthesis) and not have hold violations (which are caused by positive skew). So let's try to remove as much skew as possible and everything will be okay. This would be a global skew target and this is a much harder optimization goal than local skew, since local skew is going to be smaller than global skew. If there are hold violations due to local skew, these can be handled separately after taking care of the bigger problem.
@merrygo7189 5 років тому ⁺¹
Sir
What is phase delay at a sink pin in cts
@AdiTeman 5 років тому
In the "old" Cadence timing reports jargon, a phase delay was basically the period length (T) of the clock. That is a simplification, since it can be a rising to falling or falling to rising (half period) or various other situations. I think they have changed the way they write it in newer (commonUI) versions to make it more clear. In any case, it is usually just the clock period.
@merrygo7189 4 роки тому
Hi sir,
My question is
What will happen if we add two ICG or two latches in a single clock path .?
@AdiTeman 4 роки тому ⁺¹
In general, two ICGs is not a problem. Of course, any path to an ICG is categorized as a "clock gating path" and timing analysis has to make sure that the enable arrives during the correct cycle. But assuming both ICGs are enabled, the clock will pass through undisturbed.
Regarding two latches in a single (non clock) path - this is a different issue that I have decided not to dive into in this course. In general, positive and negative latches can be cascaded with logic in between to better distribute the delays in a master-slave type of approach. But in general this is often used.
@abdelrahmansalah6800 2 роки тому
can't wait to the Kahoot of this lecture
@AdiTeman 2 роки тому
It will be coming next week, I hope.
@simranbhaisare3434 4 роки тому
sir i have a question...if i apply a single clock to drive 50flipflops and other clock to drive 100 flipflops..which will consume more power?
@AdiTeman 4 роки тому ⁺¹
I'm not sure I understand exactly, but I think the answer is, of course, that the clock driving 100 flip flops will consume more power than the one driving 50 flip flops, as the input capacitance of 100 flip flops (and the clock network to drive them) will be larger.
@pankajdhingra9985 3 роки тому ⁺¹
I also think 100 ff would consume more power as more capacitance - more transition time - more power consumption
@merrygo7189 5 років тому
Sir please make a video on sdc file in detail
@AdiTeman 5 років тому ⁺¹
Dear Merrygo. Please take a look at the video on Static Timing Analysis, where I give some introduction to SDC. I do not have plans to make a full detailed video on this subject, but there are quite a few good books and websites that provide it. Getting the basics from the video should help you to read through that documentation.

Наступне

Автоматичне відтворення