z14 and z15 chip i,age

SMT: What utilization in real?

Simultaneous Multithreading (SMT) has been introduced to improve the throughput per core. In system Z if only one thread is running, the performance is very close to this of a single thread environment. In your data center you have most likely a mixture of both running. The FAQ I’m getting more and more is: how much capacity is left or how much is the real CPU SMT utilization?
The command from the hyptop post in LPAR does provide the data for answering this question. Here is an abbreviated output from this:

system  #core #The    core     the  mgm      Core+       thE+     Mgm+  online
(str)     (#)  (#)     (%)     (%)  (%)        (s)        (s)      (s)   (dhm)
T35LP76     8   16  799.20 1198.90 0.02     658.05     872.88     2.66 0:00:16

If we denote the core utilization by u_c, the thread utilization by u_t and the management utilization by  u_m then the real utilization u_r can be calculated as follows:

u_r = \frac{2u_c-u_t}{s} + (u_t-u_c) + u_m

where s denotes the SMT speedup factor. This factor is workload and hardware generation dependent. E.g. if a workload has to wait a lot for caches to be filled it’s probably benefitting more from SMT than a pure L1 bound compute workload. Hardware wise the efficiency improved quite a bit since z13. Not knowing anything about the workload I start with a rule of thumb of s = 1.3 for z15. In the example from above this would be

u_r = \frac{2 \cdot 799.2\% - 1198.9\%}{1.3} + (1198.9\%-799.2\%) + 0.02\% = 707.03\% 

In total the SMT utilization would be a little more than seven IFLs in this example. The three terms are

  1. The with the SMT speedup factor adjusted single thread component
  2. The dual thread component
  3. The management overhead component

If two threads would be running at the same time the first term would be zero. If all would be single threaded the second term would be zero.  

Leave a Comment

Your email address will not be published. Required fields are marked *