This has to do with the difference between how we calculate whether or not an intervention has evidence of being effective on a given outcome domain and how we calculate the size of an intervention’s effects on that outcome domain. In order to receive a supported rating, an intervention must have at least one statistically significant, favorable finding and no statistically significant unfavorable findings in the given outcome domain. Effect sizes, on the other hand, are an average of all findings for a given outcome domain, including those that are not statistically significant.
Take, for example, a study that finds three effects on earnings, one of which is statistically significant, and favorable, and two of which are statistically insignificant, but unfavorable. Because the study identified a statistically significant, favorable effect on earnings and no statistically significant unfavorable findings, the intervention receives an effectiveness rating of supported on earnings. In calculating its overall effect on earnings, however, we average all the findings in this domain, the two unfavorable, but statistically insignificant findings, along with the statistically significant, favorable effect. The average of these three findings might result in an overall negative effect on earnings.