During the 1990s, many investors sought to invest in securities that were “socially responsible.” There are, of course, many definitions of socially responsible investment (SRI). For example, in May 1990, the Domini 400 Social Index (DSI) was created to replicate the returns of the Standard & Poor’s 500 (S&P500) Index. The Domini 400 Social Index was created by Kinder, Lydenberg, and Domini (KLD) to select companies with positive social and environmental records: the criteria involved community relations, diversity, employee relations, human rights, safety, environment, and (new) corporate governance. The DSI’s social criteria led it to exclude stocks that (1) derived more than 2% of gross revenues from military weapons; (2) firms with tobacco, gambling, and alcohol sales; and (3) firms that own or share in nuclear power plants. The returns of the DSI stocks have exceeded the returns of the S&P 500 stocks since 1990, although there is a beta bias (exceeding 1.0) and a growth bias to the DSI portfolio. The relative returns of the DSI and S&P 500 stocks, as of August 30, 2003, have been (Table 4
Time period returns
The relative returns on SRI portfolios can be volatile.15
There can be additional social criteria. One may not want to invest in stocks with poor environmental records, or poor employment (unions, safety, pension concerns), or diversity (woman and/or minority CEO, or boards of directors, family or “gay” rights, hiring of disabled persons) records. There can be an almost infinite number of combinations of social criteria exclusions that can be used to create SRI portfolios. Do these social criteria affect portfolio performance and stock prices? The recent empirical evidence, as reported in studies winning the Moskowitz Prize for Research in Socially Responsible Investment, supports the notion that SRI portfolios do not perform statistically different from (traditional) equity portfolios, and stock prices of socially responsible stocks are not determined by different criteria than commons stocks in general. The Moskowitz Prize winning studies are Guerard (1997b
), Waddock and Graves (1997
), Dowell, Hart, and Yeung (2000
). There is no statistically significant cost of capital differences in the SRI stocks and non-SRI stocks.
The academic literature in the last decade has reported both the shadow cost and return enhancement views of ESG-screened investment portfolios. A number of literature reviews and meta-analyses found evidence on both sides of the SRI/ESG performance question. Friede, Busch, and Bassen (2015) review more than 2200 studies and found that “The results show that the business case for ESG investing is empirically very well founded.” Sjöström (2011) reviews 21 studies in a meta-analytic framework and found, contra Friede et al. that “results point in all directions, and … there is no clear link between SRI and financial performance.”
For stock portfolios, Evans and Peiris (2010) found evidence of a positive relationship between ESG rating criteria and accounting measures of performance including return on assets and market-to-book ratios. Fulton, Kahn, and Sharples (2012) argue that a positive correlation exists between one aspect of ESG, in particular sustainability and superior risk-adjusted returns. Park and Moon use KLD ratings and found that for S&P 500 constituents, top quantile firms outperform by as much as 6.24% over the 1991–2006 period controlling for standard factor model risks. From a global perspective and focusing on corporate social responsibility (CSR), Renneboog, ter Horst, and Zhang (2008) found that some CSR measures are associated with shareholder value, but others are associated with destruction of shareholder value.
Statman analyzes SRI/ESG performance including comparing SRI indexes to the S&P 500 historically, finding little difference in performance. This is echoed in part by Geczy, Stambaugh, and Levin (2005) who in a sophisticated framework create SRI/ESG mutual fund portfolios. They found that index-oriented portfolios under CAPM and zero alpha parameters seem to impose relatively small costs, as little as 1 or 2 basis points per month, compared to non-SRI portfolios. However, when informative alpha beliefs or beliefs in the importance of investment style are present, relative performance deteriorates. Barnett and Salomon argue that a nonlinear relationship exists between social and financial performance, finding that funds with intense screens outperform by weeding out “bad” firms, and funds that have weak screens outperform in risk-adjusted measures accounting for diversification, while funds in the middle underperform.
Corporate governance has been studied. Liang and Renneboog reported that countries with higher social responsibility (sustainability) ratings were more likely to be civil law, rather than common law, countries, with the Scandinavian countries having the highest scores. Corporate social responsibility is positively associated with lower stockholder litigation risk and strong labor regulations. Liang and Renneboog stress that corporate social responsibility reflects the supply of socially responsible behavior by firms and the demand for corporate social responsibility practices by society.
Bialkowski and Starks (2017) reported that SRI funds attracted more flows, on average, than conventional mutual funds between 1999 and 2011. SRI funds had positive net flow in all but two quarters of the 13-year period; and SRI funds had statistically significant inflow following environmental disasters, such as the BP oil spill and the Fukushima Daiichi nuclear meltdown, and accounting scandals, Enron, Tyco, and WorldCom. SRI funds had higher exposure to (MSCI) ESG values that persisted across time. Riedl and Smeets (2017) surveyed some 3382 socially responsible investors and 35,000 randomly selected investors using administrative individual investor data in the Netherlands. Riedl and Smeets reported with an 8% response rate from conventional investors and a 12% response rate from SRI investors that SRI investors contribute more to charity than conventional investors, invest primarily in SRI equity funds than conventional investors, had longer holding periods than conventional investors, and expected lower (marginally) returns than conventional investors. Investors holding a university degree are more likely to be SRI investors. Our results are consistent with the recent findings on diversity reported in Manconi, Rizzo, and Spalt (2015) and Kim and Starks (2016). Manconi et al. (2015) reported that diversity enhanced returns more than the Fama-French five-factor components and that the market-weighted diversity portfolios are enhanced relative to the equal-weighted diversity portfolios for the 2001–2014 time period. Kim and Starks (2016) tested for unique skills of female members of corporate boards.
How have the costs of SRI changed since the initial studies and enhancements to the database? In this context, we re-examine the KLD database from its 1991 inception through 2015. Over time, the KLD database was subject to enhancements resulting from acquisitions and other methodology changes. For example, in 2000 the human rights category was added (Galema, Plantinga, & Scholtens, 2008), in 2002 governance was added (Statman & Glushkov, 2009), and in 2010 KLD decided to rank companies only on issues relevant for their industry as opposed to all issues. ESG scores are naturally persistent in the short run, as reported and confirmed by Wimmer (2012) in reviewing SRI mutual funds. However, Wimmer (2012) also found that persistence declines after approximately 2 years, making rebalancing crucial for SRI portfolios. Such instability of the dataset results in many versions of the final KLD scores constructed in academic literature, causing potential difficulties in comparing results across papers and time. It also highlights the potential dependency of results on the exact measurement methods used. As a result, recent literature has focused on ways to adjust and normalize the scores to account for the lack of stability in KLD dimensions over time.
From the early KLD studies (Sharfman, 1996) and continuing to the most recent ones (Statman and Glushkov, 2009), there has been an ongoing discussion about the challenges of creating a unique KLD-based score. The simplest way that sums all strengths and subtracts all weaknesses incurs its own set of biases and imbalances driven by data structure rather than companies’ ESG attributes. Dorfleitner, Utz, and Wimmer (2014) study the relation between ESG score performance and stock performance in various markets worldwide, reiterating global evidence of positive association between firm ESG ratings and subsequent returns; however, the bias remains. The earlier literature attempted to address the implicit bias arising from weighting each issue equally. For example, in order to avoid treating each ESG strength and weakness as equally important, Waddock and Graves (1997) rely on the issues weighting scheme developed. Because such weightings are highly subjective, they are no longer used in more recent studies. For example, employee relations strengths are evaluated on 10 individual variables, with a maximum score of 8, while human rights strengths are evaluated on 3 variables with a maximum score of 2. Hence, because of the uneven distribution, the raw score will be much more impacted by the employee relations strengths than the human rights strengths. The same issues affect the weights of strengths vs concerns. Depending on which area has a larger number of evaluated metrics, that area will get higher implied weightings in the overall raw calculations.
Another difficulty arises because of the changing nature of the KLD dataset over time. Specifically, as the number of strengths and weaknesses changes in each category, summing the raw strengths and weakness, such as was the earlier practice, creates score dynamics that are influenced by the dataset construction rather than only by the company’s changing ESG policies.
Kempf and Osthoff (2007) address this problem by normalizing the net scores within each of the six categories. In addition, they introduce a way to binary transform the weakness into the same direction as the strengths. Using decile spread portfolios, their paper documents statistically positive KLD score alphas for the 1992–2004 period. In addition, they introduce an important methodology to normalize the rankings by the sector to which the company belongs. This reduces the potential sector and resulting factor biases in the rankings. Finally, they create a version of the rank that excludes all companies that have at least one of the six controversial scores. Statman and Glushkov (2009) test a variation of the Kempf and Osthoff (2007) methodology by excluding companies that have a zero value in both the strength and the weakness fields in each category, claiming that those companies were not reviewed by KLD. They exclude the governance component because of short history. Over the 1992–2007 period, they found statistically significant positive alphas for KLD long-short portfolios.
Manescu (2011) adds an additional refinement that normalizes strengths and weaknesses separately. Since in the Kempf and Osthoff (2007) and Statman and Glushkov (2009) methodologies a lack of a weakness is considered a strength, Manescu recognizes that it is important to normalize strengths and weaknesses separately. Unlike prior studies, Manescu does not exclude any categories and uses all seven categories. She found that only community relations scores have statistically significant abnormal positive returns during the 1992–2008 period.
Let us take a new look at SRI/ESG investing with the CTEF variable, featured in chapters 14 and 15, as the stock selection model and the KLD social criteria for the 2000–2014 time period. The appendix draws heavily from Geczy Guerard, and Samonov (2020).
From Raw to Normalized Score Definitions
Inside each of the seven subcategories (governance, community, diversity, employee relations, environment, human rights, and product safety), KLD provides binary ratings on multiple individual measures of strengths and concerns criteria. For each of the seven categories and for each company in each year, the category raw net score is the sum of category strengths minus sum of category weaknesses. The total raw net score is the sum of the strengths across all categories minus the sum of all the weaknesses across all categories. There is total net score only if both strength and weakness exist. If strengths or weaknesses are missing entirely, the net score is NaN for that company in that year.
Category score: For each subcategory, for each company in each year, we first normalize strengths (weaknesses) by dividing the sum of strengths (weaknesses) Booleans by the concurrent dimension of strengths (weaknesses), where the dimension is defined as the number of evaluated variables in each category during each year As the number of evaluated variables varies over time and differs across ESG categories, doing this adjustment normalizes the data and allows for cleaner review of company’s ESG information rather than KLD’s evaluation methodology.
For example, for the strengths of the diversity category, Company A in year 1 is measured by two variables (dimension is 2); its scores are 1 and 1 respectively; hence, its normalized strength score is 1 + 1/2 = 1, while its raw strength score is 2 Then, let’s say that next year, Company B has three evaluated diversity strengths criteria with scores of 1, 1, and 0. While the raw score would be the same (2), the normalized score in year 2 is 2/3. This normalization captures the fact that in year 2, the company could have earned a maximum rating of 3, while in year 1, the maximum rating was 2. Manescu (2011) points out that because the dimensions of each sub-strength and sub-weakness are different, this is important.
Next, we deduct normalized weaknesses from the normalized strengths to get the category-normalized net scores. We then subtract the corresponding industry average-normalized net score from each company’s normalized score, making scores industry-neutralized. Industries are defined by the ten Fama-French sectors (from Kenneth French’s website16). By subtracting the average industry score, we neutralize large industry biases present in ESG rankings, which remain even after the normalization. The energy sector has the lowest average scores, while consumer non-durables have the highest.
Total score: Each subcategory-normalized and industry-neutralized score is further ranked from 1 to 100, and then the total score is calculated as the average of all subcategory scores. If one or more subcategory scores is missing, the total score equals the average of all other subcategory scores. Creating a percentile rank in order to combine across subcategories is important, because combining even the normalized scores across seven subcategories is not balanced—as can be seen in panel D of Table 1, the min/max distributions of each score are not even, which would result in varying weightings of each category. Ordinal rankings create an equal contribution to the final score from each subcategory. We suggest that this adjustment process should be the new default, a more robust starting point, for reviewing the ESG information content of the KLD data, making it more comparable over time and across categories, with equal representation of various ESG issues. If materiality or other reasons are required for varying weightings across ESG issues, these methodologies can be described and tested explicitly and comparted against the robust equally weighted normalized benchmark.
Fama- and French-Weighted Portfolios
For what we term “simply weighted portfolios,” we construct both equally weighted and capitalization-weighted portfolios from the normalized KLD rankings by splitting up all the stocks in the KLD universe into high and low groups every year. As the normalized KLD scores range from 1 to 100, the high group holds all the companies with scores equal or greater than 50 and the low group contains the rest. Long-short portfolios are also formed by investing in the high vs low group, and the spread return is analyzed. In addition, we create a version of the overall portfolio that excludes controversial companies, the traditionally labeled as “sin stocks.” Portfolios are rebalanced annually. The net score is lagged by 3 months to accommodate KLD’s update timing after year-end. Although portfolios are constructed annually, we measure returns on a monthly frequency. The monthly spread portfolio returns and associated factor regressions are presented in Table 1.
The total score as well as several subcategory long-short portfolios have a positive return, although economically small. For example, the capitalization-weighted long-short portfolio that is based on the total score and excludes the controversial companies returns 2.9% per year, statistically significant at the 5% confidence level. The return remains positive in the more recent sample from 2004 to 2015, although lower than in the early history. The human rights capitalization-weighted long-short portfolio has the highest spread of 4.68% per year, which is also statistically significant. The environment long-short portfolio has the lowest return of −0.42% that is not statistically significant. The factor loadings of these normalized and neutralized scores appear understandably different and more neutered from the traditionally reported raw scores because many of the database biases have been restored, taking out implied factor tilts. For example, the overall spread portfolio that excludes controversial companies has a positive loading on value, momentum, and size Fama and French (1992, 1995, 2008, 2016) factors, with a positive intercept, although the R-squared is less than 4%. Importantly, these factor loadings do not appear stable across portfolio weighting schemes and time and hence should not be interpreted as definitive. Instead, these regressions validate the effectiveness of normalization and support a weak positive intercept. In summary, our simple portfolio return analysis strongly supports the no-cost argument associated with the ESG rankings. In addition, it weakly supports some positive association with return in the simple portfolio settings.
The monthly spread portfolio returns and associated factor regressions are presented in Table 5
KLD ratings ranked long-short portfolios’ different periods
The Environment long-short portfolio has the lowest return of −0.42% that is not statistically significant. The factor loadings of these normalized and neutralized scores appear understandably different and more neutered from the traditionally reported raw scores because many of the database biases have been restored, taking out implied factor tilts. For example, the overall spread portfolio that excludes controversial companies has positive loading on Value, Momentum, and Size Fama-French factors, with a positive intercept, although the R-squared is less than 4%. Importantly, these factor loadings do not appear stable across portfolio weighting schemes and time and hence should not be interpreted as definitive. Instead, these regressions validate the effectiveness of normalization and support a weak positive intercept. In summary, our simple portfolio return analysis strongly supports the no-cost argument associated with the ESG rankings. In addition, it weakly supports some positive association with return in the simple portfolio settings.
A Return to Optimized Portfolio Construction and Management
The majority of academic work that measures the costs of ESG investing focuses on the simply weighted portfolios, yet in practice, ESG criteria are often applied alongside expected return and risk models as well as some form of optimized settings. By introducing a realistic multifactor expected return and risk model, along with optimization settings, we can gain deeper insights into the potential costs of ESG investing as the various elements of portfolio construction interact in a more practical setting.
In an optimized setting, we build optimal portfolios using a multifactor expected return model and the APT risk model and optimizer and the Axioma statistical risk model and optimizer as we discussed in chapter “Multifactor Risk Models and Portfolio Construction and Management.” We introduce the underlying stock selection model in the coming section, and combined with a KLD-based social score forecasted return, we can build mean-variance tracking error at risk (MVTaR) portfolios such that ESG/SRI values can be effectively brought into the portfolio construction analysis.
In an updated SRI/ESG analysis, Geczy, Guerard, and Samonov (2020) constructed expected returns using the earnings forecasting variable CTEF, the I/B/E/S consensus-based composite variable composed of forecasted earnings yield, earnings revisions, and earnings breadth that we discussed in chapter “Risk and Return of Equity and the Capital Asset Pricing Model.” CTEF is the public form of the McKinley Capital Management (MCM) variable representing forecasted earnings acceleration, a key variable in our stock selection model. CTEF passes the robustness tests of portfolio construction and transaction cost management tests of statistical significance.
One can create US portfolios with the CTEF model for the Russell 3000 stocks.17 Guerard Jr. (1997a, 1997b) found that portfolios of US stocks using KLD screens for sin, nuclear, military, and environmental criteria did not significantly underperform unscreened portfolios of US stocks using a statistically based stock selection model. In this section, we replicate the “no cost to being socially responsible in investing” test. Guerard Jr. (1997a, 1997b) treated all KLD criteria as being equally weighted. There is an 8% monthly turnover constraint, a 4% upper bound on stock weight, and a 35 basis point threshold (minimum) stock weight upon initiation, builds upon Markowitz (1952) and Guerard (1997a), and is consistent with Guerard Jr. et al. (2014). The CTEF is simulated for the Russell 3000 stocks using the APT MVTaR portfolio optimization procedure, and its results are reported in Table 6. We replicate Guerard Jr. (1997a) by creating portfolios using a composite score composed of CTEF, 80%, and 20% the KLD raw concerns, for January 2000 to December 2014. The CTEF model produces statistically significant active returns for the 2000–2014 period. The majority of the 924 basis points of total active returns (t = 3.33) is composed of asset selection: 501 basis points (t = 2.84). The APT MVTaR CTEF XE “CTEF” model total active returns and asset selection portfolios are highly statistically significant. If one creates a composite score composed of CTEF, 80%, and 20% the KLD raw concerns (the KLD concern score = 0 if there is a concern in any of the subcategories), then specific returns (stock selection) asset selection fall for the corporate governance (ECGOV), environmental (EENV), community rights (ECOM), diversity (EDIV), and product (EPRO) KLD criteria from the CTEF specific return of 5.01%. We found that all KLD concerns have costs, except human rights and the total KLD criteria. The introduction of the human rights (EHUM) concern, composed primarily of indigenous people relations and labor strength, and total KLD criteria (ETOTAL) enhances stock selection. Corporate governance, primarily corporate compensation, reduces active returns by 130 basis points; the environment (ENV) concern costs 159 active return basis points; and the product (PRO) concern costs 110 basis points. The CTEF variable produces highly statistically significant active returns and specific returns in the Axioma attribution model with and without the SRI concerns (see Table 2). There continues to be no statistically significant costs to be an SRI investor in the United States associated with total KLD variable concern variables for January 2000 to December 2014, if one tests for equally weighted KLD criteria in an MVTaR model.
CTEF MVTaR with Normalized KLD Criteria
We use the normalized KLD criteria inside the 80/20 CTEF/KLD composite expected return model as an input into the Markowitz MVTaR optimization system, for the 2000–2014 time period. The CTEF model combined with total KLD variables and the KLD human rights criteria in the Russell 3000 analysis produces higher portfolio stock selection, as measured by Axioma specific returns, confirming the simply weighted portfolio results (see Table 6
). We produce a Zephyr report that uses the ITG transaction cost model, and we report several major results for Russell 3000 stocks. First, the composite models of CTEF and KLD social criteria substantially reduce tracking errors during the 2000–2014 time period and increase the portfolio Information Ratios. Second, human rights and total KLD criteria enhance annualized excess returns considering ITG transactions costs. Third, the composite models of CTEF and KLD social criteria substantially reduce portfolio standard deviations during the 2000–2014 time period and increase the portfolio Sharpe Ratios (see Table 6
). It is absolutely necessary to have a portfolio expected return model that is statistically significant for SRI/ESG integration to enhance the risk - return tradeoff. Thus, a socially responsible investor need not have to bear a lower Information Ratio and Sharpe Ratio simply, to “do good while doing good.” An SRI investor can have his/her cake and eat it too!
APT risk model and optimizer and ITG transaction cost
Analysis of socially responsible investment portfolios, since the introduction of the social investing discipline, has focused on the expected costs of the constraint, but also on the possibility of portfolio design, incorporating social responsibility factors, within which the expected return could be the same or greater than a portfolio without the SRI factors. We found that a number of important investment screens based on KLD social investment variables do not cost investors in our portfolio analyses. Stocks with good KLD human rights, diversity, and total KLD concern do not cost investors holding risk constant and the KLD total social criteria, human rights, and diversity complement CTEF to enhance active returns and specific returns in the Russell 3000 and KLD-only universes for human rights and total KLD criteria. Apparently, it may be possible to have one’s cake and eat it too in several KLD-based universes.
KLD STATS (Statistical Tool for Analyzing Trends in Social and Environmental Performance) is a dataset with annual snapshots of the environmental, social, and governance performance of companies rated by KLD Research and Analytics, Inc. KLD STATS is now sold and serviced by RiskMetrics Group, RMG. KLD covered 650 stocks in annual spreadsheets from 1991 to 2000; 1100 stocks in 2001–2002; and 3100 stocks from 2003.
Strength and Concern (Positive and Negative Indicator) Ratings
RMG covers approximately 80 indicators in seven major qualitative issue areas including community, corporate governance, diversity, employee relations, environment, human rights, and product.
RMG also provides information for involvement in the following controversial business issues: alcohol, gambling, firearms, military, nuclear power, and tobacco KLD STATS presents a binary summary of positive and negative ESG ratings. In each case, if RMG assigned a rating in a particular issue (either positive or negative), this is indicated with a 1 in the corresponding cell. If the company did not have a strength or concern in that issue, this is indicated with a 0.
KLD STATS data is organized by year. Each year, RiskMetrics takes a snapshot of its ratings and index membership to reflect the data at the calendar year-end. Each spreadsheet contains identifying information about the company, index membership, a listing of positive and negative ratings, involvement in controversial business issues, and total counts for each area.
Additionally, at the end of each spreadsheet is a summary count of all strengths and concerns the company received in a general category (either qualitative issue area or controversial business issue) in that year.
ESG Ratings’ Definitions
Qualitative Issue Areas
The company’s chief executive officer is a woman or a member of a minority group.
The company has made notable progress in the promotion of women and minorities, particularly to line positions with profit-and-loss responsibilities in the corporation.
Board of Directors (DIV-str-C)
Women, minorities, and/or the disabled hold four seats or more (with no double counting) on the board of directors, or one-third or more of the board seats if the board numbers are less than 12.
Work/Life Benefits (DIV-str-D)
The company has outstanding employee benefits or other programs addressing work/life concerns, e.g., childcare, elder care, or flextime. In 2005, KLD renamed this strength from family benefits strength.
Women and Minority Contracting (DIV-str-E)
The company does at least 5% of its subcontracting, or otherwise has a demonstrably strong record on purchasing or contracting, with women- and/or minority-owned businesses.
Employment of the Disabled (DIV-str-F)
The company has implemented innovative hiring programs; other innovative human resource programs for the disabled, or otherwise, have a superior reputation as an employer of the disabled.
Gay and Lesbian Policies (DIV-str-G)
The company has implemented notably progressive policies toward its gay and lesbian employees. In particular, it provides benefits to the domestic partners of its employees.
The company has either paid substantial fines or civil penalties as a result of affirmative action controversies or has otherwise been involved in major controversies related to affirmative action issues.
The company has no women on its board of directors or among its senior line managers.
Human Rights (HUM-)
Positive Record in South Africa (HUM-str-A)
The company’s social record in South Africa is noteworthy.
Indigenous Peoples Relations Strength (HUM-str-D)
The company has established relations with indigenous peoples near its proposed or current operations (either in or outside the United States) that respect the sovereignty, land, culture, human rights, and intellectual property of indigenous peoples.
Labor Rights Strength (HUM-str-G)
The company has outstanding transparency on overseas sourcing disclosure and monitoring, or has particularly good union relations outside the United States, or has undertaken labor rights-related initiatives that KLD considers outstanding or innovative.
Other Strength (HUM-str-X)
The company has undertaken exceptional human rights initiatives, including outstanding transparency or disclosure on human rights issues or has otherwise shown industry leadership on human rights issues not covered by other KLD human rights ratings.
South Africa (HUM-con-A)
The company faced controversies over its operations in South Africa.
Northern Ireland (HUM-con-B)
The company has operations in Northern Ireland.
Burma Concern (HUM-con-C)
The company has operations or direct investment in, or sourcing from, Burma.
The company’s operations in Mexico have had major recent controversies, especially those related to the treatment of employees or degradation of the environment.
Labor Rights Concern (HUM-con-F)
The company’s operations have had major recent controversies primarily related to labor standards in its supply chain.
Indigenous Peoples Relations Concern (HUM-con-G)
The company has been involved in serious controversies with indigenous peoples (either in or outside the United States) that indicate the company has not respected the sovereignty, land, culture, human rights, and intellectual property of indigenous peoples.