P. Jonathon Phillips 16 Carina A. Hahn 17 Peter C. Fontana 18 Information Access Division 19 Information Technology Laboratory 20 David A. Broniatowski 21 Information Technology Laboratory 22 Mark A. Przybocki 23 Information Access Division 24 Information Technology Laboratory 25 This draft publication is available free of charge from: 26 https://doi.org/10.6028/NIST.IR.8312-draft 27 August 2020 28 29 U.S. Department of Commerce 30 Wilbur L. Ross, Jr., Secretary 31 National Institute of Standards and Technology 32 Walter Copan, NIST Director and Undersecretary of Commerce for Standards and Technology 33 National Institute of Standards and Technology Interagency or Internal Report 8312 34 24 pages (August 2020) 35 This draft publication is available free of charge from: 36 https://doi.org/10.6028/NIST.IR.8312-draft 37 Certain commercial entities, equipment, or materials may be identified in this document in 38 order to describe an experimental procedure or concept adequately. Such identification is 39 not intended to imply recommendation or endorsement by the National Institute of 40 Standards and Technology, nor is it intended to imply that the entities, materials, or 41 equipment are necessarily the best available for the purpose. 42 Public comment period: August 17, 2020 through October 15, 2020 43 National Institute of Standards and Technology 44 100 Bureau Drive (Mail Stop 8940) Gaithersburg, Maryland 20899-2000 45 Email: explainable-AI@nist.gov 46 All comments will be made public and are subject to release under the Freedom of 47 Information Act (FOIA). 48 Additional information on submitting comments can be found at 49 https://www.nist.gov/topics/artificial-intelligence/ai-foundational-research-explainability. 50 Trademark Information 51 All trademarks and registered trademarks belong to their respective organizations. 52 Call for Patent Claims 53 This public review includes a call for information on essential patent claims (claims 54 whose use would be required for compliance with the guidance or requirements in this In55 formation Technology Laboratory (ITL) draft publication). Such guidance and/or require56 ments may be directly stated in this ITL Publication or by reference to another publication. 57 This call also includes disclosure, where known, of the existence of pending U.S. or foreign 58 patent applications relating to this ITL draft publication and of any relevant unexpired U.S. 59 or foreign patents. 60 ITL may require from the patent holder, or a party authorized to make assurances on its 61 behalf, in written or electronic form, either: 62 a) assurance in the form of a general disclaimer to the effect that such party does not hold 63 and does not currently intend holding any essential patent claim(s); or 64 b) assurance that a license to such essential patent claim(s) will be made available to appli65 cants desiring to utilize the license for the purpose of complying with the guidance 66 or requirements in this ITL draft publication either: 67 i. under reasonable terms and conditions that are demonstrably free of any unfair 68 discrimination; or 69 ii. without compensation and under reasonable terms and conditions that are demon70 strably free of any unfair discrimination. 71 Such assurance shall indicate that the patent holder (or third party authorized to make assur72 ances on its behalf) will include in any documents transferring ownership of patents subject 73 to the assurance, provisions sufficient to ensure that the commitments in the assurance are 74 binding on the transferee, and that the transferee will similarly include appropriate provi75 sions in the event of future transfers with the goal of binding each successor-in-interest. 76 The assurance shall also indicate that it is intended to be binding on successors-in77 interest regardless of whether such provisions are included in the relevant transfer docu78 ments. 79 Such statements should be addressed to: explainable-AI@nist.gov 80 Abstract 81 We introduce four principles for explainable artificial intelligence (AI) that comprise the 82 fundamental properties for explainable AI systems. They were developed to encompass 83 the multidisciplinary nature of explainable AI, including the fields of computer science, 84 engineering, and psychology. Because one size fits all explanations do not exist, different 85 users will require different types of explanations. We present five categories of explanation 86 and summarize theories of explainable AI. We give an overview of the algorithms in the 87 field that cover the major classes of explainable algorithms. As a baseline comparison, we 88 assess how well explanations provided by people follow our four principles. This assess89 ment provides insights to the challenges of designing explainable AI systems. 90 91 Key words 92 Artificial Intelligence (AI); explainable AI; trustworthy AI. i 93 Table of Contents 94 1 Introduction 1 95 2 Four Principles of Explainable AI 1 96 2.1 Explanation 2 97 2.2 Meaningful 2 98 2.3 Explanation Accuracy 3 99 2.4 Knowledge Limits 4 100 3 Types of Explanations 4 101 4 Overview of principles in the literature 6 102 5 Overview of Explainable AI Algorithms 7 103 5.1 Self-Explainable Models 9 104 5.2 Global Explainable AI Algorithms 10 105 5.3 Per-Decision Explainable AI Algorithms 11 106 5.4 Adversarial Attacks on Explainability 12 107 6 Humans as a Comparison Group for Explainable AI 12 108 6.1 Explanation 13 109 6.2 Meaningful 13 110 6.3 Explanation Accuracy 14 111 6.4 Knowledge Limits 15 112 7 Discussion and Conclusions 16 113 References 17 114 List of Figures 115 Fig. 1 This figure shows length of response time versus explanation detail. We 116 populate the figure with four illustrative cases: emergency weather alert, 117 loan application, audit of a system, and debugging a system. 6 ii 118 1. Introduction 119 With recent advances in artificial intelligence (AI), AI systems have become components of 120 high-stakes decision processes. The nature of these decisions has spurred a drive to create 121 algorithms, methods, and techniques to accompany outputs from AI systems with expla122 nations. This drive is motivated in part by laws and regulations which state that decisions, 123 including those from automated systems, provide information about the logic behind those 124 decisions1 and the desire to create trustworthy AI [30, 76, 89]. 125 Based on these calls for explainable systems , it can be assumed that the failure to 126 articulate the rationale for an answer can affect the level of trust users will grant that system. 127 Suspicions that the system is biased or unfair can raise concerns about harm to oneself 128 and to society . This may slow societal acceptance and adoption of the technology, 129 as members of the general public oftentimes place the burden of meeting societal goals 130 on manufacturers and programmers themselves [27, 102]. Therefore, in terms of societal 131 acceptance and trust, developers of AI systems may need to consider that multiple attributes 132 of an AI system can influence public perception of the system. 133 Explainable AI is one of several properties that characterize trust in AI systems [83, 92]. 134 Other properties include resiliency, reliability, bias, and accountability. Usually, these terms 135 are not defined in isolation, but as a part or set of principles or pillars. The definitions vary 136 by author, and they focus on the norms that society expects AI systems to follow. For this 137 paper, we state four principles encompassing the core concepts of explainable AI. These 138 are informed by research from the fields of computer science, engineering, and psychology. 139 In considering aspects across these fields, this report provides a set of contributions. First, 140 we articulate the four principles of explainable AI. From a computer science perspective, 141 we place existing explainable AI algorithms and systems into the context of these four prin142 ciples. From a psychological perspective, we investigate how well people’s explanations 143 follow our four principles. This provides a baseline comparison for progress in explainable 144 AI. 145 Although these principles may affect the methods in which algorithms operate to meet 146 explainable AI goals, the focus of the concepts is not algorithmic methods or computations 147 themselves. Rather, we outline a set of principles that organize and review existing work in 148 explainable AI and guide future research directions for the field. These principles support 149 the foundation of policy considerations, safety, acceptance by society, and other aspects of 150 AI technology. 151 2. Four Principles of Explainable AI 152 We present four fundamental principles for explainable AI systems. These principles are 153 heavily influenced by considering the AI system’s interaction with the human recipient of 154 the information. The requirements of the given situation, the task at hand, and the consumer 1The Fair Credit Reporting Act (FCRA) and the European Union (E.U.) General Data Protection Regulation (GDPR) Article 13. 1 155 will all influence the type of explanation deemed appropriate for the situation. These situa156 tions can include, but are not limited to, regulator and legal requirements, quality control of 157 an AI system, and customer relations. Our four principles are intended to capture a broad 158 set of motivations, reasons, and perspectives. 159 Before proceeding with the principles, we need to define a key term, the output of an AI 160 system. The output is the result of a query to an AI system. The output of a system varies by 161 task. A loan application is an example where the output is a decision: approved or denied. 162 For a recommendation system, the output could be a list of recommended movies. For a 163 grammar checking system, the output is grammatical errors and recommended corrections. 164 Briefly, our four principles of explainable AI are: 165 Explanation: Systems deliver accompanying evidence or reason(s) for all outputs. 166 Meaningful: Systems provide explanations that are understandable to individual users. 167 Explanation Accuracy: The explanation correctly reflects the system’s process for gen168 erating the output. 169 Knowledge Limits: The system only operates under conditions for which it was designed 170 or when the system reaches a sufficient confidence in its output. 171 These are defined and contextualized in more detail below. 172 2.1 Explanation 173 The Explanation principle obligates AI systems to supply evidence, support, or reasoning 174 for each output. By itself, this principle does not require that the evidence be correct, infor175 mative, or intelligible; it merely states that a system is capable of providing an explanation. 176 A body of ongoing work currently seeks to develop and validate explainable AI methods. 177 An overview of these efforts is provided in Section 5. A variety of strategies and tools 178 are currently being deployed and developed. This principle does not impose any metric 179 of quality on those explanations. The Meaningful and Explanation Accuracy principles 180 provide a framework for evaluating explanations. 181 2.2 Meaningful 182 A system fulfills the Meaningful principle if the recipient understands the system’s ex183 planations. Generally, this principle is fulfilled if a user can understand the explanation, 184 and/or it is useful to complete a task. This principle does not imply that the explanation is 185 one size fits all. Multiple groups of users for a system may require different explanations. 186 The Meaningful principle allows for explanations which are tailored to each of the user 187 groups. Groups may be defined broadly as the developers of a system vs. end-users of a 188 system; lawyers/judges vs. juries; etc. The goals and desiderata for these groups may vary. 189 For example, what is meaningful to a forensic practitioner may be different than what is 190 meaningful to a juror . 2 191 This principle also allows for tailored explanations at the level of the individual. Two 192 humans viewing the same AI system’s output will not necessarily interpret it the same way 193 for a variety of reasons. One reason is that a person’s prior knowledge and experiences in194 fluence their decisions . Another reason is that psychological differences among people 195 may influence how they interpret an explanation and what type of explanations they find 196 meaningful [10, 61]. Thus, different users may take different meanings from identical AI 197 explanations. The tailoring of an explanation to user groups and individuals may not be 198 static over time. As people gain experience with a task, what they consider a meaningful 199 explanation will likely change [10, 35, 57, 72, 73]. Therefore, meaningfulness is influ200 enced by a combination of the AI system’s explanation and a person’s prior knowledge, 201 experiences, and mental processes. 202 All of the factors that influence meaningfulness contribute to the difficulty in model203 ing the interface between AI and humans. Developing systems that produce meaningful 204 explanations need to account for both computational and human factors [22, 58]. 205 2.3 Explanation Accuracy 206 Together, the Explanation and Meaningful principles only call for a system to produce ex207 planations that are meaningful to a user community. These two principles do not require 208 that a system delivers an explanation that correctly reflects a system’s process for gen209 erating its output. The Explanation Accuracy principle imposes accuracy on a system’s 210 explanations. 211 Explanation accuracy is a distinct concept from decision accuracy. For decision tasks, 212 decision accuracy refers to whether the system’s judgment is correct or incorrect. Re213 gardless of the system’s decision accuracy, the corresponding explanation may or may not 214 accurately describe how the system came to its conclusion. Researchers in AI have de215 veloped standard measures of algorithm and system accuracy [13, 18, 33, 64–66, 71, 79]. 216 While there exist these established decision accuracy metrics, researchers are in the process 217 of developing performance metrics for explanation accuracy [2, 16, 97]. 218 Similarly to the Meaningful principle, this principle allows for different explanation 219 accuracy metrics for different groups and individuals. Some users will require simple ex220 planations that succinctly focus on the critical point(s) but lack nuances that are necessary 221 to completely characterize the algorithm’s process for generating its output. However, 222 these nuances may only be meaningful to experts. This highlights the point that explana223 tion accuracy and meaningfulness need not overlap. A detailed explanation may be highly 224 accurate but sacrifice how meaningful it is to certain audiences. Overall, a system may 225 be considered more explainable if it can generate more than one type of of explanation. 226 Because of these different levels of explanation, the metrics used to evaluate the accuracy 227 of an explanation may not be universal or absolute. 3 228 2.4 Knowledge Limits 229 The previous principles implicitly assume that a system is operating within its knowledge 230 limits. This Knowledge Limits principle states that systems identify cases they were not 231 designed or approved to operate, or their answers are not reliable. By identifying and 232 declaring knowledge limits, this practice safeguards answers so that a judgment is not pro233 vided when it may be inappropriate to do so. The Knowledge Limits Principle can increase 234 trust in a system by preventing misleading, dangerous, or unjust decisions or outputs. 235 There are two ways a system can reach its knowledge limits. First, the question can be 236 outside the domain of the system. For example, in a system built to classify bird species, a 237 user may input an image of an apple. The system could return an answer to indicate that it 238 could not find any birds in the input image; therefore, the system cannot provide an answer. 239 This is both an answer and an explanation. In the second way a knowledge limit can be 240 reached, the confidence of the most likely answer may be too low, depending on an internal 241 confidence threshold. For example, for a bird classification system, the input image of a 242 bird may be too blurry to determine its species. In this case, the system may recognize that 243 the image is of a bird, but that the image is of low quality. An example output may be: “I 244 found a bird in the image, but the image quality is too low to identify it.” 245 3. Types of Explanations 246 Explanations will vary depending on their consumer. Some explanations will be simple, 247 while others will be detailed and could require training or expertise to fully understand. To 248 illustrate the range of explanation, we describe five categories of explanations that build on 249 the work in the literature [6, 26, 98]. The categories described below were not designed to 250 be exhaustive. 251 User benefit: This type of explanation is designed to inform a user about an output. For 252 example, the explanation could provide the reason a loan application was approved 253 or denied to the applicant. 254 Societal acceptance: This type of explanation is designed to generate trust and acceptance 255 by society. For example, if an unexpected output is provided by the system, the 256 explanation may help users understand why this output was generated. It may also 257 provide an increased sense of comfort in the system if the rationale can be provided 258 (e.g., ). 259 Regulatory and compliance: This type of explanation assists with audits for compliance 260 with regulations, safety standards, etc. The audience of the explanation may include 261 a user who requires significant detail (e.g., a safety regulator) and the user interacting 262 with the system (e.g., a developer). Examples may include the developer or auditor 263 of a self-driving car. This may also include explanations to evaluate the output of a 264 forensic examination after an airplane crash.
System development: This type of explanation assists or facilitates developing, improv266 ing, debugging, and maintaining of an AI algorithm or system. Consumers of this 267 category includes technical staff, product managers, and executives. This category 268 includes the users requiring significant detail and users interacting with the system. 269 For example, this may include the technical staff debugging a vision algorithm with 270 a Gradient-Weighted Class Activation Mapping (GRAD-CAM) based tool . 271 Owner benefit: This type of explanation benefits the operator of a system. An example 272 is a recommendation system that lists movies or videos to watch and explains the 273 selection based on previous viewed items. A system recommends a movie and ex274 plains this choice by stating “here is a movie to watch because you liked these other 275 movies.” If the user trusts the explanation, the owner benefits because that person 276 continues watching movies on their service. 277 Categories of this nature are also discussed in more detail in Bhatt et al. , Hall et al. 278 , Weller . Bhatt et al.  mentions in their use cases that the explanations are 279 usually used by the algorithm developers to debug the models. Bhatt et al.  interviews 280 30 individuals on how their organizations use explainable AI. They use explainable AI in 281 a variety of applications, including object detection and sentiment analysis. Hall et al. 282  proposes best practices on how to use explainable AI algorithms. They summarize 283 their recommendations into implementation guidelines: design explanations to enable un284 derstanding, learn how explainable AI can be exploited for nefarious purposes, augment 285 surrogate models with direct explanations, and for high-stakes decisions, provided expla286 nations must be highly interpretable. In Caruana et al. , the authors developed an 287 explainable AI model and used it to both determine and explain pneumonia risk in a patient 288 data set and 30-day readmission risk in another patient data set. 289 From a practical perspective, explanations can be characterized by the amount of time 290 the consumer of the explanation has to respond to the information and the level of detail in 291 an explanation. Figure 1 captures the relationship between time requirements and explana292 tion detail. The horizontal axis represents the time requirement a user has to respond to a 293 situation. The time requirement axis addresses situations ranging from those that require 294 immediate responses to those that permit a longer evaluation. The vertical axis represents 295 the level of detail in the explanation. This axis addresses situations related to the level of 296 detail the consumer or user will require. At one end of the explanation, an explanation 297 is not required or a simple explanation will be sufficient. For example, in response to an 298 emergency weather alert, the consumer must act immediately, and the explanation needs to 299 be simple and straightforward. A current weather alert from the National Weather Service, 300 “Tornado Warning: Take Action!”2, operates as both an alert and a simple explanation. The 301 alert is to “Take Action” with the simple explanation of “Tornado Warning.” Explanations 302 for debugging could fall at the other end of the time requirement and level of detail spec303 trum. The explanation could include information on the internal steps of a system, and it 2https://www.weather.gov/safety/tornado-ww 5 304 could take the audience time to examine the explanation and decide on their next actions. 305 Two additional examples were placed on Figure 1: loan applications and audit of a sys306 tem. The response to a loan application is generally quick and the explanation provides 307 greater detail than a weather alert. The response time and explanation detail for an audit of 308 a system could be similar to debugging a system. Immediate response Longer term response Level of detail Simple explanation Detailed explanation 1 Emergency weather alert 3 Audit of a system 2 Loan applicant 4 Debugging performance of a system Time requirement Fig. 1. This figure shows length of response time versus explanation detail. We populate the figure with four illustrative cases: emergency weather alert, loan application, audit of a system, and debugging a system. 309 Explanations will need to fulfill a variety of requirements and needs, which will depend 310 on the tasks and users. The five categories of explanations illustrate the range and types of 311 explanations and points to the need for flexibility in addressing the scope of systems that 312 require explanations. 313 4. Overview of principles in the literature 314 Theories and properties of explainable AI have been discussed from different perspectives, 315 with commonalities and differences across these points of view [16, 22, 53, 77, 78, 98]. 316 Lipton  divides explainable techniques into two broad categories: transparent and 317 post-hoc interpretability. Lipton  defines a transparent explanation as reflecting to some 318 degree how a system came to its output. A subclass is simulatability, which requires that 6 319 a person can grasp the entire model. This implies that explanations will reflect the inner 320 workings of a system. Their post-hoc explanations “often do not elucidate precisely how 321 a model works, they may nonetheless confer useful information for practitioners and end 322 users of machine learning.” For example, the bird is a cardinal because it is similar to 323 cardinals in the training set. 324 Rudin  and Rudin and Radin  argue that models for high-stakes decision must 325 provide explanations that reveal their inner workings. They claim that deep neural networks 326 are inherently black-boxes and should be avoided for high-stakes decisions. 327 Wachter et al.  argue that explanations do need to meet the explanation accuracy 328 property. They claim that counterfactual explanations are sufficient. “A counterfactual ex329 planation of a prediction describes the smallest change to the feature values that changes the 330 prediction to a predefined output ;” e.g., if you had arrived to the platform 15 minutes 331 earlier, you would have caught the train. Counterfactual explanations do not necessarily 332 reveal the inner workings of a system. This property allows counterfactual explanations to 333 protect intellectual property. 334 Gilpin et al.  defines a set of concepts for explainable AI and provides an outline 335 of current approaches. In their survey, Gilpin et al.  take a similar stance to Rudin  336 and Rudin and Radin  in their set of “foundational concepts” for explainability. Similar 337 to the meaningful and explanation accuracy principles in our current work, Gilpin et al. 338  propose that explanations should allow for a trade-off between their interpretability 339 and completeness. However, they state that trade-offs must not obscure key limitations of 340 a system. 341 Doshi-Velez and Kim  address the critical question: measuring if explanations are 342 meaningful for users or consumers. They present a framework for a science to measure the 343 efficiency of explanations. This paper discusses factors that are required to begin testing 344 interpretability of explainable systems. This highlights the gap between these principles as 345 a concept and creating metrics and evaluation methods. 346 Across these viewpoints, there exist both commonalities and disagreement. Similar to 347 our four principles, commonalities include concepts which distinguish between the exis348 tence of an explanation, how meaningful it is, and how accurate or complete it is. Although 349 disagreements remain, these perspectives provide guidance for development of explainable 350 systems. A key disagreement between philosophies is the relative importance of explana351 tion meaningfulness and accuracy. These disagreements highlight the difficulty in balanc352 ing multiple principles simultaneously. Context of the application, community and user 353 requirements, and the specific task will drive the importance of each principle. 354 5. Overview of Explainable AI Algorithms 355 Researchers have developed different algorithms to explain AI systems. Sometimes, the 356 algorithms themselves provide the explanation (satisfying Principle 1). The most common 357 of these explanations are self-explainable models, where the models themselves are the 358 provided explanation. These models are self-explaining algorithms, where viewing and 7 359 querying the models provide an explanation. We describe these algorithms in Section 5.1. 360 There are algorithms that provide explanations for themselves without directly providing 361 the model details. One such example is Class Activation Mappings (CAM) , which are 362 system-specific explanations that can explain some convolutional neural networks. How363 ever, researchers generalized these algorithms so that they can not only explain the original 364 system but also explain other systems. These generalized algorithms form the next two 365 types of explanations: global explainable AI algorithms and per-decision explainable AI 366 algorithms. For instance, GRAD-CAM is a generalization of CAM that can provide the 367 explanation of CAM but to any convolutional neural network . 368 A global explanation produces a model that approximates the non-interpretable model. 369 We describe these algorithms in Section 5.2. Per-decision explanations provide a separate 370 explanation for each decision. Per-decision explanations are considered local explanations. 371 We describe per-decision explanations in Section 5.3. A particular type of per-decision ex372 planation is a counterfactual, which is an explanation saying “if the input were this 373 new input instead, the system would have made a different decision.” In these explana374 tions, although there are often many widely-differing instances that all are counterfactuals, 375 a counterfactual explanation usually provides a single instance. This means that even if 376 there are many different possible ways that the instance could be changed to result in the 377 system providing the decision, only one of those instances is provided as the explanation. 378 The hope is the instance is as similar as possible to the input with the exception that the 379 system makes a different decision. Because counterfactual explanations are per-decision 380 explanations, they are also described in Section 5.3. 381 Self-explainable models of machine learning systems themselves can be used as global 382 explanations (since the models explain themselves). Likewise, many global explanations 383 (including self-explainable models) can also be used to generate per-decision explanations. 384 The coefficient weights of the features of an input in a regression model and the flow of a 385 decision through a decision tree both serve as per-decision explanations. Models that do not 386 provide an explanation or provide an explanation that a user does not consider meaningful 387 enough will sometimes seek an explanation from an alternate algorithm, thus encouraging 388 the development of global and per-decision explanations. Furthermore, global explanations 389 are harder to generate than per-decision explanations because per-decision explanations 390 only require an understanding of a single decision. 391 With these explainable algorithms, developers wish for the explanations to be mean392 ingful to users (Principle 2). In the computer science literature this is often labelled as 393 interpretable. Often, developers self-proclaim their algorithm explanations to be mean394 ingful. However, others will use measurements such as human simulatability , which 395 measure whether a human can correctly take an input and with the model, correctly identify 396 the model’s prediction. 397 Although the explanation accuracy is important (Principle 3), it is often only measured 398 for self-explainable models. For these types of models, the model’s decision accuracy 399 (see Section 2.3) is the measure of the explanation accuracy. However, there is limited 400 research measuring explanation accuracy. Adebayo et al.  evaluate explanation accu8 401 racy of saliency pixel explanations for deep neural networks by measuring the amount the 402 explanation changes relative to how the trained models differ. 403 To our knowledge there is limited work on developing algorithms that understand their 404 knowledge limits (Principle 4) and declare when a validly-formatted data input is out of 405 the system’s scope. However, algorithms often give real-valued outputs rather than hard 406 decisions, which reflect the algorithms’ confidences in their predictions. 407 5.1 Self-Explainable Models 408 Machine Learning Algorithms include Decision Trees and Linear and Logistic Regression. 409 Although these simple models are explanations themselves, they are often not always ac410 curate, especially if used without much pre-processing. Consequently, there has been work 411 in developing more accurate models that themselves are explanations. Authors developing 412 models will often label these models as interpretable, which we refer to as meaningful. 413 Rudin  argues that using meaningful models that explain themselves are the best way 414 to produce explainable models, arguing that separately-produced explanations of black-box 415 models (or even single decisions of black-box models) may not be faithful to what the orig416 inal model computes. This claim is that explanations often have low explanation accuracy 417 if those explanations are not the models themselves. Although many sources discuss an 418 accuracy-interpretability trade-off, Rudin and Radin  disagrees, with the belief that no 419 such trade-off exists for high-stakes decisions. 420 One line of research works on producing improvements on the standard decision trees, 421 sometimes represented as a nested sequence of “if-then-else” rules, called decision lists 422 . In addition to being inaccurate, Lakkaraju et al.  claims that the nesting makes the 423 rules hard to interpret, and develops Decision Sets, which are a sequence of “if-then” rules 424 with one default “else” at the end, where each clause is a conjunction of conditions. How425 ever, Lakkaraju and Rudin  produces decision lists with improved accuracy. Lakkaraju 426 et al.  measure the interpretability of the decision sets by measuring metrics on the 427 model: the number of rules, the number of the largest rules, the overlap of the rules (how 428 many instances are classified in more than one if-then rule). The last “else” guarantees that 429 every instance is classified.  explores decision lists with at most one customized nesting 430 to further improve accuracy while still being meaningful according to their measures. Bert431 simas and Dunn  produce a variant of decision trees, called optimal classification trees, 432 that split on mixed integer constraints involving multiple variables. These trees focus on 433 preserving the meaningfulness of decision trees but greatly improving their classification 434 accuracy.  produce another variant of a more accurate decision tree, called an addi435 tive tree, that combines elements of decision tress and gradient boosting to produce more 436 accurate trees. A Bayesian variant of decision lists that was studied for meaningfulness 437 is Bayesian Rule Lists , where they add a Bayesian credible interval estimate to each 438 decision rule. Bayesian credible intervals are the Bayesian analog to confidence intervals. 439 Kuhn et al.  produces a model that tries to find combinations of features that either 440 exclude a class or specifically identify a particular class. Each set of combinations could 9 441 be viewed as a clause of a decision set rule. 442 Models, including linear models such as linear and logistic regression are considered 443 to be explanations of system decisions. One interpretation is using the weights of the coef444 ficients to indicate the importance of features. They are sometimes considered inaccurate 445 when the data is not believed to be linear. One measure of the ease of understanding of a re446 gression model is the number of non-zero coefficients. One way to encourage a regression 447 model to limit the number of features is to regularize it with the lasso, which penalizes the 448 model for using more features , incorporating a trade-off for accuracy and meaningful449 ness in the training objective function. Although this and other regularization strategies are 450 also used to prevent overfitting in many models including deep neural networks, regulariza451 tion is one technique to make models sparser, and thus believed to be more understandable. 452 Poursabzi-Sangdeh et al.  considers regression models meaningful and aims to measure 453 the value the model coefficients provide to human users trying to use the model. Caruana 454 et al.  also treats the more general class of these models, Generalized Additive Mod455 els with Pairwise Interactions (GA2M), as understandable models and applies them to a 456 healthcare case study. 457 Another self-explainable algorithm involves learning prototypes, or representative sam458 ples of each class, to better understand the algorithm. Models learn and produce prototypes. 459 With these prototypes, the model outputs the class as a weighted combination of the proto460 types. Although these prototypes do work on tabular data, Kim et al. , Li et al.  use 461 this approach for classification on image data sets. 462 5.2 Global Explainable AI Algorithms 463 Global Explainable AI Algorithms are an approach that treat the AI algorithm as a black464 box that can be queried and produce a model that explains the algorithm. Depending on 465 what the global model is, it can then be used to produce per-decision explanations. 466 One such global explainable AI Algorithms is SHAP (SHapley Additive exPlanations) 467 . SHAP provides a global per-feature importance for a regression problem by convert468 ing it to a coalitional game from game theory. In coalitional games, there are n players that 469 can team up in different ways to form coalitions and share a payoff depending on which 470 players team up (often the total payoff is largest when all players team up). After play471 ers receive a payoff, they must divide the payoff between themselves. One way to divide 472 payoffs with desirable mathematical properties is to give each player their Shapley value 473 as their individual payoff. SHAP treats the regression outputs of a system as a coalitional 474 game where the target is the payoff and each feature is a player that either participates in or 475 does not participate in the coalition with the other features for each row. SHAP then com476 putes the Shapley values for each feature, and uses those values as the feature importance 477 values. See  for more information on Shapley values and coalitional games. 478 In deep neural networks, one such global algorithm is TCAV (Testing with Concept Ac479 tivation Vectors) . TCAV wishes to explain a neural network in a more user-friendly 480 way by representing the neural network state as a linear weighting of human-friendly con10 481 cepts, called Concept Activation Vectors (CAVs). TCAV was applied to explain image 482 classification algorithms through learning CAVs including color, to see how colors influ483 enced the image classifier’s decisions. 484 Two visualizations used to provide global explanations are Partial Dependence Plots 485 (PDPs) and Individual Conditional Expectation (ICE) [60, 104]. The partial dependence 486 plot shows the marginal change of the predicted response when the feature (value of that 487 specific data column or component) changes. PDPs are useful for determining if a relation488 ship between a feature and the response is linear or more complex . The ICE curves are 489 finer-grained and show the marginal effect of the change in one feature for each instance 490 of the data. ICE curves are useful to check if the relationship visualized in the PCP is the 491 same across all ICE curves, and can help identify potential interactions. 492 5.3 Per-Decision Explainable AI Algorithms 493 Per-decision explainable AI algorithms take both a black-box model that can be queried and 494 a single decision of that model, and explain why the model made that particular decision. 495 These explanations differ from global explanations in that the explanation is not required 496 to generalize to other decisions. 497 One such algorithm is LIME (Local Interpretable Model-Agnostic Explainer) . 498 LIME takes a decision, and by querying nearby points, builds an interpretable model that 499 represents the local decision, and then uses that model to provide per-feature explanations. 500 The default model chosen is logistic regression. For images, LIME breaks each image into 501 superpixels, and then queries the model with a random search space where it varies which 502 superpixels are omitted and replaced with all black (or a color of the user’s choice). 503 Another popular type of local explanations are counterfactuals. A counterfactual expla504 nation is an alternate system input where the system’s decision on that input differs from the 505 provided input. Good counterfactuals answer the question “what is the minimum amount 506 an input would need to change for the system to change its decision on that input?” Wachter 507 et al.  measures how good counterfactual explanations are by measuring how far away 508 the counterfactual is from the original data point, measuring this distance as the Manhattan 509 distance of features after normalizing each feature by its median absolute deviation. Ustun 510 et al.  develop a counterfactual explanation of logistic (or linear) regression models. 511 Counterfactuals are represented as the amounts of specific features to change. They further 512 refine their counterfactual explanations by distinguishing which features can be changed, 513 which ones cannot, and which ones can only be changed under certain conditions. 514 An additional local explanation in Koh and Liang  takes a decision and produces 515 an estimate of the influence of each training data point on that particular decision. 516 Another popular type of local explanations for problems on image data are saliency 517 pixels. Saliency pixels color each pixel depending on how much that pixel contributes to 518 the classification decision. One of the first saliency algorithms is Class Activation Maps 519 (CAM) . A popular saliency pixel algorithm that enhanced CAM is GRAD-CAM 520 . GRAD-CAM generalized CAM so that it can explain any convolutional network. 11 521 A variety of saliency pixel explanation algorithms are compared on for their explanation 522 accuracy in Adebayo et al. . 523 5.4 Adversarial Attacks on Explainability 524 Explanation accuracy (Principle 3) is an important component of explanations. Sometimes, 525 if an explanation does not have 100 percent explanation accuracy, it can be exploited by 526 adversaries who manipulate a classifier’s output on small perturbations of an input to hide 527 the biases of a system. First, Lakkaraju and Bastani  observes that even if an expla528 nation can mimic the predictions of the black box, that this is insufficient for explanation 529 accuracy and such systems can produce explanations that may mislead users. An approach 530 to generate misleading explanations is demonstrated in Slack et al. . They do this by 531 producing a scaffolding around a given classifier that matches the classification on all in532 put data instances but changes outputs for small perturbations of input points, which can 533 obfuscate global system behavior when only queried locally.This means that if the sys534 tem is anticipating being explained by a tool such as LIME that gives similar instances to 535 training set instances as inputs, the system will develop an alternative protocol to decide 536 those instances that differs from how they will classify trials in the training and test sets. 537 This can mislead the explainer by anticipating which trials the system might be asked to 538 classify. Another similar approach is demonstrated in Aivodji et al. . They fairwash a 539 model by taking a black box model and produce an ensemble of interpretable models that 540 approximate the original model but are much fairer, which then hide the unfairness of the 541 original model. Another example of slightly perturbing a model to manipulate explanations 542 is demonstrated in Dimanov et al. . The ability for developers to cover up unfairness in 543 black-box models is one of the several vulnerabilities of explainable AI discussed in Hall 544 et al. . 545 6. Humans as a Comparison Group for Explainable AI 546 Up to this point, we have outlined core concepts of explainable AI and related work in 547 the field of computer science. However, an explainable AI system consists of both an AI 548 system and a human recipient. To effectively understand both components, and to provide 549 a benchmark for explainable AI systems, we next overview the explainability of human550 produced judgments and decisions. Independent of AI, humans operating alone also make 551 high stakes decisions with expectation that they be explainable. For example, physicians, 552 judges, lawyers, and forensic scientists make decisions that can affect large populations. 553 In these cases, a human makes the decision and provides their conclusion along with the 554 evidence supporting that conclusion as an explanation. How do these proffered explana555 tions adhere to our four principles? We focused strictly on human explanations of their 556 own judgments and decisions (e.g.,“why did you arrive at this conclusion or choice?”), not 557 of external events (e.g., “why is the sky blue?” or “why did an event occur?”). External 558 events accompanied by explanations can be helpful for human reasoning and formulating 12 559 predictions . This is consistent with a desire for explainable AI. However, as outlined 560 in what follows, human-produced explanations for their own judgments, decisions, and 561 conclusions are largely unreliable. Humans as a comparison group for explainable AI can 562 inform the development of benchmark metrics for explainable AI systems; and lead to a 563 better understanding of the dynamics of human-machine collaboration. 564 6.1 Explanation 565 This principle requires only that the system provides an explanation. In this section, we will 566 focus on whether humans produce explanations of their own judgments and decisions and 567 whether doing so is beneficial for the decision makers themselves. In Section 6.2, we will 568 discuss whether human explanations are meaningful, and in Section 6.3, we will discuss 569 the accuracy of those explanations. 570 Humans are able to produce a variety of explanation types [37, 53, 58]. However, 571 producing verbal explanations can interfere with decision and reasoning processes [80, 81, 572 100]. It is thought that as one gains expertise, the underlying processes become more 573 automatic, outside of conscious awareness, and therefore, more difficult to explain verbally 574 [17, 19, 44, 80]. This produces a similar tension which exists for AI itself: the desire for 575 high accuracy are often thought to come with reductions in explainability (however, c.f., 576 ). 577 More generally, processes which occur with limited conscious awareness can be harmed 578 by requiring the decision itself to be expressed explicitly. An example of this comes from 579 lie detection. Lie detection based on explicitly judging whether or not a person is telling 580 the truth or a lie is typically inaccurate [9, 88]. However, when judgments are provided 581 via implicit categorization tasks, therefore bypassing an explicit judgment, lie detection 582 accuracy can be improved [87, 88]. This suggests that lie detection may be a nonconscious 583 process which is interrupted when forced to be made a conscious one. 584 Together these findings suggest that some assessments from humans may be more ac585 curate when left automatic and implicit, compared to requiring an explicit judgment or 586 explanation. Human judgments and decision making can oftentimes operate as a black-box 587 , and interfering with this black-box process can be deleterious to the accuracy of a 588 decision. 589 6.2 Meaningful 590 To meet this principle, the system provides explanations that are intelligible and under591 standable. For this, we focused on the ability of humans to interpret how another human 592 arrived at a conclusion. This concept can be defined operationally as: 1) whether the au593 dience reaches the same conclusion as intended by the person providing the explanation 594 and 2) whether the audience agrees with each other on what the conclusion is, based on an 595 explanation. 596 One analogous case to explainable AI for human-to-human interaction is that of a foren597 sic scientist explaining forensic evidence to laypeople (e.g., members of a jury). Currently, 13 598 there is a gap between the ways forensic scientists report results and the understanding of 599 those results by laypeople (see Edmond et al. , Jackson et al.  for reviews). Jack600 son et al.  extensively studied the types of evidence presented to juries and the ability 601 for juries to understand that evidence. They found that most types of explanations from 602 forensic scientists are misleading or prone to confusion. Therefore, they do not meet our 603 internal criteria for being “meaningful.” A challenge for the field is learning how to improve 604 explanations, and the proposed solutions do not always have consistent outcomes . 605 Complications for producing meaningful explanations for others include people expect606 ing different explanation types, depending on the question at hand , context driving the 607 formation of opinions , and individual differences in what is considered to be a satis608 factory explanation . Therefore, what is considered meaningful varies by context and 609 across people. 610 6.3 Explanation Accuracy 611 This principle states that a system provides
6.3 Explanation Accuracy 611 This principle states that a system provides explanations which are faithful to the system’s 612 process for generating the output. For humans, this is analogous to an explanation of one’s 613 decision processes truly reflecting the mental processes behind that decision. In this sec614 tion, we focused on this aspect only. An evaluation of the quality or coherence of the 615 explanation falls outside of the scope of this principle. 616 For the type of introspection related to explanation accuracy, it is well-documented that 617 although people often report their reasoning for decisions, this does not reliably reflect 618 accurate or meaningful introspection [62, 70, 99]. This has been coined the “introspection 619 illusion”: a term to indicate that information gained by looking inward to one’s mental 620 contents is based on mistaken notions that doing so has value . People fabricate reasons 621 for their decisions, even those thought to be immutable, such as personally held opinions 622 [24, 34, 99]. In fact, people’s conscious reasoning that is able to be verbalized does not 623 seem to always occur before the expressed decision. Instead, evidence suggests that people 624 make their decision and then apply reasons for those decisions after the fact . From a 625 neuroscience perspective, neural markers of a decision can occur up to 10 seconds before 626 a person’s conscious awareness . This finding suggests that decision making processes 627 begin long before our conscious awareness. 628 People are largely unaware of their inability to introspect accurately. This is docu629 mented through studies of “choice blindness” in which people do not accurately recall their 630 prior decisions. Despite this inaccurate recollection, participants will provide reasons for 631 making selections they never, in fact, made [24, 25, 34]. For studies that do not involve 632 long-term memory, participants have also been shown to be unaware of the ways they eval633 uate perceptual judgments. For example, people are inaccurate when reporting which facial 634 features they use to determine someone’s identity [75, 93]. 635 Based on our definition of explanation accuracy, these findings do not support the idea 636 that humans reliably meet this criteria. As is the case with algorithms, human decision 637 accuracy and explanation accuracy are distinct. For numerous tasks, humans can be highly 14 638 accurate but cannot verbalize their decision process. 639 6.4 Knowledge Limits 640 This principle states that the system only operates under the conditions it was designed 641 or that a provided output may not be reliable. For this principle, we narrowed down the 642 broad field of metacognition, or thinking about one’s own thinking. Here, we focused on 643 whether humans correctly assess their own ability and accuracy, and whether they know 644 when to report that they do not know an answer. There are several ways to test whether 645 people can evaluate their own knowledge limits. One method is to ask participants to 646 predict how well they believe they performed or will perform on a task, relative to others 647 (e.g., in what percentile will their scores fall relative to other task-takers). Another way to 648 test the awareness of knowledge limits is to obtain a measure of their response confidence, 649 with higher confidence indicating that a person believes with greater likelihood that they 650 are correct. 651 As demonstrated by the well known Dunning-Kruger Effect , most people inac652 curately estimate their own ability relative to others. A similar finding is that people, in653 cluding experts, generally do not predict their own accuracy and ability well when asked 654 to explicitly estimate performance [7, 8, 12, 28, 63]. However, a recent replication of the 655 Dunning-Kruger Effect for face perception showed that, although people did not reliably 656 predict their accuracy, their ability estimates varied accordingly with the task difficulty 657 . This suggests that although the exact value (e.g., predicted performance percentile 658 relative to others, or predicted percent correct) may be erroneous, people can modulate the 659 direction of their predicted performance appropriately (e.g., knowing a task was more or 660 less difficult for them). 661 For a variety of judgments and decisions, people often know when they have made 662 errors, even in the absence of feedback . To use eyewitness testimony as a relevant 663 example: although confidence and accuracy have repeatedly shown to be weakly related 664 , a person’s confidence does predict their accuracy in the absence of “contamination” 665 through the interrogation process and extended time between the event and the time of 666 recollection . Therefore, human shortcomings in assessing their knowledge limits are 667 similar to those of producing explanations themselves. When asked explicitly to produce 668 an explanation, these explanations can interfere with more automatic processes gained by 669 expertise; they often do not accurately reflect the true cognitive processes. Likewise, as 670 outlined in this section, when people are asked to explicitly predict or estimate their ability 671 level relative to others, they are often inaccurate. However, when asked to assess their 672 confidence for a given decision vs. this explicit judgment, people can gauge their accuracy 673 at levels above chance. This suggests people do have insight into their own knowledge 674 limits, although this insight can be limited or weak in some cases. 15 675 7. Discussion and Conclusions 676 We introduced four principles to encapsulate the fundamental elements for explainable AI 677 systems. The principles provide a framework with which to address different components 678 of an explainable system. These four principles are that the system produce an explanation, 679 that the explanation be meaningful to humans, that the explanation reflects the system’s 680 processes accurately, and that the system expresses its knowledge limits. There are differ681 ent approaches and philosophies for developing and evaluating explainable AI. Computer 682 science approaches tackle the problem of explainable AI from a variety of computational 683 and graphical techniques and perspectives, which may lead to promising breakthroughs. A 684 blossoming field puts humans at the forefront when considering the effectiveness of AI ex685 planations and the human factors behind their effectiveness. Our four principles provide a 686 multidisciplinary framework with which to explore this type of human-machine interaction. 687 The practical needs of the system will influence how these principles are addressed (or 688 dismissed). With these needs in mind, the community will ultimately adapt and apply the 689 four principles to capture a wide scope of applications. One example of adapting to meet 690 practical requirements is illustrated by the trade-off between explanation detail and time 691 constraints. These constraints highlight that certain scenarios require a brief, meaningful 692 explanation to take priority over an accurate, detailed explanation. For example, emergency 693 weather alerts need to be meaningful to the public but can lack an accurate explanation 694 of how the system arrived at its conclusion. Other scenarios may require more detailed 695 explanations but restrict meaningfulness to a specific user group; e.g., when auditing a 696 model. 697 The focus of explainable AI has been to advance the capability of the systems to pro698 duce a quality explanation. Here, we addressed whether humans can meet the same set of 699 principles we set forth for AI. We showed that humans demonstrate only limited ability to 700 meet the principles outlined here. This provides a benchmark with which to compare AI 701 systems. In reflection of societal expectations, recent regulations have imposed a degree 702 of accountability on AI systems via the requirement for explainable AI . As advances 703 are made in explainable AI, we may find that certain parts of AI systems are better able 704 to meet societal expectations and goals compared to humans. By understanding the ex705 plainability of both the AI system and the human in the human-machine collaboration, this 706 opens the door to pursue implementations which incorporate the strengths of each, poten707 tially improving explainability beyond the capability of either the human or AI system in 708 isolation. 709 In this paper, we focused on a limited set of human factors related to explainable de710 cisions. Much is to be learned and studied regarding the interaction between humans and 711 explainable machines. Although beyond the scope of the current paper, in considering the 712 interface between AI and humans, understanding general principles that drive human rea713 soning and decision making may prove to be highly informative for the field of explainable 714 AI . For humans, there are general tendencies for preferring simpler and more general 715 explanations . However, as described earlier, there are individual differences in which 16 716 explanations are considered high quality. The context of the decision and the type of de717 cision being made can influence this as well. Humans do not make decisions in isolation 718 of other factors . Without conscious awareness, people incorporate irrelevant infor719 mation into a variety of decisions such as first impressions, personality trait judgments, 720 and jury decisions [21, 29, 90, 91]. Even when provided identical information, the con721 text, a person’s biases, and the way in which information is presented influences decisions 722 [4, 15, 17, 23, 36, 43, 68, 94]. Considering these human factors within the context of 723 explainable AI has only just begun. 724 To succeed in explainable AI, the community needs to study the interface between hu725 mans and AI systems. Human-machine collaborations have shown to be highly effective 726 in terms of accuracy . There may be similar breakthroughs for AI explainability in 727 human-machine collaborations. The principles defined here provide guidance and a phi728 losophy for driving explainable AI toward a safer world by giving users a deeper under729 standing into a system’s output. Meaningful and accurate explanations empower users to 730 apply this information to adapt their behavior and/or appeal decisions. For developers and 731 auditors, explanations equips them with the ability to improve, maintain, and deploy sys732 tems as appropriate. Explainable AI contributes to the safe operation and trust of multiple 733 facets of complex AI systems. The common framework and definitions under the four prin734 ciples facilitate the evolution of explainable AI methods necessary for complex, real-world 735 systems. 736 Acknowledgments 737 The authors thank Kristen Greene, Reva Schwartz, Brian Stanton, Amy Yates, and Jesse 738 Zhang for their insightful comments and discussions. 739