1 Introduction

This paper presents a stress test of experimenter demand effects, which refer to changes in behaviour by experimental subjects due to cues about what constitutes an appropriate behaviour (Zizzo 2010). The idea behind experimenter demand effects is that subjects try to make sense of an unfamiliar experimental environment in order to decide an appropriate response, and in doing so they may be particularly sensitive to whatever cues are provided in such an environment. Our stress test of experimenter demand effects is of particular relevance to the interpretation of results in settings, such as public good games, trust games or bargaining games, that are characterized by variable surplus. That is, how the overall amount is split across the subjects depends on the actions by the subjects.

We run an experiment where subjects can physically destroy money-equivalent coupons awarded to themFootnote 1 as well as, in a different task, explicitly return money to the experimenter. Subjects affected by experimenter demand effects will consider both tasks to be identical where, given their stylized nature and the absence of any simple alternative schema to make sense of them, there is an expectation that they should physically destroy or return some of the assets given to them. The distinctive feature of the destruction task is that the coupons’ destruction could not directly benefit the experimenter. This is in contrast to the cash return task or, indeed, any experiment where a money transfer to the budget of the experimenter implicitly takes place if experimental surplus is destroyed or not obtained. In the cash return task, unlike the destruction task, there can be a money transfer towards the experimenter, so altruism towards the experimenter may potentially affect behaviour and provide an alternative explanation of behaviour relative to experimenter demand effects.

We are aware of two papers that have tested the possibility of altruism towards the experimenter. In an insightful contribution, Frank (1998) did so in the context of ultimatum games: in the experimental treatment, if the proposer’s offer was rejected, the money-equivalent currency (stamps) was physically burned; they found that receivers did not behave differently in this treatment relative to a control where the money implicitly went back to the experimenter. However, the strategic nature of ultimatum games may have meant that feelings of anger out of unfairness may have reduced the relevance of altruism towards the experimenter in this context; furthermore, this experiment did not control for the possibility that some subjects would gain utility from seeing stamps physically burned in the laboratory. Harrison and Johnson (2006) considered the effect of changing the recipients in standard dictator games, and found that giving to the experimenter was intermediate between giving to a charity and giving to another subject.Footnote 2 This could be interpreted as evidence of altruism towards the experimenter, but it is equally possible that returning money to the experimenter by the implied transfer of unexploited experimental surplus could in itself be due to experimenter demand, i.e. being due to the demands of the experimental decision environment, rather than being due to altruism towards the experimenter.Footnote 3

We use an extremely simple setup with minimal strategic concerns or potential for misunderstanding to identify experimenter demand effects against alternative explanations. By providing an equivalent physical task to all subjects regardless of whether coupons are destroyed, we control for the pleasure of physical activity. Our experiment also controls for the potential benefits of coupon destruction for the coupons provider and for the clarity of the instructions. We are then able to separate out experimenter demand effects from altruism towards the experimenter as possible explanations of behaviour. If subjects are driven by altruism towards the experimenter in deciding what to do in experiments with variable surplus tasks, we would expect subjects to return money to the experimenter but not to destroy the coupons as this could not benefit the experimenter. Conversely, if experimenter demand effects drive both physical coupon destruction and returning money to the experimenter, both should be positive and should be positively correlated to each other.

We have a social information treatment manipulation where we provide summary information about how subjects behaved in a pilot. We also have a partial, albeit imperfect, measure of sensitivity to social pressure by using the Stöber (2001) social desirability scale.Footnote 4 Intuitively, under social information the experimental norm (what is expected of subjects in the experimental environment) should be clearer and, therefore, subjects who are more responsive to social pressure should destroy more (conversely, subjects who are resistant to social pressure may destroy less). This provides a further stress test of experimenter demand effects.Footnote 5

Section 2 presents the experimental design, Section 3 presents the results and Section 4 briefly concludes. The experimental instructions are available in the appendix.

2 Experimental design

2.1 Design

64 participants completed the social desirability measure online. About one week later participants were invited back to an experimental laboratory session. Participants completed two tasks in counterbalanced order followed by debriefing questions. Questions to check understanding were given ahead of each task, and clarifications were given to subjects who gave any incorrect question.

The two tasks were a (coupons) destruction task and a (cash) return task. In the destruction task participants were given six 50 pence paper coupons worth \({\pounds }3\) and six paper blanks. The coupons were redeemable at university cafés.Footnote 6 The participants were told ‘you need to decide how many vouchers to destroy by shredding them’. For each coupon not destroyed a paper blank was destroyed instead, using the same shredder, so that the same physical activity took place for all participants, thus controlling for any pleasure from the physical activity.Footnote 7 Participants were discreetly observed during this task. Participants kept any non-destroyed coupons for use after the experiment. In the return task participants were given six 50 pence pieces worth \({\pounds }3\). The participants were told ‘you need to decide how much cash to return to the experimenters’ and were provided with an envelope into which cash could be returned. Any remaining cash and coupons could be kept as participant payment.Footnote 8

There were two experimental treatments. In a NoInfo treatment (\(n=31\)), there were no further instructions. In the SocialInfo treatment (\(n=33\)), subjects were told, truthfully (as we relied on pilot data), the following social information: ‘In test sessions... some people destroyed vouchers/returned cash - of those that did, on average (rounded to the nearest 50p) they destroyed/returned \({\pounds }1\) of vouchers/cash’.

After the decision tasks participants were asked about their perceived altruism in voucher destruction, ‘Do you think destroying vouchers would be beneficial to [campus cafés]?’ on a Likert scale from 1 not at all beneficial to 7 extremely beneficial. As a further check, subjects were also asked about how clear they had found the instructions for each of the two tasks, again on a 1–7 Likert scale.

3 Experimental results

Figure 1 presents histograms with amounts destroyed and returned. 30 % of participants in the NoInfo treatment and 35 % of participants in the SocialInfo treatment did destroy some coupons. 6.6 and 11.1 % of endowments were, respectively, returned and destroyed in the NoInfo treatment, vs. 9.7 and 11.8 % in the SocialInfo treatment. The amounts destroyed and returned are comparable (Wilcoxon \(p=0.392\)),Footnote 9 implying that altruism towards the experimenter (as separate from experimenter demand) is not the primary driver of behaviour, for it cannot explain the destruction activity. While there was more destruction or returning of coupons under SocialInfo, this does not achieve statistical significance (Mann–Whitney \(p=0.753\) and 0.258 for destruction and returning, respectively).

Fig. 1
figure 1

Histograms of destruction and return choices in the NoInfo and SocialInfo treatments. a Destruction task. b Return task percentages are rounded to the second decimal place

Table 1 presents a regression analysis on the amount destroyed. As destruction can only take place in blocks of 50 pence (the value of each coupon), the amount destroyed can take 7 levels, from 0 coupons destroyed to 6 coupons destroyed, and so an ordered probit model is appropriate; Tobit estimates are also reproduced as robustness checks. As independent variables we have CashReturned (the amount returned in the return task) in two of the regressions, SocialInfo (\(=1\) in SocialInfo, else 0), SocDes (the Stöber measure of social desirability),Footnote 10 SocDesxSocialInfo (an interaction term), Beneficial (stated extent to which the coupons destruction is seen as beneficial towards the university café provider), ClearIDestruction (stated clarity of the destruction task instructions), ClearIReturn (stated clarity of the return task instructions). There is a clear and strong positive correlation between amount returned and amount destroyed, which is consistent with an experimenter demand effect explanation rather than one based on altruism towards the experiment. There is also a positive coefficient on Beneficial, which is to be expected; however, the clarity of the instructions on destruction seems, if anything, to increase rather than decrease destruction,Footnote 11 suggesting that confusion was not a problem in our experiment.

Table 1 Regressions on amount destroyed

SDS17xSocialInfo has a statistically significant positive coefficient. Subjects with a higher value of SDS17 are more socially sensitive to social pressure, vertical—i.e. experimenter demand—and horizontal—i.e. peer pressure. When social information is provided, this provides greater clarity on the social demands of the situation and subjects who are more sensitive to social pressure are then more responsive to destroy more; though, as there is no average effect of social information, this clarity would also seem to yield less destruction from subjects who are less sensitive to social pressure.

4 Conclusions

We considered a stress test of experimenter demand effects. Altruism towards the experimenter is unable to explain the key finding of destruction of coupons by roughly one subject out of three, the same as the fraction of subjects who returned money to the experimenter. While the latter can be explained by altruism, the former cannot; the strong positive correlation between coupons destroyed and money returned to the experimenter is also left unexplained by an explanation based on altruism towards the experimenter. Confusion is unlikely to be behind this as the tasks were simple, subjects’ understanding was checked, and variables for clarity of the instructions on the destruction task were either statistically non significant or positively rather than negatively correlated with destruction. Subjects valued the coupons (see footnote 9) and our analysis also controlled for the pleasure of physical destruction by ensuring that the same physical destruction activity took place regardless of how many coupons were destroyed; and for the possibility of benefiting the café coupon providers by having a relevant question that went into our regression analysis.

Social information provided clarity on the experimental norm and, where present, our measure of sensitivity to social pressure predicted the amount of destructive activity, which is consistent with experimenter demand effects. The heterogeneity in degrees to which subjects are sensitive to social pressure is still largely a neglected variable in economic research, and it should not be.