Social network analysis methods have made it possible to test whether novel behaviors in animals spread through individual or social learning. To date, however, social network analysis of wild populations has been limited to static models that cannot precisely reflect the dynamics of learning, for instance, the impact of multiple observations across time. Here, we present a novel dynamic version of network analysis that is capable of capturing temporal aspects of acquisition—that is, how successive observations by an individual influence its acquisition of the novel behavior. We apply this model to studying the spread of two novel tool-use variants, “moss-sponging” and “leaf-sponge re-use,” in the Sonso chimpanzee community of Budongo Forest, Uganda. Chimpanzees are widely considered the most “cultural” of all animal species, with 39 behaviors suspected as socially acquired, most of them in the domain of tool-use. The cultural hypothesis is supported by experimental data from captive chimpanzees and a range of observational data. However, for wild groups, there is still no direct experimental evidence for social learning, nor has there been any direct observation of social diffusion of behavioral innovations. Here, we tested both a static and a dynamic network model and found strong evidence that diffusion patterns of moss-sponging, but not leaf-sponge re-use, were significantly better explained by social than individual learning. The most conservative estimate of social transmission accounted for 85% of observed events, with an estimated 15-fold increase in learning rate for each time a novice observed an informed individual moss-sponging. We conclude that group-specific behavioral variants in wild chimpanzees can be socially learned, adding to the evidence that this prerequisite for culture originated in a common ancestor of great apes and humans, long before the advent of modern humans.
Permission to conduct this research was given by the Uganda Wildlife Authority (UWA), the Ugandan National Council for Science and Technology (UNCST), and the National Forestry Authority (NFA). The Budongo Conservation Field Station was established in 1990 in the Budongo Forest Reserve, which lies in the western Rift Valley in Uganda (1°350–1°550 N, 31°180–31°420 E) at a mean altitude of 1,050 m. The 793 km2 Reserve includes 482 km2 of continuous medium-altitude semideciduous forest cover. The Sonso community has been under continuous observation since the early 1990s with individuals individually known and habituated to human observers for about 20 y [39]. During data collection in November 2011, the Sonso study community of chimpanzees consisted of 68 named individuals. Following Reynolds [39], we defined age groups as infants (0–4 y), juveniles (5–9 y), subadults (m, 10–15 y; f, 10–14 y), and adults (m, 16+ y; f, 15+ y). Using these categories, the group composition was 30 adults (10 males and 20 females), 15 subadults (4 males and 11 females), 13 juveniles (4 males and 9 females), and 10 infants (3 males and 7 females). Data were collected on November 14–19, 2011, between 7 a.m. and 5 p.m., at a socially contested waterhole between the roots of two trees (Cynometra alexandrii and Mimusops bagshawei) located in an area of recently flooded swamp forest approximately 5 m from a seasonal river (Figure S1). The hole contained high mineral levels compared with other nearby water sources, such as the river (Na, K, Ca, Mn, Cl) (Reynolds V, Lloyd AW, English CJ, Lyons P, Dodd H, et al., Budongo Forest chimpanzees’ sodium resources: New adaptations, unpublished manuscript). All observed cases of leaf-tool fabrication and use were recorded using a hand-held high-definition camcorder (Panasonic HD60) [73]. Although leaf-sponging was focused on the waterhole, there were a number of additional stagnant puddles within a 3-m radius where individuals used LS tools and drank directly (Figure S2). Leaves used to manufacture sponges were identified as Lasiodiscus mildbraedii, Lychnodiscus cerospermus, and Agromolera subspecies. Mosses were collected in the waterhole area when chimpanzees were absent. Species were identified as Pilotrichella cuspidate, Racopilum africanum (Mitt), and Pinnatella minuta (Mitt). Additionally, two liverwort species, Plagiochila strictifolia (Steph) and Plagiochila pinniflora (Steph), were identified. These primitive plants looked similar to flattened mosses and may have been part of the moss-sponges. Following Whiten et al. [11], LS is “wad of leaves/vegetation chewed and used to collect water, then squeezed in mouth.” Moss-sponge, following Lanjouw [40], is defined as follows: “chimpanzees collected moss off the bark of the trees, loosely rolled it into a bundle, generally not bigger than a few centimeters wide.” Moss-sponge was inserted into the mouth at least once before sponging. In both previous cases, the sponges appeared exclusively composed of moss despite leaves being freely available. In Sonso, moss may be combined, but not necessarily, with leaves in the initial fabrication or added to an existing LS (Videos S1 and S2). Fabrication is the removal/collection of leaves or moss and fabrication of sponge in mouth, but sponge is not subsequently dipped into water, for example, as access to the sponging location is blocked by another individual. Use is defined as dipping of sponge into water and insertion at least once into mouth to suck the water. Re-use (type 1 and 2) is defined as follows: We coded as re-use type 1 (Video S3) the recovery of a used sponge that had been fabricated by another individual (or possibly by the same individual on a previous visit to the sponging location) and discarded. We distinguished this from re-use 2, a commonly observed behavior in which infants beg or scrounge for sponges made by their mother or older maternal siblings, as this is done while the older relative is using the sponge, as opposed to after they have discarded it (Video S4). In Sonso, RU2 appears limited to immature individuals and has never been recorded in mature individuals. Similarly, in West African chimpanzees (P. t. verus), both RU1 and RU2 are observed, but the behavior is only displayed by infants and juveniles [38]. Drinking is defined as drinking directly with the mouth from the water source. Video files were uploaded to an Apple MacBook Pro using iMovie and edited into discrete clips for analysis. We coded the following variables for all occurrences of leaf-tool fabrication, (re-)use, and direct drinking: date, individual identity, party composition, specific audience (individuals within 1 m), fabrication of sponge (removal of material and fabrication of sponge in mouth, collection of discarded sponge from the ground), use of sponge for drinking (sponge dipped in water and back to mouth at least once), sponge material (leaf or moss), and location (sponging-hole or puddle). Individuals within 1 m of the model while the model was fabricating the sponge, but excluding individuals with either their head turned fully away or with their view obstructed by the environment (for example, sitting behind a tree-buttress or with their head inside the waterhole), were considered to be “potential observers.” A second more restrictive definition was also applied for the “specific audience” in which individuals had to be within 1 m of the model and were considered to have actively looked at the model while the sponge was fabricated. This specific audience included individuals who were seen to shift their eye gaze to the model or to track the model’s movements with their head movements or who had their head facing the model ±45° (as per [74]). A separate network was constructed for M and RU1. In each case, a directed edge was considered to exist between two individuals, from X to Y if there was at least one registered occurrence of X observing Y performing the RU1 or M behavior prior to X acquiring the relevant behavior themselves. The latter criterion was included as behavior can only be transmitted by observations that occur prior to acquisition of behavior and such that a positive result could not be indicative of homophily—that is, individuals who acquire a behavior being subsequently attracted to one another and thus observing each other more. The weight of the directed edge, aYX, was equal to the number of such occurrences. For the dynamic social network, the edges were allowed to vary over time. Here, aYX(t) was taken to be the number of times X had observed Y performing the target behavior prior to time t. We also considered a binary dynamic network, where aYX(t) was taken to be 1 if X had observed Y performing the target behavior prior to time t, and 0 otherwise. We included this to allow for the possibility that a single observation of the target behavior may be sufficient for a maximal social transmission effect to occur. To analyze the spread of the behaviors, we entered information about all individuals who used at least one tool at the tree-hole in NBDA models (N = 30). We ran an OADA [2] treating M and RU1 as independent diffusions included in the same model, allowing us to test for difference in the social transmission effect. We used the R script model for NBDA Version 1.2.11 available at http://lalandlab.st-andrews.ac.uk/freeware.html. NBDA is based on survival analysis models and so assumes that the spread of the behavior is a stochastic process and that a naïve individual, i, has at any time a given learning rate, , for each behavior pattern in question. We included a number of potentially confounding variables: x1, age (in years); x2, time spent in the community (in years); x3, sex (0/1 for female/male, respectively). These data were extracted from the Sonso community official list of individuals downloaded at http://www.budongo.org/. There is little support for an important effect of any individual-level variable (see Table S2). We considered both conventional NBDA models with the static social network and expanded the approach to include the dynamic network described above. For the static network NBDA, there are two functional forms for inclusion of individual-level variables in an NBDA [2], a model in which the interaction between social transmission and the individual-level variables is taken to be additive: and one in which it is taken to be multiplicative: where is a baseline rate function, which in OADA remains unspecified; s is the effect of social transmission per occasion i observed j; is the multiplicative effect of individual-level variable k on the log scale; and zi(t) is an indicator variable that takes the value 1 if i has acquired the behavior by time t and 0 otherwise. Both additive and multiplicative models were fitted: Findings were similar for each, but the multiplicative model had slightly better support (see Table S1), as reported in the main text. The log-likelihood for acquisition event l, occurring at time tl, at which individual m acquired the behavior is: The log-likelihood for the whole diffusion is calculated by summing across all acquisition events. In a reanalysis, we excluded the M acquisition event for KW (see main text) by simply excluding this acquisition event from the likelihood function. Proportion of acquisitions that were by social transmission was estimated for the best model (with no individual-level variables) by calculating for each acquisition event l>1: Here, the numerator is the rate of social transmission relative to the rate of asocial learning at time of the l-th acquisition event, and the denominator is the total rate of learning relative to the rate of asocial learning. Therefore, the whole equation gives the probability that event l occurred by social transmission, predicted by the model. By averaging across all acquisition events except the initial acquisition, we obtain the estimated proportion of events (excluding the innovation) that occurred by social transmission. A static network based on observations does not fully allow for the time course of observations. To illustrate, one can imagine a group of three individuals: A, B, and C. A learns the behavior first. Next, B observes A performing the behavior three times and then learns the behavior. Finally, C observes A performing the behavior three times and subsequently learns the behavior last. A static network would represent the network as having links of strength 3 from A to both B and C, so an NBDA model based on such a network would predict that B and C were equally likely to learn second. In fact, we would expect B to be more likely to learn second, because B observed A performing the behavior first. A dynamic network allows us to incorporate this information into the NBDA. We considered a number of different functional forms for the dynamic network. First, we considered a model in which each successive observation of the target behavior had a linear relationship with the rate of learning. As with the static network NBDA, we considered models in which the interaction with individual-level variables was taken either to be additive or to be multiplicative. These models are identical to those given above, except aij is replaced with aij(t). We also considered a form where the effect of each successive observation of the target behavior had a linear effect on the log scale, on the rate of learning—that is, each successive observation multiplied the rate of learning by exp(s): We refer to this as the log-linear model. Here a single observation adds s to the linear predictor [inside the exp() term] having the effect of multiplying the rate of learning by a factor of exp(s). We also considered a version of the log-linear model in which the interaction with individual-level variables was additive: but this had less support than the multiplicative version (see Table S1). For our dynamic network, the log-linear model is equivalent to including the number of observations of the target behavior prior to time t as a time-varying covariate in a Cox model [75]. This allowed us to use the survival package [76] to fit the models in the R statistical environment [77] to include a random (or frailty) effect to account for the fact that each diffusion included the same individuals. However, the random effect was estimated to be negligible and had no effect on the results, corresponding to the fact that each behavior diffused through a different subset of the group (with the exception of KW). Consequently, we dropped the random effect from the analysis. The model using the binary dynamic network is specified using the same equation as the log-linear model. The likelihood function given above for the static network NBDA is valid for all models given here. Analogously to the linear model, the proportion of acquisitions that were by social transmission was estimated for the best log-linear model (with time in population included) using the dynamic network by calculating for each acquisition event l>1: Here the numerator is the estimated rate of learning at the time of acquisition of the behavior minus the rate that would be expected under asocial conditions, and so can be thought of as the rate of social transmission. The denominator is the total rate of learning at the time of acquisition, so the fraction gives the probability the event occurred by social transmission. Averaging across all acquisition events except the initial innovation gives the estimated proportion of acquisitions that were by social transmission, excluding the innovation, which is known not to have occurred by social transmission. We used an information theoretic approach using Akaike’s Information Criterion corrected for sample size (AICc) to allow for model selection uncertainty. This allowed us to estimate the support for each variable/model of social transmission, calculate model-averaged estimates of effects, and construct unconditional confidence intervals using profile likelihood methods [78]. Because the TADA can have more statistical power than OADA [2], we fitted TADA models to check the robustness of our findings. The times of learning entered into the models were the cumulative time across days, including only times at which the group was present at the waterhole—to allow for the fact that the rate of learning would be zero when the group was not present at the waterhole. We fitted models assuming a constant baseline function , and models allowing for the possibility that might systematically increase or decrease over time [79]. We also fitted models in which the baseline rate differed between M and RU1, to allow for differences in the asocial rate of learning. For the TADA analysis, the best model was the standard linear form of NBDA: Here we report the results of this set of models, though other functional forms gave similar results. For many models, the estimated Hessian matrix could not be inverted, so we could not reliably extract standard errors, meaning we could not calculate confidence intervals allowing for model selection uncertainty [78]. Consequently traditional confidence intervals are reported for TADA—that is, conditional on the best model containing the relevant parameter. There was stronger evidence for social transmission of RU1 (same social effect as for M, Σwi = 0.289; different social effects, Σwi = 0.268) though still more support for social transmission of moss-sponging only (Σwi = 0.443). For moss-sponging, s was estimated at 42.5 (95% C.I. = 6.74–814). corresponding to 84.3% (77.5%–85.6%) of acquisition events by social transmission, excluding the innovator. For leaf-sponging re-use, s was estimated to be 1.18 (95% C.I. = 0–6.78) corresponding to 22.3% (0%–36.4%) of acquisition events occurring by social transmission. The difference in s parameters (M – RU1) was estimated to be 41.3 (95% C.I. = 5.16–800). Therefore, the results of the TADA are qualitatively similar to the results of the OADA. In the main text, we present the results of the OADA as it makes fewer assumptions about the underlying baseline rate: although we can allow for a systematically increasing baseline rate using TADA, it is difficult to allow for a fluctuating rate, caused by changing conditions in the environment—for example, temperature changes affecting motivation to drink [33]. Consequently, we suspect OADA is likely to be more reliable in uncontrolled conditions. To assess the robustness of our findings to the judgments we made about who observed whom, we repeated both OADA and TADA analyses using static and dynamic networks based on a stricter criterion of recording observation (see above). Overall the strict network had 0.43× less support than the less strict network for OADA, and slightly more support for TADA (1.2×). In both cases, the Akaike weights showed a similar pattern of support using each observation criterion (see Table S1 and Figures S4 and S5). Note that both (a) recording of nonobserving individuals as observers and (b) failure to record observers will obscure any existing relationship between the observation network and the pattern of diffusion. This has two consequences: First, a stricter observation criterion does not necessarily mean a more accurate estimate of s parameters, as it may reduce cases of a but at the potential cost of increasing cases of b. Second, in either case, the effect of such errors in recording will be a tendency to underestimate social transmission effects, so the reported social transmission of M could not be the result of a bias arising from errors in recording who observed whom. A potentially confounding variable is the different level of exposure each chimpanzee had to the waterhole. A priori, it seemed possible that chimpanzees that interacted with the waterhole more frequently would be more likely to acquire both behavior patterns than chimpanzees that interacted with the water hole less frequently. If this exposure was correlated with observation of others performing M, this could create a spurious social transmission effect. To an extent, the different level of social transmission for M and RU1 weakens the case for this explanation, as we would expect an exposure effect to operate similarly on both behavior patterns. Nonetheless, we ran additional analyses to allow for the potential effects. We calculated an exposure score for each chimpanzee for each behavior pattern as being the rate at which each chimpanzee interacted with the waterhole—that is, initiated bouts of normal leaf-sponging behavior. If a chimpanzee did not acquire the behavior pattern in question (M or RU1), exposure was calculated over the whole period for which we observed the chimpanzees at the waterhole ( = number of interactions/total observation time). For chimpanzees that acquired a behavior pattern, the corresponding exposure score was calculated over the time preceding acquisition of that behavior (e.g. = number of interactions prior to acquiring M/time at which M was acquired), as exposures experienced after acquiring M (for example) cannot exert a causal effect on the acquisition of M. We first added exposure score as a predictor to the best model for the OADA reported in the main text, with exposure constrained to have the same effect on both M and RU1. This model had 0.43× less support, the effect of exposure was estimated to be small, and the estimate of the social transmission parameter remained very similar (s = 2.79). We then wished to allow for the possibility that exposure might affect only M, thus resulting in a spurious social transmission effect for M. This model had 3.92× more support than the previous best OADA model. However, contrary to expectations, the effect of exposure was estimated to be negative with a 9.3% reduction in rate of acquisition for one standard deviation difference in exposure score (see Figure S6). Most importantly, the effect of social transmission was estimated to be slightly higher in this model (s = 3.00), suggesting that differential exposure to the waterhole is unlikely to have resulted in a spurious social transmission effect for M.