The study of psychology has been handicapped by the difficulty of measuring how individual traits affect interactions with the surrounding social structures and how this interaction affects both individual life outcomes and group characteristics. With the advent of continuous, fine-grain data from cell phones, credit cards, and online interactions, the field of human psychology can become better at understanding the role of social context by combining these new data sources with standard experimental methods. This article will examine how these new tools can shed light on the influence individual psychological traits have on life outcomes, as well as on social properties such as inequality. Use of these new data sources requires special care to uphold ethical standards, and so new methodologies have been developed.
Keywords: technology and psychology, computational social science, life outcomes, child development, social networks
Disclosure: The author declares no conflict of interest.
Open Access License: This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CCBY-NC-ND). This license permits copying and redistributing the work in any medium or format for noncommercial use provided the original authors and source are credited and a link to the license is included in attribution. No derivative works are permitted under this license.
Disclaimer: Interactive content is included in the online version of this article.
Correspondence concerning this article should be addressed to Alex Pentland, Department of Connection Science, Massachusetts Institute of Technology, E15-387, 20 Ames St, MIT, Cambridge, MA 02139, USA. Email: email@example.com
It is often said that no person is an island and it takes a village to raise a child, but psychology has largely lacked the scientific evidence to quantify and characterize these aphorisms. As a result, experimental focus is usually on more easily quantifiable individual traits and behaviors. In the last decade, however, digital data from online interactions, cell phones, and credit cards have allowed us to precisely quantify large-scale social behavior at a very fine level of detail. When properly regulated to protect personal privacy, these data have enormous power to refine and clarify the traditional questions posed by the social sciences.
The term “computational social science” to describe the use of such “digital breadcrumbs” in the social sciences was popularized by the 2009 Science paper of the same name (Lazer et al., 2009). Since then more than 10,000 journal papers have referenced the term and more than 100 academic groups have begun to describe themselves using the term. There is great enthusiasm that this new approach to social science is enabling new ways of understanding of human behavior; for the background to these new tools, refer to overview articles in Nature (Buchanan, 2009) and Proceedings of the National Academy of Science of the United States of America (Mann, 2016).
This article will propose how these new tools can help relate individual traits to the surrounding social context and thus better explain life outcomes and societal characteristics. Beginning with an example where psychology theory has been ineffective—the problem of predicting children’s life outcomes—this article will show what evidence computational social science brings to the problem and then explore how to extend psychological theory using this new computational perspective. The final sections of this article will outline the new practical and ethical problems that this new type of “computational psychology” hybrid presents.
In 2017, my MIT research group analyzed a uniquely large and complete database describing the life trajectories of at-risk children and used these data to build predictive models for life outcomes ranging from eviction from home to “grit” to school grade-point average. These data were generated by the Fragile Families Study (see https://fragilefamilies.princeton.edu/), which examined the development of 4,242 children, interviewing primary caregivers at birth and again when children are aged 1, 3, 5, 9, and 15 years, together with in-home assessments of the children. Several collaborative studies provided additional information on parents’ medical, employment and incarceration histories, religion, child care and early childhood education. In total, 12,943 measurements were made of each child and their family, including scores on an extremely wide variety of standardized tests.
A total of 160 academic teams competed to use these data to predict life outcomes of these children (Salganik et al., 2020). My MIT team produced the most accurate models for half of the life outcome prediction tasks (see http://news.mit.edu/2017/mit-human-dynamics-team-tops-fragile-families-challenge-1004). Despite the rich data set and state-of-art statistical methods, however, our best predictions for these life outcomes were not very accurate and in fact were only slightly better than those from a simple benchmark. The uncomfortable conclusion is that you cannot predict children’s life outcomes from any of the standard tests or interview methods applied to either the children or their families.
It turns out, however, that you can predict at least some life outcomes from data about the neighborhood in which the children and their families live. For example, consider these findings about intergenerational financial mobility, a life outcome that is highly correlated with many life outcomes considered in the Fragile Families Study. To examine the “American dream” of intergenerational mobility, a group of economists obtained access to 30 years of longitudinal data from the U.S. Internal Revenue Service (see http://www.equality-of-opportunity.org/). From these data, they could compute the rate of intergenerational financial mobility across all U.S. Census Blocks. These economists found that 71% of the variation in financial life outcome could be predicted by characteristics of the surrounding neighborhood, specifically, the roughly four block area surrounding the child’s home.
Moreover, approximately one-quarter of this neighborhood effect is “locked in” by the time the child enters kindergarten, and approximately half of the neighborhood effect is in place by the fifth grade. They could also analyze the outcomes of children who moved from one Census Block to another Census Block as part of a randomized lottery, thus establishing that the neighborhood effect is causal. Why don’t interviews with parents provide similar predictive power? Perhaps it is because the significant variables here are ones that people generally do not have quantitative knowledge about (e.g., income distribution of people in adjoining city blocks), or are not even aware of (e.g., proportion of census forms returned, a proxy for social capital). Nor do people suspect the predictive power of these variables. Indeed, the relationships were unknown until this large-scale longitudinal computational social science analysis became available.
The failure of the Fragile Family analyses in predicting life outcomes despite incorporating thousands of standard individual and family measurements over a period of 20 years, when contrasted with the surprising predictive power of neighborhood characteristics, is perplexing and points to a blind spot within the discipline of psychology. Many psychology studies focus on short-term cognitive (linguistic) behaviors of humans older than 5 years. It is quick and easy to ask questions in a laboratory experiment, but extremely difficult1 to track behaviors over years (and especially the earliest years). The Internal Revenue Service data, however, suggest that the life trajectory of these children can be predicted far better by their neighborhood social context in the years before they were linguistically mature, rather than by cognitive or emotional characteristics during their entire childhood. Very few psychological studies speak quantitatively and systematically to the effects of this sort of social context, especially for very young children.
What can computational social science, leveraging behavioral data from cell phones, online interactions, and elsewhere, say about the connection between psychological traits, the surrounding social context, and life outcomes? To explore this question further, consider which behavioral characteristics predict wealth creation and wealth inequality. Many life outcomes are strongly correlated with wealth and inequality and now there are data available from large numbers of people in societies across the world to help explore these relationships.
For instance, Figure 1 illustrates data from a study in which my research group examined a sample of 100,000 randomly selected people in a mid-income Asian country and compared their ability to hear about new opportunities (measured by how closed or open their social networks are) to their income (Jahani et al., 2017). As can be readily seen, people who have more open networks make more money. Moreover, this is not likely to be an artifact of the way access to opportunities was measured because you obtain the same result looking at the diversity of the jobs of the people with whom they interact, or the diversity of locations of the people with whom that they interact. Surprisingly, if you examine only people who have a sixth grade education or less, this curve moves only a little downward. If you look at people with college educations, the curve moves only a little bit upward. The variation that has to do with education is small when compared with the variation that has to do with diversity of interaction.
It is natural to ask if greater network diversity causes greater income or whether it is the other way around. The answer seems to be both: greater network diversity causes greater income on average (this is the idea of weak ties bringing new opportunities), but it is also true that greater income allows people to make their social networks more diverse. This focus on the structure of social networks and their ability to relay new opportunities to individuals is quite different than the normal psychological focus on individual traits and behaviors and is critical in modeling life outcomes. Psychology can benefit from this type of contextualization, that is, incorporating characteristics of the social network in which individuals are embedded and how their individual traits interact with the opportunities and barriers those social connections provide.
Many of the large-scale data analyses using the tools of computational social science provide evidence that when seeking to understand how behavior traits affect life outcomes, it is best to conceive of humans as a species who are on a continual search for new opportunities and ideas and that the surrounding social networks serve as a major, and perhaps the greatest, resource for finding opportunities. Humans are like every other social species: our lives consist of a balance between the habits that allow us to make a living by exploiting our environment and exploration to find new opportunities.
In the animal literature this is known as foraging behavior. For instance, if you watch rabbits, they will come out of their burrows, go get some berries, then come back every day at the same time—except that on some days they will scout around for other berry bushes. This is the tension between exploring, in case your berry bush goes away, and eating the berries while they are there.
And this is also the character of normal human life. When my research group examined credit card purchase data for 100 million people in the United States, the primary insight was that people are immensely predictable (Krumme et al., 2013). Both credit card data and mobile phone mobility data demonstrate that people are largely creatures of habit. By observing where you go or what you purchase in the morning, there is a 90% plus odds of being able to accurately predict where you will go and what you will purchase in the evening. But once in a while people break loose and they explore people and places that they visit only occasionally and this type of behavior is extremely unpredictable. It is this unpredictable exploration that gives us the impression of personal freedom.
Moreover, when you find individuals who do not show this pattern, they are usually sick or stressed in some way (Blumenstock et al., 2015; Madan et al., 2011; Singh et al., 2015). You can tell whether a person’s life is healthy in a general sense—both mentally and physically—by whether they show this most basic biological rhythm or not. This tendency is regular enough that one of the largest health services in the United States is now using this to keep track of at-risk patients (for additional details, see http://ginger.io). Once again, the finding is that individual traits alone do not effectively predict life outcomes, but rather it is how those traits act in concert with the surrounding social network to facilitate or hinder available opportunities.
Returning to the data shown in Figure 1, people in this South Asian country with open, diverse social networks make more money across all social castes; however, the amount of money earned for each “unit” of diversity is less for lower caste individuals. Talking to diverse lower caste people is not as profitable as talking to diverse upper caste people, regardless of their other individual characteristics. Similar segregation by social network happens everywhere, however, and is generally invisible and independent of individual traits. For an example closer to the United States, consider how schools and universities are designed. Across universities in several different countries, analysis of mobility and communication data shows that social connections are dramatically better predictors of student outcome than personality, study patterns, previous training, or grades and other individual traits. Performance in school and on tests is better predicted by the community of people you interact with than with almost all standard measures of individual characteristics (Kassarnig et al., 2017; de Montjoye et al., 2014).
Large-scale data from tax returns, purchase records, and phone mobility and communication records demonstrate that the characteristics of neighborhoods are extremely important in predicting life outcomes for children and that wealth is strongly correlated with social network diversity. But what is it about the neighborhoods that predict children’s life outcomes? The same methodology used to look at wealth creation in countries and cities can also be used for individual neighborhoods.
As illustrated in Figure 2, when data from individual neighborhoods are examined it becomes clear that diversity is a strong predictor of both income and (more importantly) income growth, even after controlling for factors such as location of the neighborhood in the city and density of the neighborhood. In other words, the idea flow via the social bridges that connect them predicts wealth creation. By comparing the explanatory strength of interaction diversity with other variables such as average age or percentage of residents who received a tertiary education, it becomes clear that these traditional demographic measures of social context are much weaker at explaining economic growth than social interaction diversity. This means that models and civil systems that depend only on demographic variables such population, education, and so forth are missing critical information.
In fact, our research shows that this neighborhood-by-neighborhood correlation accounts for roughly half of the variance in GDP growth across several thousands of neighborhoods in the United States, in Asia, and in Europe. When combined with the similar regularity for individuals (Figure 1), this result suggests that diversity of social connections within a neighborhood, and the “weak tie” opportunities they bring with them, may be the principle underlying driver of financial growth. In this case, the psychological and social factors that are usually thought of as affecting life outcomes may be secondary factors. Instead of being the core causal traits, they may make a difference primarily because they help or hinder the search for new opportunities. The main driver of progress in society may be the search for new opportunities, for example, foraging behavior, as opposed to individual skills or capital investment.
What does the existence of strong predictive contextual factors mean for causal processes and for psychological theory? Seeing lots of people carrying umbrellas does predict rain quite well, but umbrella carrying does not cause rain; the chain of causal factors is complex. Moreover, if we measure many, many different behaviors we will find some that predict rain within our measurement database not by causal connections but by pure chance. We must be careful to have both training data and independent test data, and it is best to pick contextual factors suggested by psychological or sociological theory. If we find a contextual factor that has strong predictive power (e.g., greater than 0.5) over independent test data, then this is a good indication that there are causal factors at work, but as with umbrellas, the causal chain may not be obvious.
As an illustration let us examine the results showing that diversity of interaction is strongly predictive of both individual income and GDP growth within a neighborhood (Figures 1 and 2). In the psychological literature, diversity of interaction is known to be a very strong factor in the “collective intelligence” of groups (Woolley et al., 2010). Similarly, in the sociology literature, the “theory of weak ties” predicts that diverse social network connections present better opportunities for employment and other desirable outcomes than do social ties within your homophilous core social network. These psychological and sociological facts suggest that diversity of interaction will be important in personal wealth creation and neighborhood GDP growth.
However, the mere fact of greater opportunity does not guarantee better personal or neighborhood outcomes. Better outcomes also depend on psychological factors such as curiosity, grit, and intelligence, interpersonal skills to find people to help exploit opportunities, and a physical, cultural, and economic environment that enables better outcomes. Moreover, these factors interact and their importance depends on the specifics of the opportunity and progress toward achieving desired goals.
Despite all these complexities in understanding causation, we must not lose sight of the strong predictive power of diversity of interaction in enabling opportunity. This strong relationship suggests that the flow of ideas and innovations between individuals and communities is the proper frame for understanding the role of psychological factors in enabling personal and community prosperity.
One interpretation of the Fragile Family results that is consistent with these and similar results in the computational social science literature is that very early social learning establishes children’s “foraging pattern.” It is useful to think of this type of “social programming” in relation to fast and slow thinking, as proposed by psychologist Daniel Kahneman. Kahneman embraces a model of a human mind with two ways of thinking. In his formulation, fast thinking is largely automatic and unconscious and used for almost all the regular activities of daily life. The second way that humans think is a slow, rule-based, and largely conscious mode. A thumbnail sketch of fast thinking is that it drives habits and categorical perceptions. In contrast, the slow mode of thinking uses reasoning, combining beliefs to reach new conclusions.
This picture of the human mind, however, is missing a critical piece. Computational social science suggests that the fast mind is the repository of cultural norms, a sort of “tribal mind” constructed largely unconsciously by integrating observations about how other people behave with biological constraints and tendencies (Madan et al., 2011; Pentland, 2015). Observing the experiences of others provides us with an easy way to decide whether a new idea will be successful for ourselves—we do not have to eat a new type of fruit to find out if it is poisonous, instead we can just watch what happens to others that eat that type of fruit.
In contrast, slow thinking is built on beliefs gained by individual reasoning and observations that seem interesting—facts and behaviors that might someday prove useful. Because slow thinking is rule-based and reflective, it provides a safe way to conjecture new ideas and norms without direct evidence (e.g., if you observe that eating dark berries make a person sick, then is it reasonable to ask if perhaps all dark berries are poisonous?). Language and slow-thinking are tightly coupled and so memorable stories can act as a sort of social “virtual reality” that allows us to learn useful facts and behaviors without having to observe them directly.
Language and logic seem to have little direct impact upon our fast thinking repertoire. Without social consensus reinforcing an idea or action, our slow-thinking rational mind is very poor at influencing our fast-thinking habits (Pentland, 2015). Language, however, permits the belief structures of slow thinking to be spread through a population. Such widely shared beliefs can potentially be incorporated into the fast-thinking repertoire by social pressure, thus becoming a behavioral norm.
In the Fragile Families example, it seems that very early experience sets the basic structure for the children’s fast-thinking norms and habits. Characteristics such as the tendency to explore versus hide, to persevere versus give up, and to assume personal agency seem to be established very early, by observation of and interaction with both other children and adults. Slow-thinking faculties mature on top of this foundation and have only limited ability to modify it. Habits are hard to break even when they obviously cause harm and changing social foraging habits is even more difficult because the disadvantages of a flawed fast-thinking repertoire are usually quite subtle and difficult to focus upon.
Standard models of human thought are mostly variations on the rational individual model. Fast thinking is seen as a compiled “look-up table” of beliefs and actions. It is quite limited in that it is inflexible, automatic, and imperfect in other ways, but still an approximation to rational thought. Slow thinking is seen as more generally rational, although still limited.
What computational social science suggests is that the “rational individual” model refers mostly to our slow-thinking mind and is a poor description of how people incorporate new actions and habits into their everyday, fast-thinking behavior. The key failure is not limitations on rationality; it is that the fast-thinking mind does not maximize for the needs of the individual. Instead, our fast-thinking mind, which is responsible for most of our everyday behaviors, is culture-bound, maximizing according to social norms, group benefit, and biological constraint, often against the interests of the individual.
The idea that fast thinking is primarily culture bound, instead of being driven by individual thought and reflection, means that fast thinking is collectively rational rather than individually rational (Pentland, 2015). Humans continually engage in exploratory behavior to find new adaptive behaviors and most of these new behaviors come from mimicry of other people. As the Fragile Families, diversity, and similar studies illustrate, it seems to be the breadth of a person’s exploratory behavior, and not their individual cognitive traits, that usually dominate life outcomes and the evolution of social characteristics.
Understanding that humans have two systems of thinking that work quite differently and which are primarily based on social observation transforms many of the classic disputes in philosophy, anthropology, and sociology. On one side of this academic battle are anthropologists such as Claude Lévi-Strauss, philosopher–economists such as Karl Marx and Adam Smith, and many social psychologists. Thinkers on this side of the dispute emphasize how the structure of society shapes the behavior of the individual. On the other side of the battle are philosophers such as Jean-Paul Sartre, game theorists, and cognitive scientists, who emphasize free will and how individual cognitive processes shape individual behavior.
The modern discovery that the human mind has two types of socially driven thinking yields this conclusion: It tells us that both sides of the free will versus social context debate are right, but that neither is right about all of human behavior all of the time. The majority of our behavior is habitual rather than reasoned, which runs counter to how many of us would like to view ourselves. As Kahneman put it, most of our behavior is based on the fast judgments of intuition and habit, not the slow process of reasoning. But, as the free will side would point out, it is likely that the majority of our most important decisions—what communities to be part of, who to pay attention to—are due to the slow process of reasoning.
What practical steps can society take to promote better life outcomes? The previous examples illustrate the importance of face-to-face interaction within the surrounding neighborhood. While there has been a sharp increase in remote, digital communications in modern times, the computational social science literature shows that physical interactions between people remain the key medium of information exchange, accounting for a major portion of the variance in a wide variety of outcomes (Pentland, 2015).
If you combine this idea of foraging for novel opportunities with the idea that diverse networks bring greater opportunities, you might hypothesize that cities with better transportation infrastructure would be better at facilitating face-to-face connection between diverse people and thus would enable better life outcomes. To test this hypothesis, data from 150 cities in the United States and 150 cities in the European Union were examined and the patterns of physical interactions between people inferred (Pan et al., 2013). What this study demonstrated is that if a city’s infrastructure facilitates more diverse face-to-face interactions, then over the long term the population has better life outcomes in areas ranging from health to crime to personal wealth. Consequently, one of the key ways to promote better life outcomes for Fragile Families may be simply to improve transportation to and from their neighborhoods.
How can measurements of neighborhood social characteristics be made more generally available to researchers? In 2014, a group of data scientists (including myself), along with representatives of communications companies and the heads of National Statistical Offices from nations in both the North and South, met within the UN headquarters and proposed what the UN Secretary General called the “data revolution” (UN, 2014). The proposal was that the nations of the world produce psychologically and sociologically relevant measures of human behavior within each census block of every country in the world. These measurements of the human condition would then be used to address poverty, inequality, injustice, and sustainability in a scientific, transparent, accountable, and comparable manner. Perhaps surprisingly, this proposal was approved by the UN General Assembly in 2015, as part of the 2030 Sustainable Development Goals. What this means is that rich, neighborhood-level (census block) data are an aspirational goal for all the 196 national signatories. While poorer nations will have difficulties meeting these data goals, the developed countries will have openly available, current data about all their neighborhoods. The wide availability of large-scale, multiple-perspective neighborhood data such as used in computational social science studies will enable construction of links between individual psychological traits, life outcomes, and societal conditions.
But what about privacy? And won’t this place too much power in too few hands? Because of concern about these issues, in 2007 I proposed the “New Deal on Data” (Pentland, 2009), putting citizens in control of data that are about them and also creating a data common to improve both government and private industry. This article resulted in the formation of a World Economic Forum discussion group in which world leaders and scientists were able to productively explore the risks, rewards, and cures for these big data problems. Members of this group went on to shape both the U.S. Consumer Privacy Bill of Rights and the EU Data Protection laws. While privacy and concentration of power will always be a concern, there are good solutions available through a combination of technology standards (e.g., “Open Algorithms” or OPAL, as described below) and policy (e.g., open data and a “data tax” on data held by private companies that force them to release an aggregated, low-granularity version of the data they collect).
The OPAL project is a sociotechnological system that leverages private sector data for public good purposes (Hardjono et al., 2016). It does this by “sending the code to the data” in a privacy preserving, predictable, participatory, scalable, and sustainable manner. It has two main objectives: providing a far better picture of human reality to official statisticians, scientists, policymakers, planners, businesses, and citizens, while enabling greater inclusion and control for all citizens on the kinds and uses of analyses performed on data about themselves. OPAL is currently being deployed through pilot projects in Senegal and Colombia, where it has been endorsed by and benefits from the support of their National Statistical Offices and major local telecom operators (see http://opalproject.org). Local engagement and empowerment will be central to the development of OPAL: needs, feedback, and priorities have been collected and identified through local workshops and discussions, and their results will feed into the design of algorithms.
Initiatives such as OPAL have the potential to enable more human-centric, accountable, and transparent data-driven decision-making and governance. These same data resources are what is required to contextualize psychology and thus make psychological theories far more predictive, connecting individual psychological traits to life outcomes and societal characteristics.
Computational social science provides longitudinal descriptions of behavior, and particularly of fine-grain interactions between people. When combined with standard psychological measurements, there is the potential to contextualize psychology, by determining the connections between individual psychological traits and how they influence interactions with the surrounding social network. This will allow more accurate predictions of life outcomes and of how individual behavioral traits shape societal characteristics.
Today most of our social institutions are based on the idea of “rational individuals” and the idea that Adam Smith’s “invisible hand” is due to market mechanisms. But Adam Smith actually stated something quite different: “It is human nature to exchange not only goods but also ideas, assistance, and favors…it is these exchanges that guide men to create solutions for the good of the community.” Interestingly, Karl Marx said something similar, namely, that society is the sum of all our social relationships. Computational social science affirms their intuitions.
Societies need to make choices about what the future should look like. Those choices should be made with the guidance of the best available data and methodology. There needs to be a way to understand, for example, the effects of individual’s social network diversity, of social network segregation, of tolerance of novelty versus holding fast to norms, and to understand the effects of how new digital tools shape human behavior. The challenge to the APA readership is clear: the psychology community needs to supply the evidence that will allow government programs to be more effective and fair. Psychology needs to become more contextual, better integrating theories and experiments with the surrounding social environment so that psychological science can predict concretely and accurately how individual psychology relates to overall social conditions.
Copyright © the Author(s) 2020
Received January 13, 2020
Revision received May 05, 2020
Accepted May 07, 2020