Inside Social Scientists’ New Data Dilemma
Sociologist, dean and professor at New York University, Dalton Conley is one of a growing group of social scientists now pushing large, for-profit companies that collect social information to share it with social scientists for the purposes of research.
Conley and several of his colleagues recently penned a joint opinion piece in The Chronicle of Higher Education in which they criticize social media companies for collecting historic amounts of social data, only to legally maintain these data behind proprietary firewalls. He discussed these issues — and their possible resolution — in depth with EdTech: Focus on Higher Education Managing Editor Tara E. Buck.
EDTECH: You are part of a group that has called for companies to open their data to social scientists for research purposes. Why is this so important?
CONLEY: We freely give these Internet companies an incredible amount of social information that they keep in silos, releasing only to those they see fit. Given that they have become near monopolies on social life, there is room for regulation that would require them to take part in some sort of de-identifying, data-sharing mechanism in the name of the public good.
We have the tools to analyze these data, but we did not create these platforms where people are voluntarily pouring their data. All we are saying is, please let us in.
EDTECH: Do you have any thoughts on how something like that would be structured?
CONLEY: One solution is to create an infrastructure that allows researchers to request data samples for specific hypotheses. Researchers would apply to receive access to de-identified data, just as they do now with government entities such as the IRS.
Researchers are accustomed to working with restricted data sets, so gaining access to the huge samples available on places like Facebook or Twitter would be great.
EDTECH: These companies already are under pressure from the government to share data for national security purposes. Why would they bend to the social sciences?
CONLEY: We definitely don’t have a kind of intimidating presence that the NSA has. We are talking about transparency and openness and the sharing of data, not using data in ways that will make people feel uncomfortable.
Just like Bell Telephone was too big and was broken up, when you are the size of many of these social networks, you exit the realm of the free market. You now are in the realm of monopoly, where in order to maintain your size, you need to begin to cut some deals with society, with the government, and I think this could be one of them.
EDTECH: Is there any worry that social science as a data-driven discipline is in jeopardy?
CONLEY: In my worst fear, yes, but that’s not to say it’s something that is highly possible. It is something I think about. The fear is that, given how people communicate today with technology, if you are not on the inside — if you don’t have access to these data — then you are shut out. In the future, it’s not going to be seen as acceptable science to survey even 300 people. It’s going to be considered biased and too small a sample size compared with the billions of data points available through something like Facebook messaging.
That’s a universe that covers nearly everyone between the ages of 18 and 35 in North America, so the way we previously gathered information will become irrelevant.
To the extent that these for-profit private entities are not regulated in terms of data sharing requirements, the reality is that if you are not on the inside you are not going to be able to research. That’s what I worry about.
EDTECH: How do you see your group’s work moving forward?
CONLEY: We hope to show examples of productive new forms of data linkage and collection that produce social science results that would not have been achieved without access to these data.
We hear their concerns about sharing data and what their motivations would or would not be, but I think we have some stuff to offer them. There has been a causal revolution in social sciences over the past 30 years that gets beyond correlations; we understand how to estimate cause and effect in observational data without formal experiments.
That kind of work has not completely penetrated the Big Data computer science world, so I think there is room on both sides to learn from one another, for the betterment of everyone.