Big Data: The new frontier or a methodological nightmare?
Big Data refers to the enormous amount of information now possessed by companies, that we offer up in our day-to-day lives. In Google searches, Facebook wall posts, or any purchase we are contributing to the vast amount of data, and allowing companies to make predictions about how we will behave. The use of patterning, statistical analysis and algorithms give these companies a perceived ability to ‘predict the future’; ranging from suggesting future purchases to tracking the flu virus through Internet searches. Cheerleaders of this phenomenon (for example Mayer-Schonberger and Cukier 2013) see it is an extremely useful tool that will revolutionise our lives, unequivocally for the better. An opposing view comes from Evgeny Morozov (amongst others) who criticises ‘technological solutionism’ (Morozov 2013) arguing that these benefits are over-stated, and blind us to imbedded structural issues that cannot be solved by ‘more’ data. Whilst there is not space in this blog post to explore these views in full, the debate raises a consideration for Sociology: are there methodological issues with using Big Data, and what are the implications for the social sciences?
Historically, much technological development has been accompanied by claims that it will help make the working day shorter, more compact, and more mobile. Techno-utopianism has become second nature, and we praise technological research and development and the ways in which it makes our lives ‘easier’. The oft cited (and perhaps over-used) “work/life balance” is supposedly improved by workers being able to access the office from anywhere, and at any time. However, ‘mobile’ working creates in many cases a way to track when and where work is done. This creates a panopticon of surveillance, which, it has been argued, often does not alleviate problems of overwork, instead re-producing a culture of presenteeism in a disembodied form (Gregg 2011).
Google is one step beyond this, using the principles of Big Data throughout their business. A recent online article claims that Google has ‘reinvented HR’ with their internal use of Big Data. They quantify a huge range of people-related concepts, such as ‘retention likelihood’, ‘happiness’, ‘productivity’, ‘diversity’, ‘performance value’, ‘health’ and ‘talent’ (Sullivan 2013). For one study, “Project Oxygen”, they sought to discover what made up an effective manager. By gathering data across more than a hundred variables they established a ranked list of desirable traits, the most significant being that managers’ technical expertise is less important than being approachable and accessible. A simple, and rather obvious critique of this is that it is not nearly as new a finding as Google seems to think – the importance of approachability rather than skill with regards to management was argued for example by Kanter in 1977, and countless studies since. Google then goes on to assert the importance of research like this, boasting that rather than dictating to their company they are able to back up their decisions with data (see Sullivan 2013). This seems however to conflate the use of the word ‘data’ with ‘evidence’ or even ‘logical reasoning’, something that one would naturally expect from any business decision. Companies should be able to justify their decisions with reasons or ‘evidence’ otherwise what exactly are they basing them on?
What this seems to be then, is an over quantification of a ‘problem’ and its ‘solution’. Coming from a qualitative background I am admittedly sceptical and concerned with the quantification of what are clearly qualitative issues. Equally problematic is assuming that research on an enormous scale is bound to find something that has not been revealed previously, or has only been argued ‘without data’ as if that made it ‘bad’ research. In a recent article Chris Anderson claims that the Big Data means that we can do away with the need for hypothesis and theory. He states; “Correlation is enough…we can analyze the data without hypotheses about what it might show…statistical algorithms [can] find patterns where science cannot.” This raisies significant problems, as it places a great deal of power in the design of the algorithm without being aware of the potential pitfalls. As pointed out in a recent piece in the New York Times, the problem with the use of large data sets is that when it ‘fails’ to deliver what the researcher wants it is often the data that is blamed, rather than the model, algorithm or human manipulating it: “You didn’t have enough data, there was too much noise, you measured the wrong things. The list of excuses can be long.”
Social research by its nature relies on two things – data (whether that be qualitative or quantitative) and the person collecting and manipulating it. Consequently the validity of any study will depend on these two factors. This makes the research fallible, and a great deal of time in research methods training is given over to being aware of the weaknesses in any methodology. This reflexivity and self-critique does not seem to apply to Big Data however. Perhaps then the problem with Big Data is that we treat it as something new, without applying the same scrutiny. We must ask ourselves how Sociology can engage with its usage; and ensure we are not enamored by the sheer volume of data to the point of forgetting to be critical of the study design or the conclusions drawn.