Following the traces of research in society
Gustaf Nelhans, Senior Lecturer in Library and Information Science, spent the earlier years of his research career investigating the distribution of research funding. He pointed out problems that existed with the system used at the national level, in which institutions received grants based on how often their researchers' work was cited. Why they were cited was not something the system took account of, and researchers' ranking at the individual level could even be raised if they were referred to by someone who pointed out that they were wrong.
Gustaf Nelhans often works with digital methods for analysing data. Most of the methods are still relatively new, and the way they are best used is still at the research stage. Examples of techniques and methods that he and his colleagues use are data mining and topic modelling.
Data mining means that a computerised review is made of large amounts of text with the purpose of finding patterns, links, and trends in the text material. An example might be to look for valuing words that appear in texts associated with particular phenomena. Data mining has been used for a long time in a number of scientific fields, for example, to search for medical data, IT forensics, or so-called "business intelligence" to create better business models based on customer behaviour.
Topic modelling is a form of automated text analysis that could be said to be a subcategory of data mining. The computer finds underlying variables in a text that point the researcher in the right direction regarding the theme of the text. For example, the computer may find that terms such as trees, barrels and foxes often appear in a text; then the researcher could categorise this as "life in the forest."
"In addition, this is a system that only takes into account the dissemination of different researchers' work within the research community, and not on the impact of the research on society as a whole. At the same time, there's a lot of talk about the third task: that researchers need to share their results so that they benefit society at large. But the method that the government tested in this year's allocation of resources is inadequate, as it only values the plans of the institutions and not the outcomes," he says.
A couple of years ago, he participated in an EU project that looked at the impact of research into guidelines for the treatment of diseases in five different areas. His task was to seek out all references to scientific articles that were contained in a number of healthcare guidelines.
"Then I thought that my elbow will never let me do this manually, with all the clicking. Instead, I began to think about what automated solutions were available."
Around the same time, he happened to come into contact with a company in Borås: Minso Solutions AB. The company, like himself, had begun to consider the benefits of looking for research references in clinical guidelines. They saw how they could benefit mutually and a collaboration began on a new project. The company had the ability to encode algorithms that allowed research references to be mapped automatically. Suddenly, it did not matter if there were 10,000 or 100,000 references that had to be gone through--the computers did the job of compiling the data. When the compilations were complete, Gustaf Nelhans took over and examined how the computerised compilations could be used. He asked questions about what happens when higher education institutions are compared at this level: can you gather enough data, and what can you really see? Was it possible to see if the research led to any form of impact in professional practice? Against the background of the results of his research, the Swedish Research Council (Vetenkapsrådet, or VR in Swedish) decided to go ahead and try a new model as part of the distribution of so-called ALF funds, which constitute the major part of funding for clinical research in Sweden.
"The new model taking into account the impact of medical research on clinical practice was used as evidence this year when VR panels made evaluations regarding ALF. So far, everyone seems happy and satisfied, but we will see if the pilot project is made permanent for future ALF evaluations.”
At the same time that his previous projects have had a significant influence, he is participating in the newly-begun EU project "Data for Impact" along with colleagues from several European countries. He explains that the area of data analysis has grown and that he and colleagues therefore are investigating whether a more comprehensive, deeper, and more thorough analysis can be carried out than before.
"We see a variety of trends in data analysis. For example, automated techniques can be used to analyse text, but also there are digital arenas where we can find people discussing different research results; this may concern both the classic digital channels of the digital news media, but also social media and debate forums. At the moment, we are working extensively with our lawyers to ensure that what we do is correct in relation to GDPR (the new EU directive that deals with, among other things, the processing of personal data, Editor's Note).”
In addition, this is a system that only takes into account the dissemination of different researchers' work within the research community, and not on the impact of the research on society as a whole. At the same time, there's a lot of talk about the third task: that researchers need to share their results so that they benefit society at large.
By not just allowing the algorithms to calculate the prevalence of different references, but also putting different words and expressions in relation to each other, he hopes to be able to address the problem that the statistics do not take into account content. This is an example of where his colourful pictures come into play.
"Often it can help to visualise the data we receive in different ways. Using colour codes, for example, we can categorise different terms and put them together in clusters, or we can produce an image like nodes and arcs showing how often different words occur together."
He says that the visualisations can be used in an exploratory fashion to identify themes and relationships, rather than to look at rankings and competition between different researchers. An example of what can be seen is whether the name on a particular set of articles appears together with descriptive terms that categorise them or with evaluating terms such as "good" or "bad" that can show how the research has been used.
"With regard to these newer data analysis methods that we are applying, it is exciting to see to what extent different research activities show such a systematic comparability that it is possible to combine text, reference practices and, for example, attendance at conferences to create rich sets of data to analyse. It is much more rewarding than just evaluating research with an individual parameter, such as citation frequency, which also risks moving or shifting the target away from producing important research that makes a large knowledge contribution."
"On the other hand, this type of interaction means that we are more likely to compromise personal integrity when every activity that a researcher performs can contribute to an evaluation system. There is a great responsibility here to be humble about and responsive to the challenging situations that may arise," he concluded.
Text Helen Rosenberg
Illustration Johan Lindh
Photo Ulf Nilsson
Translation Eva Medin