The challenge: Find the best research

Within medical research, there is currently a range of different measures used to assess which research is best. But critics argue that the measurement has become an end in itself. Alternative methods are available such as those that highlight the research's impact on society.

Thomas Perlmann. Photo: Ulf Sirborn

“In our case, the assessment is both easier and more difficult than for others. More difficult because it absolutely cannot be wrong, easier as we have time on our side. A scientific breakthrough leaves an impression that can be assessed retrospectively. It is no accident that it can take 20 years before a discovery is rewarded,” says Thomas Perlmann, professor of molecular developmental biology at the Department of Cell and Molecular Biology, recently appointed secretary of the Nobel Committee and one of those who will decide who receives this year's Nobel Prize in Physiology or Medicine.

There are many possible answers to the question of what makes good research. One is that the question at issue must be relevant and that the study set up in the attempt to answer the question must be designed correctly. The format of the study has an impact on the chances of interpreting the results, for example if there are sufficient patients in order to draw statistically significant conclusions. Another is that the research is original and pioneering, something that is emphasised in the context of the Nobel Prize. Alfred Nobel's will stipulates that the research that is rewarded with the Nobel Prize in Physiology or Medicine has to be a discovery.

“Such a breakthrough will change ways of thinking from their very foundations and there has to be a clear before and after in the field, or an entirely new field has to be opened up,” says Thomas Perlmann.

Good research also has to be ethical and conducted well. This means avoiding, for example, a large number of people for whom the treatment has had no effect leaving the study, that the samples taken are mixed together or contaminated or that straightforward fraud means that the results cannot be trusted. In addition, it can be argued that the research has to be of benefit; in other words, that it is possible to make use of the results.

Accordingly, there are many facts that have an impact on whether the research is good and thus much that can go wrong. The Nobel Committee is able to avoid these pitfalls, primarily by penetrating the research deep inside. In other cases, for example when someone is to be appointed to a post or a funding body is to allocate research funding, other assessment criteria are used for reasons of time and resources. What is measured is primarily citations, which means that a certain scientific article is used as a reference in a subsequent article (please refer to the glossary). The articles cited most frequently are often those that are published in highly ranked journals, with a high impact factor. Accordingly, it is considered important to be published in these journals.

But this can cause problems. Within the research community, we often talk about something called “the rush to publish”, i.e. that is it becoming more important to publish in the right journal than to do good research. This can result in a lack of quality. Thomas Perlmann provides an example:

“Imagine that the editor of the very highly regarded journal Nature says that the article you have submitted was interesting, but that complementary experiments are required in order to get the article published. It is easy then to think that 'I've got my foot in the door of Nature, no silly experiment is going to prevent me from publishing!'”.

This increases the risk that the individual researcher fudges their data and perhaps chooses the experiment that supports the thesis. This sort of sliding involves publishing results that are not as robust as they can be. This puts obstacles in the way of other important quality criteria, namely that it is not possible for the results to be repeated by other researchers. This reproducibility involves other researchers and research groups being able to obtain the same results if they repeat the study, which is the test that shows that the results are reliable.

A lack of reproducibility has been noted a great deal in recent years. For example, there was great excitement a few years ago when researchers at the pharmaceutical company Amgen attempted to reproduce 53 studies that had had a major impact in the field of cancer research. It was only possible to reproduce six of these. Thomas Perlmann also describes how an entire “story” is required in order to be able to publish in the highly ranked journals.

“The article has to contain everything! An exciting hypothesis, a series of experiments in several different model systems – from the cellular level to experiments on humans – and preferably a breakthrough,” he says and continues:

“What happens then is that what is presented sometimes appears worryingly complete, with all the pieces of the puzzle sitting a little too perfectly in their place. Those of us who are looking for real breakthroughs know that these are very rare. There are definitely not scores of these in each issue of the established journals, even though the articles are often presented in this way.”

Critics of the current system have therefore questioned whether the journals in which a researcher publishes and the number of citations the publication receives is actually a good measure of the quality of their research. Pauline Mattsson, researcher at the Department of Learning Informatics, Management and Ethics, has interviewed 34 Nobel laureates in physiology or medicine and collated information about publications, patents and collaboration with industry as part of a current project.

“When we look at all the articles they have published, in half of the cases, their most cited article is also the one for which they were awarded the Nobel Prize. But in the other cases, it was other articles that were the most cited. This may, for example, involve methodology that has been refined and developed,” she says.

Many citations also does not necessarily mean that the research is “better”. The number of citations is also influenced by other factors such as when an article is published. If it was published a long time ago, this means there is a chance it has more citations. The type of article also has an effect, review articles result in more citations, as does the field the article concerns – articles within medicine and biochemistry traditionally reference many articles, while ethics or physics have few references. The size of the field also has a role to play, whether there are 100 or 10,000 researchers in the field. Henrik Schmidt works as a librarian providing services to researchers at Karolinska Institutet's library in Solna. He emphasises the problem of individual measures.

“What you have to bear in mind is that these quantitative measurement methods are not a good way of measuring individual researchers' performance, but they can give an indication of the quality of the research conducted when they are applied to a university or large institution. At the very least, there is often a correlation between high values in quantitative measures and high scores from qualitative analyses,” he says.

That is why new ways of measuring research have been developed in recent years. The Research Excellence Framework (REF2014) was launched in the United Kingdom as a new way to measure research quality in order to allow research funding to be allocated better. An important aspect was to incorporate societal benefit into the equation. This made an impression on the previous Swedish Government and it tasked the major research councils with developing an equivalent Swedish model. The FOCUS model for research quality evaluation in Sweden was presented in December 2014. According to the this model, 70 per cent of the importance is placed on the quality of the research as assessed by the opinions of panels of experts and analysis of citations, 15 per cent on development potential (e.g. how many doctoral students there are in the field or number of collaborations) and 15 per cent on societal benefit. Nothing has happened since then.

“I personally believe that this model would be good, but it is doubtful whether it will be applied. On the one hand, it can be difficult to appoint independent experts, and on the other, there is a risk of it becoming overly bureaucratic and costly. Researchers don't want to spend even more time on administration,” says Henrik Schmidt.

Various commercial enterprises have seen that a need exists and now supply what are known as altmetric services. Using various algorithms, they calculate how great an impact a certain scientific article has and present this compiled data as a figure. This takes into account how many people have read the article, downloaded it, commented on it and also whether it has been mentioned in a news context or been written about in Wikipedia. In contrast to the traditional measures, whether the article is referred to in social media has an impact. One benefit of these measures is speed – you can get clicks on your article as soon as it is published online, while it may take up to a year before you receive your first citation. Another benefit is breadth. But the disadvantages are obvious.

“If your research subject is sexy, like diet and health, you get a lot more attention, regardless of the quality of the research. In addition, it is easy to manipulate by asking people to spread the article on social media,” points out Henrik Schmidt.

Something that has been introduced in recent years is that it is now possible to comment on individual scientific articles published on the research database PubMed. However, this requires you to be registered and have published articles there yourself. This does not involve counting the number of comments, rather it is about gaining valuable responses from colleagues.

“It is one way to talk about the pros and cons of the published research in a manner that is similar to how people discuss published articles in so-called journal clubs in an individual institution. The difference is that it takes place online,” says Henrik Schmidt.

He also highlights the Leiden Manifesto, that was launched in April 2015. In this, researchers have compiled ten points for improved evaluation of research quality. One important aspect here is to see various quantitative indicators for what they are – just indicators – and combine them with a careful review of the research that is being assessed. The measurement methods should also be suited to the aim of the research, whether it is about immediate societal benefit or the development of a field of research. Not least, the various evaluation methods should be transparent and also constantly evaluated and updated.

Gert Helgesson. Photo: Stefan Zimmerman

Gert Helgesson, professor of medical ethics at the Stockholm Centre for Healthcare Ethics at Karolinska Institutet, also agrees with much of the criticism levelled against the use of individual indicators as measures of research quality. He emphasises the differences between different field of research.

“In medical ethics, for example, it is extremely rare for our results to be published in the journals that have the highest impact factor, although it does happen. I said to my students recently that it was previously a case of 'publish or perish'. Instead the situation is now 'publish AND perish' – getting published doesn't help if it is not in the best journals,” he says and continues;

“But having said this, I must emphasise that attempting to measure the quality of the research in a formalised and objective way is, in principle, a good idea as it combats nepotism, in other words preferential treatment not based on a person's actual merits. This is something not to forget.”

Facts: Glossary

Citation: When researchers choose to refer to other researcher results in their papers, you say that their article is cited. The number of citations (the citation rate) can be used a measure of how significant a certain article is.

Impact factor: Impact factor is calculated by annually compiling how often published scientific articles that are published in the journal. The impact factor is the average of all articles published in the journal over the course of the period. Accordingly, it says nothing about how often an individual researcher's article is cited.

Peer review: This is the system whereby every study sent into a journal is reviewed by other researchers in the same field or one that is closely related.

Facts: How results are published

The articles written by a researcher are sent in to a scientific journal in the pertinent field, for example stem cell research or clinical oncology.
Once the journal has received it, the study and its results are reviewed by other researchers who assess how the study has been conducted and how the data has been interpreted. This then forms the basis of whether or not the article is accepted.
The researcher then receives a response. If the article is not accepted for publication, they can try sending it to another journal. A journal with a high impact generally accepts a lower proportion of the articles submitted to it, in some cases only five per cent.
If the article is accepted for publication, complementary experiments or reasoning are often required. All in all, this is a time-consuming process and it can take six months to a year before a submitted article is published.
When the article is finally published, others can read and cite it. Medical articles are collected in a database, PubMed, in which it is possible to search by author, journal or keywords.

Text: Lotta Fredholm, first published in the magazine Medical Science no 2, 2016.