Skip to main content

Project: Using text mining to aid cancer risk assessment and research

The CRAB, CHAT and LION are three different text mining-based tools for literature review and knowledge discovery. These tools have been developed for cancer risk assessment and research in collaboration with Language Technology Lab at University of Cambridge (UK) and can assist risk assessors and researchers with the management of large textual data and aid knowledge discovery. The tools are based on text mining and literature-based discovery - a growing field of computer science which discovers new knowledge by automatically extracting information from written texts. They can assist researchers and risk assessors in their work and contribute to effective management of health risks in the future. The LION project is a collaboration between the Language Technology Lab at University of Cambridge (UK) and the Cancer Research UK Cambridge Institute, Narita Group (UK).

Links to the tools:

  • CRAB – classifies the scientific literature according to carcinogenic mode of action.
  • CHAT – classifies the scientific literature according to the Cancer Hallmarks.
  • LION-LBD – a literature-based discovery system for cancer biology

Contact persons


Ulla Stenius

Telefon: 08-524 878 72
Enhet: Biokemisk toxikologi


  • Medical Research Councils, UK


LION LBD: a Literature-Based Discovery System for Cancer Biology.
Pyysalo S, Baker S, Ali I, Haselwimmer S, Shah T, Young A, et al
Bioinformatics 2018 Oct;():

AI system accelerates search for cancer discoveries.
Technology Networks, November 28, 2018.

Cancer Hallmarks Analytics Tool (CHAT): a text mining approach to organize and evaluate scientific literature on cancer.
Baker S, Ali I, Silins I, Pyysalo S, Guo Y, Högberg J, et al
Bioinformatics 2017 Dec;33(24):3973-3981


Grouping chemicals for health risk assessment: A text mining-based case study of polychlorinated biphenyls (PCBs).
Ali I, Guo Y, Silins I, Högberg J, Stenius U, Korhonen A
Toxicol. Lett. 2016 Jan;241():32-7

Automatic semantic classification of scientific literature according to the hallmarks of cancer.
Baker S, Silins I, Guo Y, Ali I, Högberg J, Stenius U, et al
Bioinformatics 2016 Feb;32(3):432-40

Evaluation of carcinogenic modes of action for pesticides in fruit on the Swedish market using a text-mining tool.
Silins I, Korhonen A, Stenius U
Front Pharmacol 2014 ;5():145

Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review.
Guo Y, Silins I, Stenius U, Korhonen A
Bioinformatics 2013 Jun;29(11):1440-7

Text mining for literature review and knowledge discovery in cancer risk assessment and research.
Korhonen A, Séaghdha D, Silins I, Sun L, Högberg J, Stenius U
PLoS ONE 2012 ;7(4):e33427

Data and literature gathering in chemical cancer risk assessment.
Silins I, Korhonen A, Högberg J, Stenius U
Integr Environ Assess Manag 2012 Jul;8(3):412-7

Weakly supervised learning of information structure of scientific abstracts--is it accurate enough to benefit real-world tasks in biomedicine?
Guo Y, Korhonen A, Silins I, Stenius U
Bioinformatics 2011 Nov;27(22):3179-85

A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment.
Guo Y, Korhonen A, Liakata M, Silins I, Hogberg J, Stenius U
BMC Bioinformatics 2011 Mar;12():69

Identifying the information structure of scientific abstracts: An investigation of three different schemes.
Guo Y, Sun L, Korhonen A, Liakata M, Silins I, Sun L and Stenius U (2010).
In Proceedings of Bio-Natural Language Processing (BioNLP) Uppsala, Sweden.

The first step in the development of Text Mining technology for Cancer Risk Assessment: identifying and organizing scientific evidence in risk assessment literature.
Korhonen A, Silins I, Sun L, Stenius U
BMC Bioinformatics 2009 Sep;10():303

User-Driven Development of Text Mining Resources for Cancer Risk Assessment.
Sun L, Korhonen A, Silins I, and Stenius U. (2009).
In Proceedings of the Natural Language Processing in Biomedicine (BioNLP) 2009. Boulder, Colorado.

A New Challenge for Text Mining: Cancer Risk Assessment.
Lewin I, Silins I, Korhonen A, Hogberg J and Stenius U. (2008).
In Proceedings of the ISMB BioLINK Special Interest Group on Text Data Mining. Toronto, Canada.