探花直播 of Cambridge - natural language processing /taxonomy/subjects/natural-language-processing en AI system may accelerate search for cancer discoveries /research/news/ai-system-may-accelerate-search-for-cancer-discoveries <div class="field field-name-field-news-image field-type-image field-label-hidden"><div class="field-items"><div class="field-item even"><img class="cam-scale-with-grid" src="/sites/default/files/styles/content-580x288/public/news/research/news/crop_98.jpg?itok=o2UdvOBH" alt="Skin cancer cells from a mouse show how cells attach at contact points" title="Skin cancer cells from a mouse show how cells attach at contact points, Credit: NIH Image Gallery" /></div></div></div><div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"><p> 探花直播system, called <a href="https://lbd.lionproject.net/">LION LBD</a> and developed by computer scientists and cancer researchers at the 探花直播 of Cambridge, has been designed to assist scientists in the search for cancer-related discoveries. It is the first literature-based discovery system aimed at supporting cancer research. 探花直播<a href="https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/bty845/5124276">results</a> are reported in the journal <em>Bioinformatics</em>.聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽</p>&#13; &#13; <p>Global cancer research attracts massive amounts of funding worldwide, and the scientific literature is now so huge that researchers are struggling to keep up with it: critical hypothesis-generating evidence is now often discovered long after it was published.</p>&#13; &#13; <p>Cancer is a complex class of diseases that are not completely understood and are the second-leading cause of death worldwide. Cancer development involves changes in numerous chemical and biochemical molecules, reactions and pathways, and cancer research is being conducted across a wide variety of scientific fields, which have variability in the way that they describe similar concepts.</p>&#13; &#13; <p>鈥淎s a cancer researcher, even if you knew what you were looking for, there are literally thousands of papers appearing every day,鈥 said Professor Anna Korhonen, Co-Director of Cambridge鈥檚 Language Technology Lab who led the development of LION LBD in collaboration with Dr Masashi Narita at Cancer Research UK Cambridge Institute and Professor Ulla Stenius at Karolinska Institutet in Sweden. 鈥淟ION LBD uses AI to help scientists keep up-to-date with published discoveries in their field, but could also help them make new discoveries by combining what is already known in the literature by making connections between sources that may appear to be unrelated.鈥</p>&#13; &#13; <p> 探花直播鈥楲BD鈥 in LION LBD stands for Literature-Based Discovery, a concept developed in the 1980s which seeks to make new discoveries by combing pieces of information from disconnected sources. 探花直播key idea behind the original version of LBD is that concepts that are never explicitly linked in the literature may be indirectly linked through intermediate concepts.</p>&#13; &#13; <p> 探花直播design of the LION LBD system allows real-time search to discover indirect associations between entities in a database of tens of millions of publications while preserving the ability of users to explore each mention in its original context.</p>&#13; &#13; <p>鈥淔or example, you may know that a cancer drug affects the behaviour of a certain pathway, but with LION LBD, you may find that a drug developed for a totally different disease affects the same pathway,鈥 said Korhonen.</p>&#13; &#13; <p>LION LBD is the first system developed specifically for the needs of cancer research. It has a particular focus on the molecular biology of cancer and uses state-of-the-art machine learning and natural language processing techniques, in order to detect references to the hallmarks of cancer in the text. Evaluations of the system have demonstrated its ability to identify undiscovered links and to rank relevant concepts highly among potential connections.</p>&#13; &#13; <p> 探花直播system is built using open data, open source and open standards, and is available as an interactive web-based interface or a programmable API.</p>&#13; &#13; <p> 探花直播researchers are currently working on extending the scope of LION-LBD to include further concepts and relations. They are also working closely with cancer researchers to help and improve the technology for end users.</p>&#13; &#13; <p> 探花直播system was developed in collaboration with 探花直播 of Cambridge Language Technology Lab, Cancer Research UK Cambridge Institute, and Karolinska Institutet in Sweden, and was funded by the Medical Research Council.</p>&#13; &#13; <p><strong><em>Reference:</em></strong><br /><em>Sampo Pyysalo et al. 鈥<a href="https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/bty845/5124276">LION LBD: a Literature-Based Discovery System for Cancer Biology</a>.鈥 Bioinformatics (2018). DOI: 10.1093/bioinformatics/bty845</em></p>&#13; </div></div></div><div class="field field-name-field-content-summary field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"><p><p>Searching through the mountains of published cancer research could be made easier for scientists, thanks to a new AI system.聽</p>&#13; </p></div></div></div><div class="field field-name-field-content-quote field-type-text-long field-label-hidden"><div class="field-items"><div class="field-item even">As a cancer researcher, even if you knew what you were looking for, there are literally thousands of papers appearing every day</div></div></div><div class="field field-name-field-content-quote-name field-type-text field-label-hidden"><div class="field-items"><div class="field-item even">Anna Korhonen</div></div></div><div class="field field-name-field-image-credit field-type-link-field field-label-hidden"><div class="field-items"><div class="field-item even"><a href="https://www.flickr.com/photos/nihgov/26192443504/in/photolist-9osuSC-GkY3ES-24pHjEE-4QaBa-4QaC6-FUx7Vw-wyPJtV-HJpd72-H4YPGs-KKkpaU-EPwcbP-27Le5Du" target="_blank">NIH Image Gallery</a></div></div></div><div class="field field-name-field-image-desctiprion field-type-text field-label-hidden"><div class="field-items"><div class="field-item even">Skin cancer cells from a mouse show how cells attach at contact points</div></div></div><div class="field field-name-field-cc-attribute-text field-type-text-long field-label-hidden"><div class="field-items"><div class="field-item even"><p><a href="http://creativecommons.org/licenses/by/4.0/" rel="license"><img alt="Creative Commons License" src="https://i.creativecommons.org/l/by/4.0/88x31.png" style="border-width:0" /></a><br />&#13; 探花直播text in this work is licensed under a <a href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>. Images, including our videos, are Copyright 漏 探花直播 of Cambridge and licensors/contributors as identified.聽 All rights reserved. We make our image and video content available in a number of ways 鈥 as here, on our <a href="/">main website</a> under its <a href="/about-this-site/terms-and-conditions">Terms and conditions</a>, and on a <a href="/about-this-site/connect-with-us">range of channels including social media</a> that permit your use and sharing of our content under their respective Terms.</p>&#13; </div></div></div><div class="field field-name-field-show-cc-text field-type-list-boolean field-label-hidden"><div class="field-items"><div class="field-item even">Yes</div></div></div><div class="field field-name-field-license-type field-type-taxonomy-term-reference field-label-above"><div class="field-label">Licence type:&nbsp;</div><div class="field-items"><div class="field-item even"><a href="/taxonomy/imagecredit/attribution-noncommerical">Attribution-Noncommerical</a></div></div></div> Tue, 27 Nov 2018 12:09:25 +0000 sc604 201522 at Mining the language of science /research/news/mining-the-language-of-science <div class="field field-name-field-news-image field-type-image field-label-hidden"><div class="field-items"><div class="field-item even"><img class="cam-scale-with-grid" src="/sites/default/files/styles/content-580x288/public/news/research/news/111114-copyright-istockphotoenot-poluskun.jpg?itok=Z9IOnV6Z" alt="Categorising textual information" title="Categorising textual information, Credit: 漏iStockphoto/Enot Poluskun" /></div></div></div><div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"><p>Ask any biomedical scientist whether they manage to keep on top of reading all of the publications in their field, let alone an adjacent field, and few will say yes. New publications are appearing at a double-exponential rate, as measured by MEDLINE 鈥 the US National Library of Medicine鈥檚 biomedical bibliographic database 鈥 which now lists over 19 million records and adds up to 4,000 new records daily.</p>&#13; <p>For a prolific field such as cancer research, the number of publications could quickly become unmanageable and important hypothesis-generating evidence may be missed. But what if scientists could instruct a computer to help them?</p>&#13; <p>To be useful, a computer would need to trawl through the literature in the same way that a scientist would: reading the literature to uncover new knowledge, evaluating the quality of the information, looking for patterns and connections between facts, and then generating hypotheses to test. Not only might such a program speed up the progress of scientific discovery but, with the capacity to consider vast numbers of factors, it might even discover information that could be missed by the human brain.</p>&#13; <p> 探花直播aim of Dr Anna Korhonen and researchers in the Natural Language and Information Processing Group in the 探花直播 of Cambridge's Computer Laboratory is to develop computers that can understand written language in the same way that humans do. One of the projects she is involved in has recently developed a method of 鈥榯ext mining鈥 one of the most literature-dependent areas of biomedicine: cancer risk assessment of chemicals.</p>&#13; <p>Every year, thousands of new chemicals are developed, any one of which might pose a potential risk to human health. Complex risk assessment procedures are in place to determine the relationship between exposure and the likelihood of developing cancer, but it鈥檚 a lengthy process, as Royal Society 探花直播 Research Fellow Dr Korhonen explained: 鈥 探花直播first stage of any risk assessment is a literature review. It鈥檚 a major bottleneck. There could be tens of thousands of articles for a single chemical. Performed manually, it鈥檚 expensive and, because of the rising number of publications, it鈥檚 becoming too challenging to manage.鈥</p>&#13; <p>CRAB, the tool her team has developed in collaboration with Professor Ulla Stenius鈥 group at the Institute of Environmental Medicine at Sweden鈥檚 Karolinska Institutet, is a novel approach to cancer risk assessment that could help risk assessors move beyond manual literature review.</p>&#13; <p> 探花直播approach is based on text-mining technology, which has been pioneered by computer scientists, and involves developing programs that can analyse natural language texts, despite their complexity, inconsistency and ambiguity. 探花直播tool Dr Korhonen has developed with her colleagues is the first text-mining tool aimed at aiding literature review in chemical risk assessment.</p>&#13; <p>At the heart of CRAB, the development of which was funded by the Medical Research Council and the Swedish Research Council among others, is a taxonomy that specifies scientific evidence used in cancer risk assessment, including key events that may result in cancer formation. 探花直播system takes the textual content of each relevant MEDLINE abstract and classifies it according to the taxonomy. At the press of a button, a profile is rapidly built for any particular chemical using all of the available literature, describing highly specific patterns of connections between chemicals and toxicity.</p>&#13; <p>鈥淎lthough still under development, the system can be used to make connections that would be difficult to find, even if it had been possible to read all the documents,鈥 added Dr Korhonen. 鈥淚n a recent experiment, we studied a group of chemicals with unknown mode of action and used the CRAB tool to suggest a new hypothesis that might explain their male-specific carcinogenicity in the pancreas.鈥</p>&#13; <p> 探花直播tool will be available for end-users via an online web interface. However, research into improving text mining will continue. One of the biggest current challenges is to develop adaptive technology that can be ported easily between different text types, tasks and scientific fields.</p>&#13; <p>One day, rather than being at the mercy of the flourishing rate of publication, scientists will have at their fingertips a system to work alongside them that will not only point them towards those references that are relevant to their search, but will also tell them why.</p>&#13; </div></div></div><div class="field field-name-field-content-summary field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"><p><p>Scientists are developing a computer that can read vast amounts of scientific literature, make connections between facts and develop hypotheses.</p>&#13; </p></div></div></div><div class="field field-name-field-content-quote field-type-text-long field-label-hidden"><div class="field-items"><div class="field-item even">Although still under development, the system can be used to make connections that would be difficult to find, even if it had been possible to read all the documents.</div></div></div><div class="field field-name-field-content-quote-name field-type-text field-label-hidden"><div class="field-items"><div class="field-item even">Dr Anna Korhonen</div></div></div><div class="field field-name-field-image-credit field-type-link-field field-label-hidden"><div class="field-items"><div class="field-item even"><a href="/" target="_blank">漏iStockphoto/Enot Poluskun</a></div></div></div><div class="field field-name-field-image-desctiprion field-type-text field-label-hidden"><div class="field-items"><div class="field-item even">Categorising textual information</div></div></div><div class="field field-name-field-cc-attribute-text field-type-text-long field-label-hidden"><div class="field-items"><div class="field-item even"><p><a href="http://creativecommons.org/licenses/by-nc-sa/3.0/"><img alt="" src="/sites/www.cam.ac.uk/files/80x15.png" style="width: 80px; height: 15px;" /></a></p>&#13; <p>This work is licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/3.0/">Creative Commons Licence</a>. If you use this content on your site please link back to this page.</p>&#13; </div></div></div><div class="field field-name-field-show-cc-text field-type-list-boolean field-label-hidden"><div class="field-items"><div class="field-item even">Yes</div></div></div> Fri, 18 Nov 2011 09:00:55 +0000 lw355 26480 at