Introduction: Books and papers are the most relevant source of theoretical knowledge for medical education. New technologies of artificial intelligence can be designed to assist in selected educational tasks, such as reading a corpus made up of multiple documents and extracting relevant information in a quantitative way. Methods: Thirty experts were selected transparently using an online public call on the website of the sponsor organization and on its social media. Six books edited or co-edited by members of this panel containing a general knowledge of breast cancer or specific surgical knowledge have been acquired. This collection was used by a team of computer scientists to train an artificial neural network based on a technique called Word2Vec. Results: The corpus of six books contained about 2.2 billion words for 300d vectors. A few tests were performed. We evaluated cosine similarity between different words. Discussion: This work represents an initial attempt to derive formal information from textual corpus. It can be used to perform an augmented reading of the corpus of knowledge available in books and papers as part of a discipline. This can generate new hypothesis and provide an actual estimate of their association within the expert opinions. Word embedding can also be a good tool when used in accruing narrative information from clinical notes, reports, etc., and produce prediction about outcomes. More work is expected in this promising field to generate "real-world evidence."
Natural Language Processing to Extract Meaningful Information from a Corpus of Written Knowledge in Breast Cancer: Transforming Books into Data
Catanuto G.;
2023-01-01
Abstract
Introduction: Books and papers are the most relevant source of theoretical knowledge for medical education. New technologies of artificial intelligence can be designed to assist in selected educational tasks, such as reading a corpus made up of multiple documents and extracting relevant information in a quantitative way. Methods: Thirty experts were selected transparently using an online public call on the website of the sponsor organization and on its social media. Six books edited or co-edited by members of this panel containing a general knowledge of breast cancer or specific surgical knowledge have been acquired. This collection was used by a team of computer scientists to train an artificial neural network based on a technique called Word2Vec. Results: The corpus of six books contained about 2.2 billion words for 300d vectors. A few tests were performed. We evaluated cosine similarity between different words. Discussion: This work represents an initial attempt to derive formal information from textual corpus. It can be used to perform an augmented reading of the corpus of knowledge available in books and papers as part of a discipline. This can generate new hypothesis and provide an actual estimate of their association within the expert opinions. Word embedding can also be a good tool when used in accruing narrative information from clinical notes, reports, etc., and produce prediction about outcomes. More work is expected in this promising field to generate "real-world evidence."I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.