Abstract
One potential application of neural networks (NNs) is the early-stage detection of oral cancer. This systematic review aimed to determine the level of evidence on the sensitivity and specificity of NNs for the detection of oral cancer, following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and Cochrane guidelines. Literature sources included PubMed, ClinicalTrials, Scopus, Google Scholar, and Web of Science. In addition, the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool was used to assess the risk of bias and the quality of the studies. Only 9 studies fully met the eligibility criteria. In most studies, NNs showed accuracy greater than 85%, though 100% of the studies presented a high risk of bias, and 33% showed high applicability concerns. Nonetheless, the included studies demonstrated that NNs were useful in the detection of oral cancer. However, studies of higher quality, with an adequate methodology, a low risk of bias and no applicability concerns are required so that more robust conclusions could be reached.
Keywords: oral cancer, oral neoplasms, medical informatics applications, computer neural networks, cancer early detection
Introduction
Artificial intelligence (AI) has significantly impacted the field of medicine,1 and much AI research focuses on the diagnosis and prognosis of cancer,2 neurological disorders3 and cardiovascular diseases,4 among others.5 Neural networks (NNs) constitute an area of AI. They contain sets of artificial neurons organized in superimposed layers – an input layer, n intermediate layers for data processing and a result layer.6 Deep learning (DL) is a combination of NN and machine learning; it enables the creation of computational models composed of multiple processing layers, able to learn the representations of data with multi-level abstraction.7 In DL, convolutional, recursive and recurrent NNs have been applied.8 Convolutional neural networks (CNNs) are a class of DL algorithms applied to medical image classification,9 including those used for cancer detection.10, 11, 12, 13
Oral cancer ranks sixth among the most common high-risk malignancies in middle-income countries globally.14 The most common type of oral cancer is oral squamous cell carcinoma (OSCC).15 Early diagnosis and treatment are crucial to improve patient survival. The histopathological examination of biopsy samples is the gold standard in diagnosing oral cancer. However, this approach is invasive and the samples require complex processing.16 The detection of oral cancer in situ results in survival rates as high as 82%, though these rates can decrease to 32% if metastases are detected.17 Therefore, an early diagnosis is essential, and recommendations state that any suspicious lesion that does not heal within 15 days after detection and the removal of the local causes of irritation should be biopsied.18, 19 Although the histopathological examination of biopsy specimens is the current reference method,20 there are still discrepancies (12%) between the initial diagnosis from the incisional biopsy and the final histopathology results following the excision of the lesion.21 However, many patients are reluctant to have a suspicious lesion biopsied by a clinician, for various reasons, including cost, fear of the procedure, concerns about healing, and esthetics. As a result, patients often postpone the biopsy to get a second opinion on its necessity. Therefore, research groups have proposed other diagnostic methods that are logistically more accessible. One of such approaches is the use of NNs for the early diagnosis of oral cancer through the analysis of risk factors, laboratory tests and the images of the lesion.22, 23
The process of detection of oral cancer through imaging has different phases. Ideally, during the training phase, a set of images classified into different types, such as the normal region, the cancerous region and the precancerous region, is introduced to NN. The classified images allow NN to learn the characteristics of each set of images. Subsequently, in the testing phase, the image of a suspicious oral lesion is provided and the system outputs the predicted result.24 Therefore, the NN diagnosis of oral cancer can be made by clinicians working in remote areas, where biopsy processing is complicated.
For the reasons outlined above, this systematic review aimed to determine the level of evidence on the sensitivity and specificity of NNs for detecting oral cancer.
Material and methods
Study protocol registration
This systematic review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses of Diagnostic Test Accuracy Studies (PRISMA-DTA) statement25 and the Cochrane guidelines.26 The protocol was registered in the International Prospective Registry of Systematic Reviews (PROSPERO) (CRD42021256938). The articles included in the present systematic review were studies with an observational design (cohort and case–control studies). Case reports, case series, animal studies, pilot studies, short communications, and systematic reviews were excluded.
Eligibility criteria, information sources
and search strategy
The eligibility of the studies was determined using the modified PICO strategy (Patient/Population, Intervention, Comparison, and Outcome). Searches with no restriction on the publication date were carried out in PubMed, ClinicalTrials, Scopus, Google Scholar, and Web of Science in April 2021, and were updated in July 2022. The search strategies used for each database are shown in Table 1. A manual search was performed by reading the reference sections of the included studies.
To meet the eligibility criteria, studies needed to use NN for the analysis of images for the detection of oral cancer in humans, and assess the sensitivity, specificity, precision, and accuracy of NN in comparison with the histopathological examination. All studies that used NN for the prognosis of oral cancer, to determine the efficacy of oral cancer treatment or to classify the stages of oral cancer, as well as studies that used other methods of data collection (not imaging), were excluded.
Study selection
For the selection of studies, the title and abstract of each paper were read. Those which answered the research question were reviewed in full text to determine if they met the eligibility criteria. If the eligibility criteria were not met, the articles were eliminated with reasons, as shown in Figure 1.
Data collection and data extraction
The relevant data from the selected articles was extracted, processed and tabulated using a Microsoft Excel spreadsheet. Data extraction was performed independently by 2 reviewers (M.P.B.-C. and M.E.M.C.-G.).
Data synthesis
The results were formally synthesized by grouping the data according to the type of images used for cancer detection, which included photographic images, confocal laser endomicroscopy (CLE), hyperspectral imaging (HSI), optical coherence tomography (OCT), and high-resolution microendoscopy (HRME). The summary of the individual studies with the details of the relevant data, such as the type of images, the NN computing technique, comparators, and outcomes, are presented in the result tables.
Synthesis of the results
If the results of the studies showed high heterogeneity in methodological or population characteristics, a synthesis without a meta-analysis (SWiM) was performed using the qualitative synthesis27 and a representative graph.
Risk of bias and applicability
Two reviewers (R.T.-R. and L.A.-F.) assessed the risk of bias and the applicability of each study, using the modified Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool,28 which includes the patient selection, index test, reference standard, and flow and timing domains (Table 2). Any disagreement in the assessment of the risk of bias was resolved by the consensus of the research group.
Results
Study selection and study characteristics
The search of electronic databases and registries generated 1,248 records, of which 40 duplicate records were eliminated. After reading the titles and abstracts, it turned out that 30 articles answered the research question, and thus their full texts were retrieved. Subsequently, it was determined if they met the eligibility criteria, which resulted in the reasonable exclusion of 21 articles. For the qualitative analysis, 9 articles were included (Figure 1). The characteristics of each of the studies and the extracted data are shown in Table 3. The synthesis of the results without a meta-analysis is shown in Figure 2.
Synthesis of the results
Studies detecting oral cancer used various image types, including photographic images, CLE, HSI, OCT, and HRME.
Photographic images were used most frequently for the detection of oral cancer. Welikala et al. acquired images with a cell phone camera at the primary clinical care level22; Jubair et al. used various types of digital cameras and smartphones.29 Welikala et al., who used ResNet-101 for image classification and Faster R-CNN for object detection, reported a precision of 84.77% and a recall of 89.51%.22 Jubair et al. used a pre-trained EfficientNet-B0 as a lightweight transfer learning model for oral cancer detection, and reported a sensitivity of 86.7% and a specificity of 84.5%.29 On the other hand, Tanriver et al. collected photographic images of oral lesions with histopathological results from the archive of the Department of Tumor Pathology of the Oncology Institute at Istanbul University, Turkey; the rest of the images were collected from publicly available sources by using search engines (https://images.google.com and https://yandex.com/images).30 The dataset comprised a diverse set of lesions coming from a wide range of oral diseases and anatomical regions. The authors reported a precision of 87% and a recall of 86%.30 Likewise, Warin et al. retrospectively collected clinical oral photographs obtained between 2009 and 2018 at an oral and maxillofacial surgery center.31 They used DenseNet-121 for classification, and reported a sensitivity of 98.75% and a specificity of 100%.31 Finally, Fu et al. used biopsy-confirmed OSCC photographs from 11 hospitals in China, and reported a sensitivity of 91.0% and specificity of 93.5%.32
Confocal laser endomicroscopy is an adaptation of the conventional optical microscopy technique, in which the light from a laser source directed at a pinhole geometrically removes information from the outside of the focal plane and generates an optical plane at a specific depth from the surface.33 Aubreville et al. used 16-bit grayscale CLE images to analyze 4 regions of interest, including the inner lower lip, the upper alveolar ridge and the hard palate.34 The images acquired from suspicious lesions and 3 other areas that were assumed to be healthy resulted in a sensitivity of 86.6% and a specificity of 90.0%.34
Hyperspectral imaging acquires a three-dimensional (3D) data set called a hypercube, formed by 2 spatial dimensions and 1 spectral dimension. Using HSI provides information on tissue physiology, morphology and composition. One field of application for HSI is image classification for detecting tissues at risk of cancer.35 Jeyaraj et al. applied a novel CNN with 2 partitioned layers to label and classify the region of interest in multidimensional HSI, and reported a sensitivity of 94% and a specificity of 98%.36
Optical coherence tomography is a non-invasive high-resolution optical imaging technology that produces real-time cross-sectional images in two-dimensional (2D) space (a lateral coordinate and an axial coordinate).37 It is analogous to ultrasound imaging, except it uses light instead of sound, and is a powerful imaging technology for medical diagnosis, acting as a type of optical biopsy. However, unlike the conventional histopathological examination, which requires the extraction and processing of a tissue sample for microscopic evaluation, OCT can generate real-time images of the tissue.38 James et al. used OCT images to classify non-dysplasia, dysplasia and malignancy through artificial NN/machine learning, and reported a sensitivity of 93% and a specificity of 74% for OSCC identification.23
High-resolution microendoscopy enables real-time epithelial imaging with subcellular resolution. Numerous research studies on gastrointestinal neoplasms has indicated that HRME is a modality that provides high specificity and precision for diagnosing different diseases.39, 40 Yang et al. developed an algorithm to determine whether HRME images show enough oral epithelial nuclei to differentiate between oral cancer and benign tissue.41 Their study used 811 HRME images from 169 patients and demonstrated that HRME images were suitable for classifying oral cancer. The researchers reported a sensitivity of 75% and a specificity of 89%.41
Assessment of the risk of bias
and applicability
Regarding domain 1 (patient selection), all studies exhibited a high risk of bias, with the main issues being an inadequate selection of patients and a lack of investigator blinding. Furthermore, 1 study (11%) had a high risk of bias with regard to domain 2 (index test), as it used a small number of images.34 As many as 33% of the articles showed high applicability concerns (Figure 3).
Discussion
In this systematic review, the type of NN applied for the detection of oral cancer was analyzed. All studies used CNN, probably due to the ease of working with images. The comprehensive search aimed to identify whether the studies used an additional type of NN to support the oral cancer detection process. In this regard, Sharma and Om developed a probabilistic NN and general regression model for the early detection and prevention of oral cancer, using various indicators, such as clinical symptoms, medical history and personal history.42 This review identified an area of opportunity, which involves using CNN and other types of NN in the analysis of risk factors to provide a more reliable diagnosis of oral cancer, as this combination of data has not been assessed so far.
When verifying the accuracy of the algorithms used for oral cancer diagnosis, the images used for training must come from patients with the diagnosis confirmed through the histopathological examination. Studies were excluded if they did not report the gold standard for the validation of diagnosis, since the absence of an adequate comparator invalidates the results of such studies.
In the study by Tanriver et al., the training, validation and testing dataset was inadequate.30 They obtained some of the images from a hospital (validated by a histopathologist), but as the sample was insufficient, they sourced other images through searching publicly accessible repositories.30 However, such images do not provide the certainty of histopathological diagnostic validation.
Several studies tested the effect of the sample size during the training phase. Narayana et al. determined that a sample size of at least 50 was necessary.43 Fang et al. conducted a study that aimed to investigate the impact of the training sample size on the performance of organ self-segmentation (Eye L, Eye R, Lens L, Lens R, Optic nerve L, Optic nerve R, Parotid L, Parotid R, Spinal cord, Larynx, and Body) in computed tomography (CT) based on DL for head and neck cancer patients.44 They found that 200 samples were required to obtain a 90% yield for lenses and optic nerves, whereas the remaining organs needed at least 40 images for their detection.44 However, according to Narayana et al., the minimum training sample size depends on a number of factors, such as the acquisition protocol, the type of tissue to be segmented, and others.43 The results are not only associated with the dataset, but also with the specific CNN configuration.43 According to Samala et al., assessing the precision and accuracy of CNN architecture by using a test set may be overly optimistic.45 Therefore, validating the training process with unknown and independent cases derived from actual clinical practice is crucial. So far, no studies have tested the algorithms developed in this way.
Artificial intelligence can support the detection of cancer in its early stages. The evidence on the efficacy of CNN in image-based oral cancer detection demonstrated that NN could be used in daily clinical practice using photographs. This could be particularly helpful for clinicians in remote locations, where access to specialist oral pathology advice is limited.
Conclusions
Convolutional neural networks can potentially detect oral cancer in its early stages, though the results need to be verified by the corresponding histopathological examination. Most of the analyzed studies showed an accuracy greater than 85%. However, several studies encountered training problems due to the reduced number of images or because the testing process was performed on the same samples and not in clinical practice. In addition, the analysis of patient-specific risk factors and habits should complement these applications to formulate a more accurate diagnosis.
Ethics approval and consent to participate
Not applicable.
Data availability
The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.
Consent for publication
Not applicable.