Published online Jun 28, 2022. doi: 10.35711/aimi.v3.i3.55
Peer-review started: January 19, 2022
First decision: March 12, 2022
Revised: April 12, 2022
Accepted: June 16, 2022
Article in press: June 16, 2022
Published online: June 28, 2022
Processing time: 159 Days and 22 Hours
Much of the published literature in Radiology-related Artificial Intelligence (AI) focuses on single tasks, such as identifying the presence or absence or severity of specific lesions. Progress comparable to that achieved for general-purpose computer vision has been hampered by the unavailability of large and diverse radiology datasets containing different types of lesions with possibly multiple kinds of abnormalities in the same image. Also, since a diagnosis is rarely achieved through an image alone, radiology AI must be able to employ diverse strategies that consider all available evidence, not just imaging information. Using key imaging and clinical signs will help improve their accuracy and utility tremendously. Employing strategies that consider all available evidence will be a formidable task; we believe that the combination of human and computer intelligence will be superior to either one alone. Further, unless an AI application is explainable, radiologists will not trust it to be either reliable or bias-free; we discuss some approaches aimed at providing better explanations, as well as regulatory concerns regarding explainability (“transparency”). Finally, we look at federated learning, which allows pooling data from multiple locales while maintaining data privacy to create more generalizable and reliable models, and quantum computing, still prototypical but potentially revolutionary in its computing impact.
Core Tip: It is necessary to understand the principles of how different artificial intelligence (AI) approaches work to appreciate their respective strengths and limitations. While advances in deep neural net research in Radiology are impressive, their focus must shift from applications that perform only single recognition task, to those that perform realistic multi-recognition tasks that radiologists perform daily. Humans use multiple problem-solving strategies, applying each as needed. Similarly, realistic AI solutions must combine multiple approaches. Good radiologists are also good clinicians. AI must similarly be able to use all available evidence, not imaging information alone, and not just one/Limited aspects of imaging. Both humans and computer algorithms (including AI) can be biased. A way to reduce bias, as well as prevent failure, is better explainability – the ability to clearly describe the workings of a particular application to a subject-matter expert unfamiliar with AI technology. Federated learning allows more generalizable and accurate machine-learning models to be created by preserving data privacy, concerns about which form a major barrier to large-scale collaboration. While the physical hurdles to implementing Quantum computing at a commercial level are formidable, this technology has the potential to revolutionize all of computing.
- Citation: Nadkarni P, Merchant SA. Enhancing medical-imaging artificial intelligence through holistic use of time-tested key imaging and clinical parameters: Future insights . Artif Intell Med Imaging 2022; 3(3): 55-69
- URL: https://www.wjgnet.com/2644-3260/full/v3/i3/55.htm
- DOI: https://dx.doi.org/10.35711/aimi.v3.i3.55
As medical knowledge’s volume and complexity advances, electronic clinical decision support will become increasingly important in healthcare delivery, and increasingly likely to use Artificial Intelligence (AI). Historically, AI approaches have been diverse. However, even senior radiologists, e.g.[1], have inaccurately considered AI, machine learning, and deep learning as synonymous. We therefore summarize these approaches, considering their strengths and weaknesses.
These, the focus of “classical” AI (1950s-1990s), embody the use of high-level abstractions (“symbols”) that represent the concepts that humans (often experts) use in solving non-numerical problems. They are most closely related to traditional computer science/software development. In fact, they are mainstream enough that specific terms (instead of “AI”) are preferred to describe a given approach. Among the successes:
Business-rule systems (BRS or “Expert Systems”)[2]: These allow human experts, working either with software developers or with graphical user interfaces, to embody their knowledge of a particular area to offer domain-specific advice/diagnosis. Robust open-source tools such as Drools[3] are available for building BRS.
Constraint programming systems[4]: Constraint satisfaction involves finding a solution to a multivariate problem given a set of constraints on those variables. When the constraints are numeric, techniques such as linear programming[5] (which preceded symbolic AI and is applied in numerous business-operations problems) work better. Some software, such as Frontline Solver(TM)[6] (of which Microsoft Excel’s “Solver” add-in is a lightweight version) handles both numerical and symbolic constraints.
(Also called “machine learning” or ML): These are used to make predictions, or decisions based on those predictions, by manipulating numbers, or entities transformed into numbers, rather than symbols. They are most useful in domains where human experts have not formulated problem-solving strategies, but data is available that, if analyzed to discover patterns, can guide such formulation.
Understandably, ML approaches have received a major boost in today’s “big data” era. Approaches that employ probabilities, such as Bayesian inferencing[7], have become viable: prior probabilities that could only be guessed at previously (using highly subjective “expert judgment”) can now be computed directly from data (e.g., EHRs/public-health registries), with the caveat that these reflect local conditions – e.g., incidence of specific infectious diseases – and will vary with the data source.
All data-driven approaches use iterative mathematical optimization techniques (originally pioneered by Isaac Newton and his contemporaries) to converge onto solutions. In ML parlance, the optimization process is called “training”.
The use of statistical methods to discover patterns or fit predictive models to data. These techniques originated in the late 19th century (linear regression/correlation), though they have advanced to tackling vast numbers of input variables (also called “features” in ML) and vastly more diverse problems. Human expertise is involved in identifying the features (numeric or categorical) relevant to the problem, and in transforming them to a form suitable for analysis. (For example, a variable comprising of N categories – e.g., gender/race – can be transformed into (N-1) one-or-zero variables using a simple technique called “one-hot encoding”[8]). Almost all statistical learning (SL) methods have been developed by researchers with an applied math/statistics background. Individual methods might make specific assumptions about the nature of the variables (e.g., that they have a Gaussian distribution, or that their effects are additive).
(The term “artificial” is typically implied and therefore usually dropped in both the full phrase and the abbreviation.) This family of approaches, which began in the 1950s, also results in the creation of predictive models. It is now prominent enough to deserve its own subsection, below.
Neural networks: Deep learning: Neural Networks (NNs) are inspired by the microstructural anatomy and functioning of animals’ central nervous systems: software that simulates two or more layers of “neuron”-like computational units (“cells”). Each layer’s cells send their output to cells in the next – and in approaches called “recurrent NNs”, provide “feedback” to earlier layers as well. However, NNs employ mathematical techniques under the hood, notably mathematical “activation functions” for individual cells. The activation function for a neuron typically transforms inputs of large positive or negative numbers into outputs with a smaller range (e.g., zero to one, or ± 1). An activation function may also incorporate a threshold, i.e., the output is zero unless the input exceeds a particular value.
“Deep” NNs, their modern incarnation, have many more layers than older (“shallow”) NNs. (“Deep learning” is ML performed by DNNs). NNs differ from Statistical learning in two ways.
NNs make few or no assumptions about variables’ characteristics: their statistical distributions don’t matter, and their inter-relationships may be non-linear (typically, unknown). Consequently, NNs may sometimes yield accurate predictive models where traditional SL fails.
While NNs can use human-expert-supplied features, they don’t have to. For image input, DNNs can discover features directly from the raw pixels/voxels. The initial layer discovers basic feature such as regional lines, subsequent layers assemble these into shapes, and so on: LeCun et al’s classic Nature paper describes this process[9], which parallels the cat visual cortex’s operation, as discovered by Nobelists David Hubel and Torsten Wiesel[10]. After training, the initial layers can be reused for other image-recognition problems, a phenomenon called Transfer Learning (TL)[11]: Starting training with layers that recognize basic features is faster than starting from scratch.
TL is also widely used in DNN-based natural language processing (NLP) for medical text: BERT[12], a giant DNN trained by a Google team on the entire contents of Wikipedia and Google Books, was used to bootstrap the training of BioBERT, trained on the full text of PubMed and PubMed Central[13]. Choudhary et al[14] review medical-imaging applications of Domain adaptation, a special case of TL, where a DNN trained on a set of labeled images (e.g., relating to a particular medical condition) are reused for images for a different, but related, condition, either as-is or after an accelerated training process.
This gain in power isn’t free. The number of computations involved goes up non-linearly with the number of layers[15], and so much more compute power is required: Notably, abundant random-access-memory (RAM) and the use of general-purpose Graphics Processing Units (GPUs)[16], which perform mathematical operations on sequences of numbers in parallel. (In fact, the theoretical advances embodied in diverse modern DNN architectures would be infeasible without powerful hardware).
DNNs require vastly more data than SL to discover reliable features which human experts may find obvious. Data volume isn’t enough: One must also try to eliminate bias by using diverse data. (We address bias in section 3).
Certain arithmetic-based issues manifest when the number of layers becomes large - production DNNs can have hundreds of layers - and inputs from each layer pass to the next. Underneath the hood, numbers are being multiplied. When a large sequence of numbers that are all either larger or less than 1 get multiplied repeatedly, the product tends to infinity or to zero: For example, 2 multiplied by itself 64 times is approximately 1.88 × 1019.
In DNNs, the consequences of repeated multiplication, called the “Exploding Gradient” or “Vanishing Gradient” problems, can thwart the training process. These are both prevented by batch normalization (BN), which re-adjusts the numerical values of all the outputs of each hidden layer during each iteration of the optimization training, so that the average of the outputs is zero and their standard deviation is one. Apart from speeding learning, BN allows more layers to be added to the DNN, and hence one can tackle harder problems.
Because of their performance characteristics - DNNs have achieved better accuracy than previous methods, on numerous benchmarks, in a variety of domains - most current AI research focuses on DNNs.
Table 1 summarizes the differences between the symbolic, statistical and DNN approaches.
Symbolic AI | Statistical learning (SL) | Deep learning (DL) | |
Entities manipulated | Both symbols and numbers | Numbers (most representing interval data, but some representing categories) | Same as SL, can be applied to the same problems |
Algorithm design | Requires computer-science knowledge & traditional software skills, including user-interface design | Less customization needed, but problem-specific pre-processing of data (e.g., statistical standardization is necessary) | Same as SL |
Domain expert role | Work closely and extensively with software developer, Evaluate output of algorithm for a set of test cases against desired output | To identify variables/features of interest, annotating training data, and evaluating results and individual features’ relative importance. Must evaluate results for novelty | Same as SL, but features can be discovered from raw data, so may not need designation. Annotation is more burdensome because much more data is typically needed |
Data inputs | Expert and software work closely to design software and create test cases | Rows of data, annotated text, or images. For supervised learning, the output variable’s value for each instance is also supplied | Same as SL, in some forms of DL, notably for image processing, features are discovered from raw data |
Partitioning of input data | (Not applicable) | Divided into training data and test data | Same as SL |
Generalizability | Limited to modest: Typically required tailored solutions, especially for the user interface | More generalizable than symbolic AI, but success depends on careful feature selection, choice of method and whether the data matches the method’s assumptions (e.g., Gaussian distribution, additive effects) | DL methods are “non-parametric” and rely on few or no assumptions about the variables/features in the data |
Training in machine learning: ML models can be trained in one of two ways: Supervised Learning: The objective here is to predict a category (presence/absence or severity of a lesion/disease) or a numeric (interval) value. Category prediction is also called “classification”. The training data contains the answers: Either in the output variable/s for tabular data, or for images, human annotation/Labeling that identifies specific object categories (including their region of interest, if multiple categories coexist within an image).
Unsupervised Learning: Here, the objective is to discover patterns in the data, thereby achieving dimension reduction (i.e., a more compact, parsimonious representation of the data).
Semi-supervised learning: The drawback of supervised learning is that for unstructured data (narrative text, images) annotation/Labeling is human-intensive, as well as costly if it involves human expertise that must be paid for. Semi-supervised learning uses a combination of (some) labeled and (mostly) unlabeled data, under the assumption that unlabeled data points close to (or in the same cluster as) labeled data points are likely to share the same category/class.
Statistical learning techniques can be either supervised or unsupervised. Examples of supervised techniques are: Multivariate linear regression/general linear models, which predict interval values; logistic regression and support vector machines, which predict categories; K-nearest neighbor and Classification and Regression Trees (CART), which predict either. Unsupervised SL methods include clustering algorithms, principal components/factor analysis and Latent Dirichlet Allocation.
DNNs, which need very large amounts of data, have motivated the development of semi-supervised methods. They are intrinsically suited for classification. For interval-value prediction with image data, they typically perform or assist in segmentation (which can work with/without supervision), after which numeric volumes can be computed from the demarcated voxels.
Preprocessing: Before training, the data is typically pre-processed with one or more steps. Pre-processing makes the training (and hence predictions) more reliable. The strategies used depend on the kind of data (numeric vs image). Some strategies are general, while others are problem specific (we occasionally refer to the latter). Among these steps are: Detecting suspected erroneous values including unrealistic outliers (e.g., non-physiological clinical-parameter values). The adage “Garbage In, Garbage Out” applies to all facets of computing.
Replacing missing/erroneous values (“imputing”): An entire subfield of applied statistics is devoted to this problem. Strategies include picking the average value across all data points, average value for the individual patient, interpolated values (for time-series data), etc. In general, SL algorithms, many of which mandate either imputing all missing values or dropping the data point/s in question, are more vulnerable to missing values than DL.
Standardizing: Adjusting numeric values so that disparate variables are represented on the same scale. For variables with a Gaussian (“Normal”) distribution, each value is subtracted from the variable’s mean and the result divided by the variable’s standard deviation, with the sign preserved. For non-Gaussian variables, the value is subtracted from the median and divided by the inter-quartile range. (Batch normalization, discussed earlier, was inspired by standardizing).
For images, editing out artefacts extraneous to the content to be analyzed - e.g., superimposed text labels or rulers to indicate object size. We come back to this issue later.
Sources of error: Overfitting and hidden stratification: A strength of DNNs, stated earlier, is their ability to discover features from raw data. Sometimes, this can also be a weakness: Overfitting occurs when any ML model is led astray by incidental but irrelevant features in the input. Apart from working unreliably with a new dataset, an overfitted model often making mistakes that humans never would. A DNN for diagnosing skin malignancies used a ruler/scale’s presence to infer cancerous lesions, whose dimensions are usually recorded diligently[17]. Similarly, textual labels on plain musculoskeletal radiographs were confused with internal-fixation implants, lowering accuracy[18].
Several strategies minimize the risk of overfitting, in addition to making reporting of results more honest: Cross-validation: The training data is partitioned into a certain number, N (e.g., 10), of approximately equal slices. The training is conducted N times, each time sequentially withholding 1 slice (i.e., only the remaining N-1 slices are used), and the results are averaged.
Withholding of test data from training: A portion of the data is completely withheld from the training process. After the ML model is fully trained with the training data, it is evaluated with the test data, and results are (or should be) reported against the test data only.
Regularization: This is a general term for computational techniques that reduce the likelihood of overfitting during the operation of the training algorithm’s optimization phase. The most well-known and general approach is to penalize model complexity: the fewer the number of variables that remain in the final trained model, the less the complexity. Originally applied to linear and logistic regression[19], where Lasso and Ridge Regression respectively include penalties that are linear and quadratic in the final number of variables, it is also used for DL.
A regularization approach specific to DLs is Dropout: disabling a certain fraction of neurons in hidden layers of a multilayer network during each cycle of training. Li et al[20] provide theoretical reasons why dropout can interfere with batch normalization, discussed above, resulting in performance degradation. They recommend that dropout be employed only after the last hidden layer where BN is used, and that the proportion of disabled neurons not exceed 50% (and should usually be much smaller).
A related problem, Hidden Stratification[21] occurs when a category contains sub-categories (“strata”) unrecognized during problem analysis: here, performance on some strata may be poor. Thus, Rueckel et al[22] cite an example of severe pneumothorax being recognized accurately only in those images where a chest tube (inserted to provide an outlet for trapped air) is present[23]. While mild pneumothorax is treated conservatively without a tube, misdiagnosing a yet-to-be-treated, severe pneumothorax has serious consequences.
Nakkiran et al[24] had earlier observed the phenomenon of “double descent.” For some problems, when a DNN classifier is trained on increasingly larger datasets, performance intially gets worse. Later, when the training dataset has become much larger, performance gets better. This could be explained by hidden stratification. The somewhat-larger dataset is heterogenous in unconsidered ways, but the instances of minority sub-categories are too few to learn from, so they only serve to degrade performance. With much larger datasets, these instances become numerous enough to yield a signal that the DNN can use to discriminate more accurately.
Most recent research in radiology AI has focused on DNNs: The following is just a brief list of DL applications. (This list is not intended to be comprehensive). Binary (Yes/no) classification: Elbow fractures[25], rib fractures[26], orthopedic implants[27], pneumothorax[28], pulmonary embolism[29], lung cancer[30], pulmonary tuberculosis (where several commercial applications exist)[31]. Multi-category classification (grading/staging): Anterior cruciate ligament injuries[32], hip fracture[33]. Segmentation with quantitation: Pulmonary edema[34], epicardial fat[35,36]; gliomas[37,38]; liver metastases[39,40]; spleen[41], and brain infarcts[42]. While impressive, much more is needed to apply AI to realistic problems, especially when intended for deployment in teleradiology scenarios where onsite skill/experience is often lacking. We summarize the issues here before discussing each issue in detail. The focus on DNN applications that perform only a single task, while proliferating the number of publications in the literature, does little to advance the likelihood of practical deployment. Depending on the problem, humans use multiple problem-solving strategies. Similarly, realistic solutions must combine multiple AI approaches, in addition to old-fashioned software engineering (such as intuitive and robust user interfaces). Good radiologists are also good clinicians. AI must be able to use all available evidence, including collective wisdom gained over decades of experience. Both humans and AI can be biased; this susceptibility must be recognized. Among the numerous ways to reduce bias, one must consider explainability – the ability to clearly describe the workings of a particular application to a subject-matter expert unfamiliar with AI technology.
The Limitations of Uni-tasking: As Krupinski notes[1], most DNNs in radiology uni-task. Thus, a DNN specialized for rib-fracture recognition will, even if outperforming radiologists, ignore concurrent tuberculosis, pneumothorax, or Flail Chest, unless trained for the same. For that matter, DNN tuberculosis (TB) diagnosis considering only consolidation/cavitation/mediastinal lymph nodes may miss TB in children. In one series of pediatric patients with pleural effusions, 22% had TB; in 41% of these, effusion was the only radiologic TB sign[43]. We have noticed that these effusions may be lamellar and track upwards, akin to pleural thickening, without being overtly visible, unlike the usual pleural effusions. In fact, in our experience, a lamellar effusion in a child is a good pointer towards the presence of a Primary Complex of TB.
No clinical radiologist uni-tasks: “Savant Syndrome” describes humans with exceptional skill in one area who are mentally challenged otherwise. Overspecialized DNNs suffer, in effect, from perceptual blindness. This phenomenon can be induced experimentally in normal humans by overwhelming their cognitive abilities: in a famous experiment, where subjects had to watch a basketball-game video and count the number of passes one team made, half the subjects failed to notice an intermingling gorilla-suited actor in the center of several scenes[44].
Based on general-purpose vision (GPV) studies, features learned in one specialized uni-tasking recognition problem (e.g., cats) transfer poorly to a related problem (e.g., recognizing horses). GPV has advanced because of the public availability of datasets, most notably ImageNet[45], which contain a vast number of object categories, often with multiple categories per image. The images are annotated by crowdsourcing: each object is indicated with a bounding box. Any DL approach expecting to perform well in a challenge to identify these objects cannot be over-specialized. (Unfortunately, DNNs trained on ImageNet perform very poorly with radiology images: Transfer learning is not guaranteed to work).
We believe that focusing short-term on research publications addressing relatively simple problems (with much research being PhD-thesis-driven) retards overall progress. Historically, symbolic AI’s notorious addiction to this approach, accompanied by hype that greatly outpaced actual achievement, led to several “AI Winters”[46,47], steep funding drops following disillusionment. McDermott (a symbolic AI researcher) raised such concerns in a famous 1976 paper, “Artificial Intelligence Meets Natural Stupidity”[48].
Moving toward multi-tasking: There is no reason (besides the costs of compensating radiologists for their time) why radiographic modality-specific ImageNet equivalents cannot be created. Collections of images for trauma patients where multiple lesions are likely to be present may be a good starting point. One could also reuse the vast amount of existing annotated images for uni-tasking-DL research: Federated DL (see section 5.1) may help to test new, broader, lesion-recognition algorithms.
While DNNs excel at the important subtask of pattern recognition, they alone would not suffice to move radiology AI into the clinic, as now discussed.
The right strategy for the right subtask: Decades of research in cognitive psychology, especially observations of human expertise, have shown that humans use different strategies to different problems. In his classic, “Conceptual Blockbusting”, Adams et al[49] identifies strategies as varied as: General-purpose critical thinking; knowledge of science and mathematics (including calculus); visualization; and applying ethical constraints.
The psychologists Daniel Kahneman and Amos Tversky, founders of “behavioral economics” (Kahneman got a Nobel– Tversky was deceased by then) postulate two modes of thinking. These are “System 1” – “lower level”, rapid, intuitive, and reflex (“short-cut”)– and “System 2” – “higher level”, slow, deliberate, considering multiple sources of information, and requiring concentration. (We return to this work later.) As noted by Lawton[50], DNNs embody System 1 thinking, while statistical and symbolic approaches embody System 2. Both must be used together.
What applies to humans also applies to electronic systems. Symbolic, statistical and NN approaches have been combined in several ways: In new domains where little practical human experience has accumulated, statistical learning has led to discovery of patterns that can then be encoded as rules or in decision trees, which originated symbolic AI.
While symbolic AI can identify differential diagnosis for a given clinical presentation, statistical AI, using data from local sources or from the literature, can compute probabilities to rank these diagnoses, as well as sensitivity/positive predictive value of individual findings (including test results) to suggest the way forward.
Symbolic approaches are easier for human experts to understand (because they parallel deliberative human problem-solving approaches), and so are often used to “explain” patterns discovered by DNNs. (We discuss explainability in Section 4).
In radiology AI, Rudie et al[51] combine DNN with symbolic/statistical AI (Bayesian networks) for differential diagnosis of brain lesions. Doing this on a large scale across multiple radiology domains has the potential to improve clinical decision making.
Using all available evidence: In sufficiently diverse patient populations, attribution of diagnoses to detected radiographic lesions requires evidence from history, physical exam, non-radiology investigations, plus knowledge of prevalence. Our recommendation to combine all such information to make better decisions is not unique: Kwon et al[52] also suggest a Radiology AI that approach that combines multiple evidence sources (imaging plus clinical variables) for COVID-19 prognostication, while Jamshidi et al[53] also recommend a combined approach for COVID-19 diagnosis and treatment.
We provide examples below. An upper-lobe cavity on a chest X-ray could suggest neoplastic processes, mycobacterial infection, intracellular fungal infection (histoplasma, coccidiosis), etc. Serological confirmation plus newer technologies (e.g., GenXPert for tuberculosis[54]) assist diagnosis.
The failure to elicit a proper history can be expensive and traumatizing. One of us (S.A.M.) encountered a young girl who had been repeatedly evaluated under general anesthesia for possible ectopic ureter localization, because of failure to make one simple observation on the plain radiograph. A subsequent Multidetector CT exam concluded erroneously that the incontinence was due to a vesicovaginal fistula, which is extremely rare in children, more so if acquired. This erroneous diagnosis could have been avoided by a simple observation (a slight gap in the pubic symphysis) and one simple question: When did symptoms start? (From birth). This suggested the correct diagnosis: female epispadias, which a pediatric surgeon confirmed.
Recognizing midline shift (MLS), plus trans-tentorial and other herniations, allows better triaging for intracranial bleeds or head trauma[55,56]). Xiao et al[57] describe an algorithm to MLS of the brain on CT, with a sensitivity of 94% and specificity of 100%, comparable to radiologists.
In head injury, ear-nose-throat bleeds/pneumocephalus suggest basilar skull fractures[58], which are non-displaced and difficult to detect unless looked for diligently.
Pneumothorax diagnosis by DNNs[59], while useful, could increase accuracy for Tension Pneumothorax by additionally looking for simple radiological signs like - inversion of the diaphragm, tracheal shift/shift of mediastinal structures to the opposite side (Figure 1).
AI for rib-fracture recognition[60] can be complemented by the clinical finding of “Flail Chest”, which seriously impairs respiratory physiology[61] and may occur when three or more ribs are broken in at least two places.
Combining AI with other technologies: A major thrust of medical AI is in making other technologies, both existing and novel, much “smarter”, reducing error by assisting manual tasks and decision-making performed by the radiologist or operator.
Applications in Interventional Radiology: The Royal Free Hospital in London employs an AI-backed keyhole procedure for stenting, coupled with Optical coherence tomography (OCT). While OCT allows viewing the inside of a blood vessel, the AI software automatically measures vessel diameter to enhance decision-making by the interventionist[62]. Similar roles are possible in interventions such as robotic intussusception–where visualization of the ileocecal junction and reflux into terminal ileum could be taken as end points of the procedure.
AI-assisted 3-D Printing of biological tissue such as heart valves, blood vessel grafts and possibly complete organs is discussed in[63].
Artificial Intelligence needs real Intelligence to guide it. Truly intelligent humans are distinguished from the merely smart by intellectual humility and flexibility: as noted in Robson’s “The Intellect Trap”[64], they constantly consider the possibility of being wrong, and abandon long-held beliefs when these are invalidated by new evidence. Tetlock’s work on human expertise also emphasizes flexibility’s importance; both in adapting to reality, as well as in problem-solving strategies. As discussed in section 2.2, AI approaches must be flexible too.
Tversky and Kahneman emphasize that, because of its reflex nature, System 1 thinking is prone to bias. Also, because System 2 requires sustained mental effort (which can cause fatigue), System 1 often contaminates System 2 thought, leading to errors or bias. Busby et al[65] cite this work in their excellent article on bias in radiology. An early paper by Egglin and Feinstein considers context bias in radiology[66], where certain aspects of patients’ initial presentation to their clinicians led radiologists to give less weight to alternative diagnoses.
Electronic applications can be biased just as humans are. The sources of bias are several. Symbolic approaches may reflect the biases of their human creators. Machine-learning approaches that rely on humans to specify relevant features/input variables may be biased if the features chosen are inappropriate, or if relevant features are omitted.
If features are discovered entirely by DL, the data itself may be biased or non-representative. An early version of Facebook’s artificial-vision system misidentified bare-chested black males as “primates”[67] because of too few samples in the training data.
Explainability is the ability to describe the internal workings of a particular AI model (which may apply one or more techniques to a practical problem) to a human expert who intimately knows the problem’s-domain but not AI technology. Molnar’s book on Interpretable ML[68] is an excellent reference. From this perspective, ML techniques are classified into “white-box” (explainable in terms resembling ordinary language), and “black-box” models, which cannot be readily explained, because they rely on complex mathematical functions/concepts.
Explainability is determined by the following factors: The choice of technique. In general, Symbolic AI (and techniques that display output as symbols, such as decision trees) are most understandable/ex
Statistical techniques are less explainable. Tversky and Kahneman found in their studies of cognitive errors that people find statistical concepts – such as the phenomenon of regression to the mean due to random processes– more difficult to understand than symbols. In the real-life example of the “Monty Hall problem”[69], at least 1000 PhDs, including the great mathematician Paul Erdos, had difficulty believing the correct answer, which is an application of Bayesian reasoning that causes a revision of posterior probabilities when new evidence arrives. Therefore, the explainer must often educate the human expert in statistics before addressing the specifics of the application.
In DNNs, the “explanation” is actually a large set of numbers, corresponding to the weights of the inputs of each “neuron” to the neurons to which it connects, along with descriptions of the mathematical transformation/s involved. This is so far removed from everyday experience as to be practically incomprehensible (though there is active research in converting this information into explanatory visuals).
The classification of a particular technique as “black-box” or “white-box” is somewhat arbitrary, depending on the beholder, and on the domain expert’s background knowledge. For example, Loyola-Gonzales[70] classifies Support Vector Machines (SVMs) as “black-box”. However, SVMs, developed by applied statistician Vladimir Vapnik’s group at Bell Labs[71] , are mathematically very closely related to regression[72], but try to optimize a different mathematical function (maximized separation between instances of different classes vs minimized sum-of-least-squares deviations between observed and predicted values). Multivariate regression (linear, logistic, etc.) is taught in enough practically oriented college-level statistics courses for non-statisticians (e.g., business majors, life scientists, medical researchers) to be widely understood.
The complexity of individual problems: Any model with hundreds of input variables (such as the regression models used by macro-economists) will be intrinsically hard to comprehend.
Business-Rule systems are naturally expressed in ordinary language, and so are in principle, highly explainable. However, R1, devised by McDermott[73] to configure Digital Equipment Equipment’s VAX minicomputers based on a customer’s needs, eventually used 2500 rules. Proving that a BRS is internally consistent - that is, no rule contradicts any other rule in the system- is known to be combinatorically hard. “Understanding” the principles of a large BRS does not make it any easier to debug if its output is incorrect.
Whether human-understandable input needs to be modified into an unfamiliar form to make it amenable to computation. This is the case with SVMs when employed for optical character recognition: the image of each letter is converted to a set of numeric features. In the extreme case, radiographic images are transformed by DNNs from individual pixels into hundreds of features that are “discovered” from the raw data, with each subsequent layer in the DNN representing composite features of increasing complexity.
The concerns about explainability are closely tied to two risks: Bias: If you cannot explain the application (to a human expert, or to a jury if the application’s use is challenged legally), how can you show that it is not biased? “Because the computer says so” is unpersuasive.
Failure: DNNs that process images often make unexplained, bizarre mistakes – misidentifications or failure to identify, as noted by Heaven D[74]. Explanations for such mistakes’ origins are not obvious in “post-mortems” even to DNN experts. One approach to forestalling such errors is to deliberately attempt to fool image-classification DNNs by generating “fakes” using another “adversary” DNN to make tweaks (minor or not-so-minor) to authentic images, which are then supplied as training input to the classification-DNN[75]. However, while adversarial networks have reduced misidentifications, they do not offer cast-iron guarantees that a mistake will never be made. As in the cliché, absence of evidence (of defects) is not evidence of absence.
Failure can have consequences ranging from the merely frustrating to the near-apocalyptic. A famous example of the latter was the Soviets’ satellite-based Early-Missile-Warning System, which, in 1983, flagged 5 missiles from US sites heading toward the USSR[76]. A retaliatory nuclear strike, which would have started World War 3, was averted by Lt. Col. Stanislav Petrov, who reasoned that this was a false alarm – an intentional US attack would need many more missiles – and disobeyed standing orders (to relay the warning up the command-chain) by deciding to wait for confirming evidence, which never arrived.
In general, such approaches are specific to the problem being addressed, as Molnar makes clear. One can show the impact of the values of individual input variables/features on the output variable (e.g., categorization, risk score) using a technique called Deep Taylor Decomposition (DTD)[77], based on the Taylor series taught in intermediate-level Calculus. Lauritsen et al[78] use DTD as part of an explanation module for predicting four categories of acute critical illness in inpatients based on EHR data. DTD works when the number of input variables is modest (this paper used 33 clinical parameters), and the variables correspond to concepts in the domain. It would not be useful for very numerous, transformed, or automatically discovered variables.
Sometimes, a detailed technical explanation may not be necessary: one can simply test with enough test cases where the system’s output matched that of human experts. For images, delineating areas of interest with highlight boxes can draw the user’s attention. (This is a standard technique employed by object-recognition systems on benchmark datasets such as ImageNet). This technique has the drawback that in case of erroneous diagnosis, merely drawing the user’s attention to regions of interest may not suffice.
Also, “absence of evidence is not evidence of absence”. For a “black-box” system with a critical bug that manifests under uncommon circumstances, you will discover the problem only when it happens. In a complex-system (non-AI) context, Jon Bentley, in his classic work “Programming Pearls”[79] cites a colleague who implemented what he thought was a performance optimization in a FORTRAN compiler. Two years later, the compiler crashed during use. The colleague traced the crash to his “optimization”, which had never been invoked in the interim and crashed the very first time it was activated in production.
Loyola-Gonzales[70] suggests combining a white-box and black-box approach (the order depending on the problem) in a pipeline, so that the output of the first is processed into a more human-understandable approach by the second.
Certain software applications for tasks previously requiring specialized human skills have already received FDA approval and are in wide use. For example, smartphone-deployable electrocardiogram (EKG)-interpretation programs report standard EKG parameters as well as a few abnormal signals such as Ventricular Premature Beats. Given the increasing deployment of Software as a Medical Device (SaMD), and the possibility of catastrophic medical error when operated (semi-) autonomously, national regulatory bodies are naturally concerned about standardizing the processes of development and testing of SaMD to prevent such errors.
The FDA has specified an action plan, including guidelines for best ML practices, version control when the algorithm is changed, and protection of patient data[80]. The European Commission’s proposal for regulation is much wider, encompassing uses of AI across all of society[81]: Human Rights Watch has criticized this proposal[82] on the grounds that it currently does not offer sufficient protection for the social safety net when such software functions autonomously to make decisions concerning, for example, eligibility of individuals for benefits.
ML in general, and DL specifically, need lots of data to achieve desired accuracy. Volume alone does not suffice: the data must also be sufficiently diverse (i.e., coming from multiple locales) to minimize bias. The obvious solution, physical pooling of data. faces the following barriers: Data privacy - which is less of an issue with digital radiography, where DICOM metadata containing identifiable information can be removed. Mistrust – a formidable hurdle when academic or commercial consortia bring rivals together. The technique of Federated Learning (FL), originally pioneered by Google as an application of their well-known MapReduce algorithm[83] allows iteratively training an ML model across geographically separated hardware: The ML algorithm is distributed, while data remains local, thereby ensuring data privacy. It can be employed for both statistical and deep learning.
Typically, a central server coordinates computations across multiple distributed clients. At start-up, the server sends the clients initialization information. The clients commence computation. When each client is done, it sends its results back to the server, which collates all clients’ results. For the next iteration, the server sends updates to each client, which then computes again. The process continues until the ML training completes convergence.
FL’s drawbacks are Internet-based communication overhead, which limits training speed, and greater difficulty of analysis of any detected residual bias. Ng et al[84] provide a detailed technology overview. Sheller et al[85] use FL to replicate prior analysis of a 10-institution brain-tumor-image-dataset derived from The Cancer Genome Atlas (TCGA). Sarma et al[86] describe 3-institution FL-based training on whole-prostate segmentation from MRIs, while Navia-Vazquez et al[87] describe an approach for Federated Logistic Regression.
In balance, FL’s finessing of data privacy issues enables addressing of problems at scales not previously possible, with the greater data volume and diversity ensuring better accuracy and generalizability.
See our previous work, Merchant et al[88], for an exploration of this rapidly progressing and revolutionary field. Here, we only provide a basic introduction and address some issues not covered in that paper.
Quantum mechanics describes the rules governing the properties and behavior of matter at the molecular and subatomic levels. Established technologies such as digital photography and nuclear radiography (based on the photoelectric effect), the integrated circuit (based on semi-conduction of electricity by certain materials), and the laser (based on coherent emission of photons) are all applications of quantum mechanics.
Quantum computing (QC) uses the phenomenon of quantum superposition, in which matter at the atomic/subatomic level can exist (briefly) in two different states simultaneously, as the basis for computing hardware design. Unlike the bit in an ordinary computer, which can be either 1 or 0, the quantum bit (“qubit”) can be both 1 and 0 simultaneously, so that an array of N qubits could represent 2N states simultaneously.
QC can, in theory, help solve certain computational problems (called NP-hard problems, where NP = “non-deterministic polynomial”[89]). The time taken to solve an NP-hard problem by brute force (i.e., trying out every possible solution, which is the only way to solve such a problem exactly) increases exponentially as the problem size grows linearly. For example, cracking the widely used Advanced Encryption Standard-256 (with 256 bits) would take all the world’s (non-quantum) computers working together, longer than the age of the Universe. In 1994, Peter Shor’s theoretical work[90] showed that a “quantum computer” with enough qubits could solve a particular NP-hard problem (factoring the product of 2 large prime numbers, used in AES-256) in polynomial time, making cryptographic attacks feasible.
The physical challenge is to maintain the qubits stable for a sufficiently long time to accomplish some computation (thus far, such stability has been achieved at temperatures close to absolute zero). In addition, for a computer based on qubits, prototypical work suggests that replacing the conducting elements (the interconnecting wires in an integrated circuit) with light-conducting elements (so-called optical computing[91]) may be the way forward[92].
There are also theoretical considerations as to the kinds of problems for which QC will offer benefits. Thus, Aaronson[93] points out that we don’t yet know if the class of problems involved in the optimization (training) phase of DNNs will benefit: while we can hope that they do, the simulations must still be performed to show that this will be the case. Similar concerns are echoed by Sarma[94], who expresses uncertainty about the timeline for QC to become commercially feasible.
Despite the risks of hype and disillusion, it may be worth remembering Arthur C. Clarke’s dictum about the future: “If an elderly but distinguished scientist says that something is possible, he is almost certainly right; but if he says that it is impossible, he is very probably wrong”[95]. If quantum computing becomes commercially viable, almost every aspect of computing (and therefore, every technology that depends on computing) will benefit vastly. The Quantum Internet, Intelligent Edge devices, Edge Computing, Quantum Artificial Intelligence, Quantum Artificial Intelligence Algorithms and their applications in Augmented Reality/Virtual Reality and a more immersive Metaverse experience (for teaching/simulations, actual interactions etc.); are some of the exciting future developments/enhancements based on Quantum Computing that we have discussed in our previous paper.
Combining the wisdom (of both knowledge and meta-knowledge – i.e., problem-solving strategies) gained over the years, with the tremendous versatility of AI algorithms will maximize the utility of AI applications in medical imaging for everyday clinical care. However, scaling up the use of multiple algorithmic strategies and sources of evidence is challenging. Because of its sheer diversity and volume, radiologists’ experiential knowledge is very hard to encode in a form that allows instant retrieval. This difficulty applies even to its subset, “artificial general intelligence” (AGI), also known as “common sense”. Common sense, apart from being not so common across humans, turns out to be surprisingly hard to implement, because of the sheer breadth of information that must be encoded into computable form.
We see two ways forward: The first long-term and less feasible, the second possible today. Allocating massive effort and resources to create medical/radiology AGI. Using software technology (including AI) to extend the human mind, much as access to Web search engines has vastly democratized access to considerable specialized knowledge.
In the latter approach, AI technology can be ubiquitous, integrated, and often functioning behind the scenes for tedious, monotonous and time-consuming tasks (as suggested by Krupinski[1], but still leaving humans in control of critical decisions.
Provenance and peer review: Invited article; Externally peer reviewed.
Peer-review model: Single blind
Specialty type: Radiology, nuclear medicine and medical imaging
Country/Territory of origin: India
Peer-review report’s scientific quality classification
Grade A (Excellent): 0
Grade B (Very good): B
Grade C (Good): C
Grade D (Fair): D
Grade E (Poor): 0
P-Reviewer: Jamshidi M, Czech Republic; Sato H, Japan; Singh M, United States A-Editor: Yao QG, China S-Editor: Liu JH L-Editor: A P-Editor: Liu JH
1. | Krupinski EA. Artificial Intelligence and Teleradiology: Like It or Leave It? 2019. Southwest Telehealth Resource Center Blog. Last Accessed: Dec 9, 2021. Available from: https://southwesttrc.org/blog/2019/artificial-intelligence-teleradiology-it-or-leave-it. [Cited in This Article: ] |
2. | Ross R. Principles of the Business Rule Approach. Pearson Education Inc: Boston; 2003. 372 p. ISBN: 020-178-8934. [Cited in This Article: ] |
3. | Browne P. JBoss Drools Business Rules. Packt Publishing: Birmingham, UK; 2009 ISBN: 9781847196064. [Cited in This Article: ] |
4. | Apt KR. Principles of Constraint Programming. Cambridge University Press: Cambridge, UK; 2003. ISBN: 052-182-5830. [Cited in This Article: ] |
5. | Stevenson WJ. Operations Management. 14th Ed. ed. McGraw-Hill: New York, NY; 2020. ISBN: 978-1260238891. [Cited in This Article: ] |
6. | Frontline Solvers. Solver Technology: Optimization. 2021. Last Accessed: Dec 10, 2021. Available from: https://www.solver.com/solver-technology. [Cited in This Article: ] |
7. | Pearl J. Causality: Models, Reasoning, and Inference. Cambridge University Press: Cambridge, UK; 2000. ISBN: 978-0-521-77362-1. [Cited in This Article: ] |
8. | Brownlee J. Ordinal and One-Hot Encodings for Categorical Data. 2020. Last Accessed: Dec 1, 2021. Available from: https://machinelearningmastery.com/one-hot-encoding-for-categorical-data/. [Cited in This Article: ] |
9. | LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436-444. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 36149] [Cited by in F6Publishing: 18336] [Article Influence: 2037.3] [Reference Citation Analysis (0)] |
10. | Hubel DH, Wiesel TN. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. J Physiol. 1962;160:106-154. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 8487] [Cited by in F6Publishing: 6899] [Article Influence: 265.3] [Reference Citation Analysis (0)] |
11. | Brownlee J. A Gentle Introduction to Transfer Learning for Deep Learning. 2019. Last Accessed: Dec 1, 2021. Available from: https://machinelearningmastery.com/transfer-learning-for-deep-learning/. [Cited in This Article: ] |
12. | Jacob D, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2019. Available from: https://arxiv.org/abs/1810.04805v2. [Cited in This Article: ] |
13. | Lee J, Yoon W, Kim S, Kim D, So CH, Kang J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36:1234-1240. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 560] [Cited by in F6Publishing: 1127] [Article Influence: 281.8] [Reference Citation Analysis (0)] |
14. | Choudhary A, Tong L, Zhu Y, Wang MD. Advancing Medical Imaging Informatics by Deep Learning-Based Domain Adaptation. Yearb Med Inform. 2020;29:129-138. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 51] [Cited by in F6Publishing: 25] [Article Influence: 6.3] [Reference Citation Analysis (0)] |
15. | Sze V, Chen Y-H, Yang T-J, Emer J. Efficient Processing of Deep Neural Networks. Morgan & Claypool Publishers: New York, NY; 2020. 342 p. ISBN: 978-1681738352. [Cited in This Article: ] |
16. | Wikipedia. Graphical Processing Unit. 2021. Available from: https://en.wikipedia.org/wiki/Graphics_processing_unit. [Cited in This Article: ] |
17. | Winkler JK, Fink C, Toberer F, Enk A, Deinlein T, Hofmann-Wellenhof R, Thomas L, Lallas A, Blum A, Stolz W, Haenssle HA. Association Between Surgical Skin Markings in Dermoscopic Images and Diagnostic Performance of a Deep Learning Convolutional Neural Network for Melanoma Recognition. JAMA Dermatol. 2019;155:1135-1141. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 125] [Cited by in F6Publishing: 146] [Article Influence: 29.2] [Reference Citation Analysis (0)] |
18. | Yi PH, Malone PS, Lin CT, Filice RW. Deep Learning Algorithms for Interpretation of Upper Extremity Radiographs: Laterality and Technologist Initial Labels as Confounding Factors. AJR Am J Roentgenol. 2022;218:714-715. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 5] [Cited by in F6Publishing: 7] [Article Influence: 2.3] [Reference Citation Analysis (0)] |
19. | Harrell FE. Regression Modeling Strategies with Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis 2nd Ed. ed. Springer; 2015. ISBN: 978-3319194240. [Cited in This Article: ] |
20. | Li X, Chen S, Hu X, Yang J. Understanding the Disharmony Between Dropout and Batch Normalization by Variance Shift. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019; 2682-2690. Available from: https://openaccess.thecvf.com/content_CVPR_2019/html/Li_Understanding_the_Disharmony_Between_Dropout_and_Batch_Normalization_by_Variance_CVPR_2019_paper.html. [Cited in This Article: ] |
21. | Oakden-Rayner L, Dunnmon J, Carneiro G, Ré C. Hidden Stratification Causes Clinically Meaningful Failures in Machine Learning for Medical Imaging. Proc ACM Conf Health Inference Learn (2020). 2020;2020:151-159. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 73] [Cited by in F6Publishing: 113] [Article Influence: 28.3] [Reference Citation Analysis (0)] |
22. | Rueckel J, Trappmann L, Schachtner B, Wesp P, Hoppe BF, Fink N, Ricke J, Dinkel J, Ingrisch M, Sabel BO. Impact of Confounding Thoracic Tubes and Pleural Dehiscence Extent on Artificial Intelligence Pneumothorax Detection in Chest Radiographs. Invest Radiol. 2020;55:792-798. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 11] [Cited by in F6Publishing: 11] [Article Influence: 2.8] [Reference Citation Analysis (0)] |
23. | Oakden-Rayner L, Dunnmon J, Carneiro G, Ré C. Hidden Stratification Causes Clinically Meaningful Failures in Machine Learning for Medical Imaging. Proc ACM Conf Health Inference Learn (2020). 2020;2020:151-159. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 73] [Cited by in F6Publishing: 113] [Article Influence: 28.3] [Reference Citation Analysis (0)] |
24. | Nakkiran P, Kaplun G, Bansal Y, Yang T, Barak B, Sutskever I. Deep Double Descent: Where Bigger Models and More Data Hurt. 2019.. [DOI] [Cited in This Article: ] |
25. | Rayan JC, Reddy N, Kan JH, Zhang W, Annapragada A. Binomial Classification of Pediatric Elbow Fractures Using a Deep Learning Multiview Approach Emulating Radiologist Decision Making. Radiol Artif Intell. 2019;1:e180015. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 57] [Cited by in F6Publishing: 38] [Article Influence: 7.6] [Reference Citation Analysis (0)] |
26. | Wu M, Chai Z, Qian G, Lin H, Wang Q, Wang L, Chen H. Development and Evaluation of a Deep Learning Algorithm for Rib Segmentation and Fracture Detection from Multicenter Chest CT Images. Radiol Artif Intell. 2021;3:e200248. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 5] [Cited by in F6Publishing: 13] [Article Influence: 4.3] [Reference Citation Analysis (0)] |
27. | Patel R, Thong EHE, Batta V, Bharath AA, Francis D, Howard J. Automated Identification of Orthopedic Implants on Radiographs Using Deep Learning. Radiol Artif Intell. 2021;3:e200183. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 7] [Cited by in F6Publishing: 15] [Article Influence: 5.0] [Reference Citation Analysis (0)] |
28. | Thian YL, Ng D, Hallinan JTPD, Jagmohan P, Sia SY, Tan CH, Ting YH, Kei PL, Pulickal GG, Tiong VTY, Quek ST, Feng M. Deep Learning Systems for Pneumothorax Detection on Chest Radiographs: A Multicenter External Validation Study. Radiol Artif Intell. 2021;3:e200190. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 20] [Cited by in F6Publishing: 18] [Article Influence: 6.0] [Reference Citation Analysis (0)] |
29. | Pan I. Deep Learning for Pulmonary Embolism Detection: Tackling the RSNA 2020 AI Challenge. Radiol Artif Intell. 2021;3:e210068. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 2] [Cited by in F6Publishing: 3] [Article Influence: 1.0] [Reference Citation Analysis (0)] |
30. | Jacobs C, Setio AAA, Scholten ET, Gerke PK, Bhattacharya H, M Hoesein FA, Brink M, Ranschaert E, de Jong PA, Silva M, Geurts B, Chung K, Schalekamp S, Meersschaert J, Devaraj A, Pinsky PF, Lam SC, van Ginneken B, Farahani K. Deep Learning for Lung Cancer Detection on Screening CT Scans: Results of a Large-Scale Public Competition and an Observer Study with 11 Radiologists. Radiol Artif Intell. 2021;3:e210027. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 4] [Cited by in F6Publishing: 21] [Article Influence: 7.0] [Reference Citation Analysis (0)] |
31. | Tavaziva G, Harris M, Abidi SK, Geric C, Breuninger M, Dheda K, Esmail A, Muyoyeta M, Reither K, Majidulla A, Khan AJ, Campbell JR, David PM, Denkinger C, Miller C, Nathavitharana R, Pai M, Benedetti A, Ahmad Khan F. Chest X-ray Analysis With Deep Learning-Based Software as a Triage Test for Pulmonary Tuberculosis: An Individual Patient Data Meta-Analysis of Diagnostic Accuracy. Clin Infect Dis. 2022;74:1390-1400. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 10] [Cited by in F6Publishing: 28] [Article Influence: 9.3] [Reference Citation Analysis (0)] |
32. | Namiri NK, Flament I, Astuto B, Shah R, Tibrewala R, Caliva F, Link TM, Pedoia V, Majumdar S. Deep Learning for Hierarchical Severity Staging of Anterior Cruciate Ligament Injuries from MRI. Radiol Artif Intell. 2020;2:e190207. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 16] [Cited by in F6Publishing: 27] [Article Influence: 6.8] [Reference Citation Analysis (0)] |
33. | Krogue JD, Cheng KV, Hwang KM, Toogood P, Meinberg EG, Geiger EJ, Zaid M, McGill KC, Patel R, Sohn JH, Wright A, Darger BF, Padrez KA, Ozhinsky E, Majumdar S, Pedoia V. Automatic Hip Fracture Identification and Functional Subclassification with Deep Learning. Radiol Artif Intell. 2020;2:e190023. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 83] [Cited by in F6Publishing: 68] [Article Influence: 17.0] [Reference Citation Analysis (0)] |
34. | Horng S, Liao R, Wang X, Dalal S, Golland P, Berkowitz SJ. Deep Learning to Quantify Pulmonary Edema in Chest Radiographs. Radiol Artif Intell. 2021;3:e190228. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 11] [Cited by in F6Publishing: 13] [Article Influence: 4.3] [Reference Citation Analysis (0)] |
35. | Commandeur F, Goeller M, Razipour A, Cadet S, Hell MM, Kwiecinski J, Chen X, Chang HJ, Marwan M, Achenbach S, Berman DS, Slomka PJ, Tamarappoo BK, Dey D. Fully Automated CT Quantification of Epicardial Adipose Tissue by Deep Learning: A Multicenter Study. Radiol Artif Intell. 2019;1:e190045. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 49] [Cited by in F6Publishing: 73] [Article Influence: 14.6] [Reference Citation Analysis (0)] |
36. | Schoepf UJ, Abadia AF. Greasing the Skids: Deep Learning for Fully Automated Quantification of Epicardial Fat. Radiol Artif Intell. 2019;1:e190140. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 1] [Cited by in F6Publishing: 1] [Article Influence: 0.2] [Reference Citation Analysis (0)] |
37. | Eijgelaar RS, Visser M, Müller DMJ, Barkhof F, Vrenken H, van Herk M, Bello L, Conti Nibali M, Rossi M, Sciortino T, Berger MS, Hervey-Jumper S, Kiesel B, Widhalm G, Furtner J, Robe PAJT, Mandonnet E, De Witt Hamer PC, de Munck JC, Witte MG. Robust Deep Learning-based Segmentation of Glioblastoma on Routine Clinical MRI Scans Using Sparsified Training. Radiol Artif Intell. 2020;2:e190103. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 14] [Cited by in F6Publishing: 7] [Article Influence: 1.8] [Reference Citation Analysis (0)] |
38. | Wu S, Li H, Quang D, Guan Y. Three-Plane-assembled Deep Learning Segmentation of Gliomas. Radiol Artif Intell. 2020;2:e190011. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 17] [Cited by in F6Publishing: 14] [Article Influence: 3.5] [Reference Citation Analysis (0)] |
39. | Nakamura Y, Higaki T, Tatsugami F, Zhou J, Yu Z, Akino N, Ito Y, Iida M, Awai K. Deep Learning-based CT Image Reconstruction: Initial Evaluation Targeting Hypovascular Hepatic Metastases. Radiol Artif Intell. 2019;1:e180011. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 29] [Cited by in F6Publishing: 53] [Article Influence: 10.6] [Reference Citation Analysis (0)] |
40. | Vorontsov E, Cerny M, Régnier P, Di Jorio L, Pal CJ, Lapointe R, Vandenbroucke-Menu F, Turcotte S, Kadoury S, Tang A. Deep Learning for Automated Segmentation of Liver Lesions at CT in Patients with Colorectal Cancer Liver Metastases. Radiol Artif Intell. 2019;1:180014. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 50] [Cited by in F6Publishing: 57] [Article Influence: 11.4] [Reference Citation Analysis (1)] |
41. | Humpire-Mamani GE, Bukala J, Scholten ET, Prokop M, van Ginneken B, Jacobs C. Fully Automatic Volume Measurement of the Spleen at CT Using Deep Learning. Radiol Artif Intell. 2020;2:e190102. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 9] [Cited by in F6Publishing: 18] [Article Influence: 6.0] [Reference Citation Analysis (0)] |
42. | Christensen S, Mlynash M, MacLaren J, Federau C, Albers GW, Lansberg MG. Optimizing Deep Learning Algorithms for Segmentation of Acute Infarcts on Non-Contrast Material-enhanced CT Scans of the Brain Using Simulated Lesions. Radiol Artif Intell. 2021;3:e200127. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 1] [Cited by in F6Publishing: 1] [Article Influence: 0.3] [Reference Citation Analysis (0)] |
43. | Merino JM, Carpintero I, Alvarez T, Rodrigo J, Sánchez J, Coello JM. Tuberculous pleural effusion in children. Chest. 1999;115:26-30. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 47] [Cited by in F6Publishing: 49] [Article Influence: 2.0] [Reference Citation Analysis (0)] |
44. | Most SB, Simons DJ, Scholl BJ, Jimenez R, Clifford E, Chabris CF. How not to be seen: the contribution of similarity and selective ignoring to sustained inattentional blindness. Psychol Sci. 2001;12:9-17. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 231] [Cited by in F6Publishing: 199] [Article Influence: 8.7] [Reference Citation Analysis (0)] |
45. | ImageNet. org. ImageNet: About. 2021. Last Accessed: Dec 10, 2021. Available from: https://image-net.org/about.php. [Cited in This Article: ] |
46. | Shead S. Researchers: Are we on the cusp of an ‘AI winter’? BBC News. 2020. Available from: https://www.bbc.com/news/technology-51064369. [Cited in This Article: ] |
47. | Wikipedia. AI Winter. 2020. Available from: https://en.wikipedia.org/wiki/AI_winter. [Cited in This Article: ] |
48. | McDermott D. Artificial Intelligence meets Natural Stupidity. ACM SIGART Bulletin. 1976; 57: 4-9. [DOI] [Cited in This Article: ] [Cited by in Crossref: 60] [Cited by in F6Publishing: 27] [Article Influence: 0.6] [Reference Citation Analysis (0)] |
49. | Adams JL. Conceptual Blockbusting (4th Ed). Basic Bookks: San Francisco, CA; 2001. ISBN: 978-0738205373. [Cited in This Article: ] |
50. | Lawton G. Neuro-symbolic AI emerges as powerful new approach. 2020. Last Accessed: Dec 10, 2021. Available from: https://searchenterpriseai.techtarget.com/feature/Neuro-symbolic-AI-seen-as-evolution-of-artificial-intelligence. [Cited in This Article: ] |
51. | Rudie JD, Rauschecker AM, Xie L, Wang J, Duong MT, Botzolakis EJ, Kovalovich A, Egan JM, Cook T, Bryan RN, Nasrallah IM, Mohan S, Gee JC. Subspecialty-Level Deep Gray Matter Differential Diagnoses with Deep Learning and Bayesian Networks on Clinical Brain MRI: A Pilot Study. Radiol Artif Intell. 2020;2:e190146. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 6] [Cited by in F6Publishing: 9] [Article Influence: 2.3] [Reference Citation Analysis (0)] |
52. | Kwon YJF, Toussie D, Finkelstein M, Cedillo MA, Maron SZ, Manna S, Voutsinas N, Eber C, Jacobi A, Bernheim A, Gupta YS, Chung MS, Fayad ZA, Glicksberg BS, Oermann EK, Costa AB. Combining Initial Radiographs and Clinical Variables Improves Deep Learning Prognostication in Patients with COVID-19 from the Emergency Department. Radiol Artif Intell. 2021;3:e200098. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 34] [Cited by in F6Publishing: 21] [Article Influence: 7.0] [Reference Citation Analysis (0)] |
53. | Jamshidi MB, Lalbakhsh A, Talla J, Peroutka Z, Hadjilooei F, Lalbakhsh P, Jamshidi M, Spada L, Mirmozafari M, Dehghani M, Sabet A, Roshani S, Bayat-Makou N, Mohamadzade B, Malek Z, Jamshidi A, Kiani S, Hashemi-Dezaki H, Mohyuddin W. Artificial Intelligence and COVID-19: Deep Learning Approaches for Diagnosis and Treatment. IEEE Access. 2020;8:109581-109595. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 398] [Cited by in F6Publishing: 180] [Article Influence: 45.0] [Reference Citation Analysis (0)] |
54. | Helb D, Jones M, Story E, Boehme C, Wallace E, Ho K, Kop J, Owens MR, Rodgers R, Banada P, Safi H, Blakemore R, Lan NT, Jones-López EC, Levi M, Burday M, Ayakaka I, Mugerwa RD, McMillan B, Winn-Deen E, Christel L, Dailey P, Perkins MD, Persing DH, Alland D. Rapid detection of Mycobacterium tuberculosis and rifampin resistance by use of on-demand, near-patient technology. J Clin Microbiol. 2010;48:229-237. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 596] [Cited by in F6Publishing: 598] [Article Influence: 39.9] [Reference Citation Analysis (0)] |
55. | Bartels RH, Meijer FJ, van der Hoeven H, Edwards M, Prokop M. Midline shift in relation to thickness of traumatic acute subdural hematoma predicts mortality. BMC Neurol. 2015;15:220. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 30] [Cited by in F6Publishing: 30] [Article Influence: 3.3] [Reference Citation Analysis (0)] |
56. | Chiewvit P, Tritakarn SO, Nanta-aree S, Suthipongchai S. Degree of midline shift from CT scan predicted outcome in patients with head injuries. J Med Assoc Thai. 2010;93:99-107. [PubMed] [Cited in This Article: ] |
57. | Xiao F, Liao CC, Huang KC, Chiang IJ, Wong JM. Automated assessment of midline shift in head injury patients. Clin Neurol Neurosurg. 2010;112:785-790. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 15] [Cited by in F6Publishing: 15] [Article Influence: 1.1] [Reference Citation Analysis (0)] |
58. | Simon LV, Newton EJ. Basilar Skull Fractures. StatPearls. Treasure Island (FL) 2022. Available from: https://www.ncbi.nlm.nih.gov/pubmed/29261908. [Cited in This Article: ] |
59. | Rueckel J, Huemmer C, Fieselmann A, Ghesu FC, Mansoor A, Schachtner B, Wesp P, Trappmann L, Munawwar B, Ricke J, Ingrisch M, Sabel BO. Pneumothorax detection in chest radiographs: optimizing artificial intelligence system for accuracy and confounding bias reduction using in-image annotations in algorithm training. Eur Radiol. 2021;31:7888-7900. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 4] [Cited by in F6Publishing: 11] [Article Influence: 3.7] [Reference Citation Analysis (0)] |
60. | Meng XH, Wu DJ, Wang Z, Ma XL, Dong XM, Liu AE, Chen L. A fully automated rib fracture detection system on chest CT images and its impact on radiologist performance. Skeletal Radiol. 2021;50:1821-1828. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 17] [Cited by in F6Publishing: 32] [Article Influence: 10.7] [Reference Citation Analysis (0)] |
61. | Perera TB, King KC. Flail Chest. StatPearls. Treasure Island (FL) 2022. Available from: https://www.ncbi.nlm.nih.gov/pubmed/30475563. [Cited in This Article: ] |
62. | Mageit S. Artificial intelligence delivers boost for heart attack patients. NHS Royal Free London NHS Foundation Trust 2021. Last Accessed: May 18, 2021. Available from: https://www.royalfree.nhs.uk/news-media/news/artificial-intelligence-delivers-boost-for-heart-attack-patients/. [Cited in This Article: ] |
63. | Ewumi O. AI in 3-D Bioprinting. 2021. Last Accessed: Jan 18, 2022. Available from: https://dataconomy.com/2021/04/ai-in-3d-bioprinting/. [Cited in This Article: ] |
64. | Robson D. The Intellect Trap. W. W. Norton & Company; 2019. ISBN: 978-0393651423. [Cited in This Article: ] |
65. | Busby LP, Courtier JL, Glastonbury CM. Bias in Radiology: The How and Why of Misses and Misinterpretations. Radiographics. 2018;38:236-247. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 165] [Cited by in F6Publishing: 136] [Article Influence: 22.7] [Reference Citation Analysis (0)] |
66. | Egglin TK, Feinstein AR. Context bias. A problem in diagnostic radiology. JAMA. 1996;276:1752-1755. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 15] [Cited by in F6Publishing: 44] [Article Influence: 1.6] [Reference Citation Analysis (0)] |
67. | Mac R. Facebook Apologizes After A.I. Puts ‘Primates’ Label on Video of Black Men. New York Times. 2021. Sept 3, 2021. Available from: https://www.nytimes.com/2021/09/03/technology/facebook-ai-race-primates.html. [Cited in This Article: ] |
68. | Molnar C. Interpretable machine learning: A Guide for Making Black Box Models Explainable; 2019. [Cited in This Article: ] |
69. | Wikipedia. Monty Hall Problem. 2018. Available from: https://en.wikipedia.org/wiki/Monty_Hall_problem. [Cited in This Article: ] |
70. | Loyola-Gonzalez O. Black-Box vs. White-Box: Understanding Their Advantages and Weaknesses From a Practical Point of View. IEEE access. 2019;7:154096-154113. [DOI] [Cited in This Article: ] |
71. | Cortes C, Vapnik V. Support-Vector Networks. Machine Learning. 1995; 20(3):273–97.. [DOI] [Cited in This Article: ] [Cited by in Crossref: 22481] [Cited by in F6Publishing: 10025] [Article Influence: 668.3] [Reference Citation Analysis (0)] |
72. | Hastie T, Tibshirani R, Friedman J. Elements of Statistical Learning: Data Mining, Inference and Prediction (2nd Ed). Springer: Stanford, CA; 2008. [Cited in This Article: ] |
73. | McDermott J. R1: An Expert in the Computer Systems Domain. Proceedings of the First AAAI Conference on Artificial Intelligence; Stanford, California 1980. Available from: https://aaai.org/Papers/AAAI/1980/AAAI80-076.pdf. [Cited in This Article: ] |
74. | Heaven D. Why deep-learning AIs are so easy to fool. Nature. 2019;574:163-166. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 117] [Cited by in F6Publishing: 87] [Article Influence: 17.4] [Reference Citation Analysis (0)] |
75. | Brownlee J. A Gentle Introduction to Generative Adversarial Networks (GANs). 2019. Last Accessed: Dec 1, 2021. Available from: https://machinelearningmastery.com/what-are-generative-adversarial-networks-gans/. [Cited in This Article: ] |
76. | Wikipedia. 1983 Soviet nuclear false alarm incident. 2021. Available from: https://en.wikipedia.org/wiki/1983_Soviet_nuclear_false_alarm_incident. [Cited in This Article: ] |
77. | Montavon G, Bach S, Binder A, Samek W, Muller K-R. Explaining NonLinear Classification Decisions with Deep Taylor Decomposition. Pattern Recognition. 2017; 65: 211-222. Available from: https://arxiv.org/pdf/1512.02479.pdf. [Cited in This Article: ] |
78. | Lauritsen SM, Kristensen M, Olsen MV, Larsen MS, Lauritsen KM, Jørgensen MJ, Lange J, Thiesson B. Explainable artificial intelligence model to predict acute critical illness from electronic health records. Nat Commun. 2020;11:3852. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 108] [Cited by in F6Publishing: 137] [Article Influence: 34.3] [Reference Citation Analysis (0)] |
79. | Bentley JL. Programming Pearls (2nd edition). Addison-Wesley: Reading, MA; 1999. ISBN: 0-201-11889-0. [Cited in This Article: ] |
80. | US Food and Drug Administration. Artificial Intelligence and Machine Learning in Software as a Medical Device: Action Plan. 2019 (updated 2021). Last Accessed: 4/1/2022. Available from: https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-software-medical-device. [Cited in This Article: ] |
81. | European Commission. Proposal for a Regulation of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act) and Amending Certain Union Legislative Acts. April 21 2021. Last Accessed: 4/1/2022. Available from: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A52021PC0206. [Cited in This Article: ] |
82. | Human Rights Watch. How the EU’s Flawed Artificial Intelligence Regulation Endangers the Social Safety Net: Questions and Answers. 2022. Last Accessed: 4/1/2022. Available from: https://www.hrw.org/news/2021/11/10/how-eus-flawed-artificial-intelligence-regulation-endangers-social-safety-net. [Cited in This Article: ] |
83. | Dean J, Ghemawat S. editors. MapReduce: Simplified Data Processing on Large Clusters Sixth Symposium on Operating System Design and Implementation; 2004. Available from: http://static.googleusercontent.com/media/research.google.com/en/us/archive/mapreduce-osdi04.pdf. [Cited in This Article: ] |
84. | Ng D, Lan X, Yao MM, Chan WP, Feng M. Federated learning: a collaborative effort to achieve better medical imaging models for individual sites that have small labelled datasets. Quant Imaging Med Surg. 2021;11:852-857. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 16] [Cited by in F6Publishing: 35] [Article Influence: 11.7] [Reference Citation Analysis (0)] |
85. | Sheller MJ, Edwards B, Reina GA, Martin J, Pati S, Kotrotsou A, Milchenko M, Xu W, Marcus D, Colen RR, Bakas S. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci Rep. 2020;10:12598. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 582] [Cited by in F6Publishing: 280] [Article Influence: 70.0] [Reference Citation Analysis (0)] |
86. | Sarma KV, Harmon S, Sanford T, Roth HR, Xu Z, Tetreault J, Xu D, Flores MG, Raman AG, Kulkarni R, Wood BJ, Choyke PL, Priester AM, Marks LS, Raman SS, Enzmann D, Turkbey B, Speier W, Arnold CW. Federated learning improves site performance in multicenter deep learning without data sharing. J Am Med Inform Assoc. 2021;28:1259-1264. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 30] [Cited by in F6Publishing: 62] [Article Influence: 20.7] [Reference Citation Analysis (0)] |
87. | Navia-Vazquez A, Vazquez-Lopez M, Cid-Sueiro J. Double Confidential Federated Machine Learning Logistic Regression for Industrial Data Platforms. Proceedings of the 37th International Conference on Machine Learning; Vienna, Austria 2020. Available from: http://www.tsc.uc3m.es/~navia/FL-ICML2020/DCFML-FL_ICML2020-A_Navia_Vazquez_et_al.pdf. [Cited in This Article: ] |
88. | Merchant SA, Shaikh MJ, Nadkarni PM. Tuberculosis Conundrum - Current and Future Scenarios: A proposed comprehensive approach combining Laboratory, Imaging and Computing Advances. World J Radiol. 2022;. [Cited in This Article: ] |
89. | Wikipedia. NP-hardness. Wikimedia foundation; 2022. Last Accessed: 8/3/2015. Available from: https://en.wikipedia.org/wiki/NP-hardness. [Cited in This Article: ] |
90. | Shor PW. Algorithms for quantum computation: discrete logarithms and factoring. Proceedings 35th Annual Symposium on Foundations of Computer Science: IEEE Comput. Soc. 1994;p:124-134. [DOI] [Cited in This Article: ] |
91. | Wikipedia. Optical Computing. 2022. Last Accessed: 4/1/2022. Available from: https://en.wikipedia.org/wiki/Optical_computing. [Cited in This Article: ] |
92. | Pires F. Research Opens the Door to Fully Light-Based Quantum Computing 2020. Last Accessed: 4/1/2022. Available from: https://www.tomshardware.com/news/research-opens-the-door-to-fully-light-based-quantum-computing. [Cited in This Article: ] |
93. | Aaronson S. The Limits of Quantum Computers. Scientific American. 62-69. [Cited in This Article: ] |
94. | Sarma SD. Quantum computing has a hype problem. MIT Technology Review. March 28, 2022. Available from: https://www.technologyreview.com/2022/03/28/1048355/quantum-computing-has-a-hype-problem/. [Cited in This Article: ] |
95. | Clarke AC. Arthur C. Clarke Quotes. 2022. Last Accessed: 4/1/2022. Available from: https://www.brainyquote.com/quotes/arthur_c_clarke_100793. [Cited in This Article: ] |