Content-based Image Retrieval: Portal Venous CT of Liver Lesions
 
Authors:
Sandy Napel, PhD, Stanford University School of Medicine; Christopher F. Beaulieu, MD, PhD; Jessica S. Faruque, MS; Cesar Rodriguez, MD; Daniel Korenblum, MS; Jing-Yu Cui, MS; Jiajing Xu, MS; Ankit Gupta, MS; Hayit Greenspan, PhD; Grace Tye, MD; Daniel L. Rubin, MD, MS
 
Hypothesis:
Semantic and computer-derived image features can be combined to retrieve images demonstrating similarly appearing liver lesions from a database of portal venous CT scans.
 
Introduction:
While the magnitude of digital radiological image data existing on PACS in the world is already measured in petabytes, it is currently impossible for a radiologist or researcher to find images that are similar to another. Radiologists rely on training, experience, and memory for their image interpretation tasks and, as a result, exhibit variable performance in terms of accuracy and efficiency. The ability to access similar images together with their associated metadata, including biopsy results, successful therapies, and survival, could result in improved performance across the spectrum of practicing radiologists, and could also facilitate knowledge discovery regarding relationships between disease and imaging phenotype. We built a prototype system that allows us to explore these concepts and, as a first step, applied it to a database of portal venous liver CT scans exhibiting various types of liver lesions.
 
Methods:
We used a combination of semantic features described radiologists, and pixel-based features derived by computer algorithms, to create rich feature vectors describing each image, as described next:

Semantic Features: We used OsiriX[1] to review CT studies and to identify lesions with a Region-of-Interest (ROI), and a custom-developed plug-in, called image Physician Annotation Device (iPAD),[2] to annotate them with semantic features. Users are prompted to describe lesions according to 12 categories of semantic description, enabling a thorough description of each lesion. As the user types, iPad checks on-the-fly for direct mapping to 161 terms from an augmented list of RadLex terms. The resulting coded annotation is stored in a file compliant with the AIM (Annotation and Image Markup) standard, established by the National Cancer Institute’s Cancer Bioinformatics Grid (CaBIG),[3,4] and uploaded to a database.

Texture Features: We computed multiple features for each lesion based on its pixels, including (a) gray-level histogram-based: 14 features including the histogram itself , the low frequency coefficients of its 3-level Haar wavelet transform, the abscissa of its peak, and its variance,[5] and (b) Gabor features:[6] mean of the energy in the frequency domain over 4 scales and 8 orientations in each of 32 bins, for a total of 46 features.

Boundary Features: We computed features for boundary sharpness by bi-linearly interpolating profiles along radial line segments that are automatically drawn at many angles from the center outwards towards a dilated version of the ROI boundary. We characterized the difference in intensity between the lesion and the surrounding liver and the sharpness of the margin along this profile using parameters derived by fitting a sigmoid function to each intensity profile, and averaging the parameters across all profiles.

Similarity Metric: All image features described above were used to create a feature vector describing each lesion. Similarity between any pair of images was computed as the negation of a weighted sum of differences between corresponding elements of two feature vectors. Weights were derived using a machine-learning method that maximizes the retrieval performance, as measured by Normalized Discounted Cumulative Gain (NDCG: see Evaluation Metrics, below) compared to a separate Similarity Standard (see below).

Patient Data: We selected 81 portal venous phase CT images from patients demonstrating liver lesions, including 25 cysts, 24 metastases, 14 hemangiomas, 7 HCCs, 6 focal nodular hyperplasias, 3 abscesses, 1 laceration, and 1 focal fat deposit. One radiologist used OsiriX and iPAD to annotate each lesion as described above, followed by completion of each feature vector using automatically derived texture and boundary features.

Similarity Standard: We created a Web-based tool that randomly selects an image from the database and asks the user to rate 5 lesion characteristics (heterogeneity, boundary shape, margin, density, and rim) on a 5-point scale, and computes visual similarity based on the difference between ratings of each characteristic. Five readers used this tool to rate the 81 CT images of lesions. Characteristic-combining weights were derived using a bootstrap approach, optimizing the weights to match results in a subset of the database for which a degree of general visual similarity was agreed upon by a consensus of two radiologists.

Evaluation Metrics: To evaluate performance, we used Precision-Recall (P-R), which plots the number of similar images (defined as having an average similarity score in the Similarity Standard of 3.0 or greater) retrieved divided by the total number of images retrieved vs. the number of similar images retrieved divided by the total number of similar images in the database.[7] As P-R does not account for graded truth, as in our 5-point characteristic similarity scale, nor does it account for the order of retrieved images in a ranked list, we also used Normalized Discounted Cumulative Gain (NDCG) as a function of the number of images, K, desired and retrieved.[8] At a given K, higher NDCG(K) means more lesions similar to the query image are ranked ahead of dissimilar ones, with NDCG(K)=1 implying perfect retrieval of K images. Both analyses were conducted by withholding each lesion in turn from the database as the query image, training the weights used in the Similarity Metric described above using the remaining 80 images together with the Similarity Standard, and retrieving images in order of their similarity to the query images using the same metric with the trained weights.

 
Results:
Figure 1 shows a precision-recall plot showing average precision values of over 90% for all values of recall, considered excellent performance. Figure 2 shows our NDCG(K) results, corroborating excellent retrieval performance with average gain of over 90% when retrieving 3 or more images. Figure 3 shows an example for a single query image, including the top ranked 11 and bottom ranked 12 images, representative of the results in general; images that appear most similar to the query image were ranked higher than those that were less so, with a small number of exceptions.

Figure 1

Figure 1: Precision-recall plot, showing average performance (circles; error bars indicate 1 s.d.) in our set of 81 portal venous liver CT images containing various types of lesions. Note excellent performance with average precision > 90% at all values of recall.

Figure 2

Figure 2: Normalized Discounted Cumulative Gain plot showing average gain (circles; error bars indicate 1 s.d.) as a function of the number, K, of top-ranked retrieved images using our 81-lesion database. Note that the average gain is greater than 90% for all queries retrieving 3 or more images.

Figure 3

Figure 3: Retrieval example with small cyst as query lesion, with integer rankings and similarity scores in parentheses above each ranked image. The image matrix shows the 11 images with the most similar lesions (top two rows) and the 12 least similar ones (bottom 2 rows) from our 81-lesion database.

 
Discussion:
Content-based retrieval of medical images is not new,[7,9] but most results to date center on retrieving images of certain types (e.g., CT vs. MR) and/or of particular anatomical regions (e.g., hand vs. head).[10] Retrieval of similarly appearing lesions has the potential to support diagnostic decision-making by providing additional data associated with the retrieved images. While diagnosis of liver lesions from CT scans is a common and important clinical problem,[11,12] our method is quite general: it uses an evolving standard terminology (RadLex) and a standard information model (AIM), making it amenable to other modalities and applications in the future. Despite the excellent results, our study does have limitations. We have not yet included lesion boundary shape, another important imaging feature of lesions, as this requires fairly accurate segmentation. Inclusion of this feature is expected to improve performance further. Other limitations of the study include a small database containing few representatives of several lesion types, and a non-objective similarity standard against which to validate performance.
 
Conclusion:
We have demonstrated a content-based retrieval system for lesions seen on portal venous CT scans that incorporates semantic features observed by radiologists, as well as features computationally-extracted from the images themselves, and shown it to be capable of excellent retrieval results. Our preliminary results encourage development and evaluation in this and other clinical areas. Ultimately, this approach could provide real-time decision support to practicing radiologists by showing them similar images with associated diagnoses and, where available, results of tissue analyses, responses to various therapies and outcomes.
 
References:
1. Rosset A, Spadola L, Ratib O. OsiriX: An open-source software for navigating in multidimensional DICOM images. J Digit Imaging. 2004;17:205-216. (PMID:15534753)
2. Rubin DL, Rodriguez C, Shah P, Beaulieu C. iPad: Semantic Annotation and Markup of Radiological Images. AMIA Annu Symp Proc. 2008;626-630. (PMID:18999144)
3. Channin DS, Mongkolwat P, Kleper V, Sepukar K, Rubin DL. The caBIG Annotation and Image Markup Project. J Digit Imaging. 2009;[Epub ahead of print]. (PMID:19294468)
4. Rubin DL, Mongkolwat P, Kleper V, Supekar K, Channin DS. Annotation and Image Markup: Accessing and Interoperating with the Semantic Content in Medical Imaging. IEEE Intelligent Systems. 2009;24:47-56.
5. Strela V, Heller PN, Strang G, Topiwala P, Heil C. The application of multiwavelet filterbanks to image processing. IEEE Trans Image Process. 1999;8:548-563. (PMID:18262898)
6. Zhao CG, Cheng HY, Huo YL, Zhuang TG. Liver CT-image retrieval based on Gabor texture. In:IEMBS ‘04. 26th Annual International Conference of the IEEE. 2004;1491-1494.
7. Muller H, Michoux N, Bandon D, Geissbuhler A. A review of content-based image retrieval systems in medical applications-clinical benefits and future directions. Int J Med Inform. 2004;73:1-23. (PMID:15036075)
8. Jarvelin K, Kekalainen J. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 2002;20:422-446.
9. Kahn CE Jr, Thao C. GoldMiner: A radiology image search engine. AJR Am J Roentgenol. 2007;188:1475-1478. (PMID:17515364)
10. Hersh W, Muller H, Kalpathy-Cramer J. The ImageCLEFmed Medical Image Retrieval Task Test Collection. J Digit Imaging. 2008;[Epub ahead of print]. (PMID:18769965)
11. Kamel IR, Liapi E, Fishman EK. Liver and biliary system: Evaluation by multidetector CT. Radiol Clin North Am. 2005;43:977-997;vii. (PMID:16253658)
12. Marin D, Brancatelli G, Federle MP, Midiri M, Furlan A. Imaging Approach for Evaluation of Focal Liver Lesions. Clin Gastroenterol Hepatol. 2009. (PMID:19348962)