Creation and Management of a Lung Cancer Patient Database
Through a Campus Enterprise Data Warehouse |
| |
| Authors: |
| Vivek V. Dave, MD, Northwestern University; David S. Channin, MD; Mathew G. Blum, MD |
| |
| Background: |
| Lung cancer evaluation and treatment is based on a variety of clinical data. Optimal treatment requires integration and analysis of data from a variety of clinical, imaging, surgical, and pathology sources. While, for the most part, this data is stored electronically, it is often fragmented and not well integrated in a manner that facilitates retrospective trend analysis, clinical decision making, or research[1,2]. Creation of a problem centric database requires knowledge of available campus information system resources, the methods used to access and integrate them, and anticipation of future information requirements[1]. We describe a streamlined process of database creation through the use of a campus enterprise data warehouse (EDW).
The EDW exists within the Northwestern University Biomedical Informatics Center (NUBIC), the computing arm of the Northwestern Clinical and Translational Sciences Institute (NUCATS). Oversight comes from the EDW Board of Trustees composed of senior management of the participating institutions. This group, in conjunction with the institutional review board (IRB), and the director of the EDW have developed a set of policies and procedures to govern access and use of the EDW. Within each institution, a data steward, identified for each data source, is responsible for the supervision of those policies and procedures. The EDW is comprised of approximately 45 different schemas, representing a variety of inpatient and outpatient electronic medical record, billing, and research systems. The total storage used by the EDW exceeds 7 TB and is growing at the rate greater than 1 GB/day. Loading data from transactional systems in an unadulterated fashion provides many benefits, not the least of which being that the data warehouse team can provide early value to stakeholders by enabling greater access to their data without the potential for negatively impacting front-end system performance or burdening centralized technology resources. By querying the EDW, we were able to populate a database that could be used for clinical and research decision support. Database creation was supplemented by other electronic and offline resources when necessary. |
| |
| Evaluation: |
| We began by extracting billing data from one of our institution’s cardiothoracic surgeons, which generated a list of 724 lung cancer patients. This was accomplished by searching for patients from 2004–2009 whose invoice diagnosis codes (ICD9) indicated partial or complete lung resections for presumed malignancy. Clinical notes, radiology imaging, laboratory results, pathology results, and surgical procedure data was then mined from the EDW. Specific data we accessed include patient demographics and social history, dates and results of computed tomography (CT) and positron emission tomography (PET) scans, laboratory and histology data, presence of nodal disease, tumor immunohistochemistry (IHC), stage of tumor, and type of surgical procedure. Initially, we gathered individual data manually by accessing specific systems on campus, but migrated to EDW access to gather data faster and more efficiently. When appropriate, all data was derived from the EDW utilizing structured query language (SQL) queries via a dedicated research portal.
Imaging data for these patients was extracted from the picture archiving and communication system (GE Centricity PACS). Tumor measurements were determined after images were processed by computer assisted, manual volumetric segmentation, which resulted in the production of Annotation and Image Markup (AIM) standard format annotations which were added to the database. Some pathology results were extracted from our hospital information system (HIS) and campus tumor registry. Several surgical procedure notes were accessed using the hospital electronic medical record (EMR), and this information often led to more focused data mining from the EDW. Offline sources were used for some surgical and pathology results because some data may have been shared or evaluated at other institutions. |
| |
| Discussion: |
| Clinical research today requires a multidisciplinary approach. As such, an organized and efficient method for data integration needs to be in place to elucidate relationships between clinical, imaging, pathology, surgical, and eventually genomic and proteinomic data. To analyze this data, a common integrated database needs to be present. We created such a database by utilizing our campus EDW. When the EDW was not available, or did not contain the data or records we needed, alternate electronic sources were accessed. Creation of a clinical database requires an understanding of the electronic resources available on a campus, the format of that data, as well as the specific methods used to access these resources[1]. Once created, a lung cancer database can be quickly created, and subsequently used to efficiently analyze a multitude of variables, trends, and clinical data. |
| |
| Conclusion: |
| Hospital databases and medical records offer an expansive array of resources for clinical and research use. However, the systems often are maintained in a decentralized and fragmented manner. By creating a lung cancer database, mainly from our EDW, we have shown by example how to streamline creation of a problem centric database. We built this database to facilitate research and clinical decisions in lung cancer patients. Clinical trends can be determined rapidly from this retrospective dataset. With minimal planning, the database can be made portable and extensible in anticipation of future research and clinical needs. Such rapid integration of data could likewise be extended to other areas, but as supplementing clinical data at time of radiology exam interpretation[3] or patient risk stratification during a clinic visit. |
| |
| References: |
| [1] Dewitt JG, Hampton PM. Development of a Data Warehouse at an Academic Health System: Knowing a Place for the First Time. Acad Med. November 2005;80(11)1019-25.
[2] Grant A, et al. Integrating feedback from a clinical data warehouse into practice organisation. Int J Med Inform. March-April 2006,75(3-4):232-9.
[3] Rubin, DL, Desser TS. A Data Warehouse for Integrating Radiology and Pathologic Data. J Am Coll Radiol. March 2008;5(3):210-7. |
| |
| |
|
| |
| |
| |
|
|
|