Skip to main content

Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted.

The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov.

Updates regarding government operating status and resumption of normal operations can be found at opm.gov.

An official website of the United States government
Grant Details

Grant Number: 5R21CA202130-02 Interpret this number
Primary Investigator: Egleston, Brian
Organization: Research Inst Of Fox Chase Can Ctr
Project Title: Deep Learning for Representation of Codes Used for Seer-Medicare Claims Research
Fiscal Year: 2017


Abstract

 DESCRIPTION (provided by applicant): We propose developing an algorithm and user-friendly software to better identify treatments using Medicare claims data. We will validate our approach using procedures listed in the Surveillance, Epidemiology, and End Results (SEER) database as a gold standard. In this way, we hope to better match procedures identified using Medicare claims data with SEER listed procedures. The focus of this research is observational (i.e. non-randomized) data. Well-run randomized clinical trials can provide the best level of evidence of treatment effects. However, randomized trials in the United States have suffered from poor accrual for many interventions. Despite the fact that well-designed randomized clinical trials should be the gold standard, well-designed observational studies might be the only method of obtaining inferences concerning comparative effectiveness for some cancer interventions. In cancer research, one of the most commonly used databases for observational research is the linked SEER-Medicare database. SEER-Medicare data has provided useful measurements of the effectiveness of a number of cancer therapies. Algorithms for identifying relevant treatment and diagnosis codes using Medicare data are often based on clinical reasoning and scientific evidence. One group of researchers, for example, developed an algorithm for identifying laparoscopic surgery among kidney cancer cases before claims codes for laparoscopic surgery were well developed. While such algorithms are useful for others pursuing similar investigations, there may still be substantial mismatch between treatment identified by the SEER cancer registry and treatment identified through Medicare claims. In this work, we propose developing a rigorous machine learning algorithm that can help researchers in better identifying treatments in Medicare claims data. Specifically, we will design a neural language modeling algorithm and implement a software system that finds vector representations of diagnosis and procedure codes. We plan on using the neural language modeling algorithm to learn vector representations from SEER- Medicare claims data where related procedure and diagnosis codes are "neighbors" (i.e. closely related). We will investigate whether the codes we identify within neighborhoods correspond to the procedure codes used for published SEER-Medicare studies. We will then design a software assistant interface that will allow an investigator to explore which codes are related to a given seed of diagnosis or procedure codes. Finally, we will investigate the sensitivity and specificity of the algorithm by comparing procedures identified using Medicare claims with procedures listed in the SEER database. We will replicate analyses from a published SEER-Medicare paper to investigate if estimated treatment effects differ when using our novel algorithm compared to using the algorithm in the published paper.



Publications

Error Notice

The database may currently be offline for maintenance and should be operational soon. If not, we have been notified of this error and will be reviewing it shortly.

We apologize for the inconvenience.
- The DCCPS Team.

Back to Top