Classification Title: |
Data Scientist III
|
Classification Minimum Requirements: |
A Bachelor's Degree in data science, statistics, bioinformatics, analytics, or similar field and five years of experience; Master's Degree in data science, statistics, bioinformatics, analytics, or similar field and three years of experience; Doctoral Degree in data science, statistics, bioinformatics, analytics, or similar field and one year of experience.
|
Job Description: |
The Department of Medicine, Division of Nephrology Quantitative Health is seeking a full time Data Scientist III. This position supports a federally funded, interdisciplinary research initiative based in the Computational Microscopy Imaging Lab (CMIL). The project unifies clinical, imaging, and molecular data to develop predictive models of disease progression. The Data Scientist III – Clinical Text & EHR Data Lead is responsible for developing pipelines that extract and harmonize structured and unstructured patient data, including NLP-based processing of clinical notes. This role supports data preparation, temporal alignment, and integration of clinical features into a multimodal AI research platform. The position reports to Dr. Pinaki Sarder, Principal Investigator.
Essential Functions;
NLP Pipeline Development and Text Analysis - Build and maintain pipelines using tools such as MedSpaCy, cTAKES, or similar to extract structured variables from clinical notes. - Tune entity recognition, concept mapping, and negation detection to support patient-level feature generation. - Document pipeline logic and validation metrics. Structured EHR Feature Engineering - Develop tools to extract, clean, and organize structured EHR variables (e.g., labs, medications, diagnoses). - Apply clinical standards (e.g., OMOP, FHIR) to support semantic consistency and cross-site interoperability. - Transform EHR data into research-ready formats aligned with modeling needs. Temporal Alignment and Multimodal Integration - Align clinical events with imaging and biopsy timelines to enable time-resolved analysis. - Support the construction of longitudinal patient records for AI model training and validation. - Troubleshoot data conflicts, gaps, and synchronization issues. Collaboration and Data Stewardship - Liaise with institutional data providers to ensure accurate, secure data transfers. - Contribute to protocol development and maintain clear analytic traceability. Communicate updates and collaborate across project teams and external sites. Mentorship and Innovation Support - Provide informal guidance to student researchers or junior analysts. - Recommend new tools or analytic methods to improve pipeline performance.
|
Expected Salary: |
$82,000 - $95,000
|
Required Qualifications: |
A Bachelor's Degree in data science, statistics, bioinformatics, analytics, or similar field and five years of experience; Master's Degree in data science, statistics, bioinformatics, analytics, or similar field and three years of experience; Doctoral Degree in data science, statistics, bioinformatics, analytics, or similar field and one year of experience.
|
Preferred: |
Experience with NLP in the clinical domain using libraries like MedSpaCy or cTAKES Knowledge of EHR data structures, standards, and interoperability frameworks (e.g., OMOP, FHIR) Familiarity with Python and clinical data integration tools Strong organizational skills and attention to reproducibility and versioning Experience collaborating with clinical, data science, or research stakeholders Additional technical certifications (e.g., AWS, Security+, etc.) may be encouraged but not required.
|
Special Instructions to Applicants: |
In order to be considered, you must upload your cover letter and resume.
This is a time-limited position.
Application must be submitted by 11:55 p.m. (ET) of the posting end date.
|
Health Assessment Required: |
No |