Graduate Internship: Human-in-the-Loop Named Entity Recognition for Biological Data Curation

Apply now Job no: 539700
Work type: Student Ast
Location: Main Campus (Gainesville, FL)
Categories: Biology/Life Science, Libraries/Museums, Artificial Intelligence
Department:55010900 - LB-ACADEMIC RESEARCH CONS-SERV

Classification Title:

Graduate Internship: Human-in-the-Loop Named Entity Recognition for Biological Data Curation

Classification Minimum Requirements: Currently enrolled in good standing in a UF graduate program. 
Job Description:

The George A. Smathers Libraries is offering a graduate internship supervised by Dr. Borui Zhang in collaboration with Dr. Jonathan Nations (Florida Museum of Natural History). The project focuses on building an open-access, machine-readable database of mammalian dietary information extracted from decades of unstructured scientific literature. It integrates text data mining, NLP pipeline engineering, and biodiversity science. The graduate intern will construct and validate ground-truth datasets for an AI-driven data extraction pipeline supporting this effort. This internship is funded through the Smathers Graduate Internship Program.

RESPONSIBILITIES

The intern will:

  1. Use authorized APIs to identify and retrieve relevant mammalian dietary literature, with emphasis on Mammalian Species Accounts.
  2. Apply the project’s NER (Named Entity Recognition) pipeline to extract consumer (mammal species) and food class entities from full-text publications.
  3. Systematically validate AI-generated annotations using professional annotation tools, building a rigorously reviewed ground-truth reference dataset.
  4. Meeting with the Internship Director and faculty partner weekly to resolve ambiguous cases and refine prompt strategies.
  5. Assist in organizing and cleaning metadata fields, standardizing ecological data for integration into the project’s GitHub repository.
  6. Prepare technical documentation summarizing inter-annotator agreement metrics, common extraction error patterns, and recommendations for prompt and pipeline refinement.
  7. Deliver a final presentation to UF Libraries and Florida Museum research teams summarizing methods, findings, and lessons learned.

SCHEDULE

10 hours a week.  

Expected Salary:

$22 /hour

Required Qualifications:
  • Currently enrolled in good standing in a UF graduate program in Computer Science, Computational Linguistics, Information Science, Data Science, Biology/Ecology, or a closely related field.
  • Proficiency in Python for data processing, file I/O, JSON and XML parsing, pandas-style tabular manipulation, and basic scripting against REST APIs (e.g., NCBI Entrez) for literature retrieval
  • Experience with at least one NLP task: sequence labeling, entity recognition, text classification, or annotation of linguistic phenomena in real, noisy text.
  • Demonstrated ability to work independently on sustained research tasks with attention to detail and consistent documentation habits.
  • Comfort reasoning about linguistic ambiguity: you should be able to explain why a given span of text is or is not a valid entity mention and articulate the decision rule you applied.
Preferred:
  • Background or coursework in biology, ecology, or environmental sciences.
  • Prior experience with text annotation platform (e.g., Label Studio, Prodigy, INCEpTION, or similar platforms)
  • Familiarity with biomedical or biodiversity literature retrieval (PubMed/Entrez API, GBIF, BHL, iDigBio) or other large-scale scholarly text corpora.
  • Experience working with version control systems or structured research data repositories, comfort with JSON schema validation.
  • Exposure to transformer-based models (BERT, BioBERT, or similar): fine-tuning, inference, or attention analysis.
Special Instructions to Applicants:

In order to be considered, you must upload: 

  1. A current CV or résumé.
  2. A brief statement (300–500 words) describing: a. your experience with Python b. any prior computational annotation work in real data; and c. why the intersection of NLP and biodiversity informatics interests you.
  3. Contact information for one academic or professional reference.

Application must be submitted by 11:55 p.m. (ET) of the posting end date.

Health Assessment Required: No

 

Advertised: Eastern Daylight Time
Applications close: Eastern Daylight Time

Back to search results Apply nowRefer a friend


Send me jobs like these

We will email you new jobs that match this search.