Datasets

Privileged Datasets

Massachusetts Eye and Ear Dataset: We have demographic and clinical information for 1.67 million patients, 1.91 million fundus photos from 67,000 patients, 824,000 optical coherence tomography scans from 81,000 patients, 30,000 visual fields from 74,000 patients.

Glaucoma Research Network Dataset: We have 602,000 24-2 visual fields from 129,000 patients and 54,000 10-2 visual fields from 12,000 patients. The Glaucoma Research Network is comprised of Massachusetts Eye and Ear at Harvard Medical School, Wilmer Eye Institute at Johns Hopkins University, New York Eye and Ear Infirmary at Icahn School of Medicine at Mount Sinai, Bascom Palmer Eye Institute at University of Miami, Wills Eye Hospital at Thomas Jefferson University, and Edward S. Harkness Eye Institute at Columbia University, Hamilton Eye Institute at University of Tennessee Health Science Center.

LIFE Dataset: We have data from 10,000 patients at baseline and 2,000 patients at 5-year follow-up from the LIFE-Adult Study randomly selected participants from 550,000 residents in Leipzig, which is a population-based cohort study conducted by the Leipzig Research Centre for Civilization Diseases (LIFE), University of Leipzig in Germany. All participants underwent fundus photography and optical coherence tomography examination, in addition to an extensive core assessment including physical examinations, cognitive function tests, genetic data, biospecimen tests, structured interviews, questionnaires, brain magnetic resonance imaging scans, etc.

American Academy of Ophthalmology IRIS Registry Dataset:  The American Academy of Ophthalmology IRIS Registry (Intelligent Research in Sight) is the nation’s first EHR-based comprehensive eye disease and condition clinical registry and is also the world’s largest clinical data registry. As of May 2021, the IRIS Registry had participation from over 14,000 ophthalmologists and their 3,300 employed optometrists and included approximately 400 million patient encounters from about 70 million unique patients. Its reach continues to grow, providing ophthalmologists with clinical benchmarks and practice patterns. The Academy developed it as part of the profession’s shared goal of continual improvement in the delivery of eye care. Massachusetts Eye and Ear is one of the four institutions having full access to the entire dataset.

UK Biobank Dataset: We have data from 68,000 patients at baseline and 19,000 patients at follow-up with fundus photos, optical coherence tomography scans, physical examinations, cognitive function tests, genetic data, biospecimen tests, self-reported health conditions, ICD codes, brain, cardiac and abdominal magnetic resonance imaging scans, dual-energy X-ray absorptiometry scans, carotid ultrasound scans, etc. We get the UK Biobank data by paying an access fee.

Public Datasets

Harvard Glaucoma Detection with 500 Samples (Harvard-GD500): This Harvard-GD500 dataset includes 500 samples from 500 patients for glaucoma detection to confirm results in our paper “Artifact-Tolerant Clustering-Guided Contrastive Embedding Learning for Ophthalmic Images in Glaucoma" published in the Journal of Biomedical and Health Informatics. Here is the data download link for Harvard-GD500. The data use license is CC BY-NC-ND 4.0.

Harvard Glaucoma Detection and Progression with 1000 Samples (Harvard-GDP1000): This Harvard-GDP1000 dataset includes 1,000 samples from 1,000 patients for glaucoma detection and progression forecasting to confirm results in our paper “Harvard Glaucoma Detection and Progression: A Multimodal Multitask Dataset and Generalization-Reinforced Semi-Supervised Learning" published in the 2023 International Conference on Computer Vision. The corresponding code is available on our GitHub repository Harvard-GDP. Here is the data download link for Harvard-GDP1000. The data use license is CC BY-NC-ND 4.0.

Harvard Glaucoma Fairness with 3300 Samples (Harvard-GF3300): This Harvard-GF3300 dataset includes 3,300 samples from 3,300 patients for fairness learning in glaucoma used in our manuscript “Harvard Glaucoma Fairness (Harvard-GF): A Retinal Nerve Disease Dataset for Fairness Learning and Fair Identity Normalization" to 2023 ICCV. The corresponding code is available on our GitHub repository Harvard-GF. Here is the data download link for Harvard-GF3300. The data use license is CC BY-NC-ND 4.0.

Harvard Diabetic Retinopathy Fairness with 3300 Samples (Harvard-DRF3300): This Harvard-DRF3300 dataset includes 3,300 samples from 3,300 patients for diabetic retinopathy used in our manuscript “Harvard Diabetic Retinopathy Fairness (Harvard-DRF): A 3D Imaging Dataset of Diabetic Retinopathy for Fairness Learning" to 2023 NeurIPS Datasets & Benchmarks. The corresponding code is available on our GitHub repository Harvard-DRF. Here is the data download link for Harvard-DRF3300. The data use license is CC BY-NC-ND 4.0.