Dataset¶
We gathered lumbar spine MRI studies from 257 patients who had experienced LBP in the past. Each study consisted of up to three MRI series, resulting in a total of 544 series. The studies were obtained from four hospitals, which included an academic center, two regional hospitals, and one orthopedic hospital, between January 2019 and March 2022. The standard sagittal T1 and T2 images' resolution ranged from 3.3 x 0.33 x 0.33 mm to 4.8 x 0.90 x 0.90 mm. Sagittal T2 SPACE sequence images had a near isotropic spatial resolution with a voxel size of 0.90 x 0.47 x 0.47 mm.
All visible vertebrae, intervertebral disks, and the spinal canal in each series were manually segmented, excluding the sacrum. The segmentation was performed by a medical trainee supervised by a medical imaging expert and an experienced MSK radiologist. An automatic baseline segmentation algorithm was trained on a small dataset, which enabled the automatic segmentation of unseen images. The predicted segmentations were reviewed, manually corrected, and added to the training data. This process was repeated several times by retraining the automatic segmentation model until the entire dataset was annotated. Twenty high-resolution T2 (SPACE) series were randomly selected and manually annotated. All other segmentations were created by described iterative annotation strategy. All annotations and corrections were done using 3D Slicer version 5.0.3.
The reference segmentations in this dataset are labeled from the bottom up. In this labeling scheme, the lowest lumbar vertebra, which is usually L5 but can also be L4 or L6, is assigned the label 1. The subsequent vertebra above it is labeled as 2, and the labeling continues in ascending order for the rest of the vertebrae in the series. For labeling the intervertebral discs (IVDs), the label is based on the vertebra immediately above it, with the lowest IVD assigned the label 201, the one above it labeled as 202, and so on. The spinal canal is assigned the label 100.
The training data can be found here: https://zenodo.org/doi/10.5281/zenodo.8009679. The paper on the dataset and the baseline algorithms can be found here: https://www.nature.com/articles/s41597-024-03090-w. The dataset is published under a CC-BY 4.0 licence: https://creativecommons.org/licenses/by/4.0/legalcode.
Data split¶
- Public training and training and validation set (218 out of 257 studies, 85%).
- Hidden test set (39 out of 257 studies, 15%).
The twenty studies that were annotated completely manually were divided over the hidden test set (15 studies) and the training set (5 studies). All remaining studies were randomly divided between the different sets according to the prescribed data split. Series belonging to the same patient were always placed in the same set.