Data Access

Related Links

CATE 2024 will maintain an open data policy for its eclipse data; all data products generated at the 2023 and 2024 total solar eclipses (TSEs) will be made publicly available following calibration and integration.

We acknowledge the data sovereignty of the communities with which we will be working during this project. While not every community will be Indigenous, we will work with all communities to develop a plan for equitable data sharing according to the CARE Principles for Indigenous Data Governance. All reprocessed CATE-2017 data will be released following the same protocols and procedures as described for CATE-2024.

1. Data products to be produced
The primary data to be produced by CATE 2024 are images of the 2023 Australian TSE from each of our training stations; images of the 2024 U.S. TSE from the official volunteer stations; and calibration measurements (e.g., darks and flats) required to produce calibrated observations from each of these events. Additional data resulting from performance and characterization of the observation system not required for analysis of the observational data will be made public via publications and reports, but are not considered data products.

The full CATE 2024 observing system will generate the following data products, in Flexible Image Transport System (FITS) files:
Level-0: raw uncalibrated camera frames and associated metadata required for calibration

Level-1: Linearized, background-subtracted polarized and unpolarized brightness images in calibrated brightness units, including full metadata

Quicklook products serve as a real-time data quality monitor at individual eclipse observing sites and can be used to generate high-dynamic-range images for rapid dissemination (see Penn et al. 2020). Scientific data products comply with FITS v4.0 and WCS standards for interoperability with a variety of existing software tools used in the solar physics community. The data generated as a result of TSE 2023, including preliminary camera tests and observations of the event (12 MP polarimetric images with a cadence of roughly 5–8 images per second) between our several training setups are relatively small in volume, approximately 50 GB between all sites in raw data, and roughly the same amount of data for calibrated and compressed Level-1 products, for the roughly one minute of totality in Australia. Because all sites are co-located, data can be uploaded to a centrally shared disk, backed up on site, and transported by hand from the eclipse, before being processed on SwRI’s servers in Boulder. Individual datasets are integrated into a central set to develop methodology and software, but are distributed separately, organized by team. These data are made available on the SwRI-hosted project website.

The full CATE 2024 dataset, 40 sites producing data at the same rate as for TSE 2023, for a four-minute eclipse, will generate roughly 2 TB of raw data, which doubles to roughly 4 TB including calibrated Level-1 files. Reprocessed CATE 2017 Level-0 data is expected to require approximately the same space as the original raw data, about 1 TB, doubling to 2 TB when including calibrated Level-1 files.

Products are hosted on servers at SwRI and will be archived for long-term accessibility in community repositories like the NASA Solar Data Analysis Center (SDAC).

We make data products available under an Attribution 4.0 International (CC BY 4.0) license, which allows open use of the data for any purpose with appropriate attribution. For TSE 2023, raw products will be made available by July 2023, with calibrated products expected to be released by March 2024 (contingent upon appropriate funding). For the 2024 TSE, products will be made available after six months, including one month for data aggregation from individual sites, two months for calibration and integration, and an additional three months for initial analysis by the science team and affiliated students. We acknowledge and accommodate data sovereignty as appropriate.

2. Data processing
Data reduction for CATE 2024 follows standard image processing procedures, to be developed and refined as a result of the Phase 1 activities. We leverage the lessons learned from CATE 2017 and adapt a calibration procedure following Penn et al. (2020), which can be straightforwardly updated to accommodate the polarization data we acquire in CATE 2024. Data calibration and analysis tools are being developed in Python and are hosted in an open source github repository. Software development and documentation follows the standards of Python in the Heliophysics Community (PyHC). Data products are self-documented via internal metadata and embedded
links/DOIs to additional documentation on Zenodo. Publications funded by this effort will follow the latest guidelines for public data release, including the release of code to reproduce analyses from one of the available levels of science data. Where possible, these will be included with published papers via the journal or appropriate preprint archive. Otherwise they will instead be made available on a CATE website at SwRI.

3. Roles and responsibilities of team members for data management
For the 2023 TSE, initial data acquisition and backup is the responsibility of the project leads at the observing site in Australia. For the 2024 TSE, initial data acquisition and backup is the responsibility of individual site leads at each observing station. Site teams mail data to the CATE project via the removable backup media and retain backups locally for redundancy. Aggregation, calibration, and archiving is overseen by the PI and Project Manager. Software development is shared by the CATE instrument team and coordinated by the PM. Specific collection and dissemination procedures will be further developed during preparation for TSE 2023.