Data processing is the phase of a project in which data are converted into the desired format and prepared for analysis.
What characteristics must data formats have to ensure long-term usability?
Depending on the phase in the data lifecycle, the requirements for formats vary.
Data processing
In this phase, formats are required that allow simple and efficient editing. Editable but sometimes proprietary formats are often used, as they offer specific functions and tools that facilitate work. Examples include formats such as Microsoft Excel (.xlsx) or certain database formats.
Reuse
For the reuse of data, for example by other researchers or in future projects, editable formats are needed that are preferably non-proprietary to ensure long-term readability. Open formats such as CSV (.csv) or OpenDocument (.ods) enable greater interoperability and independence from specific software, improving the long-term usability and exchangeability of data.
Archiving
For long-term archiving, it is particularly important to choose formats suitable for permanent storage. These formats should preferably be non-proprietary to ensure that data remain accessible in the future, regardless of specific software providers. Suitable formats for long-term archiving include PDF/A, XML, or TIFF, which are specifically designed for long-term preservation.
It can be advisable to store and publish data in multiple formats.
Which data formats are recommended?
File formats that are suitable for data processing are not necessarily appropriate for long-term archiving.
| Media Type | Editing & Saving | Archiving |
| Text | MS Word
Open/Libre Office MS Power Point |
PDF A/1-b |
| Table | Excel | CSV |
| Image | HPEG
PNG GIF |
TIFF
PDF A/1-b |
| Audio/Video Material | Windows Player | MPEG-4_AVC
WAV |
Archivable File Formats (ETH Digital Curation Office) https://documentation.library.ethz.ch/display/DD/Archivtaugliche+Dateiformate
Catalogue of Archival File Formats (KOST): kost-ceco | Dateiformate (KaD) | Katalog archivischer Dateiformate
