Fedora repository file system structure

Base path for datastreams and digital objects

The base path for datastreams and digital objects can be configured separately in fedora.cfcg:
 <param name="object_store_base" value="data/objects" isFilePath="true"/>
 <param name="datastream_store_base" value="data/datastreams" isFilePath="true"/>

File and directory encoding algorithm

The file system structure depends on the algorithm configured in the server configuration file (fedora.fcfg):
<param name="path_algorithm" value="fedora.server.storage.lowlevel.TimestampPathAlgorithm">
  <comment>The java class used to determine the path algorithm; 
	   default is fedora.server.storage.lowlevel.TimestampPathAlgorithm.
  </comment>
</param>
Currently, TimestampPathAlgorithm is the only implementation avaliable. It encodes file- and directory names based on the current date and time, using a left-padded format. The corresponding code snippet shows this approach:
public String format (String pid) throws LowlevelStorageException {
  GregorianCalendar calendar = new GregorianCalendar();
  String year = Integer.toString(calendar.get(Calendar.YEAR));
  String month = leftPadded(1+ calendar.get(Calendar.MONTH),2);
  String dayOfMonth = leftPadded(calendar.get(Calendar.DAY_OF_MONTH),2);
  String hourOfDay = leftPadded(calendar.get(Calendar.HOUR_OF_DAY),2);
  String minute = leftPadded(calendar.get(Calendar.MINUTE),2);
  return storeBase + SEP + year + SEP + month + dayOfMonth + SEP + hourOfDay +
			SEP + minute /*+ sep + second*/ + SEP + pid;
  }
  
  private final String leftPadded (int i, int n) throws LowlevelStorageException {
    if ((n > 3) || (n < 0) || (i < 0) || (i > 999)) {
	throw new LowlevelStorageException(true,getClass().getName() + ": faulty date padding");
    }
    int m = (i > 99) ? 3 : (i > 9) ? 2 : 1;
    int p = n - m;
    return PADDING[p] + Integer.toString(i);
  }
Suppose, an object has a PID value of escidoc:1234, current date and time are 2008-29-04 16:08, the storage base for digital objects is /usr/local/fedora/data, the following filename and path result (this example assumes a unix based operating system, but due to the platform independence of Java the same principle applies to other operating systems of course):
/usr/local/fedora/data/08/0429/16/08/escidoc:1234

The file naming convention for filenames of datastreams is slighly different, it adheres to the principle outlined in the following pseudo-code:

filename = object.getPid() + "+" + datastream.getId() + "+" + datastream.getVersionId();
The following XML fragment (PID escidoc:1234)
<foxml:datastream ID="escidocDs1" STATE="A" CONTROL_GROUP="X" VERSIONABLE="true">
  <foxml:datastreamVersion ID="escidocDsv1" LABEL="" CREATED="2008-05-20T15:36:51.441Z" MIMETYPE="text/xml">
  ...
  </foxml:datastreamVersion>
</foxml:datastream>
will result in the filename: escidoc:1234+escidocDs01+escidocDsv1

Conclusion

The filesystem storage structure of Fedora can be adapted to individual needs. Should there be the requirement to use a certain (different) directory structure, a corresponding algorithm would have to be implemented and configured in Fedora's server.fcfg. This could also be helpful in ingesting a repository using the Fedora rebuild mechanism. Files from an existing structure would not - or dependend on the datastream storage location only partially - have to be moved. This in turn might help reduce IO.

Add new attachment

In order to upload a new attachment to this page, please use the following box to find the file, then click on “Upload”.
« This page (revision-1) was last changed on 21-May-2008 12:09 by unknown [RSS]