Separate Measurement Of Ingest Parts

Setup

  • 50.000 digital objects representing a random sample of the full repository
  • Java 6 (1.6.0_04, 32 bit)
  • Tomcat 5.5
  • Fedora Commons 3.0b1 (standard installation). The results also apply to Fedora Commons 3.0b2 and 3.0
  • MPT Triplestore (same host)
  • Ingest from same machine
  • Buffered remote logging via SimpleSocketServer and Logging-Singleton
  • Empty repository before start

After breaking down the ingest conceptually here, measurement points were placed in the respective areas of the code. The image below shows five different measurement points. They represent the most expensive operations during the ingest. The red line shows ingest times measured in the client code. The green line is the first point within the server code, namely the SOAP Endpoint of the API-M. Hence, the difference between the red and the green line is the time taken by SOAP. In this test run the client was on the same host as the server, so there was no additional network latency widening the gap between both lines. The green line represents the total ingest time for the server, the blue, cyan and lavender lines are individual parts of the ingest so that their sum total equals the green line. Each of these three lines is broken down individually below.

The following image (Ingest Writer Breakdown) is a breakdown of the blue line from the previous image. It is the time it takes to obtain the ingest writer. The red line is the same as the blue line in the image above. It is broken down into five major distinct parts:

  • XML Validation: time it takes to validate the digital object against XSD and Schematron.
  • Rels-Ext Validation: as there is no rels-ext in the test data, nothing was recorded
  • getPid: this is the time it takes for the object to obtain a PID provided it does not already have one (see here for details).
  • RegisterObject: here the object is registered in the database
  • Set DS Props: time it takes to set the properties of the datastreams. This part is negligible.

Essentially the ingest writer part consists of two most time-consuming factors: getPid and register object.

Out of interest how the two XML validation processes add up two additional measurement points were introduced. The red line in this image is the same as the green line in the image above. XML validation takes about 5ms total. Schematron validation takes about 3ms, XSD 2ms.

The next image below shows the breakdown of the cyan line of the first image, labeled "commit". This is the time it takes for the digital object to get persisted. Again, the red line is equal to the cyan line of the first image and is broken down to the most expensive parts, namely:

  • Retrieval of managed content: this is the major part of the commit process.
  • Resource Index: the time it takes to persist triples to the triplestore (mpt in this test run)
  • Permanent Store: again a costly IO-bound process. Here the digital object gets written to the underlying storage.
  • Registry Add: increment version for this particular PID.
  • Field Search: indexing and storage of DC fields for the API-A search methods.

Conclusion

This measurement breaks down the individual parts of the ingest. From this test run the following can be concluded:
  • Database- and filesystem based operations are most expensive, namely the retrieval of managed content, object registry, storage of objects in the file system, etc.
  • XML validation, SOAP, serialization and deserialization of digital objects and runtime costs of most of the Fedora code are negligible in this context.
=> In order to improve performance, efforts should be based on reducing/tuning IO and improving database performance.

Note: Another identical testrun was conducted without the measurement points in order to investigate if those measurement points influence the outcome. Therefore only the times total for the ingests were measured. There was no noticable difference.

Add new attachment

In order to upload a new attachment to this page, please use the following box to find the file, then click on “Upload”.

List of attachments

Kind Attachment Name Size Version Date Modified Author Change note
png
fedrep5_DefaultDOManager.commi... 33.484 kB 1 Fri Jun 13 12:01:17 CEST 2008 KST
png
fedrep5_DefaultDOManager.commi... 12.084 kB 1 Fri Jun 13 12:01:35 CEST 2008 KST
png
fedrep5_DefaultDOManager.getIn... 25.962 kB 1 Wed May 14 18:58:18 CEST 2008 KST
png
fedrep5_DefaultDOManager.getIn... 10.017 kB 1 Wed May 14 18:54:00 CEST 2008 KST
png
fedrep5_DefaultDOManager.getIn... 15.722 kB 1 Wed May 14 18:58:22 CEST 2008 KST
png
fedrep5_DefaultDOManager.getIn... 5.971 kB 1 Wed May 14 18:53:56 CEST 2008 KST
png
fedrep5_serverFedoraAPIMBindin... 29.901 kB 1 Wed May 14 18:58:24 CEST 2008 KST
png
fedrep5_serverFedoraAPIMBindin... 10.981 kB 1 Wed May 14 18:53:48 CEST 2008 KST
« This page (revision-5) was last changed on 17-Jul-2012 15:53 by KST [RSS]