Database Servers

The ingest consists of many small, mostly write-based transactions. This scenario can be optimized fairly well as most major RDBMS allow special tuning for this kind of situation. For a detailed overview of Fedora's underlying data model and the composition of SQL transactions regarding the ingest, see this page.

Postgresql

The image below shows several superimposed test runs: one test run without any tuning (red line), one testrun with fsync=off (purple), one test run with asynchronous commit with a database running on the same host and disk (green) and one test run with asynchronous commit, using a dedicated host as database system (blue).

The performance benefit is essentially a result of a markedly decreased IO activity. Turning off fsync leaves the choice about when to persist data to the buffer mechanisms of the operating system instead of explicitly controlling it. This is however a very dangerous tuning strategy because it turns off the Write Ahead Log. Hence, in the event of a system crash, data may be unrecoverably lost.

Asynchronous commit leaves the WAL intact, will however prevent the database writer waiting for the result of a commit to be synchronized with the disk. There is also a small risk involved but it is generally not as severe as with fsync=off because the database can always be restored to a consistent state afterwards using the WAL.

The test runs were conducted using the following setup:

  • Java 6 (1.6.0_04-b12) 64bit, tuned as described here
  • Fedora 3.0b1
  • Tomcat 5.5
  • approx. 50,000 digital objects, managed content
  • Triplestore MPT
  • Database Postgresql 8.3, remote or local depending on test

These are the test results:

Pg 8.3 untuned,mpt,db remote:

#total(ms)avg(ms)min(ms)max(ms)median
49,9515,728,308114471,492n/a

Pg 8.3 async,mpt,db local:

#total(ms)avg(ms)min(ms)max(ms)median
49,9512,108,80142142,75127

Pg 8.3 async,mpt,db remote:

#total(ms)avg(ms)min(ms)max(ms)median
49,9512,333,06046251,62646

Pg 8.3 fsync=off,mpt,db remote:

#total(ms)avg(ms)min(ms)max(ms)median
49,9512,176,25243241,05936

The test results essentially reveal:

  • Using asynchronous commit / turning off fsync (and full page write, not shown here) increases performance markedly. It shows that at least for the test data set the database is the major bottleneck for the ingest and therefore is definately worth tuning.
  • Running the database on the local host results in huge spikes, as a result of high IO activity though overall performance is slighly better. However, these spikes get bigger and occur more often over time, so the benefit evens out.
  • Turning off fsync does not have a significant benefit over asynchronous commit and should consequently be abandoned in favor of asynchronous commit.

The above test runs do only show a general overview of what is possible. No additional parameters were changed for these test runs as they are highly dependent on the environment and thus not generally applicable. However, to provide some additional help, the following table is shown which describes parameters that were found to generally have the highest impact on ingest times:

NameConfiguration ParameterImpact On PerformanceRisk Of Data LossComment
Asynchronous Commitsynchronous_commit=offhighlowsince postgres 8.3, window for failure = 3x WAL writer delay
Full Page Writesfull_page_writes=offhighhighonly advisable for filesystems that can handle "dirty pages" (e.g ufs,zfs)
fsync offfsync=offhighhighgenerally not advisable, use asynchronous commit instead.
Shared Buffersshared_buffersmedium-see this page
Effective Cache Sizeeffective_cache_sizemedium-see this page
Checkpoint Segmentscheckpoint_segmentsmedium-if WAL is active only. More checkpoint segments generally lessen disk write activity.
WAL Synchronization Methodwal_sync_methodlow-depends on capability of the operating system, usually pg selects the fastest option automatically

Additional Tips

  • As the image above and this test run indicate, it is generally advisable to run Postgres either on the same host using a separate hard disk or preferably on a dedicated machine with possibly at least two separate disks (one for the Database / OS, one for WAL). And raid 0+1 or raid 5 with a battery backed controller.
  • The database should be created using the POSIX locale, if another locale is chosen the LIKE operator will not use the index which in turn will have a negative impact on performance (test with: explain analyze select ... LIKE)
  • Turning off writeback and noatime for the WAL drive/partition will also improve performance. Note: this should only be done with caution.
  • In some situations it can be helpful to turn the indexes off during bulk loads. This strategy will not work in this case because the indexes are used during the ingest (get new PID, check if PID exists, etc.). If MPT is used, it might help (has not been tested yet).
  • Turning off auto-commit is also not a viable option because Fedora Commons currently does not support explicit transactions.

For comprehensive material on Postgres tuning see

MySQL

The image below shows a test run using the MyISAM storage engine, superimposed with two InnoDB test runs. Like Postgres, InnoDB can be tuned and will as a result perform equal or even better than MyISAM for our ingest scenario.

Test setup:

  • Java 6 (1.6.0_04-b12) 64bit, tuned as described here
  • Fedora 3.0b1
  • Tomcat 5.5
  • No Triplestore
  • 50,000 digital objects
  • Managed content
  • Database and Fedora Commons on same host

The MyISAM storage engine performed much better than the untuned InnoDB. However, MyISAM does lack several crucial features such as ACID compliance, row level locking etc. and might therefore not be considered a suitable database engine for Fedora Commons repositories.
InnoDB provides acid compliance, row level locking and multi-version concurrency control and might therefore be much more suited for Fedora Commons.

For the tuned InnoDB test run the following parameters were changed:

...
innodb_buffer_pool_size=1024M
innodb_flush_log_at_trx_commit=0
...

Setting the innodb_flush_log_at_trx_commit parameter to 0 (or 2) involves the risk of data loss in the event of a server/os crash. Therefore, as with Postgres, the use of this parameter should be considered carefully.

More on performance tuning see:

Conclusion

As already concluded here a huge part of the time it takes to ingest a digital object is consumed by database related operations. It is possible to tune most relational database engines to an extent that ingest times decrease by a factor of 2-3 (using described hardware). However, most tuning strategies involve certain risks of data loss and should therefore only applied after careful consideration. The risk of data loss can be reduced in certain circumstances by the use of suitable file systems and hardware like battery-backed raid controllers.

Note: this article pertains to Fedora Commons, v 3.0b1 and 3.0b2

Add new attachment

In order to upload a new attachment to this page, please use the following box to find the file, then click on “Upload”.

List of attachments

Kind Attachment Name Size Version Date Modified Author Change note
png
fedrep5_mysqlengines_avgs_1200... 21.118 kB 1 Fri Jul 04 10:47:49 CEST 2008 KST MySQL Storage Engines large
png
fedrep5_mysqlengines_avgs_600x... 9.106 kB 1 Fri Jul 04 10:47:25 CEST 2008 KST MySQL Storage Engines
png
fedrep5_postgrestuning_avgs_12... 28.617 kB 1 Thu Jun 19 14:34:51 CEST 2008 KST postgres tuning large
png
fedrep5_postgrestuning_avgs_60... 11.683 kB 1 Thu Jun 19 14:34:35 CEST 2008 KST postgres tuning
« This page (revision-5) was last changed on 04-Jul-2008 10:47 by KST [RSS]