Hi
We've recently introduced disk overflow for our gemfire cluster and I'm at a loss right now as to how our overflow disk consumption has gotten so high.
Environment :-
Gemfire 6.5.1.42
Read Hat Linux Enterprise (Linux 2.6.18-164.10.1.el5 #1 SMP Wed Dec 30 18:35:28 EST 2009 x86_64 x86_64 x86_64 GNU/Linux)
Java 1.6.0_24=b07
Cluster :-
10 JVMs running at 2.5gb
Gemfire Config :-
<disk-store name="diRegionDiskStore" allow-force-compaction="true" compaction-threshold="25">
<resource-manager critical-heap-percentage="90" eviction-heap-percentage="80"/>
Scenario :-
With the eviction settings we have roughly 1.7gb of memory before we should start overflowing to disk.
We've noticed that our overflow per node is using around 3gb, so we have 3 crf files at about 1gb each. We don't think this is possible based on our current data loads. Our JVMs are at the eviction level pretty much constantly so that means we have 1.7gb in memory and 3gb on disk for each node.
For example...
-rw-r--r-- 1 xxxxx xxxxx 2048 Jan 25 13:21 BACKUPdiRegionDiskStore.if
-rw-r--r-- 1 xxxxx xxxxx 0 Mar 27 09:01 DRLK_IFdiRegionDiskStore.lk
-rw-rw-r-- 1 xxxxx xxxxx 1073741824 Apr 4 11:43 OVERFLOWdiRegionDiskStore_12.crf
-rw-rw-r-- 1 xxxxx xxxxx 1073741824 Apr 4 16:22 OVERFLOWdiRegionDiskStore_17.crf
-rw-rw-r-- 1 xxxxx xxxxx 1073741824 Mar 28 22:03 OVERFLOWdiRegionDiskStore_1.crf
We've tried running on-line compaction and off-line individual compation but the crf files do not shrink.
In fact when we try to run off-line compaction we get the following error....
1020 xxx@xxx bin> ./gemfire -debug compact-disk-store diRegionDiskStore /local/0/sw/xxxxx/overflow/ldnuat/diRegion/data-server10
ERROR: Operation "compact-disk-store" failed because: disk-store=diRegionDiskStore: java.lang.NullPointerException.
com.gemstone.gemfire.GemFireIOException: disk-store=diRegionDiskStore: java.lang.NullPointerException
at com.gemstone.gemfire.internal.SystemAdmin.compactDiskStore(SystemAdmin.java:404)
at com.gemstone.gemfire.internal.SystemAdmin.invoke(SystemAdmin.java:1965)
at com.gemstone.gemfire.internal.SystemAdmin.main(SystemAdmin.java:1772)
Caused by: java.lang.NullPointerException
at com.gemstone.gemfire.internal.cache.DiskStoreImpl.offlineCompact(DiskStoreImpl.java:4109)
at com.gemstone.gemfire.internal.cache.DiskStoreImpl.offlineCompact(DiskStoreImpl.java:4416)
at com.gemstone.gemfire.internal.SystemAdmin.compactDiskStore(SystemAdmin.java:402)
Question :-
We do not believe that the overflow file consumption is an accurate representation of what our actual data load is. Also our production machines have very small disk sizes 78gb. We're currently consuming 30gb of disk and we're finding this is continuing to rise. It has caused production issues for us previously when overflow has completely exhausted our disk.
We believe we may be cycling data into overflow prematurly due to garbage in the old generation pushing us above eviction, however even if this is the case and eventualy every value is overflowed this should still not result in 3gb and rising overflow files.