-
1. Re: RHQ Storage; Compaction failure
john.sanda Nov 6, 2013 1:43 AM (in response to genman)Can you search rhq-storage.log for rhq-six_hour_metrics-ic-1-Data.db to see what happened with the file prior to the error?
-
2. Re: Re: RHQ Storage; Compaction failure
genman Nov 6, 2013 10:52 AM (in response to john.sanda)Looks like the data was getting streamed. Maybe after that it was deleted? The cleanup was the job that failed.
INFO [Streaming to /17.172.21.186:371] 2013-11-06 02:39:57,730 StreamReplyVerbHandler.java (line 44) Successfully sent /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-1-Data.db to /17.172.21.186 INFO [AntiEntropyStage:1] 2013-11-06 02:40:43,352 StreamOut.java (line 184) Stream context metadata [/data06/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-1-Data.db sections=87 progress=0/138804 - 0%, /data03/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-88-Data.db sections=282 progress=0/134589 - 0%, /data02/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-78-Data.db sections=282 progress=0/509733 - 0%, /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Data.db sections=271 progress=0/1220679 - 0%, /data03/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-7-Data.db sections=117 progress=0/2133486 - 0%, /data04/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-83-Data.db sections=281 progress=0/134118 - 0%, /data06/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Data.db sections=271 progress=0/1220679 - 0%, /data03/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-411-Data.db sections=2 progress=0/354 - 0%, /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-1-Data.db sections=87 progress=0/138804 - 0%], 13 sstables. INFO [Streaming to /17.172.21.186:377] 2013-11-06 02:40:44,180 StreamReplyVerbHandler.java (line 44) Successfully sent /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-1-Data.db to /17.172.21.186 INFO [AntiEntropyStage:1] 2013-11-06 02:41:25,988 StreamOut.java (line 184) Stream context metadata [/data06/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-1-Data.db sections=506 progress=0/163692 - 0%, /data04/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-83-Data.db sections=515 progress=0/244773 - 0%, /data06/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Data.db sections=479 progress=0/4496757 - 0%, /data03/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-88-Data.db sections=519 progress=0/246657 - 0%, /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Data.db sections=479 progress=0/4496757 - 0%, /data02/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-78-Data.db sections=520 progress=0/926121 - 0%, /data03/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-7-Data.db sections=245 progress=0/4379049 - 0%, /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-1-Data.db sections=506 progress=0/163692 - 0%, /data03/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-411-Data.db sections=6 progress=0/1062 - 0%], 12 sstables. INFO [Streaming to /17.172.21.187:196] 2013-11-06 02:41:26,510 StreamReplyVerbHandler.java (line 44) Successfully sent /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-1-Data.db to /17.172.21.187 INFO [CompactionExecutor:3945] 2013-11-06 03:48:36,689 CompactionManager.java (line 587) Cleaning up SSTableReader(path='/data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-1-Data.db') java.lang.RuntimeException: java.io.FileNotFoundException: /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-1-Data.db (No such file or directory) Caused by: java.io.FileNotFoundException: /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-1-Data.db (No such file or directory)
-
3. Re: RHQ Storage; Compaction failure
genman Nov 6, 2013 1:44 PM (in response to genman)May be a case of https://issues.apache.org/jira/browse/CASSANDRA-4857
The mailing list suggests a create-drop-create sequence may have happened. Is it possible that RHQ would have dropped my existing data, even if by mistake?
I have been seeing fairly quirky behavior where some queries are turning up no data, but after a few tries data comes back.
Is there a way to basically 'refresh' a node, meaning rebuild the data directory area from scratch?
-
4. Re: RHQ Storage; Compaction failure
john.sanda Nov 6, 2013 2:39 PM (in response to genman)Elias Ross wrote:
Is it possible that RHQ would have dropped my existing data, even if by mistake?
If you mean dropped as in dropped the keyspace as described in CASSANDRA-4857, then no, that would not happen. It is possible however for a replica to miss data. If for example you have 3 replicas for a given key (i.e., schedule id) and on of the replicas goes down when data is written for that key, then that node will be inconsistent when it comes back up.
On the node where the cleanup error occurs, try running,
nodetool -p 7299 scrub rhq
That will rebuild the data files and should remove anything that is broken.
-
5. Re: RHQ Storage; Compaction failure
genman Nov 6, 2013 9:43 PM (in response to john.sanda)$ ./nodetool -p 7299 scrub rhq
Exception in thread "main" java.lang.RuntimeException: Tried to create duplicate hard link to /data05/rhq/data/rhq/six_hour_metrics/snapshots/pre-scrub-1383787540489/rhq-six_hour_metrics-ic-6-Summary.db
No such luck. Is it possible to simply rm -rf it all and do the scrub?
-
6. Re: RHQ Storage; Compaction failure
john.sanda Nov 7, 2013 8:02 AM (in response to genman)There is an offline scrub that you can try.
- Shut down the node.
- cd <rhq-server-home>/rhq-storage/bin
- ./sstablescrub rhq six_hour_metrics
- restart storage node
- ./nodetool -p 7299 repair -pr rhq six_hour_metrics
If that does not work you can try an rm -rf approach. Here is how I would do it.
- nodetool -p 7299 disablebinary
- nodetool -p 7299 flush rhq six_hour_metrics
- on each of the other nodes in the cluster run, nodetool -p 7299 repair rhq
- Shut down the node
- rm -rf <rhq-data-dir>/data/rhq/six_hour_metrics
- restart the node
- nodetool -p 7299 repair -pr rhq
-
7. Re: RHQ Storage; Compaction failure
mazz Nov 7, 2013 8:04 AM (in response to john.sanda)John - that smells like a good FAQ entry - hint hint
-
8. Re: RHQ Storage; Compaction failure
genman Nov 7, 2013 5:21 PM (in response to john.sanda)Thanks, it seems to be ignoring errors and powering through, which I like. I was thinking I might have to patch the server to keep grinding through.
-
9. Re: RHQ Storage; Compaction failure
genman Nov 8, 2013 11:11 PM (in response to genman)The issue was I had somehow symlinked two of the data directories to the same physical drive. User error. Luckily only took me a week to figure out.
-
10. Re: RHQ Storage; Compaction failure
jayshaughnessy Nov 9, 2013 12:45 PM (in response to genman)Elias, ugh. Thanks for following up, I'm sure John will appreciate it when he sees it. If nothing else he came up with some potential scrubbing FAQ entry.