4 Replies Latest reply on Aug 3, 2010 1:38 PM by clebert.suconic

Two low hanging fruit improvements on persistence

clebert.suconic Jul 30, 2010 1:04 PM

There is a few improvements we could make that would improve performance on persistence:

I - The delete/sync stuff that I already made a post about it: http://community.jboss.org/thread/154733?tstart=0

Just explaining it better, say:

- you have a producer, sending messages TX with batches of 100 messages on each commit.

- you have a consumer, ACKing batch of 100 messages each commit also.

Say the disk is capable of doing 200 syncs / second. You will be able to send and ACK up to 20K messages / second (200 syncs * 100 messages per second).

However, you we will be able to delete 200 messages / second.

As a result, the journal will be creating more files (a lot more) files than what it would be needed. We would release the journal file for reuse much earlier what would increase throughput of persistent messages.

II - Use of DuplicateIDs on Paging.

We should send a single ID per page read. Currently we send several (one for each transaction).

This is making the journal to compact *very* frequently.

Actually, I'm thinking about changing the Page code to use the DuplciateCache directly, inject an ID and delete the ID as soon as the page is deleted. (I would need to add a delete operation on duplicateCache, but that's an easy change). That would also avoid fragmentation of the journal just because of paging.

(We could still have fragmentation of course if the user chooses to use DuplicateID, but then it would be acceptable IMO).

I will be making these changes unless anyone sees anything against any of those.

1. Re: Two low hanging fruit improvements on persistence

timfox Aug 3, 2010 12:31 PM (in response to clebert.suconic)

Clebert Suconic wrote:

There is a few improvements we could make that would improve performance on persistence:

I - The delete/sync stuff that I already made a post about it: http://community.jboss.org/thread/154733?tstart=0

Just explaining it better, say:

- you have a producer, sending messages TX with batches of 100 messages on each commit.
- you have a consumer, ACKing batch of 100 messages each commit also.

Say the disk is capable of doing 200 syncs / second. You will be able to send and ACK up to 20K messages / second (200 syncs * 100 messages per second).

However, you we will be able to delete 200 messages / second.

As a result, the journal will be creating more files (a lot more) files than what it would be needed. We would release the journal file for reuse much earlier what would increase throughput of persistent messages.

Deleting without sync is fine, but there's always the possibility that the server crashed between ack and delete.

On journal recovery you'll need to delete messages that are completely acked but not deleted or you'll have messages hanging around in the journal forever.
Actions
2. Re: Two low hanging fruit improvements on persistence

clebert.suconic Aug 3, 2010 12:55 PM (in response to timfox)

On journal recovery you'll need to delete messages that are completely acked but not deleted or you'll have messages hanging around in the journal forever.

Yes, but we would need to that anyway.

When you sync at delete on post-ack you are actually increasing the chances of losing deletes.

Say, you ACKed 10K messages. The postACK will start working on performing those deletes.. Since it's syncing... the system will need about 40 seconds to complete each delete (considering a disk capable of 250 physical writes / second). If you crash before the completion you will lose all the deletes that are waiting a stop on the executor that is called after the completion on the TimedBuffer / AIO.

So, you can either chose to lose by sync not performed in time, or the delete not being executed due to the system capacity.

It's already done on my workspace BTW. I'm just writing tests for it, then I will commit it.
Actions
3. Re: Two low hanging fruit improvements on persistence

timfox Aug 3, 2010 1:11 PM (in response to clebert.suconic)

Clebert Suconic wrote:

On journal recovery you'll need to delete messages that are completely acked but not deleted or you'll have messages hanging around in the journal forever.

Yes, but we would need to that anyway.

Sure, I was just detailing what would need to be done to ensure the task is completed properly.
Actions
4. Re: Two low hanging fruit improvements on persistence

clebert.suconic Aug 3, 2010 1:38 PM (in response to timfox)

Ah, ok...

BTW: I did a few performance tests with sync=false. I had a single producer / consumer with commit batch = 1000 messages per TX. Having sync=false increased the throughput in about 10%.

I'm about to commit it and close the task.

Also,

I still want to change the use of DuplicateID on Paging as I said earlier.

I would:
- have depage "talking" with DuplicateCache directly,
- Whenever the file is already deleted, I would delete the cached duplicateID
- I will also have only one duplicateID per depaged transaction

with this I would optimize the use of compacting of depage. Increasing throughput after depaging.
Actions

Go to original post