Hazelcast plugin failing to wait for timeout on cluster tasks (possible cause)

Hi! I've been dealing with the same problem as other, with Hazelcast reporting that it failed to execute cluster tasks in the set timeout period. As others, I have also seen that the expected timeout time had indeed not elapsed, and that sometimes the cluster nodes appear as not available from the clustering page in the admin console. Finally, I have also seen messages being lost and synchronization on clients connected to servers sometimes failing, and I am assuming that this last part is just a symptom of the same problem.

A colleague of mine spotted this in the hazelcast plugin source code.

Class: ClusteredCacheFactory

public Collection<Object> doSynchronousClusterTask(ClusterTask task, boolean includeLocalMember) {    if (cluster == null) { return Collections.emptyList(); }    Set<Member> members = new HashSet<Member>();    Member current = cluster.getLocalMember();    for(Member member : cluster.getMembers()) {        if (includeLocalMember || (!member.getUuid().equals(current.getUuid()))) {            members.add(member);        }    }    Collection<Object> result = new ArrayList<Object>();    if (members.size() > 0) {        // Asynchronously execute the task on the other cluster members        try {            logger.debug("Executing MultiTask: " + task.getClass().getName());            Map<Member, Future<Object>> futures = hazelcast.getExecutorService(HAZELCAST_EXECUTOR_SERVICE_NAME)                .submitToMembers(new CallableTask<Object>(task), members);            long nanosLeft = TimeUnit.SECONDS.toNanos(MAX_CLUSTER_EXECUTION_TIME*members.size());            for (Future<Object> future : futures.values()) {                long start = System.nanoTime();                result.add(future.get(nanosLeft, TimeUnit.NANOSECONDS));                nanosLeft = (System.nanoTime() - start);            }        } catch (TimeoutException te) {            logger.error("Failed to execute cluster task within " + MAX_CLUSTER_EXECUTION_TIME + " seconds", te);        } catch (Exception e) {            logger.error("Failed to execute cluster task", e);        }    } else {        logger.warn("No cluster members selected for cluster task " + task.getClass().getName());    }    return result;
}

This is the implementation that several cluster tasks will use to be executed across the cluster itself.

Pay attention to lines 17 and 21 of the snippet I pasted above. The nanosLeft variable is supposed to keep track of how many nanoseconds are missing until the timeout period expired, using that for the timeout value of the future result (line 20), but on line 21, the current value of nanosLeft is not considered at all. The new value for the time left is going to be System.nanoTime() - start, which basically is how much time the last future value took to get (lines 19, 21).

At this point, I understand that having multiple nodes may have some tasks failing, and with a higher number of nodes, the possibility of these tasks failing are higher.

My setup has two nodes, and the tasks sometimes fail and sometimes doesn't. Considering what I just described, I think that tasks succeed when the second member can execute the task in a lower time than the first node, and the tasks fail otherwise.

My question for the Openfire community is: is the behavior I'm describing correct? Have we detected a bug in the Hazelcast clustering plugin?

Note that the hazelcast clustering plugin has been published with these changes about 7 months ago (according to GitHub), and I think that if this has been the case, the community would have been in an uproar for an implementation that fails 50% of the time. So, I believe that I may be missing part of the picture.

Hazelcast plugin failing to wait for timeout on cluster tasks (possible cause)

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112