Jikes RVM GSOC 2011: Version 1 SortCommutative RegisterUses Multithreading 2x speed up achieved

To help create a starting point for my work I started with profiling the execution times of each optimization in Simple.java[1] as seen below, and I started mutlithreading SortCommutativeRegisterUses as seen below and currently it is in a Version 1 state in terms of performance and it sees a 2x speedup for a certain method size.

Optimizations (in Simple.java)

// Compute defList, useList, useCount fields for each register.
DefUse.computeDU(ir);
// Recompute isSSA flags
DefUse.recomputeSSA(ir);
/ Simple copy propagation.
// This pass incrementally updates the register list.
copyPropagation(ir);
// Simple type propagation.
    // This pass uses the register list, but doesn't modify it.
   if (typeProp) {
     typePropagation(ir);
    }
// Perform simple bounds-check and arraylength elimination.
    // This pass incrementally updates the register list
    if (foldChecks) {
      arrayPropagation(ir);
    }
// Simple dead code elimination.
// This pass incrementally updates the register list
eliminateDeadInstructions(ir);
// constant folding
    // This pass usually doesn't modify the DU, but
    // if it does it will recompute it.
    foldConstants(ir);
    // Simple local expression folding respecting DU
    if (ir.options.LOCAL_EXPRESSION_FOLDING && ExpressionFolding.performLocal(ir)) {
      // constant folding again
      foldConstants(ir);
    }
// Try to remove conditional branches with constant operands
    // If it actually constant folds a branch,
    // this pass will recompute the DU
    if (foldBranches) {
      simplifyConstantBranches(ir);
    }
// Should we sort commutative use operand
if (parallel) {
    if (sortRegisters) {
      parallelsortCommutativeRegisterUses(ir);
    }
} else {
sortCommutativeRegisterUses(ir);
}

Algorithm 1:

for all threads do // i = 0, i =1, i=3, i=4, etc....

N = NumberofInstructions; //number of Instructions for Current Method

for (j = ((N/4)*(i) + 1); j < ((N/4)*(i+1) + 1); j++) {

ExecuteOptimizationCode();

}

end

What Does this Mean??????

Here is an explanation

In our parallel sortCommutativeRegisterUses we need chop our work up into N sub problems where N = NumofThreads

For example lets say the user specified 4 threads in a command Line argument

We need to take each methodSize/4 for each thread.

EG. Method “getValue” has 100 Instructions

So we have:

Thread 0 : Iterate on Instruction number (100/4 * (0) + 1) increment by 1 where Instruction number is less than (100./4 * (1) + 1)

which is saying for(i = 1; i < 26; i++) in plain English

Thread 1 : Iterate on Instruction number (100/4 * (1) + 1) increment by 1 where Instruction number is less than (100./4 * (2) + 1)

which is saying for(i = 26; i < 51; i++) in plain English

Thread 2 : Iterate on Instruction number (100/4 * (2) + 1) increment by 1 where Instruction number is less than (100./4 * (3) + 1)

which is saying for(i = 51; i < 76; i++) in plain English

Thread 3 : Iterate on Instruction number (100/4 * (3) + 1) increment by 1 where Instruction number is less than (100./4 * (4) + 1)

which is saying for(i = 76; i < 101; i++) in plain English

IN Code:

public static Runnable parallelsortCommutativeRegisterUsesrunPass(final int threadId, final IR ir) {

return new Runnable() {

// Pass over instructions

public void run() {

int nInstructions = ir.numberInstructions(); //Total number Instructions in current Method

int i = (int)(((double)nInstructions/4)*((double)threadId) + 1); //Calculate starting position in each Thread

//starts with floating point then casts back down to Integer

int end = (int)(((double)nInstructions/4)*((double)threadId + 1) + 1);//Calculate ending position in each Thread

//starts with floating point then casts back down to Integer

for (Enumeration<Instruction> e = ir.forwardInstrEnumerator(i); i < end; i++) {

Instruction s = e.nextElement();

// Sort most frequently defined operands onto lhs

if (Binary.conforms(s) && s.operator.isCommutative() &&

Binary.getVal1(s).isRegister() && Binary.getVal2(s).isRegister()) {

RegisterOperand rop1 = Binary.getVal1(s).asRegister();

RegisterOperand rop2 = Binary.getVal2(s).asRegister();

// Simple SSA based test

if (rop1.register.isSSA()) {

if (rop2.register.isSSA()) {

// ordering is arbitrary, ignore

} else {

// swap

Binary.setVal1(s, rop2);

Binary.setVal2(s, rop1);

}

} else if (rop2.register.isSSA()) {

// already have prefered ordering

} else {

// neither registers are SSA so place registers used more on the RHS

// (we don't have easy access to a count of the number of definitions)

if (rop1.register.useCount > rop2.register.useCount) {

// swap

Binary.setVal1(s, rop2);

Binary.setVal2(s, rop1);

}

};

}

/**

* Parallel Sort commutative use operands so that those defined most are on the lhs

* @param ir the IR to work on

private static void parallelsortCommutativeRegisterUses(IR ir) {

OptCompilerThread thread = new OptCompilerThread(4);

for (int i = 0; i < 4; i++) {

try {

thread.execute(parallelsortCommutativeRegisterUsesrunPass(i,ir));

} catch (InterruptedException e) {

e.printStackTrace();

}

thread.shutdown();

}

Jikes RVM GSOC 2011

Sunday, July 3, 2011

Version 1 SortCommutative RegisterUses Multithreading 2x speed up achieved

No comments:

Post a Comment