To help create a starting point for my work I started with profiling the execution times of each optimization in Simple.java[1] as seen below, and I started mutlithreading SortCommutativeRegisterUses as seen below and currently it is in a Version 1 state in terms of performance and it sees a 2x speedup for a certain method size.
Optimizations (in Simple.java)
- // Compute defList, useList, useCount fields for each register.
DefUse.computeDU(ir); - // Recompute isSSA flags
DefUse.recomputeSSA(ir); - / Simple copy propagation.
// This pass incrementally updates the register list.
copyPropagation(ir); - // Simple type propagation.
// This pass uses the register list, but doesn't modify it.
if (typeProp) {
typePropagation(ir);
} - // Perform simple bounds-check and arraylength elimination.
// This pass incrementally updates the register list
if (foldChecks) {
arrayPropagation(ir);
} - // Simple dead code elimination.
// This pass incrementally updates the register list
eliminateDeadInstructions(ir); - // constant folding
// This pass usually doesn't modify the DU, but
// if it does it will recompute it.
foldConstants(ir);
// Simple local expression folding respecting DU
if (ir.options.LOCAL_EXPRESSION_FOLDING && ExpressionFolding.performLocal(ir)) {
// constant folding again
foldConstants(ir);
} - // Try to remove conditional branches with constant operands
// If it actually constant folds a branch,
// this pass will recompute the DU
if (foldBranches) {
simplifyConstantBranches(ir);
} - // Should we sort commutative use operand
if (parallel) {
if (sortRegisters) {
parallelsortCommutativeRegisterUses(ir);
}
} else {
sortCommutativeRegisterUses(ir);
}
Algorithm 1:
for all threads do // i = 0, i =1, i=3, i=4, etc....
N = NumberofInstructions; //number of Instructions for Current Method
for (j = ((N/4)*(i) + 1); j < ((N/4)*(i+1) + 1); j++) {
ExecuteOptimizationCode();
}
end
What Does this Mean??????
Here is an explanation
In our parallel sortCommutativeRegisterUses we need chop our work up into N sub problems where N = NumofThreads
For example lets say the user specified 4 threads in a command Line argument
We need to take each methodSize/4 for each thread.
EG. Method “getValue” has 100 Instructions
So we have:
Thread 0 : Iterate on Instruction number (100/4 * (0) + 1) increment by 1 where Instruction number is less than (100./4 * (1) + 1)
which is saying for(i = 1; i < 26; i++) in plain English
Thread 1 : Iterate on Instruction number (100/4 * (1) + 1) increment by 1 where Instruction number is less than (100./4 * (2) + 1)
which is saying for(i = 26; i < 51; i++) in plain English
Thread 2 : Iterate on Instruction number (100/4 * (2) + 1) increment by 1 where Instruction number is less than (100./4 * (3) + 1)
which is saying for(i = 51; i < 76; i++) in plain English
Thread 3 : Iterate on Instruction number (100/4 * (3) + 1) increment by 1 where Instruction number is less than (100./4 * (4) + 1)
which is saying for(i = 76; i < 101; i++) in plain English
IN Code:
public static Runnable parallelsortCommutativeRegisterUsesrunPass(final int threadId, final IR ir) {
return new Runnable() {
// Pass over instructions
public void run() {
int nInstructions = ir.numberInstructions(); //Total number Instructions in current Method
int i = (int)(((double)nInstructions/4)*((double)threadId) + 1); //Calculate starting position in each Thread
//starts with floating point then casts back down to Integer
int end = (int)(((double)nInstructions/4)*((double)threadId + 1) + 1);//Calculate ending position in each Thread
//starts with floating point then casts back down to Integer
for (Enumeration<Instruction> e = ir.forwardInstrEnumerator(i); i < end; i++) {
Instruction s = e.nextElement();
// Sort most frequently defined operands onto lhs
if (Binary.conforms(s) && s.operator.isCommutative() &&
Binary.getVal1(s).isRegister() && Binary.getVal2(s).isRegister()) {
RegisterOperand rop1 = Binary.getVal1(s).asRegister();
RegisterOperand rop2 = Binary.getVal2(s).asRegister();
// Simple SSA based test
if (rop1.register.isSSA()) {
if (rop2.register.isSSA()) {
// ordering is arbitrary, ignore
} else {
// swap
Binary.setVal1(s, rop2);
Binary.setVal2(s, rop1);
}
} else if (rop2.register.isSSA()) {
// already have prefered ordering
} else {
// neither registers are SSA so place registers used more on the RHS
// (we don't have easy access to a count of the number of definitions)
if (rop1.register.useCount > rop2.register.useCount) {
// swap
Binary.setVal1(s, rop2);
Binary.setVal2(s, rop1);
}
}
}
}
}
};
}
/**
* Parallel Sort commutative use operands so that those defined most are on the lhs
*
* @param ir the IR to work on
*/
private static void parallelsortCommutativeRegisterUses(IR ir) {
OptCompilerThread thread = new OptCompilerThread(4);
for (int i = 0; i < 4; i++) {
try {
thread.execute(parallelsortCommutativeRegisterUsesrunPass(i,ir));
} catch (InterruptedException e) {
e.printStackTrace();
}
}
thread.shutdown();
}
No comments:
Post a Comment