Jikes RVM GSOC 2011: 2011

Saturday, August 20, 2011

Parallel Boot

Boot[1] – The main advantage to multi-threading the boot is because in the Jikes rvm the Boot code compiles around methods 100 every time. The beauty in this multi-threaded optimization is the speedup achieved by this optimization is independent of the benchmark or program the user is running. Unfortunately this optimization is still in a pre version 0 phase however I hope to polish it up in the near future as continued post GSOC work. Hopefully we can get speed up from the multithreading with no overhead from the threading.

Speed up graphs will be posted in the future

[1] http://jikesrvm.hg.sourceforge.net/hgweb/jikesrvm/soc2011/rev/1c6019198603

Pre Compile[1] – Pre Compile's core method that controls compilation is embarrassingly parallel because it uses a For loop to iterate over each Method. Currently due to time constraints of Google Summer the multi-threaded PreCompile method is in an non optimized version 0 state and is currently seeing overhead but the overhead is minimal because it is causing neither a speed up nor a slow down in other words the running time is the same. However, that can be easily fixed by sending groups of data onto a thread at once preferably 1/2 methods for thread one and 1/2 methods for thread two on a two thread system which is an optimization that was used to help improve multi-threaded Single-Method[2,3,4,5,6].

Command to run and generate advice file: ./rvm -X:aos:enable_advice_generation=true -cp dacapo-2006-10-MR2.jar Harness xalan

Command to run and precompile advice file with N threads: ./rvm -X:aos:enable_precompile=true -X:aos:pre_compile_threads=2 -X:aos:cafi=aosadvice.ca -cp dacapo-2006-10-MR2.jar Harness xalan

One task I am going to work on post GSOC is optimizing this code to achieve speed up and polishing up this code to make it easy for the jikes community to use this code if they chose to.

[1] http://jikesrvm.hg.sourceforge.net/hgweb/jikesrvm/soc2011/rev/e3397f4b5006

[2] http://jikesrvm.hg.sourceforge.net/hgweb/jikesrvm/soc2011/rev/a5e56291dbf0

[3] http://jikesrvm.hg.sourceforge.net/hgweb/jikesrvm/soc2011/rev/362d399161e5

[4]http://jikesrvm.hg.sourceforge.net/hgweb/jikesrvm/soc2011/rev/84029a70dce4

[5]http://jikesrvm.hg.sourceforge.net/hgweb/jikesrvm/soc2011/rev/066d79055dbc

[6] http://jikesrvm.hg.sourceforge.net/hgweb/jikesrvm/soc2011/rev/ab1c9ed8e2dd

Saturday, July 30, 2011

New Design MutliMethod Hot Method Recompilation

As part of my GSOC I proposed 3 parallel compilation strategies single-method, multi-method, multi-phase.

Single-method is done however it is only relevant for large methods and would easily be overshadowed by multi-phase in a match up. However, multi-method is relevant for all cases and even our worst case benchmarks show speedup.

Here are the results numerically

2 Threads
Best Case LU: 816.5870
AVG Case LU: 795.67

Original
Best Case LU : 805.98
AVG Case LU: 793.05

2 Threads
for I in {1..90}; do ./rvm -X:aos:num_threads=2 -cp scimark2lib.jar jnt.scimark2.commandline; done

Thursday, July 21, 2011

New and Improved Thread Pool

I took the liberty this weekend to improve my thread pool before I move onto the next parallel compiler optimization strategy. (Multi-Method and Multi-Phase) Due to time constraints this will most likely be my last major revision to the thread pool before the summer ends. However, do not be surprised if you still see me working on this project after the summer ends since I have a hard time letting projects go unfinished or unpolished.

Here is the inheritance hierarchy of my thread pool

Thread Pool URL Easier to read http://pastebin.com/EfqSuDqF

SystemThread OptExecutor OptQueue
||| ||| |||
OptCompilerThread <-->OptCompilerThreadPool<--> OptCompilerBlocking

My Next Blog Post will showcase this ThreadPools New speedups with the existing multithreaded Optimizations

Sunday, July 3, 2011

Version 1 SortCommutative RegisterUses Multithreading 2x speed up achieved

To help create a starting point for my work I started with profiling the execution times of each optimization in Simple.java[1] as seen below, and I started mutlithreading SortCommutativeRegisterUses as seen below and currently it is in a Version 1 state in terms of performance and it sees a 2x speedup for a certain method size.

Optimizations (in Simple.java)

// Compute defList, useList, useCount fields for each register.
DefUse.computeDU(ir);
// Recompute isSSA flags
DefUse.recomputeSSA(ir);
/ Simple copy propagation.
// This pass incrementally updates the register list.
copyPropagation(ir);
// Simple type propagation.
    // This pass uses the register list, but doesn't modify it.
   if (typeProp) {
     typePropagation(ir);
    }
// Perform simple bounds-check and arraylength elimination.
    // This pass incrementally updates the register list
    if (foldChecks) {
      arrayPropagation(ir);
    }
// Simple dead code elimination.
// This pass incrementally updates the register list
eliminateDeadInstructions(ir);
// constant folding
    // This pass usually doesn't modify the DU, but
    // if it does it will recompute it.
    foldConstants(ir);
    // Simple local expression folding respecting DU
    if (ir.options.LOCAL_EXPRESSION_FOLDING && ExpressionFolding.performLocal(ir)) {
      // constant folding again
      foldConstants(ir);
    }
// Try to remove conditional branches with constant operands
    // If it actually constant folds a branch,
    // this pass will recompute the DU
    if (foldBranches) {
      simplifyConstantBranches(ir);
    }
// Should we sort commutative use operand
if (parallel) {
    if (sortRegisters) {
      parallelsortCommutativeRegisterUses(ir);
    }
} else {
sortCommutativeRegisterUses(ir);
}

Algorithm 1:

for all threads do // i = 0, i =1, i=3, i=4, etc....

N = NumberofInstructions; //number of Instructions for Current Method

for (j = ((N/4)*(i) + 1); j < ((N/4)*(i+1) + 1); j++) {

ExecuteOptimizationCode();

}

end

What Does this Mean??????

Here is an explanation

In our parallel sortCommutativeRegisterUses we need chop our work up into N sub problems where N = NumofThreads

For example lets say the user specified 4 threads in a command Line argument

We need to take each methodSize/4 for each thread.

EG. Method “getValue” has 100 Instructions

So we have:

Thread 0 : Iterate on Instruction number (100/4 * (0) + 1) increment by 1 where Instruction number is less than (100./4 * (1) + 1)

which is saying for(i = 1; i < 26; i++) in plain English

Thread 1 : Iterate on Instruction number (100/4 * (1) + 1) increment by 1 where Instruction number is less than (100./4 * (2) + 1)

which is saying for(i = 26; i < 51; i++) in plain English

Thread 2 : Iterate on Instruction number (100/4 * (2) + 1) increment by 1 where Instruction number is less than (100./4 * (3) + 1)

which is saying for(i = 51; i < 76; i++) in plain English

Thread 3 : Iterate on Instruction number (100/4 * (3) + 1) increment by 1 where Instruction number is less than (100./4 * (4) + 1)

which is saying for(i = 76; i < 101; i++) in plain English

IN Code:

public static Runnable parallelsortCommutativeRegisterUsesrunPass(final int threadId, final IR ir) {

return new Runnable() {

// Pass over instructions

public void run() {

int nInstructions = ir.numberInstructions(); //Total number Instructions in current Method

int i = (int)(((double)nInstructions/4)*((double)threadId) + 1); //Calculate starting position in each Thread

//starts with floating point then casts back down to Integer

int end = (int)(((double)nInstructions/4)*((double)threadId + 1) + 1);//Calculate ending position in each Thread

//starts with floating point then casts back down to Integer

for (Enumeration<Instruction> e = ir.forwardInstrEnumerator(i); i < end; i++) {

Instruction s = e.nextElement();

// Sort most frequently defined operands onto lhs

if (Binary.conforms(s) && s.operator.isCommutative() &&

Binary.getVal1(s).isRegister() && Binary.getVal2(s).isRegister()) {

RegisterOperand rop1 = Binary.getVal1(s).asRegister();

RegisterOperand rop2 = Binary.getVal2(s).asRegister();

// Simple SSA based test

if (rop1.register.isSSA()) {

if (rop2.register.isSSA()) {

// ordering is arbitrary, ignore

} else {

// swap

Binary.setVal1(s, rop2);

Binary.setVal2(s, rop1);

}

} else if (rop2.register.isSSA()) {

// already have prefered ordering

} else {

// neither registers are SSA so place registers used more on the RHS

// (we don't have easy access to a count of the number of definitions)

if (rop1.register.useCount > rop2.register.useCount) {

// swap

Binary.setVal1(s, rop2);

Binary.setVal2(s, rop1);

}

};

}

/**

* Parallel Sort commutative use operands so that those defined most are on the lhs

* @param ir the IR to work on

private static void parallelsortCommutativeRegisterUses(IR ir) {

OptCompilerThread thread = new OptCompilerThread(4);

for (int i = 0; i < 4; i++) {

try {

thread.execute(parallelsortCommutativeRegisterUsesrunPass(i,ir));

} catch (InterruptedException e) {

e.printStackTrace();

}

thread.shutdown();

}

Thursday, June 16, 2011

Version 0 SortCommutativeRegisterUses Optimization Running Time / Speedups

To help create a starting point for my work I started with profiling the execution times of each optimization in Simple.java[1] as seen below,

and I started mutlithreading SortCommutativeRegisterUses as seen below.

Optimizations (in Simple.java)

// Compute defList, useList, useCount fields for each register.
DefUse.computeDU(ir);
// Recompute isSSA flags
DefUse.recomputeSSA(ir);
/ Simple copy propagation.
// This pass incrementally updates the register list.
copyPropagation(ir);
// Simple type propagation.
    // This pass uses the register list, but doesn't modify it.
   if (typeProp) {
     typePropagation(ir);
    }
// Perform simple bounds-check and arraylength elimination.
    // This pass incrementally updates the register list
    if (foldChecks) {
      arrayPropagation(ir);
    }
// Simple dead code elimination.
// This pass incrementally updates the register list
eliminateDeadInstructions(ir);
// constant folding
    // This pass usually doesn't modify the DU, but
    // if it does it will recompute it.
    foldConstants(ir);
    // Simple local expression folding respecting DU
    if (ir.options.LOCAL_EXPRESSION_FOLDING && ExpressionFolding.performLocal(ir)) {
      // constant folding again
      foldConstants(ir);
    }
// Try to remove conditional branches with constant operands
    // If it actually constant folds a branch,
    // this pass will recompute the DU
    if (foldBranches) {
      simplifyConstantBranches(ir);
    }
// Should we sort commutative use operands
    if (sortRegisters) {
      sortCommutativeRegisterUses(ir);
    }

In this code we multithread SortCommuativeRegisterUses optimization by forking a thread per Instruction clearly the granularity is to fine and we need to optimize the algorithm.

./rvm -X:aos:initial_compiler=opt -classpath dacapo-2006-10-MR2.jar Harness fop
Running Time

SpeedUps

Tuesday, June 14, 2011

ComputeDU original vs Mutlithreaded

To help create a starting point for my work I started with profiling the execution times of each optimization in Simple.java[1]

I started multi threading DefUse.computeDU

Optimizations (in Simple.java)

// Compute defList, useList, useCount fields for each register.
DefUse.computeDU(ir);
// Recompute isSSA flags
DefUse.recomputeSSA(ir);
/ Simple copy propagation.
// This pass incrementally updates the register list.
copyPropagation(ir);
// Simple type propagation.
    // This pass uses the register list, but doesn't modify it.
   if (typeProp) {
     typePropagation(ir);
    }
// Perform simple bounds-check and arraylength elimination.
    // This pass incrementally updates the register list
    if (foldChecks) {
      arrayPropagation(ir);
    }
// Simple dead code elimination.
// This pass incrementally updates the register list
eliminateDeadInstructions(ir);
// constant folding
    // This pass usually doesn't modify the DU, but
    // if it does it will recompute it.
    foldConstants(ir);
    // Simple local expression folding respecting DU
    if (ir.options.LOCAL_EXPRESSION_FOLDING && ExpressionFolding.performLocal(ir)) {
      // constant folding again
      foldConstants(ir);
    }
// Try to remove conditional branches with constant operands
    // If it actually constant folds a branch,
    // this pass will recompute the DU
    if (foldBranches) {
      simplifyConstantBranches(ir);
    }
// Should we sort commutative use operands
    if (sortRegisters) {
      sortCommutativeRegisterUses(ir);
    }

Our ComputeDU Optimization Mutlithread's on a Register Level Unfortunately that Granularity is way to fine and we see overhead from runnable objects/ threads

Although we are faced with a few cases where we actually benefit (kinda) see 3/4 transition

Original Execution Time: 108715 nano Seconds Instructs: 821 Method: deleteTree
Execution Time: 556187 nano Seconds Instructs: 821 Method: deleteTree 2 Threads
Execution Time: 24728977 nano Seconds Instructs: 821 Method: deleteTree 3 Threads
Execution Time: 14231161 nano Seconds Instructs: 821 Method: deleteTree 4 Threads

for I in {1..3}; do ./rvm -X:aos:initial_compiler=opt -classpath dacapo-2006-10-MR2.jar Harness fop; done

Graph Axes will be fixed and dimensions will be lined up

Thursday, May 26, 2011

RVM Thread Pool

package org.jikesrvm.compilers.opt;

import java.util.LinkedList;
import java.util.List;
import java.util.concurrent.ArrayBlockingQueue;

import org.jikesrvm.scheduler.SystemThread;
import org.vmmagic.pragma.NonMoving;

@NonMoving
public class OptCompilerThread
{
    private final int nThreads;
    private final PoolWorker[] threads;
    private final OptCompilerBlockingQueue queue;
    public OptCompilerThread(int nThreads)
    {
        this.nThreads = nThreads;
        queue = new OptCompilerBlockingQueue();
        threads = new PoolWorker[nThreads];

        for (int i=0; i<nThreads; i++) {
            threads[i] = new PoolWorker(i);
            threads[i].start();
        }
    }




    public void execute(Runnable r) throws InterruptedException {
        synchronized(queue) {


            queue.add(r);
            queue.notify();
        }
    }


    public synchronized void shutdown(){
        for(int i=0; i<nThreads; i++){
            threads[i].stop(new IllegalStateException("ThreadPool is stopped"));

        }
      }


    @NonMoving
    private class PoolWorker extends SystemThread {
    private final ArrayBlockingQueue<Runnable> handoffBox = new ArrayBlockingQueue<Runnable>(1);




        protected PoolWorker(int ThreadCountN) {
            super("CompilerThread" + ThreadCountN);
        }

        @Override
        public void run() {
            Runnable r;

            while (true) {
                synchronized(queue) {
                    while (queue.isEmpty()) {
                        try
                        {
                            queue.wait();

                        }
                        catch (InterruptedException ignored)
                        {

                        }
                    }

                    try {
                        r = (Runnable) queue.remove();
                        handoffBox.add(r);
                    } catch (InterruptedException e) {
                        e.printStackTrace();
                    }
                }

                synchronized(handoffBox)
                {
                    while(handoffBox.isEmpty()){
                        try
                        {

                            handoffBox.wait();

                        }
                        catch (InterruptedException ignored)
                        {
                        }
                    }
                    r = (Runnable) handoffBox.remove();


                }

                // If we don't catch RuntimeException,
                // the pool could leak threads
                try {
                    r.run();
                }
                catch (RuntimeException e) {
                    // You might want to log something here
                }






            }
        }
    }
}

class OptCompilerBlockingQueue {

    private List<Object> queue = new LinkedList<Object>();

    private int limit;
    private boolean Dynamic = false;


    public OptCompilerBlockingQueue(int limit){
        this.limit = limit;
    }

    public synchronized boolean isEmpty()
        {
            if (queue.isEmpty())
                return true;
            else
                return false;
        }

    public OptCompilerBlockingQueue()
    {
        Dynamic = true;
    }

    public synchronized void add(Object item)
    throws InterruptedException {
        if(Dynamic == false)
        {
            while(queue.size() == limit) {
                wait();
            }
        }
          queue.add(item);
    }

    public synchronized Object remove()
    throws InterruptedException{
        while(queue.size() == 0){
            wait();
            }
        return queue.remove(0);
    }
}
------------------------------------------------------------------------------------------------

package org.jikesrvm.compilers.opt;

import org.jikesrvm.compilers.opt.ir.Register;
import org.vmmagic.pragma.NonMoving;

@NonMoving
public class ParallelOptCompiler implements ParallelOptCompilerPass {



    public Runnable runPassOnSingleRegister(final Register reg) {
        return new Runnable(){

            public void run() {
                reg.putSSA((reg.defList != null && reg.defList.getNext() == null));
            }

        };

    }
}

-----------------------------------------------------------------------------------------

package org.jikesrvm.compilers.opt;

import org.jikesrvm.compilers.opt.ir.Register;

public interface ParallelOptCompilerPass {
    Runnable runPassOnSingleRegister(Register reg);
}

---------------------------------------------------------------------------------------

        OptCompilerThread thread = new OptCompilerThread(2);
        ParallelOptCompiler pass = new ParallelOptCompiler();
        for (Register reg = ir.regpool.getFirstSymbolicRegister(); reg != null; reg = reg.getNext()) {
            try {
                thread.execute(pass.runPassOnSingleRegister(reg));
            } catch (InterruptedException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        }

        thread.shutdown();

Monday, May 16, 2011

Non Multithreaded Vanilla Compiler Optimization Times

To help create a starting point for my work I started with profiling the execution times of each optimization in Simple.java[1]

Optimizations (in Simple.java)

// Compute defList, useList, useCount fields for each register.
DefUse.computeDU(ir);
// Recompute isSSA flags
DefUse.recomputeSSA(ir);
/ Simple copy propagation.
// This pass incrementally updates the register list.
copyPropagation(ir);
// Simple type propagation.
    // This pass uses the register list, but doesn't modify it.
   if (typeProp) {
     typePropagation(ir);
    }
// Perform simple bounds-check and arraylength elimination.
    // This pass incrementally updates the register list
    if (foldChecks) {
      arrayPropagation(ir);
    }
// Simple dead code elimination.
// This pass incrementally updates the register list
eliminateDeadInstructions(ir);
// constant folding
    // This pass usually doesn't modify the DU, but
    // if it does it will recompute it.
    foldConstants(ir);
    // Simple local expression folding respecting DU
    if (ir.options.LOCAL_EXPRESSION_FOLDING && ExpressionFolding.performLocal(ir)) {
      // constant folding again
      foldConstants(ir);
    }
// Try to remove conditional branches with constant operands
    // If it actually constant folds a branch,
    // this pass will recompute the DU
    if (foldBranches) {
      simplifyConstantBranches(ir);
    }
// Should we sort commutative use operands
    if (sortRegisters) {
      sortCommutativeRegisterUses(ir);
    }

Next We add Timer Code, Make a Method Size API call, Make A Method Name API Call

Timer:
startTime = System.nanoTime();
DefUse.computeDU(ir);
EndTime = System.nanoTime();

Method Name and Method Size:
final int methodinstructionSize = ir.numberInstructions();

try{ //Debug Execution Time output
        // Create file
        FileWriter fstream = new FileWriter("computeDUmethodExecutionTime.txt",true);
            BufferedWriter out = new BufferedWriter(fstream);
        out.write("\nExecution Time: " + (EndTime - startTime) + " nano Seconds" + " Instructs: " + methodinstructionSize + " Method: " + ir.getMethod().getName().toString());
        //Close the output stream
        out.close();
        }catch (Exception e){//Catch exception if any
          System.err.println("Error: " + e.getMessage());
        }

./rvm -classpath dacapo-2006-10-MR2.jar Harness fop

We are left with ASCII files for each individual optimization containing data on Method Size, Method Execution Time, and Name

Execution Time: 21200 nano Seconds Instructs: 37 Method: stringToGlyph
Execution Time: 13520 nano Seconds Instructs: 53 Method: scanDigits
Execution Time: 17480 nano Seconds Instructs: 72 Method: getExplicit
Execution Time: 108801 nano Seconds Instructs: 508 Method: getShorthand

Graphs (Size vs Time)

Array Propagation

computeDU

copyPropagation

eliminateDeadInstructions

recomputeSSA

sortRegisters

typePropagation

[1] http://jikesrvm.svn.sourceforge.net/viewvc/jikesrvm/rvmroot/trunk/rvm/src/org/jikesrvm/compilers/opt/Simple.java?revision=16061&view=markup

Saturday, May 7, 2011

Jikes RVM parallel boot image creation execution time results

There are two types of problems "compiling for a multiprocessor" and "compiling on a multiprocessor"[4] I am focusing on compiling on a multiprocessor which can theoretically cut compile time down.

More specifically for the Jikes RVM there are two areas of the compiler that need to be multithreaded to solve the "compiling on a multiprocessor" problem.

We need to multithread two areas The Runtime Compiler and The Boot Image Writer.[5]

Currently Multithreading of the Boot Image compilation process has already been done and here are the before and after results on a dual core processor (two hardware threads).

System INFO

CPU: Intel(R) Core(TM)2 Duo CPU E8300 @ 2.83GHz

Memory: 2GB

OS: Fedora 12 32bit

Here are some before and after test results utilizing Jikes RVMS existing multithreaded Boot Image compilation[5]:

ant -Dconfig.name=production

BUILD SUCCESSFUL

Total time: 2 minutes 55 seconds

---------------------------------

ant -Dconfig.name=production

BUILD SUCCESSFUL

Total time: 2 minutes 57 seconds

---------------------------------

ant -Dconfig.name=production -Dbootimage.threads=2

BUILD SUCCESSFUL

Total time: 2 minutes 39 seconds

---------------------------------

ant -Dconfig.name=production -Dbootimage.threads=2

BUILD SUCCESSFUL

Total time: 2 minutes 38 seconds

As you can see with one thread we have a compile time of 2 minutes and 55 seconds and with two threads we have and execution time of 2 minutes and 39 seconds.

My research goal is to cut down the execution time down of the Jikes RVM compilation.

Here is how it works in the code:

First we start with our command line option of:

ant -Dconfig.name=production -Dbootimage.threads=2

These command line options are pulled in with the help of CommandLineArgs.java[1] and possibly a few other source files.

This changes the value of <property name="bootimage.threads" value="1"/> in the build.xml file[2] which allows for multithreaded(parallel compilation) boot image compilation.

Delving further into the source code these options set a value in the BootImageWriter.java[3] allowing for multithreaded compilation of the bootimage.

CODE:

The flag -Dbootimage.threads=2 sets -numThreads=2 in the BootImageWrite.java[3]

Command: ant -v -Dconfig.name=production -Dbootimage.threads=2

Output:

[java] '-X:bc:O2'

[java] '-littleEndian'

[java] '-da'

[java] '0x60000000'

[java] '-ca'

[java] '0x64000000'

[java] '-ra'

[java] '0x67000000'

[java] '-numThreads=2'

[java] '-classlib'

[java] 'Classpath'

[java]

[java] The ' characters around the executable and arguments are

[java] not part of the command.

[java] BootImageWriter: compiler arg: O2

Note: See the correlation between -Dbootimage.threads=N and -numThreads=N

-Dbootimage.threads=N sets the value of -numThreads=N

Code below contained in source file BootImageWrite.java[3]

* -numThreads=N number of parallel compilation threads we should create

if (args[i].startsWith("-numThreads=")) {

numThreads = Integer.parseInt(args[i].substring(12));

if (numThreads < 1) {

fail("numThreads must be a positive number, value supplied: "+ numThreads);

}

continue;

}

if (verbose >= 1) say(" compiling with " + numThreads + " threads");

ExecutorService threadPool = Executors.newFixedThreadPool(numThreads);

for (RVMType type: bootImageTypes.values()) {

threadPool.execute(new BootImageWorker(type));

}

threadPool.shutdown();

try {

while(!threadPool.awaitTermination(Long.MAX_VALUE, TimeUnit.SECONDS)) {

say("Compilation really shouldn't take this long");

}

} catch (InterruptedException e){

throw new Error("Build interrupted", e);

}

if (BootImageWorker.instantiationFailed) {

throw new Error("Error during instantiaion");

}

Full System Information:

command: cat /proc/cpuinfo

processor : 0

vendor_id : GenuineIntel

cpu family : 6

model : 23

model name : Intel(R) Core(TM)2 Duo CPU E8300 @ 2.83GHz

stepping : 6

cpu MHz : 2124.000

cache size : 6144 KB

physical id : 0

siblings : 2

core id : 0

cpu cores : 2

apicid : 0

initial apicid : 0

fdiv_bug : no

hlt_bug : no

f00f_bug : no

coma_bug : no

fpu : yes

fpu_exception : yes

cpuid level : 10

wp : yes

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm tpr_shadow vnmi flexpriority

bogomips : 5666.99

clflush size : 64

power management:

processor : 1

vendor_id : GenuineIntel

cpu family : 6

model : 23

model name : Intel(R) Core(TM)2 Duo CPU E8300 @ 2.83GHz

stepping : 6

cpu MHz : 2124.000

cache size : 6144 KB

physical id : 0

siblings : 2

core id : 1

cpu cores : 2

apicid : 1

initial apicid : 1

fdiv_bug : no

hlt_bug : no

f00f_bug : no

coma_bug : no

fpu : yes

fpu_exception : yes

cpuid level : 10

wp : yes

bogomips : 5666.33

clflush size : 64

power management:

command: cat /proc/meminfo

MemTotal: 2060128 kB

MemFree: 357516 kB

Buffers: 60876 kB

Cached: 501892 kB

SwapCached: 4344 kB

Active: 841384 kB

Inactive: 620044 kB

Active(anon): 478480 kB

Inactive(anon): 433076 kB

Active(file): 362904 kB

Inactive(file): 186968 kB

Unevictable: 0 kB

Mlocked: 0 kB

HighTotal: 1188744 kB

HighFree: 59396 kB

LowTotal: 871384 kB

LowFree: 298120 kB

SwapTotal: 4128760 kB

SwapFree: 4064148 kB

Dirty: 128 kB

Writeback: 0 kB

AnonPages: 894768 kB

Mapped: 55872 kB

Slab: 114072 kB

SReclaimable: 83344 kB

SUnreclaim: 30728 kB

PageTables: 9800 kB

NFS_Unstable: 0 kB

Bounce: 0 kB

WritebackTmp: 0 kB

CommitLimit: 5158824 kB

Committed_AS: 2058564 kB

VmallocTotal: 122880 kB

VmallocUsed: 78384 kB

VmallocChunk: 28260 kB

HugePages_Total: 0

HugePages_Free: 0

HugePages_Rsvd: 0

HugePages_Surp: 0

Hugepagesize: 2048 kB

DirectMap4k: 8184 kB

DirectMap2M: 899072 kB

Command: uname -a

Linux localhost.localdomain 2.6.31.5-127.fc12.i686.PAE #1 SMP Sat Nov 7 21:25:57 EST 2009 i686 i686 i386 GNU/Linux

[1] http://jikesrvm.svn.sourceforge.net/viewvc/jikesrvm/rvmroot/trunk/rvm/src/org/jikesrvm/CommandLineArgs.java?view=log

[2] http://jikesrvm.svn.sourceforge.net/viewvc/jikesrvm/rvmroot/trunk/build.xml?view=log

[3]http://jikesrvm.svn.sourceforge.net/viewvc/jikesrvm/rvmroot/trunk/tools/bootImageWriter/src/org/jikesrvm/tools/bootImageWriter/BootImageWriter.java?view=log

[4] http://tinyurl.com/3nsjele

[5]ftp://ftp.cs.man.ac.uk/pub/apt/theses/ChristosKotselidis_MSc.pdf%20