Statistical Analysis Of Parallel Code
Statistical Analysis Of Parallel Code
Team Number: 033
School Name: Manzano High School
 
Area of Science: statistical analysis   
     Our Problem: 
  The problem that we chose is the statistical analysis of parallel code running on different systems, mostly for the comparison of communication; to show the difference between different systems, and what would be ideal communication.  We will be monitoring the MPI (message passing interface) to see how they are communicating. The MPI is just a way to pass messages back and forth between nodes[6].  The metric that our results will be compared against is the parallel efficiency.  The test code that we are going to use is a parallel code called “NAS Parallel Benchmarks”. These clusters will most likely be Sandia’s supercomputers C-Plant and Feynman. Both of these supercomputers have a specialized interconnect called Myrinet. C-Plant is a scalable system with Alpha processors, and Feynman is a Linux cluster with Xeon processors. The statistical analysis will be performed on runtimes, mostly the minimum, mean and standard deviation.  To do this, we will be using formulas from Lilja's text[7] in the mathematical model.
     The Solution:
  With the statistical analysis of intercommunication between nodes, we may be able to find a better way for nodes to communicate.  We plan to run the selected parallel code on at least two different large clusters. Our math model will require we use LU decomposition where we decompose an N * N matrix into a product of a lower and upper triangle matrix.[2]  Another type of NAS PB that we are testing is the conjugate gradient(CG).  The CG was designed as an iterative method for solving large systems of linear equations and sparse matrices.  It was originally designed as a quicker and easier way to do steepest descents.  Through every iteration, the equation gets much bigger, so CG was designed to find the constants quickly in each iteration of the problem.[9]  The last code that we hope to test is Embarassingly Parallel(EP).  EP is a pseudo-random number generator in which all of the nodes are doing the same thing at the same time.  We hope to run a comparison of at least two super computing platforms’ parallel efficiency for benchmarks.  
     Progress: 
  Currently we have started to learn more in depth C++ programming including classes and structs.  We have also started in work with our mentor to help us develop the math model.  We have started to develop the Perl code to pull the variables out of the out files generated by the NAS PB code.  The math model has started to be developed and will eventually be put into a C++ program to compute statistical data.
Team Members: Stephanie McAllister, Matthew Bailey 
Sponsoring Teacher: Stephen Schum
Project Mentor: Sue Goudy
Cited References:
1. http://citeseer.ist.psu.edu/234289.html
                    
2. http://mathworld.wolfram.com/LUDecomposition.html
      
3. http://www.nacad.ufrj.br/guide/Misc/ezmath.html 
4. http://citeseer.ist.psu.edu/160727.html 
5. http://www.netlib.org/mpi/ 
6. http://www.mpi-forum.org/
7. http://info.sjc.ox.ac.uk/scr/sobey/iblt/chapter1/notes/node20.html
8. http://sc-admin/perlcd/index.htm
9. http://www-2.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf
10.