Skip to content
Snippets Groups Projects
Commit 49b4cf9a authored by Axel Kohlmeyer's avatar Axel Kohlmeyer
Browse files

remove references to Make.py and USER-CUDA

parent 49e6c2eb
No related branches found
No related tags found
No related merge requests found
These are input scripts used to run versions of several of the These are input scripts used to run versions of several of the
benchmarks in the top-level bench directory using the GPU and benchmarks in the top-level bench directory using the GPU accelerator
USER-CUDA accelerator packages. The results of running these scripts package. The results of running these scripts on two different machines
on two different machines (a desktop with 2 Tesla GPUs and the ORNL (a desktop with 2 Tesla GPUs and the ORNL Titan supercomputer) are shown
Titan supercomputer) are shown on the "GPU (Fermi)" section of the on the "GPU (Fermi)" section of the Benchmark page of the LAMMPS WWW
Benchmark page of the LAMMPS WWW site: lammps.sandia.gov/bench. site: lammps.sandia.gov/bench.
Examples are shown below of how to run these scripts. This assumes Examples are shown below of how to run these scripts. This assumes
you have built 3 executables with both the GPU and USER-CUDA packages you have built 3 executables with the GPU package
installed, e.g. installed, e.g.
lmp_linux_single lmp_linux_single
lmp_linux_mixed lmp_linux_mixed
lmp_linux_double lmp_linux_double
The precision (single, mixed, double) refers to the GPU and USER-CUDA
package precision. See the README files in the lib/gpu and lib/cuda
directories for instructions on how to build the packages with
different precisions. The GPU and USER-CUDA sub-sections of the
doc/Section_accelerate.html file also describes this process.
Make.py -d ~/lammps -j 16 -p #all orig -m linux -o cpu -a exe
Make.py -d ~/lammps -j 16 -p #all opt orig -m linux -o opt -a exe
Make.py -d ~/lammps -j 16 -p #all omp orig -m linux -o omp -a exe
Make.py -d ~/lammps -j 16 -p #all gpu orig -m linux \
-gpu mode=double arch=20 -o gpu_double -a libs exe
Make.py -d ~/lammps -j 16 -p #all gpu orig -m linux \
-gpu mode=mixed arch=20 -o gpu_mixed -a libs exe
Make.py -d ~/lammps -j 16 -p #all gpu orig -m linux \
-gpu mode=single arch=20 -o gpu_single -a libs exe
Make.py -d ~/lammps -j 16 -p #all cuda orig -m linux \
-cuda mode=double arch=20 -o cuda_double -a libs exe
Make.py -d ~/lammps -j 16 -p #all cuda orig -m linux \
-cuda mode=mixed arch=20 -o cuda_mixed -a libs exe
Make.py -d ~/lammps -j 16 -p #all cuda orig -m linux \
-cuda mode=single arch=20 -o cuda_single -a libs exe
Make.py -d ~/lammps -j 16 -p #all intel orig -m linux -o intel_cpu -a exe
Make.py -d ~/lammps -j 16 -p #all kokkos orig -m linux -o kokkos_omp -a exe
Make.py -d ~/lammps -j 16 -p #all kokkos orig -kokkos cuda arch=20 \
-m cuda -o kokkos_cuda -a exe
Make.py -d ~/lammps -j 16 -p #all opt omp gpu cuda intel kokkos orig \
-gpu mode=double arch=20 -cuda mode=double arch=20 -m linux \
-o all -a libs exe
Make.py -d ~/lammps -j 16 -p #all opt omp gpu cuda intel kokkos orig \
-kokkos cuda arch=20 -gpu mode=double arch=20 \
-cuda mode=double arch=20 -m cuda -o all_cuda -a libs exe
------------------------------------------------------------------------ ------------------------------------------------------------------------
To run on just CPUs (without using the GPU or USER-CUDA styles), To run on just CPUs (without using the GPU styles),
do something like the following: do something like the following:
mpirun -np 1 lmp_linux_double -v x 8 -v y 8 -v z 8 -v t 100 < in.lj mpirun -np 1 lmp_linux_double -v x 8 -v y 8 -v z 8 -v t 100 < in.lj
...@@ -81,23 +47,5 @@ node via a "-ppn" setting. ...@@ -81,23 +47,5 @@ node via a "-ppn" setting.
------------------------------------------------------------------------ ------------------------------------------------------------------------
To run with the USER-CUDA package, do something like the following:
mpirun -np 1 lmp_linux_single -c on -sf cuda -v x 16 -v y 16 -v z 16 -v t 100 < in.lj
mpirun -np 2 lmp_linux_double -c on -sf cuda -pk cuda 2 -v x 32 -v y 64 -v z 64 -v t 100 < in.eam
The "xyz" settings determine the problem size. The "t" setting
determines the number of timesteps. The "np" setting determines how
many MPI tasks (per node) the problem will run on. The numeric
argument to the "-pk" setting is the number of GPUs (per node); 1 GPU
is the default. Note that the number of MPI tasks must equal the
number of GPUs (both per node) with the USER-CUDA package.
These mpirun commands run on a single node. To run on multiple nodes,
scale up the "-np" setting, and control the number of MPI tasks per
node via a "-ppn" setting.
------------------------------------------------------------------------
If the script has "titan" in its name, it was run on the Titan If the script has "titan" in its name, it was run on the Titan
supercomputer at ORNL. supercomputer at ORNL.
...@@ -71,49 +71,33 @@ integration ...@@ -71,49 +71,33 @@ integration
---------------------------------------------------------------------- ----------------------------------------------------------------------
Here is a src/Make.py command which will perform a parallel build of a
LAMMPS executable "lmp_mpi" with all the packages needed by all the
examples. This assumes you have an MPI installed on your machine so
that "mpicxx" can be used as the wrapper compiler. It also assumes
you have an Intel compiler to use as the base compiler. You can leave
off the "-cc mpi wrap=icc" switch if that is not the case. You can
also leave off the "-fft fftw3" switch if you do not have the FFTW
(v3) installed as an FFT package, in which case the default KISS FFT
library will be used.
cd src
Make.py -j 16 -p none molecule manybody kspace granular rigid orig \
-cc mpi wrap=icc -fft fftw3 -a file mpi
----------------------------------------------------------------------
Here is how to run each problem, assuming the LAMMPS executable is Here is how to run each problem, assuming the LAMMPS executable is
named lmp_mpi, and you are using the mpirun command to launch parallel named lmp_mpi, and you are using the mpirun command to launch parallel
runs: runs:
Serial (one processor runs): Serial (one processor runs):
lmp_mpi < in.lj lmp_mpi -in in.lj
lmp_mpi < in.chain lmp_mpi -in in.chain
lmp_mpi < in.eam lmp_mpi -in in.eam
lmp_mpi < in.chute lmp_mpi -in in.chute
lmp_mpi < in.rhodo lmp_mpi -in in.rhodo
Parallel fixed-size runs (on 8 procs in this case): Parallel fixed-size runs (on 8 procs in this case):
mpirun -np 8 lmp_mpi < in.lj mpirun -np 8 lmp_mpi -in in.lj
mpirun -np 8 lmp_mpi < in.chain mpirun -np 8 lmp_mpi -in in.chain
mpirun -np 8 lmp_mpi < in.eam mpirun -np 8 lmp_mpi -in in.eam
mpirun -np 8 lmp_mpi < in.chute mpirun -np 8 lmp_mpi -in in.chute
mpirun -np 8 lmp_mpi < in.rhodo mpirun -np 8 lmp_mpi -in in.rhodo
Parallel scaled-size runs (on 16 procs in this case): Parallel scaled-size runs (on 16 procs in this case):
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.lj mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.lj
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.chain.scaled mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.chain.scaled
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.eam mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.eam
mpirun -np 16 lmp_mpi -var x 4 -var y 4 < in.chute.scaled mpirun -np 16 lmp_mpi -var x 4 -var y 4 -in in.chute.scaled
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.rhodo.scaled mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.rhodo.scaled
For each of the scaled-size runs you must set 3 variables as -var For each of the scaled-size runs you must set 3 variables as -var
command line switches. The variables x,y,z are used in the input command line switches. The variables x,y,z are used in the input
......
...@@ -106,20 +106,11 @@ tad: temperature-accelerated dynamics of vacancy diffusion in bulk Si ...@@ -106,20 +106,11 @@ tad: temperature-accelerated dynamics of vacancy diffusion in bulk Si
vashishta: models using the Vashishta potential vashishta: models using the Vashishta potential
voronoi: Voronoi tesselation via compute voronoi/atom command voronoi: Voronoi tesselation via compute voronoi/atom command
Here is a src/Make.py command which will perform a parallel build of a
LAMMPS executable "lmp_mpi" with all the packages needed by all the
examples, with the exception of the accelerate sub-directory. See the
accelerate/README for Make.py commands suitable for its example
scripts.
cd src
Make.py -j 16 -p none std no-lib reax meam poems reaxc orig -a lib-all mpi
Here is how you might run and visualize one of the sample problems: Here is how you might run and visualize one of the sample problems:
cd indent cd indent
cp ../../src/lmp_mpi . # copy LAMMPS executable to this dir cp ../../src/lmp_mpi . # copy LAMMPS executable to this dir
lmp_mpi < in.indent # run the problem lmp_mpi -in in.indent # run the problem
Running the simulation produces the files {dump.indent} and Running the simulation produces the files {dump.indent} and
{log.lammps}. You can visualize the dump file as follows: {log.lammps}. You can visualize the dump file as follows:
......
These are example scripts that can be run with any of These are example scripts that can be run with any of
the acclerator packages in LAMMPS: the acclerator packages in LAMMPS:
USER-CUDA, GPU, USER-INTEL, KOKKOS, USER-OMP, OPT GPU, USER-INTEL, KOKKOS, USER-OMP, OPT
The easiest way to build LAMMPS with these packages The easiest way to build LAMMPS with these packages
is via the src/Make.py tool described in Section 2.4 is via the flags described in Section 4 of the manual.
of the manual. You can also type "Make.py -h" to see The easiest way to run these scripts is by using the appropriate
its options. The easiest way to run these scripts
is by using the appropriate
Details on the individual accelerator packages Details on the individual accelerator packages
can be found in doc/Section_accelerate.html. can be found in doc/Section_accelerate.html.
...@@ -16,21 +13,6 @@ can be found in doc/Section_accelerate.html. ...@@ -16,21 +13,6 @@ can be found in doc/Section_accelerate.html.
Build LAMMPS with one or more of the accelerator packages Build LAMMPS with one or more of the accelerator packages
The following command will invoke the src/Make.py tool with one of the
command-lines from the Make.list file:
../../src/Make.py -r Make.list target
target = one or more of the following:
cpu, omp, opt
cuda_double, cuda_mixed, cuda_single
gpu_double, gpu_mixed, gpu_single
intel_cpu, intel_phi
kokkos_omp, kokkos_cuda, kokkos_phi
If successful, the build will produce the file lmp_target in this
directory.
Note that in addition to any accelerator packages, these packages also Note that in addition to any accelerator packages, these packages also
need to be installed to run all of the example scripts: ASPHERE, need to be installed to run all of the example scripts: ASPHERE,
MOLECULE, KSPACE, RIGID. MOLECULE, KSPACE, RIGID.
...@@ -38,39 +20,11 @@ MOLECULE, KSPACE, RIGID. ...@@ -38,39 +20,11 @@ MOLECULE, KSPACE, RIGID.
These two targets will build a single LAMMPS executable with all the These two targets will build a single LAMMPS executable with all the
CPU accelerator packages installed (USER-INTEL for CPU, KOKKOS for CPU accelerator packages installed (USER-INTEL for CPU, KOKKOS for
OMP, USER-OMP, OPT) or all the GPU accelerator packages installed OMP, USER-OMP, OPT) or all the GPU accelerator packages installed
(USER-CUDA, GPU, KOKKOS for CUDA): (GPU, KOKKOS for CUDA):
target = all_cpu, all_gpu
Note that the Make.py commands in Make.list assume an MPI environment
exists on your machine and use mpicxx as the wrapper compiler with
whatever underlying compiler it wraps by default. If you add "-cc mpi
wrap=g++" or "-cc mpi wrap=icc" after the target, you can choose the
underlying compiler for mpicxx to invoke. E.g.
../../src/Make.py -r Make.list intel_cpu -cc mpi wrap=icc
You should do this for any build that includes the USER-INTEL For any build with GPU, or KOKKOS for CUDA, be sure to set
package, since it will perform best with the Intel compilers.
Note that for kokkos_cuda, it needs to be "-cc nvcc" instead of "mpi",
since a KOKKOS for CUDA build requires NVIDIA nvcc as the wrapper
compiler.
Also note that the Make.py commands in Make.list use the default
FFT support which is via the KISS library. If you want to
build with another FFT library, e.g. FFTW3, then you can add
"-fft fftw3" after the target, e.g.
../../src/Make.py -r Make.list gpu -fft fftw3
For any build with USER-CUDA, GPU, or KOKKOS for CUDA, be sure to set
the arch=XX setting to the appropriate value for the GPUs and Cuda the arch=XX setting to the appropriate value for the GPUs and Cuda
environment on your system. What is defined in the Make.list file is environment on your system.
arch=21 for older Fermi GPUs. This can be overridden as follows,
e.g. for Kepler GPUs:
../../src/Make.py -r Make.list gpu_double -gpu mode=double arch=35
--------------------- ---------------------
...@@ -118,12 +72,6 @@ Note that when running in.lj.5.0 (which has a long cutoff) with the ...@@ -118,12 +72,6 @@ Note that when running in.lj.5.0 (which has a long cutoff) with the
GPU package, the "-pk tpa" setting should be > 1 (e.g. 8) for best GPU package, the "-pk tpa" setting should be > 1 (e.g. 8) for best
performance. performance.
** USER-CUDA package
lmp_machine -c on -sf cuda < in.lj
mpirun -np 1 lmp_machine -c on -sf cuda < in.lj # 1 MPI, 1 MPI/GPU
mpirun -np 2 lmp_machine -c on -sf cuda -pk cuda 2 < in.lj # 2 MPI, 1 MPI/GPU
** KOKKOS package for OMP ** KOKKOS package for OMP
lmp_kokkos_omp -k on t 1 -sf kk -pk kokkos neigh half < in.lj lmp_kokkos_omp -k on t 1 -sf kk -pk kokkos neigh half < in.lj
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment