Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
L
lammps
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
multiscale
lammps
Commits
64e152bc
Commit
64e152bc
authored
6 years ago
by
Axel Kohlmeyer
Browse files
Options
Downloads
Patches
Plain Diff
add some notes about GPU-direct support requirements to the manual
parent
5d87e0c6
No related branches found
Branches containing commit
No related tags found
No related merge requests found
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
doc/src/Speed_kokkos.txt
+19
-3
19 additions, 3 deletions
doc/src/Speed_kokkos.txt
doc/src/package.txt
+7
-6
7 additions, 6 deletions
doc/src/package.txt
with
26 additions
and
9 deletions
doc/src/Speed_kokkos.txt
+
19
−
3
View file @
64e152bc
...
...
@@ -96,6 +96,19 @@ software version 7.5 or later must be installed on your system. See
the discussion for the "GPU package"_Speed_gpu.html for details of how
to check and do this.
NOTE: Kokkos with CUDA currently implicitly assumes, that the MPI
library is CUDA-aware and has support for GPU-direct. This is not always
the case, especially when using pre-compiled MPI libraries provided by
a Linux distribution. This is not a problem when using only a single
GPU and a single MPI rank on a desktop. When running with multiple
MPI ranks, you may see segmentation faults without GPU-direct support.
Many of those can be avoided by adding the flags '-pk kokkos comm no'
to the LAMMPS command line or using "package kokkos comm on"_package.html
in the input file, however for some KOKKOS enabled styles like
"EAM"_pair_eam.html or "PPPM"_kspace_style.html, this is not the case
and a GPU-direct enabled MPI library is REQUIRED.
Use a C++11 compatible compiler and set KOKKOS_ARCH variable in
/src/MAKE/OPTIONS/Makefile.kokkos_cuda_mpi for both GPU and CPU as
described above. Then do the following:
...
...
@@ -262,9 +275,12 @@ the # of physical GPUs on the node. You can assign multiple MPI tasks
to the same GPU with the KOKKOS package, but this is usually only
faster if significant portions of the input script have not been
ported to use Kokkos. Using CUDA MPS is recommended in this
scenario. As above for multi-core CPUs (and no GPU), if N is the
number of physical cores/node, then the number of MPI tasks/node
should not exceed N.
scenario. Using a CUDA-aware MPI library with support for GPU-direct
is highly recommended and for some KOKKOS-enabled styles even required.
Most GPU-direct use can be avoided by using "-pk kokkos comm no".
As above for multi-core CPUs (and no GPU), if N is the number of
physical cores/node, then the number of MPI tasks/node should not
exceed N.
-k on g Ng :pre
...
...
This diff is collapsed.
Click to expand it.
doc/src/package.txt
+
7
−
6
View file @
64e152bc
...
...
@@ -480,15 +480,16 @@ The value options for all 3 keywords are {no} or {host} or {device}.
A value of {no} means to use the standard non-KOKKOS method of
packing/unpacking data for the communication. A value of {host} means
to use the host, typically a multi-core CPU, and perform the
packing/unpacking in parallel with threads. A value of {device}
means
to use the device, typically a GPU, to perform the
packing/unpacking
operation.
packing/unpacking in parallel with threads. A value of {device}
means
to use the device, typically a GPU, to perform the
packing/unpacking
operation.
The optimal choice for these keywords depends on the input script and
the hardware used. The {no} value is useful for verifying that the
Kokkos-based {host} and {device} values are working correctly. It may
also be the fastest choice when using Kokkos styles in MPI-only mode
(i.e. with a thread count of 1).
Kokkos-based {host} and {device} values are working correctly. The {no}
value should also be used, in case of using an MPI library that does
not support GPU-direct. It may also be the fastest choice when using
Kokkos styles in MPI-only mode (i.e. with a thread count of 1).
When running on CPUs or Xeon Phi, the {host} and {device} values work
identically. When using GPUs, the {device} value will typically be
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment