Force fields in molecular dynamics simulations have been the subject of extensive research in the past decades, and force fields play a key role in capturing the precise physical and chemical properties of materials. Artificial neural network potentials (ANNPs) trained using density flooding theory (DFT) datasets have been shown to reproduce these properties well compared to conventional force fields, especially physical ANNPs. However, the lower performance caused by the complex force and energy decomposition processes has a significant limitation on the wide application of ANNP. The problem can of course be solved by increasing the CPU hourly rate or the number of cores while ignoring the cost efficiency.
Artificial Neural Network (ANN), abbreviated as Neural Network (NN), is based on the basic principles of neural networks in biology. After understanding and abstracting the structure of human brain and the response mechanism of external stimulus, a mathematical model is used to estimate or approximate the function by simulating the processing mechanism of the nervous system of human brain to the input complex information based on the theoretical knowledge of network topology in the graph theory of discrete mathematics. The model is characterized by parallel distributed processing capability, high fault tolerance, intelligence, and self-learning capabilities, which combine the processing and storage of information and can solve problems that cannot be solved by traditional rule-based programming, and has attracted attention in various disciplinary fields.
LAMMPS, developed at Sandia National Laboratories, USA, is a classic MD simulation software that is widely used to simulate solid-state, soft material and other systems. It is parallelized by Message Passing Interface (MPI) technology and supports many gas pedal packages. Each MPI or processor updates the position, velocity, force and energy of local atoms in non-overlapping subdomains owned by the processor, which are divided by a spatial decomposition technique. However, updating these data for "ghost atoms" copied from neighboring subdomains requires communication with the processor.
The package supports a wide range of hardware, including NVIDIA, AMD and Intel GPUs, and can also be compiled into different modes, such as CUDA and OpenCL, depending on the hardware and the installed toolkit. The performance in different modes is almost identical, e.g. the Tersoff potential differs by 5% between CUDA and OpenCL. Three types of floating point precision can be used when building the package: single precision, double precision and mixed precision.
Artificial neural networks, which are quite similar to biological neurons, have input and output procedures. In order to implement ANNP, a step from input to output is required, which is the feedforward procedure of ANN. As shown in the figure, a four-dimensional BP feedforward is used to calculate atomic energy and forces. Its structure consists of an input layer, two hidden layers and an output layer.
As mentioned above, the process of calculating atomic energies and forces is complex. Therefore, reducing the total number of cycles per atom is a good implementation strategy. The global memory access problem becomes worse if atomic operations are used to store the derivative values of the coordinates and forces of the symmetric functions on atoms j and k. Due to the high latency of atomic operations, it is usually undesirable. Therefore, to avoid atomic operations, the researchers use two large matrices to store the derivative values and the nearest neighbor forces for each computed atom i. In addition, this method can also be used for efficient acceleration of other complex machine learning potentials, such as the physically informed artificial neural network potential (PINN).
It is well known that the bandwidth of global memory in most GPU devices is lower than the bandwidth of on-chip memory such as registers and shared memory. Therefore, reducing the number of global memory accesses is a desirable approach to implement a three-body potential. However, it requires more extra cycles, which is detrimental to complex three-body potentials such as ANNP and physical information neural network potential (PINN) because they require more time to compute than global memory accesses. The main purpose of the Flexible Computation Approach (FCA) is to reduce the total number of cycles in the current implementation. The use of hierarchical memory and the joint access to global memory play a key role in reducing the computation time.