Cluster analysis or clustering is the main task of exploratory data mining and a common technique for statistical data analysis. It has been used in many fields, including pattern recognition, image analysis, information retrieval, bioinformatics, data compression, computer graphics, and machine learning. Cluster analysis itself is not a specific algorithm, but the general task to be solved usually needs to modify the data preprocessing and model parameters until the results get the required attributes.
The knowledge involved in writing the GROMACS analysis program is explained in the first two documents, but for those who are not familiar with C++, it is still difficult to read and understand. So here is the specific method for reference.
1. Define the program to be added in
src\gromacs\gmxana\gmx_ana.h
2. int gmx_mdcluster(int argc, char *argv[]);
3. Register the module in // Modules from gmx_ana.h. in src\programs\legacymodules.cpp, so that it can be called in the gmx main program.
RegisterModule(manager, &gmx_mdcluster, "mdcluster", "md Cluster structures");
4. Combine the two files src\gromacs\gmxana\gmx_mdmat.cpp and src\gromacs\gmxana\gmx_cluster.cpp and name them mdcluster.cpp
5. Organize the header file of mdcluster.cpp and modify the name of the main function to mdcluster.
6. Try to compile. If you encounter a function redefinition error, modify the function name. Until the compilation is passed, the execution of gmx mdcluster succeeds.
7. Write the required functions according to your needs. The main thing is to understand the distance matrix output by the mdmat function and call it in the cluster function.
8. If necessary, add other required functions. If necessary, refer to other analysis programs that come with GROMACS.
Instructions
gmx mdcluster -f -s -n
Options
-f: default traj.trr, the trajectory file to be analyzed
-s: default topol.tprrun input file
-n: default index.ndx, optional index file
-g: default cluster.log, output file, containing cluster information at each moment
-unm: default num.xvg, optional output xvg file, the number of clusters at each moment
-xyz: default cluster-xyz.pdb, optional output coordinate file in xyz format, lists the coordinates of each cluster. The extension is pdb, because GROMACS does not support the specified output file in xyz format, it can only be used pdb instead.
Other options related to cluster analysis have not changed, refer to the cluster documentation.