Instructions to use SuperStructure
See also paper https://rupress.org/jcb/article/220/5/e202010003/211893/Parameter-free-molecular-super-structures
-
Open terminal (linux/mac)
-
clone git
-
compile "mydbscan.c" (gcc mydbscan.c -o mydbscan -lm)
-
Execute script "SuperStructure_curves_generator.sh" on file "data.dat" in folder "Data" as
- ./SuperStructure_curves_generator.sh data.dat
- This can be done on any number of data files in folder "Data" with different names.
- The format in which "data.dat" is passed to the calculation is
- #id = integer
- #frame = integer
- #x = position x
- #y = position y
- #z = position z [attention, Superstructure will set this to 0 and project to 2D]
A. To modify interval of epsilon (DBSCAN radius) over which the calculation is performed open SuperStructure_curves_generator.sh and look for "# USER DEFINED EPSILON". Modify epsi (initial), epsf (final) and inc (increment) as appropriate.
B. To modify number of processors over which calculation is performed look for "# USER DEFINED PROCESSORS". Set any number of n_proc that can be used for the parallel calculation.
-
The result of the calulation are
A. "Data/analysis_superstructure/CLUSTER.eps_X.data.dat". These files are standard full DBSCAN output files where X is the value of epsilon at which the calulation is performed. The information on these files are printed in the header and consist in coordinates and cluster_id.
B. "SuperStructure.data.dat" which contains the following information
- #Epsilon
- #Number_of_Clusters
- #Number_of_Clusters normalised by total_number_of_points
In gnuplot, this can be easily plotted as "p "SuperStructure.data.dat" u 1:3" to reproduce the curves in the paper.