Nadin Kokciyan
explainable-argumentation

Repository



Instructions to use SuperStructure
See also paper https://rupress.org/jcb/article/220/5/e202010003/211893/Parameter-free-molecular-super-structures


Open terminal (linux/mac)


clone git


compile "mydbscan.c"  (gcc mydbscan.c -o mydbscan -lm)


Execute script "SuperStructure_curves_generator.sh" on file "data.dat" in folder "Data" as

./SuperStructure_curves_generator.sh data.dat
This can be done on any number of data files in folder "Data" with different names.
The format in which  "data.dat" is passed to the calculation is

#id = integer
#frame = integer
#x = position x
#y = position y
#z = position z [attention, Superstructure will set this to 0 and project to 2D]


A. To modify interval of epsilon (DBSCAN radius) over which the calculation is performed open SuperStructure_curves_generator.sh and look for "# USER DEFINED EPSILON". Modify epsi (initial), epsf (final) and inc (increment) as appropriate.
B. To modify number of processors over which calculation is performed look for "# USER DEFINED PROCESSORS". Set any number of n_proc that can be used for the parallel calculation.


The result of the calulation are
A. "Data/analysis_superstructure/CLUSTER.eps_X.data.dat". These files are standard full DBSCAN output files where X is the value of epsilon at which the calulation is performed. The information on these files are printed in the header and consist in coordinates and cluster_id.
B. "SuperStructure.data.dat" which contains the following information

#Epsilon
#Number_of_Clusters
#Number_of_Clusters normalised by total_number_of_points

In gnuplot, this can be easily plotted as "p "SuperStructure.data.dat" u 1:3" to reproduce the curves in the paper.