Skip to content
Snippets Groups Projects

Instructions to use SuperStructure

See also paper https://rupress.org/jcb/article/220/5/e202010003/211893/Parameter-free-molecular-super-structures

  1. Open terminal (linux/mac)

  2. clone git

  3. compile "mydbscan.c" (gcc mydbscan.c -o mydbscan -lm)

  4. Execute script "SuperStructure_curves_generator.sh" on file "data.dat" in folder "Data" as

    • ./SuperStructure_curves_generator.sh data.dat
    • This can be done on any number of data files in folder "Data" with different names.
    • The format in which "data.dat" is passed to the calculation is
      • #id = integer
      • #frame = integer
      • #x = position x
      • #y = position y
      • #z = position z [attention, Superstructure will set this to 0 and project to 2D]

    A. To modify interval of epsilon (DBSCAN radius) over which the calculation is performed open SuperStructure_curves_generator.sh and look for "# USER DEFINED EPSILON". Modify epsi (initial), epsf (final) and inc (increment) as appropriate.

    B. To modify number of processors over which calculation is performed look for "# USER DEFINED PROCESSORS". Set any number of n_proc that can be used for the parallel calculation.

  5. The result of the calulation are

    A. "Data/analysis_superstructure/CLUSTER.eps_X.data.dat". These files are standard full DBSCAN output files where X is the value of epsilon at which the calulation is performed. The information on these files are printed in the header and consist in coordinates and cluster_id.

    B. "SuperStructure.data.dat" which contains the following information

    • #Epsilon
    • #Number_of_Clusters
    • #Number_of_Clusters normalised by total_number_of_points

    In gnuplot, this can be easily plotted as "p "SuperStructure.data.dat" u 1:3" to reproduce the curves in the paper.