diff --git a/docs/SOP_reanalysis_preparation.md b/docs/SOP_reanalysis_preparation.md index fb09be7c24d86caf4fc55d9e1489e2c00326b313..7a2a0c1416d1633782938a99d9f6a90149a94e4d 100644 --- a/docs/SOP_reanalysis_preparation.md +++ b/docs/SOP_reanalysis_preparation.md @@ -130,14 +130,10 @@ cut -f 5 families.txt | sort | uniq -d > families_multiple_entries.txt rm family_entries_removed_from_reanalysis.txt 2> /dev/null for family in `cat families_multiple_entries.txt` do - grep $family$ families.txt | sort | sed -e 's/\t/\_/g' > $family.txt + grep $family$ families.txt | sort -k4 -n | sed -e 's/\t/\_/g' > $family.txt count=`wc -l $family.txt | awk '{ print $1 - 1 }'` - for ((i = 1; i <= $count; i = i + 1)) - do - family_entry=`head -n $i $family.txt | tail -n 1` - echo $family_entry >> family_entries_removed_from_reanalysis.txt - done - rm $family.txt + head -n $count $family.txt >> family_entries_removed_from_reanalysis.txt +# rm $family.txt done cd ../families @@ -147,6 +143,7 @@ ls | grep -v trio | grep -v duo | sed -e 's/\_/\t/g' | cut -f 2-5 | sed -e 's/\t 5. Pull in all the relevant PED files into the params folder +``` cd ../params mkdir all count=`wc -l all.txt | awk '{ print $1 }'` @@ -157,6 +154,7 @@ do cp ../../$project/params/*$family*.ped ./all/ done +``` 6. Make folders for the relevant groups and sort the families based on their PED files @@ -186,3 +184,21 @@ do done done ``` + +7. Ensure all parent fields are set to 0 for the shared-affected families + +``` +cd shared_affected +for file in *.ped +do + awk '{ print $1 "\t" $2 "\t0\t0\t" $5 "\t" $6 }' $file > temp.out + mv temp.out $file +done +``` + +8. Copy DECIPHER upload files from previous analyses. If multiple files are possible, select in preference order: + +a. <INDI>_<FAM>_DECIPHER_v11.xlsx (the best) +b. <INDI>_<FAM>_DECIPHER_v10.xlsx (if we do not have v11 version) +c. <INDI>_<FAM>_DEC_FLT.csv (least preferred, if we do not have a xlsx version -> this was the case at the beginning of the NHS Trio WES service) +