Changes
Page history
MAF extraction
authored
Jun 04, 2020
by
ameyner2
Show whitespace changes
Inline
Side-by-side
Intersection-between-UK-Biobank-and-Illumina-GSA-arrays.md
View page @
25938f2c
...
...
@@ -78,3 +78,21 @@ cut -f 6 compare_alleles.txt | sort | uniq -c
| UKB | Both | 279300 / 34.0% | 267308 / 32.4% |
| GSA | Unique | 435910 / 60.9% | 450562 / 62.8% |
| GSA | Both | 279300 / 39.1% | 267308 / 37.2% |
### Extract the rsids and get the MAFs
**NB**
Could try mapping by position and/or alleles again to the MAF files for any GSA-only sites that are imputed in UKB.
```
grep -v gsa_only compare_alleles.txt | cut -f 5 | sed -e 's/,/\n/g' > rsids.ukb_all.txt
grep ukb_only compare_alleles.txt | cut -f 5 | sed -e 's/,/\n/g' > rsids.ukb_only.txt
grep both compare_alleles.txt | cut -f 5 | sed -e 's/,/\n/g' > rsids.both.txt
wc -l rsids.*
269399 rsids.both.txt
845485 rsids.ukb_all.txt
576086 rsids.ukb_only.txt
cat UKB_MAFs/* | perl scripts/extract_UKB_mafs_by_rsid.pl rsids.ukb_all.txt > mafs.ukb_all.txt
cat UKB_MAFs/* | perl scripts/extract_UKB_mafs_by_rsid.pl rsids.ukb_only.txt > mafs.ukb_only.txt
cat UKB_MAFs/* | perl scripts/extract_UKB_mafs_by_rsid.pl rsids.both.txt > mafs.both.txt
```
\ No newline at end of file