Update demo06

134f566c · Long Wang · 46bc9b8d · 134f566c · 134f566c
Commit 134f566c authored 1 year ago by Long Wang
--- a/README.md
+++ b/README.md
@@ -121,7 +121,20 @@ Please refer to the source links for up-to-date version and detailed description
    conda install -c bioconda iqtree 
    ```

+- #### pggb
+- [ ] [Official](https://github.com/pangenome/pggb)

+
+- #### panacus
+- [ ] [Official](https://github.com/marschall-lab/panacus)
+
+- #### Also install pggb and panacus via bioconda
+
+    ```bash
+    conda install -c bioconda -c conda-forge pggb
+    conda install -c bioconda -c conda-forge panacus
+    ```
+    
 </details>

 ## Disclaimer

--- a/workshop3/demo06.md
+++ b/workshop3/demo06.md
+# Example codes for demo06
+
+- ### Activating conda enviroment
+
+    ```bash
+    cd workshop3/
+
+    conda activate ws3
+    ```
+
+- ### Step1: Prepare input fasta file for PGGB
+    pggb runs on each chromosome and require a single fasta,  here we use chromosome chrIX as an example
+    
+    ### 1) Extract chrIX sequences for 18 genomes
+    <details><summary>(Click to show possible answers)</summary>
+
+    ```bash
+    mkdir chroms
+    
+    for genome_file in `ls ../data/ScRAP_data/nuclear/*.nuclear_genome.tidy.fa.gz`;
+    do
+        filename=`basename ${genome_file}`
+        out_prefix=${filename/.asm01.HP0.nuclear_genome.tidy.fa.gz}
+        
+        echo ${out_prefix}
+        
+        zcat ${genome_file} > ./chroms/${out_prefix}.genome.fa
+        
+        seqkit faidx ./chroms/${out_prefix}.genome.fa chrIX \
+            > ./chroms/${out_prefix}.chrIX.fa && \
+            rm ./chroms/${out_prefix}.genome.fa ./chroms/${out_prefix}.genome.fa.fai
+    done
+    ```
+    </details>
+
+    ### 2) Cat all 18 chrIX sequences into one fasta file, and index the fasta file as pggb need it
+    Before merge all files, we need to change the IDs as they are all identical.
+    We rename the sequence ID following PanSN-spec naming pattern (https://github.com/pangenome/PanSN-spec),the naming scheme for PanSN-spec is [sample_name][delim][haplotype_id][delim][contig_or_scaffold_name], so we will rename each ID as "Strain_ID#1#chrIX" where "1" represents haplotype id, and "#" is used as the delimiter
+    
+    <details><summary>(Click to show possible answers)</summary>
+
+    ```bash
+    > ./chroms/ScRAP_chrIX.fa
+    
+    for chrom_file in `ls ./chroms/*.chrIX.fa`;
+    do
+        filename=`basename ${chrom_file}`
+        new_id=${filename/.chrIX.fa/#1#chrIX}
+        
+        echo ${new_id}
+        
+        sed "s/>chrIX/>${new_id}/" ${chrom_file} \
+            >> ./chroms/ScRAP_chrIX.fa
+    done
+    
+    # pggb need the fasta file to be indexed by samtools faidx or seqkit faidx
+    seqkit faidx ./chroms/ScRAP_chrIX.fa
+    ```
+    </details>
+
+- ### Step2: It's now ready to run pggb
+
+    <details><summary>(Click to show possible answers)</summary>
+
+    ```bash
+    mkdir pggb
+    
+    # here we set -n 18, which is equal to the number of haplotypes
+    pggb -i ./chroms/ScRAP_chrIX.fa -o pggb -n 18 -t 4
+    ```
+    </details>
+    
+- ### Step3: Interpret the results from pggb
+    <details><summary>(Click to show possible answers)</summary>
+
+    ```bash
+    # first let's visualize the wfmash alignments
+    cd ./pggb
+    
+    ../wfmash/scripts/paf2dotplot png large ScRAP_chrIX.fa.92da240.alignments.wfmash.paf
+    
+    cd ..
+    
+    # next we generate some basic information on the graph using odgi stats
+    odgi stats -i ./pggb/ScRAP_chrIX.fa.92da240.417fcdf.b17acf2.smooth.final.og -S \
+        > ./pggb/ScRAP_chrIX.fa.92da240.417fcdf.b17acf2.smooth.final.og.stats
+    ```
+
+
+- ### Step4: Compare pggb results across different nucleotide identity threshold
+    The default minimum average nucleotide identity for segments of pggb is 90, which is suitable for genomes with a divergence <= 10%, let's try different threshold to see how this would affect pangraph
+    
+    <details><summary>(Click to show possible answers)</summary>
+
+    ```bash
+    # use a lower percent identity of 80 to allow more divergent alignment
+    pggb -i ./chroms/ScRAP_chrIX.fa -o pggb_p80 -n 18 -t 4 -p 80
+    
+    odgi stats -i ./pggb_p80/ScRAP_chrIX.fa.925a0a8.417fcdf.0e1b05c.smooth.final.og -S \
+        > ./pggb_p80/ScRAP_chrIX.fa.925a0a8.417fcdf.0e1b05c.smooth.final.og.stats
+    
+    
+    # use a higher percent identity of 95 to consider more conserved alignment 
+    pggb -i ./chroms/ScRAP_chrIX.fa -o pggb_p95 -n 18 -t 4 -p 95
+    
+    odgi stats -i ./pggb_p95/ScRAP_chrIX.fa.83ec06e.417fcdf.4cea6f5.smooth.final.og -S \
+        > ./pggb_p95/ScRAP_chrIX.fa.83ec06e.417fcdf.4cea6f5.smooth.final.og.stats
+    ```
+    </details>
+
+- ### Step5: Visualize pggb pangenome graph
+    PGGB by default generates ODGI 1D (./pggb*/*.og.viz_*.png) and 2D graphs (./pggb*/*.og.lay.*.png), let's try to view the graphs with vg view
+    
+    <details><summary>(Click to show possible answers)</summary>
+
+    ```bash
+    # first we convert gfa format to vg format as gfa seems to be not so well supported by vg in some cases
+    vg convert -t 4 -v -g ./pggb/ScRAP_chrIX.fa.92da240.417fcdf.b17acf2.smooth.final.gfa \
+        > ./pggb/ScRAP_chrIX.fa.92da240.417fcdf.b17acf2.smooth.final.vg
+    
+    # then we visualize a small region with vg view
+    vg chunk -x ./pggb/ScRAP_chrIX.fa.92da240.417fcdf.b17acf2.smooth.final.vg \
+        -p UWOPS052272#1#chrIX:450-500 -c 1 | vg view -pd - | dot -Tpng \
+        > ./pggb/ScRAP_chrIX.fa.92da240.417fcdf.b17acf2.smooth.final.UWOPS052272_chrIX_450-500.png
+    ```
+    </details>
+
+- ### Step6: Calculate pangenome openness with panacus (https://github.com/marschall-lab/panacus)
+    panacus is a tool for calculating statistics for GFA files, it supports the following calculations:
+
+    - coverage histogram
+    - pangenome growth statistics
+    - path-/group-resolved coverage table
+    
+    here we mainly use it to calculate the pangenome growth statistics
+    
+    <details><summary>(Click to show possible answers)</summary>
+
+    ```bash
+    # 1) run panacus histgrowth to calculate coverage and pangenome growth for nodes (default) with coverage/quorum thresholds 1/0, 2/0, 1/1, 1/0.5, and 1/0.1
+    RUST_LOG=info panacus histgrowth -t4 -l 1,2,1,1,1 -q 0,0,1,0.5,0.1 -S -a \
+        ./pggb/ScRAP_chrIX.fa.92da240.417fcdf.b17acf2.smooth.final.gfa \
+        > ./pggb/ScRAP_chrIX.fa.92da240.417fcdf.b17acf2.smooth.final.histgrowth.node.tsv
+    
+    2) visualize the growth statistics
+    panacus-visualize \
+        -e ./pggb/ScRAP_chrIX.fa.92da240.417fcdf.b17acf2.smooth.final.histgrowth.node.tsv \
+        > ./pggb/ScRAP_chrIX.fa.92da240.417fcdf.b17acf2.smooth.final.histgrowth.node.pdf
+    ```
+    </details>