Hi,
I have tested some genes for positive selection, and I would like to do some enrichment analysis on those using as background all the genes tested for selection.
I tried enrichment analysis with GOfuncR and I got one biological process out, when I tried enricher in clusterprofiler, I was surprised that I got no biological/molecular pathway. As suggested in the manual, since I did manual direct functional annotation I used buildGOmap. Since my gene universe is smaller than the total number of genes I filtered to retain the tested for selection, and I also retained only the relevant/present GO:terms.
Surprisingly, and shockingly, I got no significant pathway/process with enricher. I read that perhaps, instead of using enricher I should do gene set enrichment with GSEA.
I read that I need a ranked list with all the genes, my issue is what I can use as "ranking parameter". I did the selection analysis with hyphy aBSREL, and it determines selection per branch using p-values, and I thought that perhaps Corrected p-values would be sufficient to "rank" the genes. It outputs also the per-branch the statistic of the LRT test.
Perhaps I should optimize the parameters, for instance, the "gene universe" is 13445, the gene set is 92, but only 86 have annotations.
I ran an analysis with p/q-value = 1 to test, and I saw that some terms have low gene ratio (2-3/92).
> head(top_hits_Hydlep_cluster_Sel_TERM, 10)
ID Description GeneRatio BgRatio RichFactor FoldEnrichment zScore
GO:0035608 GO:0035608 protein deglutamylation 3/92 7/13445 0.42857143 62.631988 13.538066
GO:0035609 GO:0035609 C-terminal protein deglutamylation 3/92 7/13445 0.42857143 62.631988 13.538066
GO:0071499 GO:0071499 cellular response to laminar fluid shear stress 3/92 17/13445 0.17647059 25.789642 8.489025
GO:0018410 GO:0018410 C-terminal protein amino acid modification 3/92 19/13445 0.15789474 23.074943 7.992301
GO:0120222 GO:0120222 regulation of blastocyst development 2/92 5/13445 0.40000000 58.456522 10.665800
GO:0018200 GO:0018200 peptidyl-glutamic acid modification 3/92 24/13445 0.12500000 18.267663 7.027737
GO:1904375 GO:1904375 regulation of protein localization to cell periphery 10/92 439/13445 0.02277904 3.328959 4.118044
GO:0002536 GO:0002536 respiratory burst involved in inflammatory response 3/92 31/13445 0.09677419 14.142707 6.080724
GO:0004181 GO:0004181 metallocarboxypeptidase activity 3/92 31/13445 0.09677419 14.142707 6.080724
GO:0006276 GO:0006276 plasmid maintenance 2/92 8/13445 0.25000000 36.535326 8.344934
pvalue p.adjust qvalue
GO:0035608 1.063922e-05 0.09103983 0.09103983
GO:0035609 1.063922e-05 0.09103983 0.09103983
GO:0071499 1.966897e-04 1.00000000 1.00000000
GO:0018410 2.775172e-04 1.00000000 1.00000000
GO:0120222 4.569984e-04 1.00000000 1.00000000
GO:0018200 5.654846e-04 1.00000000 1.00000000
GO:1904375 8.199401e-04 1.00000000 1.00000000
GO:0002536 1.213131e-03 1.00000000 1.00000000
GO:0004181 1.213131e-03 1.00000000 1.00000000
GO:0006276 1.262573e-03 1.00000000 1.00000000
geneID Count
GO:0035608 GNX-029059/GNX-037418/GNX-034376 3
GO:0035609 GNX-029059/GNX-037418/GNX-034376 3
GO:0071499 GNX-037622/GNX-025087/GNX-037126 3
GO:0018410 GNX-029059/GNX-037418/GNX-034376 3
GO:0120222 GNX-029059/GNX-037418 2
GO:0018200 GNX-029059/GNX-037418/GNX-034376 3
GO:1904375 GNX-030347/GNX-022077/GNX-025087/GNX-025668/GNX-024960/GNX-035899/GNX-022108/GNX-024787/GNX-011804/GNX-016902 10
GO:0002536 GNX-019781/GNX-022624/GNX-023678 3
GO:0004181 GNX-029059/GNX-037418/GNX-034376 3
GO:0006276 GNX-022108/GNX-023602 2
Thanks.
Hi,
I have tested some genes for positive selection, and I would like to do some enrichment analysis on those using as background all the genes tested for selection.
I tried enrichment analysis with GOfuncR and I got one biological process out, when I tried enricher in clusterprofiler, I was surprised that I got no biological/molecular pathway. As suggested in the manual, since I did manual direct functional annotation I used buildGOmap. Since my gene universe is smaller than the total number of genes I filtered to retain the tested for selection, and I also retained only the relevant/present GO:terms.
Surprisingly, and shockingly, I got no significant pathway/process with enricher. I read that perhaps, instead of using enricher I should do gene set enrichment with GSEA.
I read that I need a ranked list with all the genes, my issue is what I can use as "ranking parameter". I did the selection analysis with hyphy aBSREL, and it determines selection per branch using p-values, and I thought that perhaps Corrected p-values would be sufficient to "rank" the genes. It outputs also the per-branch the statistic of the LRT test.
Perhaps I should optimize the parameters, for instance, the "gene universe" is 13445, the gene set is 92, but only 86 have annotations.
I ran an analysis with p/q-value = 1 to test, and I saw that some terms have low gene ratio (2-3/92).
Thanks.