4.6 Article

Size distribution of function-based human gene sets and the split-merge model

Journal

ROYAL SOCIETY OPEN SCIENCE
Volume 3, Issue 8, Pages -

Publisher

ROYAL SOC
DOI: 10.1098/rsos.160275

Keywords

gene family sizes; gene set sizes; power-law; beta rank function

Funding

  1. Robert S. Boas Center for Genomics and Human Genetics
  2. PAPIIT/UNAM [IN107414]
  3. PASPA/UNAM
  4. CONACYT Mexico

Ask authors/readers for more resources

The sizes of paralogues-gene families produced by ancestral duplication-are known to follow a power-law distribution. We examine the size distribution of gene sets or gene families where genes are grouped by a similar function or share a common property. The size distribution of Human Gene Nomenclature Committee (HGNC) gene sets deviate from the power-law, and can be fitted much better by a beta rank function. We propose a simple mechanism to break a power-law size distribution by a combination of splitting and merging operations. The largest gene sets are split into two to account for the subfunctional categories, and a small proportion of other gene sets are merged into larger sets as new common themes might be realized. These operations are not uncommon for a curator of gene sets. A simulation shows that iteration of these operations changes the size distribution of Ensembl paralogues and could lead to a distribution fitted by a rank beta function. We further illustrate application of beta rank function by the example of distribution of transcription factors and drug target genes among HGNC gene families.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available