4.0 Article

Approximation algorithm for rearrangement distances considering repeated genes and intergenic regions

Journal

ALGORITHMS FOR MOLECULAR BIOLOGY
Volume 16, Issue 1, Pages -

Publisher

BMC
DOI: 10.1186/s13015-021-00200-w

Keywords

Genome rearrangement; Intergenic regions; Reversal

Funding

  1. National Council of Technological and Scientific Development, CNPq [425340/2016-3]
  2. Coordenacao de Aperfeiaoamento de Pessoal de Nivel Superior - Brasil (CAPES) [001]
  3. Sao Paulo Research Foundation, FAPESP [2013/08293-7, 2015/11937-9, 2017/12646-3, 2019/27331-3]

Ask authors/readers for more resources

The rearrangement distance is a method for comparing genomes of different species by determining the number of rearrangement events needed to transform one genome into another. This study explores the effects of transposition and reversal events on genome representation, considering gene repetition and intergenic regions. Practical experiments on simulated genomes show that using partitions can improve distance estimates.
The rearrangement distance is a method to compare genomes of different species. Such distance is the number of rearrangement events necessary to transform one genome into another. Two commonly studied events are the transposition, which exchanges two consecutive blocks of the genome, and the reversal, which reverts a block of the genome. When dealing with such problems, seminal works represented genomes as sequences of genes without repetition. More realistic models started to consider gene repetition or the presence of intergenic regions, sequences of nucleotides between genes and in the extremities of the genome. This work explores the transposition and reversal events applied in a genome representation considering both gene repetition and intergenic regions. We define two problems called Minimum Common Intergenic String Partition and Reverse Minimum Common Intergenic String Partition. Using a relation with these two problems, we show a Theta(k)-approximation for the Intergenic Transposition Distance, the Intergenic Reversal Distance, and the Intergenic Reversal and Transposition Distance problems, where k is the maximum number of copies of a gene in the genomes. Our practical experiments on simulated genomes show that the use of partitions improves the estimates for the distances.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.0
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Interdisciplinary Applications

Approximation algorithms for sorting by k-cuts on signed permutations

Andre Rodrigues Oliveira, Alexsandro Oliveira Alexandrino, Geraldine Jean, Guillaume Fertin, Ulisses Dias, Zanoni Dias

Summary: Genome Rearrangements is a classic problem in Computational Biology, with different models and rearrangement methods. A new problem called Sorting by Multi-Cut Rearrangements uses k-cut rearrangement to generate new permutations or strings. This paper extends previous work on unsigned permutations and strings to signed permutations, showing complexity and approximation algorithms for different values of k.

JOURNAL OF COMBINATORIAL OPTIMIZATION (2023)

Article Biochemical Research Methods

Reversal and Indel Distance With Intergenic Region Information

Alexsandro Oliveira Alexandrino, Klairton Lima Brito, Andre Rodrigues Oliveira, Ulisses Dias, Zanoni Dias

Summary: Recent research has found that incorporating intergenic region information in models along with gene order can provide more accurate estimations for genome rearrangement distance compared to using gene order alone. The reversal distance, a key problem in genome rearrangements, has polynomial time algorithms when only gene order is considered, but becomes NP-hard when intergenic regions are included. In this study, we present a 2.5-approximation algorithm using a labeled intergenic breakpoint graph, and the algorithm's practical approximation factor is found to be significantly less than 2.5 through experimental analysis with simulated data. Additionally, the algorithm is applied to real genomes to construct a phylogenetic tree.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2023)

Article Biochemical Research Methods

Genome Rearrangement Distance With a Flexible Intergenic Regions Aspect

Klairton Lima Brito, Alexsandro Oliveira Alexandrino, Andre Rodrigues Oliveira, Ulisses Dias, Zanoni Dias

Summary: Most existing mathematical models for genome rearrangement problems consider only gene order, but recent research has started to incorporate more information such as intergenic region sizes. This study investigates a new variation where the target intergenic region sizes have a range of acceptable values, allowing flexibility in transforming gene order while considering variations in intergenic regions. The authors present approximation algorithms and an NP-hardness proof for the problem, relying on the Flexible Weighted Cycle Graph.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2023)

Article Biochemical Research Methods

Rearrangement distance with reversals, indels, and moves in intergenic regions on signed and unsigned permutations

Klairton Lima Brito, Andre Rodrigues Oliveira, Alexsandro Oliveira Alexandrino, Ulisses Dias, Zanoni Dias

Summary: This article investigates the solution methods for the gene sorting problem in genome rearrangement field, including cases with known and unknown gene orientations, and considering the non-coding regions between genes. The research results show that the problem is NP-hard regardless of whether gene orientations are known. For the case with known gene orientations, we propose a 2-approximation algorithm, and for the case with unknown gene orientations, we propose a 4-approximation algorithm.

JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (2023)

Article Computer Science, Interdisciplinary Applications

Signed rearrangement distances considering repeated genes, intergenic regions, and indels

Gabriel Siqueira, Alexsandro Oliveira Alexandrino, Zanoni Dias

Summary: Genome rearrangement distance problems estimate the evolutionary distance between genomes. This study introduces a new model considering intergenic regions and multiple copies of genes. It proposes a series of problems and approximation algorithms, and demonstrates their effectiveness through experimental tests.

JOURNAL OF COMBINATORIAL OPTIMIZATION (2023)

Article Biochemical Research Methods

Reversal and Transposition Distance on Unbalanced Genomes Using Intergenic Information

Alexsandro Oliveira Alexandrino, Andre Rodrigues Oliveira, Geraldine Jean, Guillaume Fertin, Ulisses Dias, Zanoni Dias

Summary: The most common method to calculate the rearrangement distance between genomes is based on the size of a minimum length sequence of rearrangements. Advances in genome rearrangements have led to the consideration of unbalanced genomes and additional genomic characteristics. This study focuses on the Reversal, Transposition, and Indel Distance using intergenic information, including indels in the rearrangement model. An improved 4-approximation algorithm is presented for transpositions and indels on unbalanced genomes, which can also handle gene orientation. Experimental evaluation is conducted using simulated data.

JOURNAL OF COMPUTATIONAL BIOLOGY (2023)

Proceedings Paper Computer Science, Artificial Intelligence

Prediction of Protein Molecular Functions Using Transformers

Felipe Lopes de Mello, Gabriel Bianchin de Oliveira, Helio Pedrini, Zanoni Dias

Summary: By the end of 2021, over 200 million proteins with unknown molecular functions were evaluated using Transformer architectures. Machine learning was applied to predict the functions, and the proposed model achieved superior results compared to existing literature.

ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2022, PT II (2023)

No Data Available