Jump to content

Subsequence

From Wikipedia, the free encyclopedia

Inmathematics,asubsequenceof a givensequenceis a sequence that can be derived from the given sequence by deleting some or no elements without changing the order of the remaining elements. For example, the sequenceis a subsequence ofobtained after removal of elementsandThe relation of one sequence being the subsequence of another is apreorder.

Subsequences can contain consecutive elements which were not consecutive in the original sequence. A subsequence which consists of a consecutive run of elements from the original sequence, such asfromis asubstring.The substring is a refinement of the subsequence.

The list of all subsequences for the word "apple"would be"a","ap","al","ae","app","apl","ape","ale","appl","appe","aple","apple","p","pp","pl","pe","ppl","ppe","ple","pple","l","le","e",""(empty string).

Common subsequence

[edit]

Given two sequencesanda sequenceis said to be acommon subsequenceofandifis a subsequence of bothandFor example, if thenis said to be a common subsequence ofand

This wouldnotbe thelongest common subsequence,sinceonly has length 3, and the common subsequencehas length 4. The longest common subsequence ofandis

Applications

[edit]

Subsequences have applications tocomputer science,[1]especially in the discipline ofbioinformatics,where computers are used to compare, analyze, and storeDNA,RNA,andproteinsequences.

Take two sequences of DNA containing 37 elements, say:

SEQ1= ACGGTGTCGTGCTATGCTGATGCTGACTTATATGCTA
SEQ2= CGTTCGGCTATCGTACGTTCTATTCTATGATTTCTAA

The longest common subsequence of sequences 1 and 2 is:

LCS(SEQ1,SEQ2)=CGTTCGGCTATGCTTCTACTTATTCTA

This can be illustrated by highlighting the 27 elements of the longest common subsequence into the initial sequences:

SEQ1= ACGGTGTCGTGCTATGCTGATGCTGACTTATATGCTA
SEQ2=CGTTCGGCTATCGTACGTTCTATTCTATGATTTCTAA

Another way to show this is toalignthe two sequences, that is, to position elements of the longest common subsequence in a same column (indicated by the vertical bar) and to introduce a special character (here, a dash) for padding of arisen empty subsequences:

SEQ1= ACGGTGTCGTGCTAT-G--C-TGATGCTGA--CT-T-ATATG-CTA-
| || ||| ||||| | | | | || | || | || | |||
SEQ2= -C-GT-TCG-GCTATCGTACGT--T-CT-ATTCTATGAT-T-TCTAA

Subsequences are used to determine how similar the two strands of DNA are, using the DNA bases:adenine,guanine,cytosineandthymine.

Theorems

[edit]
  • Every infinite sequence ofreal numbershas an infinitemonotonesubsequence (This is a lemma used in theproof of the Bolzano–Weierstrass theorem).
  • Every infinitebounded sequenceinhas aconvergentsubsequence (This is theBolzano–Weierstrass theorem).
  • For allintegersandevery finite sequence of length at leastcontains a monotonically increasing subsequence of lengthora monotonically decreasing subsequence of length(This is theErdős–Szekeres theorem).
  • A metric spaceis compact if every sequence inhas a convergent subsequence whose limit is in.

See also

[edit]

Notes

[edit]
  1. ^In computer science,stringis often used as a synonym forsequence,but it is important to note thatsubstringandsubsequenceare not synonyms. Substrings areconsecutiveparts of a string, while subsequences need not be. This means that a substring of a string is always a subsequence of the string, but a subsequence of a string is not always a substring of the string, see:Gusfield, Dan (1999) [1997].Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology.USA: Cambridge University Press. p. 4.ISBN0-521-58519-8.

This article incorporates material from subsequence onPlanetMath,which is licensed under theCreative Commons Attribution/Share-Alike License.