开发者

specific pattern search from given fasta file of genome sequence

开发者 https://www.devze.com 2023-04-11 01:48 出处:网络
i\'m trying to find pattern search with specific constraints : input pattern to be searched will be entered from std. input

i'm trying to find pattern search with specific constraints :

input pattern to be searched will be entered from std. input

R,W and Y will be anywhere in the pattern which should be replaced with below values before searching and so every combination possible for the pattern are to be searched in a sequence.

R = C or G

W = A or T

Y = A or G

[e.g, pattern may be AWTYCR and sequence may be ATCGATGAT....]

we have to find starting positions

Also one misma开发者_如何学编程tch is allowed.

i.e., one character of the pattern may or maynot match while matching with the sequence.

output:

should be written into a file in a table format:

for example,

finding ATCR (R = C or G as above mentioned) in sequence ATCGAT will result in

S.no position the_matched_pattern

1 1 ATCG

2 1 ATCC

the match and mismatch may come in any position (i.e, in same position or in different position)


The following simple code is a starting point. It can be easily generalized to read the patterns from the file, read the sequence from the Fasta file; unique the results, etc.

sub trans_pat {
   my $pat=shift;
   $pat=~s/R/\[CG\]/g;
   $pat=~s/W/\[AT\]/g;
   $pat=~s/Y/\[AG\]/g;
   return $pat;
}
sub find_pat {
   my ($pat,$seq) = (@_);
   print "Looking for pattern $pat\n";
   while ($seq=~m/$pat/g) {
      print "... match at $-[0]: $&\n";
   };
}

my $read_pat="ATCR";
my $seq="ATCGATATCGAT";

# Looking for a perfect match
find_pat (trans_pat($read_pat),$seq);

# Allowing for a single mismatch
foreach $i (1..length $read_pat) {
   my $mis_pat = $read_pat;
   substr($mis_pat,$i-1,1) = ".";
   find_pat (trans_pat($mis_pat),$seq);
}

A sample implementation:

perl -e 'sub trans {$pat=shift;$pat=~s/R/\[CG\]/g;return $pat};$read_pat="ATCR";$seq="ATCGATATCGAT";$pat=trans($read_pat);print "Looking for pattern $pat\n";while ($seq=~m/$pat/g) {print "... match at $-[0]: $&\n"};foreach $i (1..length $read_pat) {$mis_pat = $read_pat;substr($mis_pat,$i-1,1)=".";$pat=trans($mis_pat);print "Looking for pattern $pat\n"; while ($seq=~m/$pat/g) {print "... match at $-[0]: $&\n"}}'

Yields

Looking for pattern ATC[CG]
... match at 0: ATCG
... match at 6: ATCG
Looking for pattern .TC[CG]
... match at 0: ATCG
... match at 6: ATCG
Looking for pattern A.C[CG]
... match at 0: ATCG
... match at 6: ATCG
Looking for pattern AT.[CG]
... match at 0: ATCG
... match at 6: ATCG
Looking for pattern ATC.
... match at 0: ATCG
... match at 6: ATCG
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号