Problem
A matrix is a rectangular table of values divided into rows and columns. An
Say that we have a collection of DNA strings, all having the same length
A consensus string
A T C C A G C T | |
G G G C A A C T | |
A T G G A T C T | |
DNA Strings | A A G C A A C C |
T T G G A A C T | |
A T G C C A T T | |
A T G G C A C T | |
A 5 1 0 0 5 5 0 0 | |
Profile | C 0 0 1 4 2 0 6 1 |
G 1 1 6 3 0 1 0 0 | |
T 1 5 0 0 0 1 1 6 | |
Consensus | A T G C A A C T |
Given: A collection of at most 10 DNA strings of equal length (at most 1 kbp) in FASTA format.
Return: A consensus string and profile matrix for the collection. (If several possible consensus strings exist, then you may return any one of them.)
f= open("D:/gs/rosalind/rosalind_cons1.txt","r")
lines = f.readlines()
seq = []
arr1 =[]
for i in range(len(lines)):
if i%2==1:
seq.append(lines[i])
for i in range(len(seq)):
arr2=[]
for j in range(len(seq[i])):
if seq[i][j] == '\n':
continue
else:
arr2.append(seq[i][j])
arr1.append(arr2)
A = []
C = []
G = []
T = []
Max = []
for i in range(len(arr1[i])):
a = 0
c = 0
g = 0
t = 0
temp = []
for j in range(len(arr1)):
if arr1[j][i]=="A":
a+=1
elif arr1[j][i]=="T":
t+=1
elif arr1[j][i]=="G":
g+=1
else:
c+=1
A.append(a)
C.append(c)
G.append(g)
T.append(t)
temp.append(a)
temp.append(c)
temp.append(g)
temp.append(t)
Max.append(temp.index(max(temp)))
count = 0
for ma in Max:
if ma == 0:
Max[count] = "A"
count+=1
elif ma == 1:
Max[count] = "C"
count += 1
elif ma == 2:
Max[count] = "G"
count += 1
else:
Max[count] = "T"
count += 1
print("".join(Max))
A = [str (i) for i in A]
C = [str (i) for i in C]
G = [str (i) for i in G]
T = [str (i) for i in T]
print("A: "+" ".join(A))
print("C: "+" ".join(C))
print("G: "+" ".join(G))
print("T: "+" ".join(T))
Finding a Most Likely Common Ancestor
In “Counting Point Mutations”, we calculated the minimum number of symbol mismatches between two strings of equal length to model the problem of finding the minimum number of point mutations occurring on the evolutionary path between two homologous strands of DNA. If we instead have several homologous strands that we wish to analyze simultaneously, then the natural problem is to find an average-case strand to represent the most likely common ancestor of the given strands.
'Python > rosaland' 카테고리의 다른 글
Overlap Graphs (0) | 2018.12.07 |
---|---|
Mortal Fibonacci Rabbits (0) | 2018.12.05 |
Combing Through the Haystack (0) | 2018.11.30 |
Translating RNA into Protein (0) | 2018.11.30 |
Mendel's First Law (0) | 2018.11.26 |