Problem
The GC-content of a DNA string is given by the percentage of symbols in the string that are 'C' or 'G'. For example, the GC-content of "AGCTATAG" is 37.5%. Note that the reverse complement of any DNA string has the same GC-content.
DNA strings must be labeled when they are consolidated into a database. A commonly used method of string labeling is called FASTA format. In this format, the string is introduced by a line that begins with '>', followed by some labeling information. Subsequent lines contain the string itself; the first line to begin with '>' indicates the label of the next string.
In Rosalind's implementation, a string in FASTA format will be labeled by the ID "Rosalind_xxxx", where "xxxx" denotes a four-digit code between 0000 and 9999.
Given: At most 10 DNA strings in FASTA format (of length at most 1 kbp each).
Return: The ID of the string having the highest GC-content, followed by the GC-content of that string. Rosalind allows for a default error of 0.001 in all decimal answers unless otherwise stated; please see the note on absolute error below.
Code
import re
f=open("D:/gs/rosalind/rosalind_gc.txt","r")
lines = f.readlines()
name=[]
score=[]
seq=[]
temp=[]
lengthlines = len(lines)-1
for line in lines:
if re.match(">",line):
name.append(line.replace(">",""))
if temp :
seq.append("".join(temp))
temp=[]
else:
if lines[lengthlines]==line:
temp.append(line)
seq.append("".join(temp))
else :
temp.append(line)
for i in range(0,len(seq)):
length = len(seq[i]) - seq[i].count("\n")
score.append((seq[i].count("C")+seq[i].count("G"))/length*100)
maxvalue = score.index(max(score))
print(name[maxvalue])
print(round(score[maxvalue],6))
f.close()
DataSet
>Rosalind_1856
TTGAGTTCGGCGCTAACACGTGCAGAGCGCTCCGAAGGCCAAGGGTTAGGCACAGATTAG
CCGTGAAGTGCTGTATGGACGCAACGGTTGTCGATTTCACAGATTCGAATCTGACTGTAC
TACGTTAGATCAGCCCCAGAACTAATTCGAGTTTTATGCTCTAAAATCTTGCGGCTAGTG
ACTAAGTGCGATGACGGGCCTAAGGACGAGCCATCGCGATCATAAGTTACTGCTCGTCAC
TTTATGCGGGACAAGAGCCCCCTACAGTTACCGGGAAATCACCTATCAGCCTCTCTAATG
GGAAGCGCCAATGGAACGGACCAAGCGCCATTGATAACGATTGGCGCTGATTACTATATA
CCGACTCAGCCTCCCAATCTTTCCTCCGTGCAAGGTAGGCAAGGCGTTTATGAAAAGGGC
TGGCGCATGCTCAGTTAATGTGTGTGTTTATGGCTCAACAGAAACATCACCCGGACCAGA
ACTACTGCAAGGACAGATATATTACACTGCGAATCCAAGGCCTTGTGGCGACCTTTTCCT
ACCCTCAAGACCAGCATAGATTCGGCTGGGTTGGAGCTCGCTTTAGAATGGTTACAGTAG
TGTTACTACCAAAGGATTCCGAGATGTGTACCTCGAAGAGAGATTCTTTCTGATGCAAGT
GTTGTCGTATAGAAGGTCGAAATCTTATGATTTAGGCCACGTGTGTTAGGGCTCCACACC
CCCGATTGTCTACATACGGACTAGGTGGATATGCCGACTTAGTGTGTAATGGGGTCCGCG
TCCCTAACCTTTTTTTCCACCATTTTTCCTTTCGGTCTGTACTTCCTCCAATCAGCGTTG
TTGCAATACTATACTGCAAGATAAAGTAACGCGACCCTGGACTGCAATCACGCCTCTGGC
CAA
>Rosalind_4787
ACACCAAGAGGGAGCGTATTGTGACGTCGTCCCGGGATATCTGTTGGTAGCGCATTATCG
ATTACTCTCATCGACTAGGCAGTAAGGTCCACCGACGACTATGGAATTCGAGCTCTGCTA
TCAGGTAGGCGATACTCGCTTGGTTTAATCGAAGATTTGGCGACCTAGACAAAGGTAAAC
GCTCAGGGGGGCCTACTCGGCCTTATTCCTGACTACGTCTCAGGTGATTTCGAACGCACG
AAAGGGGGAAGTTCTGTATTTCGGTTGATTTTGCGTACACTGTGGAATTAACATACGAGT
TCCCCCTTTTCATCACTCTATCTTTACTTGCATAGGCCCGTTCATCTCACACCTCACAGT
GAACTGTGTTGGATTGGCGTCTCGTCGCTAACCTTGGATATCAGGGGGGCATGGATTCGG
AAGAAGGCGAGCGTCGCGAGGATGTTGTTCAGAGGCTACAAGTATATCGCACTCGCGTCA
CTCGCGGCTGATGAGAGTAATACCACAAAACTTATCCGTCCTAAAGGGACGAGAGGCGGC
ACTCTCTACAATCGAATACACGAGATGTTATTTCGAAAATTATTATTTTCTTATGTCGCA
TATAAGTCCCACCAGCAGACGCGCGTGAAGATAAGCGCCGTAAAGCCTCCCTGGTGATCC
GCTCGCTGCTGTAATAAACAGTTCAGCCCCATCTCACGTCTCGACGCTTTACCTGGGTGA
GGTCGCGAGTTTCCTAGTAGTGGCCCGGCTTCGATCATGTATGTTAATAACGATGTAGTG
CAATGGGTACTGTAACTAGGACTGGGCAGGCTTGGATCAGCCTGTATACGGATGTTTTCC
TCGGCACGCCCAGAAGATACGTATTAAGTTTCCACGGTCACAGTGCCCCCTTGAGACGTG
TCTTTGTTTCTTTGTGGTATGGAGGTGACAAGTGTCCACCGCAAGATATGTGGAT
>Rosalind_0276
CTTAAGTATCAGACAGAACCGTTTGTGGGGTATCCAGGTTATGGATCCGACACCTTAGCT
AGACTGACTCATAACGAGCGTTATCTTCTTTCGGCTTCGACTCAACTTCGAGGGAATTGA
TCCAACTAAGTTACGGGGCTAGCGCGCACTGCATTATCGTTAGTCCCAAGCGCATATATG
TTACACTGTTCTAGTAATCGGTCAACACTAGAGATGTCAATCATGCCCTTCCGTGAACGC
GTCGAAATTAAGCGTACCTTAATTCCTATCGCATTACGCTATGCAGAGCCCGTTTAGCAG
TACGCGAAGTCCCGTACCCAGTAGAGGATTTGTGTGATGGAGCCACAGCTGGGTACCAGG
GGATTAGAGTATGGATGCTTATCTATATGAGCACTGTCAAAATTGTCGCCGCCGGCCTGG
GTCGCTACCGTTATGGCAGTCGATGTAATTGTATGCGCAAGCTCACGCCTCGCAGAGTAA
TTAAGCGGTTGGGATTCGGCTGACCTCTTTGGTAGTCATCGAGATGTGTCCCCTCAATGG
TTAATTGATGCACGGACTTTTACATCTGTCCCGTCGACTCTTTGACAGTTTCACTTGTGC
TACAAGTTGCTTCTAAATATGCGCGAATGTGCTAATACATGTAGTAGATACGCCTCGTAG
TATTGCTTGAAACATGAACTAGTGGCCCCCAACTACCATAGGCGCAACTCATTGTGCACA
CCGGTAGCAGTATAACAACATCGTGCAGCGAGTTCCGTAGCCGTCCTGTCTCTTAGAGCG
GAAAAACATAGCTCGCTATTTTTATGTTGTGTCGGTTATACCGCCGGTTGCGTAGAGATC
TGTTGCCAGTGAAATAAGCCAAGTATTTGGTAGTTCTATTGGGACGGTGGGCTTCGGACT
CGTAGT
>Rosalind_0058
GCTGCATGACTTGTTACCTTACAAGCCAGCGGTTTCATGCATGTTACAGCGGCGGAGTAT
ACTTTGTAGTTGGAGTCGTACTGGAGAAAGCTAATATACTAGTATGCCATACCGGCCGTC
TTACTCCGTTGTCGGAGAGGCCACCTGTATCAGGGTATAGAAACAGCACACTGCTAGCTT
CGACTTGTTGTGGGGTCGGTAGCCTAGGTGCGTCTTGATTTCCGGCGCTTGAACGGATAT
ACACCGTTCACTCCGTCCAGGTTCCTCGCATGCCCGGTCGTCCCCACGTACTCTATACAC
CGATCACTCCTGATTGTAAAGCGTAGGAGGACATATAGCAACTTACCGGCCCATGGATTG
ATTGCCCGAAGTCGTGTCCGTGCCGGTAGTGTCAGGTTTCCGCGATGGACAGCTTCCACT
ACTTGAATAGGCCGACACTGGGTCACTCGCCCTTTTGGCCCGTCGTGGAGACATTAGCTT
GCGATACAATTCCGCAAGCGCGCATCGCTCTACAGCGCAGAACAAAGAGAAGACGCTGAG
GCGTCGGAAACACCCCTAAGATATTGGACACAGCGGACACTAGGTTAAAGACCCTGTTGT
TCCCACATCGAACATGCCTAGATGCCTGGTGGTTAAGGATCTTCAGGCTGCCTCTAGAAC
TCGCTCACAAAAAGTAGGTGGACGACTCCTATTGTCGTCCAAGTAGAACCAAGCCAACGC
TTGATTTCGGTGTTTCCGAAACTGAGTATAGTTAAACCATTCCCGAGCTCACAGCTGGAT
AGCCCGCCTCGCTGGCTCCTAAGTCCAACACCTAATTAAGACTCTATGGTCACGTGGGAT
CACGGTACGGGAGTTTTGTAGAATACGGCTGCCACGTCCTGTTGTCCGATAGCATTACTC
GGCCTCCTTAGGAGGGCTTTGCGAGGATGTTTGGTCAAACCAGGTTCCGCAGCGGACATT
ACCAACTTCGCGGTCCGCTAACTTCGGAACT
>Rosalind_9108
AACATTGCATACGGCGCAGGGTTTGTTAGGAGAATAGCCCTCTGAATTTTGCACCTGTAG
TTGGCATTCATACCGTTCACGGCTTCTTACATACTTTCGCCTCGACCATGCGAATCGACC
ACGCGAGGCCGTGCATGGTATCAAACGTGACGAGACGGATCACCGAGCTGCCAGCGCATT
AACGTCTGGTGTGACTTTACTTTAGTCTTTGTAGAGCCAACAGATTCTGTAATGGATGCG
AATCGGTGACCGGTGACACCGCTGTAAACGGTCTCTCTATACGATGGTAGAGCCAAGCGT
TGACGGATGTAAGTACCTAACGGTTAAGGAGCGCAGGGCTATACGCGCTAGGCGGCATGT
TTTGACGCGCCTGAGATATGCCTGATACGCGACCCTCTTAATAAGTAAATGACATATGTC
CATTGCCAGTAGTCTGAAAGGACAGAACCTGACGTGAGCCAACAGACATTACTACTTAGA
GTCTGCCGGTACTTGCATATGTCTTCAAGGTACGGACATCATTCCTGATGATCTGAACGA
AGACATTCAGGGCAACAAAAAATTTTGCCCACCTTGTGACACAACTAACATGTAGCCCTC
ACCATAGGACTGGAGGATATAAGGCCTTGATCTTCTTTCGTGTGGTCTCAGTTAGGGATG
GGGCTACGGGGTGCAACACCGTGTCACATTGATTAGCTGTTCGACTTAAGCTTCATTTTG
GACGCTAATTTCTTCTGCGCTGAAAAGTACGACATGTATGGATACTAACGTCACTAACCT
TAGCCTATAGGCGATGTAAGCATTCAGAAACGGGCGTAAGCGCGACTGACTTGGGACTAG
TGAATTCCCCATGAGCAATTACAGTTTCATGATATGACCAGTGACCCCTCTACGGTAGCG
TTTGCAAAACTTATTTCTGGATTACTCCATCGGGCTACACGAAAGCTGTGCGACATTTTA
TCTCCATGTCTTGATGCTGTGAGGTGAGCTAACCG
>Rosalind_8703
TCTAAGTATCCGAAAACAGACTGATCACGGGAGGGGCCACAGTTACACAAGTCATGAGAT
TAGAAGCAAAAACCCGTGGTGCGGGTCATTACGAATGGTTGACCCAATGTTCTATCCAGG
CCGTGAGATACGACCCATCCAGCCCTACGAGTACGCGGACGGAGTCCAGGGCTGGGTGCC
AGACGCAGTTACCGATGATTCTGTAGCTCCAACCTCTGCCTGTTCGCCCATTCCAGTACA
TCCACACGCCCGTTAAGATGTAATTCGAGTCCCGGACTAAAAAGTTGCGCAAGCCTTGGA
GTGCCGATGTCGAGGTGCCCGATCTTCAACCCCCCGGCTTTGGACTCATTACGGGGTCCA
CGTGTGAACGGAAACTTTACTATGGTTTCCTAACCACAAAGCCTAAGAGGAAGTCGGATT
CGGTCTGGGATAGGTCGTAATGCGCCTCTTTGCCGAATGGATGGGACGTACCATACTAGC
AGGGCGTAAATTACCCTTCGTGAGAAGTCGGATGTTCCGCTACTATGTAAATGGACTGGT
TATCGACGTATGACTTTTGCGACTTAGGGGCCGTAGTCCAATCATTGGGCGCAGATGTTG
CCAGGATAGTGTGTTTGACCCCGCCGATCATCGTGCTGGGCGCTATGGGGCGTCTCAACA
TCTAACCGTTCGGAACAGACCTGACCGCACCGGTTTCGAGTTACGCGGGGTCAGGGGAAC
CCCTTCAAGCTCCTTCCCTTATCTCGTACTACGTATAGTATGTAGTGGCTGTCCACCTTG
AAAAATAGTAAGAGCACCGACCTACAGTTGATCGGCACTGTCTCCTAACTTG
>Rosalind_1385
GGCACCATCCACGGCGTAGCGCGGGTGACTTGCTCCAATGTGTCCCATCGTGGTTGATGG
GACTGCGTGCCTAGCGCGCTAATGGGCTTCAGGAGGGAATCGACACCCTGCCGGCCCGGC
AAGACAGACGATGCCCACGAAGTAGGGTAACGCGACCAAGGGCAGTAACGGACGGGTCGT
AGCCAGGAAGGTCTAACGGGAAACCGTGTCAGGCATTGCATCAGTATGCCGCAAGAGATA
CCTCGAACTGCTCGAGACATCAATCAACCGCGCCCGCTGAGATAGGCCTGCAGCACCCTT
TAGTTCTGCTTATGCGATTAATTGCGCTTATGAGTCCCCGGGGCCGAGTGCCGGTTTCCA
ACCTAGTCATTCAGTTGCCGGCGAGGTCACCTCTGACTTTTAACGTATCATACTCACAGG
GCGGGCACAGTCTGCGACGACCTATGCGCCAGGCTCTAAGACTTCCAGTGACTCCGTTGG
TTCTTCAGCTTTCATATTTCAGTGAAGTTGTAACCCTCTGTTATAGTATGGGAGGCCCTG
TACACCAGTGCTATTCCGGTATTAGAATGCTAGACTAGTCACTTCATGCGAAGGATCGCA
TACATATACGCCTCCTTAGGCAGTGCAAAAGGCATTTAGTAGCGTACCTAATTCCGAATT
CACACAGGATTGGCACGACAGGCCAGACATACCATTCGTGTATAGGGGGAAGCGTATTGT
TGCAGCCGAAACTGTTACTATCCATGAGCGGAAATGCAATACACTTTAAGTCCTAATACT
TTCTCCTTTCTGTAGGCCGCTACGGGGAAATCTACCATCACGTAAGGGGGACCATCGGAA
AAGTCTTACTGACATCCCGAGTGCTGCCGAATGGGAGTGGTAGGCCATTTTCTTTTAACG
ATTCCGTGTACTGATATGAGAATCACGGACGGATCAACAGTGAAGGAGCAGTACTTCGAC
GAACTCGATTTGGGCCTAGATTGATGAG
>Rosalind_4521
TTCGGAGTACCATGCCGAGCGGACCTTGTAATGCGAACTTGCTAGGATTTGGTGCACTTA
TCCGAAAGAGTTAAGATCGGGGCTAGTGTGACAAGGTTCAGGGCGCAGAGTGCTAATTCC
AGGGAGCCCTTACGATGAAGCGAAGGGTTAAAGTCGTGTCTATTTTTAATTTGGTTATGG
GAGTACTGCGCATTCTTGAAGGTGTCTCGTGTCTTTTCAGATGATGCTCTATTTCAGATG
TCGCGTTCCTATACATCTCGCTACCGTGATTACAGGCGGTTGCCCGTTCTGCGTCGTAAT
CGCCACCATCGTCCTCCAGCCATTAGGGTCTCTCACAAAGTATTTCAAGGCATGCCATAG
AAAAGAGGGCATTTCTAGTGAGCATTCCGAAGTAGTTAATGGCTTGCGTTTCACGTATGG
TATAGTCAAAAGTGGGGACGAGGGTAATATCAGAAGGATGTCTCTGTCCAGGTTGCGGGG
CAGCCATTCGATACGCTGAAAGGGATACGTCATCATGGAAGAGACCAGGGTTTGCGTAAA
CGCTATTTAACCGATGCTGTATAAATTCTACTTGCCAGGGATGATCTGAATAATGCTGGT
TGCGAGCCCTAGGAATACGCTCAGCAGTTACTGTGCTAAGGGCCTGGTGCTGAAAGACTC
GAAAGCTAACGGACCCCCGACCTTCGGTACTCAGGGCAATGGAGGAGAACCCGTGATGTA
AGTTATGGCAAGGCTTTGCCCACTAATCGGTATTACGCATAGGTTAGTTATTTCGTATTT
AGGGCTATGTCCTTTCCTCGGGCGTCAGGCATGGGGTAGTAACGCCCTCGGTGCGAACGT
CCGGAGATTCACTTGAAATGAAGTGAAGACAGGCCCCCTTTTGGCGACGACGAAAACCAA
GGCCAAACATCTGACATTAACACCATTACGTCCAGTATCTCATTCCGGGACTCGAGCCAC
CGCATCATA
>Rosalind_2518
CGCTGGCGTTTTGCCTGCACGCAAGTGGGGTGCCGAGACTAACCGGTGGGGCTGCGAATA
CCCTAGGGGCATCCGGTCACTATTTGCTTCATTGTAGGTCCGGCCCTTTTATGCACGGCC
CACGTTACTCCGTCAGGGTAGGGGACATCCCCTTATGATCCGACGCTATGACGCCAGTCT
AACCGCCTTCCCAGCGCGGGGCCGCGTCAACATATGAAGGCATCGAGATTATCGACTTCG
ACATTGAGACCGACGGGTTTTACTGATTTTGTCACAGGATCCAGCCCTCTTTACCTGCAC
CGTCGACCACCCTGTAATCAATCAAACTTAAGATGGCAGATCCAGGATTTTATATGTCAT
TGAACCCGAGTCAGTACTCCCTCAGTACGGAGCGGTTAGACTATGTAATCGACTGCTTGA
GTACCAAATTGGCTGATCCTAGGATTAACATTTATCAATTAAATGGTCTAATCCATCGTT
CGAGCAGAAGCCTGCAGGGGTACTTAACATGTAAAATGCGGCGAGGGGTTAGAAGATTCT
TAAATTCGCTGTTGCTGACTCCGGGCGAGTTTCTTGAAATGGCTCTGGGGTCCGGTGGAT
CACGGGCTTACTACGCGGCGGCTGGGCAAGCATATAGTATCAAACACTCGAAATTTCAGG
CACCATCGAACGTGCACCTCCGAATGGCAGTTGTTCTCTGCGCAGCGCACCCGCTGGGCC
GACCTGCCATCCTCCATCGAGACCCGAATCCACATCGGAAGGGTCGTGACGCGTGACTCG
CTCTTCAGGCAGAATATTCATGCGGCTTTCTTACGGATTGATCGATCGACAGCTCGAAGC
AACCCGTCCACGGTCCTGACTGGAACACGCATCAGTTGACATGATGGGAGGATGTCAGCA
ATATAG
'Python > rosaland' 카테고리의 다른 글
Mendel's First Law (0) | 2018.11.26 |
---|---|
Counting Point Mutations (0) | 2018.11.21 |
Rabbits and Recurrence Relations (0) | 2018.11.16 |
Transcribing DNA into RNA (0) | 2018.11.14 |
Complementing a Strand of DNA (0) | 2018.11.14 |