Problem

The GC-content of a DNA string is given by the percentage of symbols in the string that are 'C' or 'G'. For example, the GC-content of "AGCTATAG" is 37.5%. Note that the reverse complement of any DNA string has the same GC-content.

DNA strings must be labeled when they are consolidated into a database. A commonly used method of string labeling is called FASTA format. In this format, the string is introduced by a line that begins with '>', followed by some labeling information. Subsequent lines contain the string itself; the first line to begin with '>' indicates the label of the next string.

In Rosalind's implementation, a string in FASTA format will be labeled by the ID "Rosalind_xxxx", where "xxxx" denotes a four-digit code between 0000 and 9999.

Given: At most 10 DNA strings in FASTA format (of length at most 1 kbp each).

Return: The ID of the string having the highest GC-content, followed by the GC-content of that string. Rosalind allows for a default error of 0.001 in all decimal answers unless otherwise stated; please see the note on absolute error below.


Code

import re

f=open("D:/gs/rosalind/rosalind_gc.txt","r")


lines = f.readlines()
name=[]
score=[]
seq=[]
temp=[]
lengthlines = len(lines)-1

for line in lines:
if re.match(">",line):
name.append(line.replace(">",""))
if temp :
seq.append("".join(temp))
temp=[]

else:
if lines[lengthlines]==line:
temp.append(line)
seq.append("".join(temp))
else :
temp.append(line)

for i in range(0,len(seq)):
length = len(seq[i]) - seq[i].count("\n")
score.append((seq[i].count("C")+seq[i].count("G"))/length*100)


maxvalue = score.index(max(score))
print(name[maxvalue])
print(round(score[maxvalue],6))
f.close()





DataSet


>Rosalind_1856

TTGAGTTCGGCGCTAACACGTGCAGAGCGCTCCGAAGGCCAAGGGTTAGGCACAGATTAG

CCGTGAAGTGCTGTATGGACGCAACGGTTGTCGATTTCACAGATTCGAATCTGACTGTAC

TACGTTAGATCAGCCCCAGAACTAATTCGAGTTTTATGCTCTAAAATCTTGCGGCTAGTG

ACTAAGTGCGATGACGGGCCTAAGGACGAGCCATCGCGATCATAAGTTACTGCTCGTCAC

TTTATGCGGGACAAGAGCCCCCTACAGTTACCGGGAAATCACCTATCAGCCTCTCTAATG

GGAAGCGCCAATGGAACGGACCAAGCGCCATTGATAACGATTGGCGCTGATTACTATATA

CCGACTCAGCCTCCCAATCTTTCCTCCGTGCAAGGTAGGCAAGGCGTTTATGAAAAGGGC

TGGCGCATGCTCAGTTAATGTGTGTGTTTATGGCTCAACAGAAACATCACCCGGACCAGA

ACTACTGCAAGGACAGATATATTACACTGCGAATCCAAGGCCTTGTGGCGACCTTTTCCT

ACCCTCAAGACCAGCATAGATTCGGCTGGGTTGGAGCTCGCTTTAGAATGGTTACAGTAG

TGTTACTACCAAAGGATTCCGAGATGTGTACCTCGAAGAGAGATTCTTTCTGATGCAAGT

GTTGTCGTATAGAAGGTCGAAATCTTATGATTTAGGCCACGTGTGTTAGGGCTCCACACC

CCCGATTGTCTACATACGGACTAGGTGGATATGCCGACTTAGTGTGTAATGGGGTCCGCG

TCCCTAACCTTTTTTTCCACCATTTTTCCTTTCGGTCTGTACTTCCTCCAATCAGCGTTG

TTGCAATACTATACTGCAAGATAAAGTAACGCGACCCTGGACTGCAATCACGCCTCTGGC

CAA

>Rosalind_4787

ACACCAAGAGGGAGCGTATTGTGACGTCGTCCCGGGATATCTGTTGGTAGCGCATTATCG

ATTACTCTCATCGACTAGGCAGTAAGGTCCACCGACGACTATGGAATTCGAGCTCTGCTA

TCAGGTAGGCGATACTCGCTTGGTTTAATCGAAGATTTGGCGACCTAGACAAAGGTAAAC

GCTCAGGGGGGCCTACTCGGCCTTATTCCTGACTACGTCTCAGGTGATTTCGAACGCACG

AAAGGGGGAAGTTCTGTATTTCGGTTGATTTTGCGTACACTGTGGAATTAACATACGAGT

TCCCCCTTTTCATCACTCTATCTTTACTTGCATAGGCCCGTTCATCTCACACCTCACAGT

GAACTGTGTTGGATTGGCGTCTCGTCGCTAACCTTGGATATCAGGGGGGCATGGATTCGG

AAGAAGGCGAGCGTCGCGAGGATGTTGTTCAGAGGCTACAAGTATATCGCACTCGCGTCA

CTCGCGGCTGATGAGAGTAATACCACAAAACTTATCCGTCCTAAAGGGACGAGAGGCGGC

ACTCTCTACAATCGAATACACGAGATGTTATTTCGAAAATTATTATTTTCTTATGTCGCA

TATAAGTCCCACCAGCAGACGCGCGTGAAGATAAGCGCCGTAAAGCCTCCCTGGTGATCC

GCTCGCTGCTGTAATAAACAGTTCAGCCCCATCTCACGTCTCGACGCTTTACCTGGGTGA

GGTCGCGAGTTTCCTAGTAGTGGCCCGGCTTCGATCATGTATGTTAATAACGATGTAGTG

CAATGGGTACTGTAACTAGGACTGGGCAGGCTTGGATCAGCCTGTATACGGATGTTTTCC

TCGGCACGCCCAGAAGATACGTATTAAGTTTCCACGGTCACAGTGCCCCCTTGAGACGTG

TCTTTGTTTCTTTGTGGTATGGAGGTGACAAGTGTCCACCGCAAGATATGTGGAT

>Rosalind_0276

CTTAAGTATCAGACAGAACCGTTTGTGGGGTATCCAGGTTATGGATCCGACACCTTAGCT

AGACTGACTCATAACGAGCGTTATCTTCTTTCGGCTTCGACTCAACTTCGAGGGAATTGA

TCCAACTAAGTTACGGGGCTAGCGCGCACTGCATTATCGTTAGTCCCAAGCGCATATATG

TTACACTGTTCTAGTAATCGGTCAACACTAGAGATGTCAATCATGCCCTTCCGTGAACGC

GTCGAAATTAAGCGTACCTTAATTCCTATCGCATTACGCTATGCAGAGCCCGTTTAGCAG

TACGCGAAGTCCCGTACCCAGTAGAGGATTTGTGTGATGGAGCCACAGCTGGGTACCAGG

GGATTAGAGTATGGATGCTTATCTATATGAGCACTGTCAAAATTGTCGCCGCCGGCCTGG

GTCGCTACCGTTATGGCAGTCGATGTAATTGTATGCGCAAGCTCACGCCTCGCAGAGTAA

TTAAGCGGTTGGGATTCGGCTGACCTCTTTGGTAGTCATCGAGATGTGTCCCCTCAATGG

TTAATTGATGCACGGACTTTTACATCTGTCCCGTCGACTCTTTGACAGTTTCACTTGTGC

TACAAGTTGCTTCTAAATATGCGCGAATGTGCTAATACATGTAGTAGATACGCCTCGTAG

TATTGCTTGAAACATGAACTAGTGGCCCCCAACTACCATAGGCGCAACTCATTGTGCACA

CCGGTAGCAGTATAACAACATCGTGCAGCGAGTTCCGTAGCCGTCCTGTCTCTTAGAGCG

GAAAAACATAGCTCGCTATTTTTATGTTGTGTCGGTTATACCGCCGGTTGCGTAGAGATC

TGTTGCCAGTGAAATAAGCCAAGTATTTGGTAGTTCTATTGGGACGGTGGGCTTCGGACT

CGTAGT

>Rosalind_0058

GCTGCATGACTTGTTACCTTACAAGCCAGCGGTTTCATGCATGTTACAGCGGCGGAGTAT

ACTTTGTAGTTGGAGTCGTACTGGAGAAAGCTAATATACTAGTATGCCATACCGGCCGTC

TTACTCCGTTGTCGGAGAGGCCACCTGTATCAGGGTATAGAAACAGCACACTGCTAGCTT

CGACTTGTTGTGGGGTCGGTAGCCTAGGTGCGTCTTGATTTCCGGCGCTTGAACGGATAT

ACACCGTTCACTCCGTCCAGGTTCCTCGCATGCCCGGTCGTCCCCACGTACTCTATACAC

CGATCACTCCTGATTGTAAAGCGTAGGAGGACATATAGCAACTTACCGGCCCATGGATTG

ATTGCCCGAAGTCGTGTCCGTGCCGGTAGTGTCAGGTTTCCGCGATGGACAGCTTCCACT

ACTTGAATAGGCCGACACTGGGTCACTCGCCCTTTTGGCCCGTCGTGGAGACATTAGCTT

GCGATACAATTCCGCAAGCGCGCATCGCTCTACAGCGCAGAACAAAGAGAAGACGCTGAG

GCGTCGGAAACACCCCTAAGATATTGGACACAGCGGACACTAGGTTAAAGACCCTGTTGT

TCCCACATCGAACATGCCTAGATGCCTGGTGGTTAAGGATCTTCAGGCTGCCTCTAGAAC

TCGCTCACAAAAAGTAGGTGGACGACTCCTATTGTCGTCCAAGTAGAACCAAGCCAACGC

TTGATTTCGGTGTTTCCGAAACTGAGTATAGTTAAACCATTCCCGAGCTCACAGCTGGAT

AGCCCGCCTCGCTGGCTCCTAAGTCCAACACCTAATTAAGACTCTATGGTCACGTGGGAT

CACGGTACGGGAGTTTTGTAGAATACGGCTGCCACGTCCTGTTGTCCGATAGCATTACTC

GGCCTCCTTAGGAGGGCTTTGCGAGGATGTTTGGTCAAACCAGGTTCCGCAGCGGACATT

ACCAACTTCGCGGTCCGCTAACTTCGGAACT

>Rosalind_9108

AACATTGCATACGGCGCAGGGTTTGTTAGGAGAATAGCCCTCTGAATTTTGCACCTGTAG

TTGGCATTCATACCGTTCACGGCTTCTTACATACTTTCGCCTCGACCATGCGAATCGACC

ACGCGAGGCCGTGCATGGTATCAAACGTGACGAGACGGATCACCGAGCTGCCAGCGCATT

AACGTCTGGTGTGACTTTACTTTAGTCTTTGTAGAGCCAACAGATTCTGTAATGGATGCG

AATCGGTGACCGGTGACACCGCTGTAAACGGTCTCTCTATACGATGGTAGAGCCAAGCGT

TGACGGATGTAAGTACCTAACGGTTAAGGAGCGCAGGGCTATACGCGCTAGGCGGCATGT

TTTGACGCGCCTGAGATATGCCTGATACGCGACCCTCTTAATAAGTAAATGACATATGTC

CATTGCCAGTAGTCTGAAAGGACAGAACCTGACGTGAGCCAACAGACATTACTACTTAGA

GTCTGCCGGTACTTGCATATGTCTTCAAGGTACGGACATCATTCCTGATGATCTGAACGA

AGACATTCAGGGCAACAAAAAATTTTGCCCACCTTGTGACACAACTAACATGTAGCCCTC

ACCATAGGACTGGAGGATATAAGGCCTTGATCTTCTTTCGTGTGGTCTCAGTTAGGGATG

GGGCTACGGGGTGCAACACCGTGTCACATTGATTAGCTGTTCGACTTAAGCTTCATTTTG

GACGCTAATTTCTTCTGCGCTGAAAAGTACGACATGTATGGATACTAACGTCACTAACCT

TAGCCTATAGGCGATGTAAGCATTCAGAAACGGGCGTAAGCGCGACTGACTTGGGACTAG

TGAATTCCCCATGAGCAATTACAGTTTCATGATATGACCAGTGACCCCTCTACGGTAGCG

TTTGCAAAACTTATTTCTGGATTACTCCATCGGGCTACACGAAAGCTGTGCGACATTTTA

TCTCCATGTCTTGATGCTGTGAGGTGAGCTAACCG

>Rosalind_8703

TCTAAGTATCCGAAAACAGACTGATCACGGGAGGGGCCACAGTTACACAAGTCATGAGAT

TAGAAGCAAAAACCCGTGGTGCGGGTCATTACGAATGGTTGACCCAATGTTCTATCCAGG

CCGTGAGATACGACCCATCCAGCCCTACGAGTACGCGGACGGAGTCCAGGGCTGGGTGCC

AGACGCAGTTACCGATGATTCTGTAGCTCCAACCTCTGCCTGTTCGCCCATTCCAGTACA

TCCACACGCCCGTTAAGATGTAATTCGAGTCCCGGACTAAAAAGTTGCGCAAGCCTTGGA

GTGCCGATGTCGAGGTGCCCGATCTTCAACCCCCCGGCTTTGGACTCATTACGGGGTCCA

CGTGTGAACGGAAACTTTACTATGGTTTCCTAACCACAAAGCCTAAGAGGAAGTCGGATT

CGGTCTGGGATAGGTCGTAATGCGCCTCTTTGCCGAATGGATGGGACGTACCATACTAGC

AGGGCGTAAATTACCCTTCGTGAGAAGTCGGATGTTCCGCTACTATGTAAATGGACTGGT

TATCGACGTATGACTTTTGCGACTTAGGGGCCGTAGTCCAATCATTGGGCGCAGATGTTG

CCAGGATAGTGTGTTTGACCCCGCCGATCATCGTGCTGGGCGCTATGGGGCGTCTCAACA

TCTAACCGTTCGGAACAGACCTGACCGCACCGGTTTCGAGTTACGCGGGGTCAGGGGAAC

CCCTTCAAGCTCCTTCCCTTATCTCGTACTACGTATAGTATGTAGTGGCTGTCCACCTTG

AAAAATAGTAAGAGCACCGACCTACAGTTGATCGGCACTGTCTCCTAACTTG

>Rosalind_1385

GGCACCATCCACGGCGTAGCGCGGGTGACTTGCTCCAATGTGTCCCATCGTGGTTGATGG

GACTGCGTGCCTAGCGCGCTAATGGGCTTCAGGAGGGAATCGACACCCTGCCGGCCCGGC

AAGACAGACGATGCCCACGAAGTAGGGTAACGCGACCAAGGGCAGTAACGGACGGGTCGT

AGCCAGGAAGGTCTAACGGGAAACCGTGTCAGGCATTGCATCAGTATGCCGCAAGAGATA

CCTCGAACTGCTCGAGACATCAATCAACCGCGCCCGCTGAGATAGGCCTGCAGCACCCTT

TAGTTCTGCTTATGCGATTAATTGCGCTTATGAGTCCCCGGGGCCGAGTGCCGGTTTCCA

ACCTAGTCATTCAGTTGCCGGCGAGGTCACCTCTGACTTTTAACGTATCATACTCACAGG

GCGGGCACAGTCTGCGACGACCTATGCGCCAGGCTCTAAGACTTCCAGTGACTCCGTTGG

TTCTTCAGCTTTCATATTTCAGTGAAGTTGTAACCCTCTGTTATAGTATGGGAGGCCCTG

TACACCAGTGCTATTCCGGTATTAGAATGCTAGACTAGTCACTTCATGCGAAGGATCGCA

TACATATACGCCTCCTTAGGCAGTGCAAAAGGCATTTAGTAGCGTACCTAATTCCGAATT

CACACAGGATTGGCACGACAGGCCAGACATACCATTCGTGTATAGGGGGAAGCGTATTGT

TGCAGCCGAAACTGTTACTATCCATGAGCGGAAATGCAATACACTTTAAGTCCTAATACT

TTCTCCTTTCTGTAGGCCGCTACGGGGAAATCTACCATCACGTAAGGGGGACCATCGGAA

AAGTCTTACTGACATCCCGAGTGCTGCCGAATGGGAGTGGTAGGCCATTTTCTTTTAACG

ATTCCGTGTACTGATATGAGAATCACGGACGGATCAACAGTGAAGGAGCAGTACTTCGAC

GAACTCGATTTGGGCCTAGATTGATGAG

>Rosalind_4521

TTCGGAGTACCATGCCGAGCGGACCTTGTAATGCGAACTTGCTAGGATTTGGTGCACTTA

TCCGAAAGAGTTAAGATCGGGGCTAGTGTGACAAGGTTCAGGGCGCAGAGTGCTAATTCC

AGGGAGCCCTTACGATGAAGCGAAGGGTTAAAGTCGTGTCTATTTTTAATTTGGTTATGG

GAGTACTGCGCATTCTTGAAGGTGTCTCGTGTCTTTTCAGATGATGCTCTATTTCAGATG

TCGCGTTCCTATACATCTCGCTACCGTGATTACAGGCGGTTGCCCGTTCTGCGTCGTAAT

CGCCACCATCGTCCTCCAGCCATTAGGGTCTCTCACAAAGTATTTCAAGGCATGCCATAG

AAAAGAGGGCATTTCTAGTGAGCATTCCGAAGTAGTTAATGGCTTGCGTTTCACGTATGG

TATAGTCAAAAGTGGGGACGAGGGTAATATCAGAAGGATGTCTCTGTCCAGGTTGCGGGG

CAGCCATTCGATACGCTGAAAGGGATACGTCATCATGGAAGAGACCAGGGTTTGCGTAAA

CGCTATTTAACCGATGCTGTATAAATTCTACTTGCCAGGGATGATCTGAATAATGCTGGT

TGCGAGCCCTAGGAATACGCTCAGCAGTTACTGTGCTAAGGGCCTGGTGCTGAAAGACTC

GAAAGCTAACGGACCCCCGACCTTCGGTACTCAGGGCAATGGAGGAGAACCCGTGATGTA

AGTTATGGCAAGGCTTTGCCCACTAATCGGTATTACGCATAGGTTAGTTATTTCGTATTT

AGGGCTATGTCCTTTCCTCGGGCGTCAGGCATGGGGTAGTAACGCCCTCGGTGCGAACGT

CCGGAGATTCACTTGAAATGAAGTGAAGACAGGCCCCCTTTTGGCGACGACGAAAACCAA

GGCCAAACATCTGACATTAACACCATTACGTCCAGTATCTCATTCCGGGACTCGAGCCAC

CGCATCATA

>Rosalind_2518

CGCTGGCGTTTTGCCTGCACGCAAGTGGGGTGCCGAGACTAACCGGTGGGGCTGCGAATA

CCCTAGGGGCATCCGGTCACTATTTGCTTCATTGTAGGTCCGGCCCTTTTATGCACGGCC

CACGTTACTCCGTCAGGGTAGGGGACATCCCCTTATGATCCGACGCTATGACGCCAGTCT

AACCGCCTTCCCAGCGCGGGGCCGCGTCAACATATGAAGGCATCGAGATTATCGACTTCG

ACATTGAGACCGACGGGTTTTACTGATTTTGTCACAGGATCCAGCCCTCTTTACCTGCAC

CGTCGACCACCCTGTAATCAATCAAACTTAAGATGGCAGATCCAGGATTTTATATGTCAT

TGAACCCGAGTCAGTACTCCCTCAGTACGGAGCGGTTAGACTATGTAATCGACTGCTTGA

GTACCAAATTGGCTGATCCTAGGATTAACATTTATCAATTAAATGGTCTAATCCATCGTT

CGAGCAGAAGCCTGCAGGGGTACTTAACATGTAAAATGCGGCGAGGGGTTAGAAGATTCT

TAAATTCGCTGTTGCTGACTCCGGGCGAGTTTCTTGAAATGGCTCTGGGGTCCGGTGGAT

CACGGGCTTACTACGCGGCGGCTGGGCAAGCATATAGTATCAAACACTCGAAATTTCAGG

CACCATCGAACGTGCACCTCCGAATGGCAGTTGTTCTCTGCGCAGCGCACCCGCTGGGCC

GACCTGCCATCCTCCATCGAGACCCGAATCCACATCGGAAGGGTCGTGACGCGTGACTCG

CTCTTCAGGCAGAATATTCATGCGGCTTTCTTACGGATTGATCGATCGACAGCTCGAAGC

AACCCGTCCACGGTCCTGACTGGAACACGCATCAGTTGACATGATGGGAGGATGTCAGCA

ATATAG



'Python > rosaland' 카테고리의 다른 글

Mendel's First Law  (0) 2018.11.26
Counting Point Mutations  (0) 2018.11.21
Rabbits and Recurrence Relations  (0) 2018.11.16
Transcribing DNA into RNA  (0) 2018.11.14
Complementing a Strand of DNA  (0) 2018.11.14

+ Recent posts