Bioinformatics

리눅스 캐시 메모리 정리

2018. 12. 11. 15:40

리눅스 OS가 설치된 서버에서 시스템 효율을 높이는 방법 중 하나로 캐시 메모리를 비우는 것이 있다.

리눅스 서버의 캐시 메모리를 비우는 방법은 명령어 기반으로 터미널에서 간단히 사용할 수 있다.

일단 메모리 사용량 확인하는 명령어는 다음과 같다.

$ free -m

캐시 메모리를 비우는 명령어는 다음과 같다.

$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"

참조:

[리눅스 서버의 메모리 간단 관리 방법(사용량 확인/캐시삭제/정기관리)]

http://osasf.net/discussion/587/%EB%A6%AC%EB%88%85%EC%8A%A4-%EC%84%9C%EB%B2%84%EC%9D%98-%EB%A9%94%EB%AA%A8%EB%A6%AC-%EA%B0%84%EB%8B%A8-%EA%B4%80%EB%A6%AC-%EB%B0%A9%EB%B2%95-%EC%82%AC%EC%9A%A9%EB%9F%89-%ED%99%95%EC%9D%B8-%EC%BA%90%EC%8B%9C%EC%82%AD%EC%A0%9C-%EC%A0%95%EA%B8%B0%EA%B4%80%EB%A6%AC

["echo 3 > /proc/sys/vm/drop_caches" - Permission denied as root]

https://unix.stackexchange.com/questions/109496/echo-3-proc-sys-vm-drop-caches-permission-denied-as-root

'Linux' 카테고리의 다른 글

리눅스 하드 추가 (0)	2018.11.26
PATH 설정 (0)	2018.10.15
리눅스 명령어 (0)	2018.10.11
리눅스 find & grep (0)	2018.10.10

Finding a Shared Motif

2018. 12. 11. 09:54

Problem

A common substring of a collection of strings is a substring of every member of the collection. We say that a common substring is a longest common substring if there does not exist a longer common substring. For example, "CG" is a common substring of "ACGTACGT" and "AACCGTATA", but it is not as long as possible; in this case, "CGTA" is a longest common substring of "ACGTACGT" and "AACCGTATA".

Note that the longest common substring is not necessarily unique; for a simple example, "AA" and "CC" are both longest common substrings of "AACC" and "CCAA".

Given: A collection of $k$ ( $k \leq 100$ ) DNA strings of length at most 1 kbp each in FASTA format.

Return: A longest common substring of the collection. (If multiple solutions exist, you may return any single solution.)

Sample Dataset

>Rosalind_1
GATTACA
>Rosalind_2
TAGACCA
>Rosalind_3
ATACA

Sample Output

AC

Problem explanation

fasta format에서 최장 공통 문자열을 찾는 코드 입니다. 처음에는 LCS 알고리즘을 이용해 풀려고 시도 해봤지만 약간 변형된 문제라 LCS 알고리즘으로 접근하기는 힘들었습니다. 두번째 접근 방법은 문자열에 대한 모든 경우를 리스트로 생성하고 교집합을 통해 비교하고자 하였지만 함수 runtime이 1000bp로 이뤄진 read 100개당 120초가 넘어가는 아주 무거운 방식이여서 포기했습니다. 마지막에 찾아낸 해결 방식은 아주 빠르고 정확했습니다.

Source

import time
start_time = time.time()
def FindCommonString(s1, s2):
    if len(s2) > len(s1):
        s1, s2 = s2, s1
    n = len(s2)
    for i in range(n):
        for j in range(i + 1):
            token = s2[j: n - i + j]
            if token in s1:
                return token
def reform(lines):
    for line in lines:
        if line.startswith(">"):
            a = line.replace(">", "")
            a = a.replace("\n", "")
            id.append(a)
        else:
            if len(id) > len(seq):
                line = line.replace("\n","")
                seq.append(line)
            else:
                line = line.replace("\n", "")
                seq[len(id) - 1] = seq[len(id) - 1] + line
    return id , seq

f = open("D:/gs/rosalind/rosalind_lcsm.txt","r")
lines = f.readlines()
id = []
seq = []
id, seq = reform(lines)
print(seq[1])
com = ""
for i in range(len(seq) -1):
    if com == "":
        com = FindCommonString(seq[i], seq[i+1])
        print(com)
    else:
        if len(com) >= len(FindCommonString(seq[i+1], com)):
            com = FindCommonString(seq[i+1],com)
print(com)
print("--- %s seconds ---" %(time.time() - start_time))
f.close()

Searching Through the Haystack

In “Finding a Motif in DNA”, we searched a given genetic string for a motif; however, this problem assumed that we know the motif in advance. In practice, biologists often do not know exactly what they are looking for. Rather, they must hunt through several different genomes at the same time to identify regions of similarity that may indicate genes shared by different organisms or species.

The simplest such region of similarity is a motif occurring without mutation in every one of a collection of genetic strings taken from a database; such a motif corresponds to a substring shared by all the strings. We want to search for long shared substrings, as a longer motif will likely indicate a greater shared function.

'Python > rosaland' 카테고리의 다른 글

Finding a Protein Motif (0)	2018.12.13
Calculating Expected Offspring (0)	2018.12.07
Overlap Graphs (0)	2018.12.07
Mortal Fibonacci Rabbits (0)	2018.12.05
Consensus and Profile (0)	2018.12.05

Calculating Expected Offspring

2018. 12. 7. 13:35

Problem

For a random variable $X$ taking integer values between 1 and $n$ , the expected value of $X$ is $E (X) = \sum_{k = 1}^{n} k \times P r (X = k)$ . The expected value offers us a way of taking the long-term average of a random variable over a large number of trials.

As a motivating example, let $X$ be the number on a six-sided die. Over a large number of rolls, we should expect to obtain an average of 3.5 on the die (even though it's not possible to roll a 3.5). The formula for expected value confirms that $E (X) = \sum_{k = 1}^{6} k \times P r (X = k) = 3.5$ .

More generally, a random variable for which every one of a number of equally spaced outcomes has the same probability is called a uniform random variable (in the die example, this "equal spacing" is equal to 1). We can generalize our die example to find that if $X$ is a uniform random variable with minimum possible value $a$ and maximum possible value $b$ , then $E (X) = \frac{a + b}{2}$ . You may also wish to verify that for the dice example, if $Y$ is the random variable associated with the outcome of a second die roll, then $E (X + Y) = 7$ .

Given: Six nonnegative integers, each of which does not exceed 20,000. The integers correspond to the number of couples in a population possessing each genotype pairing for a given factor. In order, the six given integers represent the number of couples having the following genotypes:

AA-AA
AA-Aa
AA-aa
Aa-Aa
Aa-aa
aa-aa

Return: The expected number of offspring displaying the dominant phenotype in the next generation, under the assumption that every couple has exactly two offspring.

Sample Dataset

1 0 0 1 0 1

Sample Output

3.5

Problem explanation

멘델의 법칙에 따라 자식 세대의 우성 기대값을 구하는 문제 입니다.

Source

couple = [18855, 19614, 16897, 18945, 16056, 16489]
ng = []
for i in range(len(couple)):
    if i < 3:
       ng.append(couple[i] * 2)
    elif i == 3:
        ng.append(couple[i] * 3 / 4 * 2)
    elif i == 4:
        ng.append(couple[i] / 2 * 2)
    else :
        ng.append(0)

print(sum(ng))

The Need for Averages

Averages arise everywhere. In sports, we want to project the average number of games that a team is expected to win; in gambling, we want to project the average losses incurred playing blackjack; in business, companies want to calculate their average expected sales for the next quarter.

Molecular biology is not immune from the need for averages. Researchers need to predict the expected number of antibiotic-resistant pathogenic bacteria in a future outbreak, estimate the predicted number of locations in the genome that will match a given motif, and study the distribution of alleles throughout an evolving population. In this problem, we will begin discussing the third issue; first, we need to have a better understanding of what it means to average a random process.

'Python > rosaland' 카테고리의 다른 글

Finding a Protein Motif (0)	2018.12.13
Finding a Shared Motif (0)	2018.12.11
Overlap Graphs (0)	2018.12.07
Mortal Fibonacci Rabbits (0)	2018.12.05
Consensus and Profile (0)	2018.12.05

PREV 1 2 3 4 5 6 ···17 NEXT

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Bioinformatics

리눅스 캐시 메모리 정리

'Linux' 카테고리의 다른 글

Finding a Shared Motif

Problem

Sample Dataset

Sample Output

Problem explanation

Source

Searching Through the Haystack

'Python > rosaland' 카테고리의 다른 글

Calculating Expected Offspring

Problem

Sample Dataset

Sample Output

Problem explanation

Source

The Need for Averages

'Python > rosaland' 카테고리의 다른 글

+ Recent posts

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역