Biopython Tutorial

Updated a year ago

Biopython Tutorial for Python 3

Course material by Dr. Kristian Rother

with contributions by Allegra Via, Magdalena Rother and Olga Sheshukova.

What is Biopython?

Biopython is a Python library for reading and writing many common biological data formats. It contains some functionality to perform calculations, in particular on 3D structures.

The library and documentation can be found at

1. Getting started

import Bio
from Bio.Seq import Seq
dna = Seq("ACGTTGCA")


from Bio.Alphabet import IUPAC
dna = Seq("AGTACACTGGT", IUPAC.unambiguous_dna)

2. Reverse complement, transcribing & translating

rna = dna.transcribe()


from Bio.Seq import reverse_complement, transcribe, translate

3. Calculating GC-content

from Bio.SeqUtils import GC

4. Caculating molecular weight (DNA only)

from Bio.SeqUtils import molecular_weight

5. Loading sequences from a FASTA file

from Bio import SeqIO
for record in SeqIO.parse("ls_orchid.fasta", "fasta"):
    print record.seq, len(record.seq)

6. Plotting a histogram of seq lengths with pylab

pylab aka matplotlib needs to be installed separately.

import pylab
sizes=[len(r.seq) for r in SeqIO.parse("ls_orchid.fasta","fasta")]
pylab.hist(sizes, bins=20)
pylab.title("%i orchid sequences\nLengths %i to %i" \
            % (len(sizes), min(sizes), max(sizes)))
pylab.xlabel("Sequence length (bp)")