SeqBench

Reverse Translate Protein to DNA (Back-Translation)

Back-translate a protein to DNA using most-frequent or degenerate IUPAC codons.

🔒 Local processing — pasted sequences are not uploaded

Reverse translate (back-translate) a protein sequence into a DNA coding sequence. Because the genetic code is degenerate, many DNA sequences encode the same protein — so pick one of two strategies: map each residue to the single most-frequent codon in E. coli, human or yeast, or collapse every synonymous codon into one degenerate IUPAC codon (e.g. Leu -> YTN) for designing degenerate oligos. You get the back-translated DNA to copy, a per-residue codon breakdown, and a hand-off to the Codon Optimizer for constraint-aware design.

Back-translated DNA (5'→3')
Result appears here

Back-translation is ambiguous — many DNA sequences encode the same protein. This tool picks the most-frequent codon (or a degenerate consensus) per residue. For constraint-aware design that also considers GC windows, repeats and restriction sites, use the Codon Optimizer.

How to use the Reverse Translate tool

  1. 1Paste a protein sequence in one-letter amino-acid codes (or load the example).
  2. 2Choose a mode: most-frequent codon (with an organism) or degenerate IUPAC consensus.
  3. 3Copy the back-translated DNA and review the per-residue codons, then refine in the Codon Optimizer if needed.

Frequently asked questions

What is reverse translation (back-translation)?
Reverse translation converts a protein sequence back into a DNA coding sequence. Because several codons can encode the same amino acid, the result is not unique — this tool resolves the ambiguity either by choosing the most-frequent codon in your organism or by emitting one degenerate IUPAC codon that covers all synonymous codons.
What is the difference between the most-frequent and degenerate modes?
Most-frequent picks the single codon used most often for each residue in the selected organism (E. coli, human or yeast), giving a concrete sequence to synthesise. Degenerate mode instead outputs one IUPAC-ambiguity codon per residue (for example Leu becomes YTN, Ser becomes WSN) that matches every synonymous codon — useful for designing degenerate primers or probes.
How are the degenerate IUPAC codons built?
For each of the three codon positions, the tool takes the set of bases that appear across all synonymous codons for that amino acid and encodes it as a single IUPAC symbol (e.g. A+G becomes R, A+C+G+T becomes N). The three symbols form the degenerate codon.
Should I use this to order a synthetic gene?
Treat it as a starting point. Real gene design also considers GC-content windows, secondary structure, repeats and restriction sites — use the Codon Optimizer for constraint-aware optimisation before synthesis. Codon-usage tables here are reference approximations.

Related tools