How to Analyze an Unknown DNA Sequence: A Step-by-Step Workflow
6 min read · Updated June 10, 2026
You've been handed a stretch of DNA — a synthesised fragment, a clone, an amplicon, a sequence from a paper — and you need to know what it is and what you can do with it. Rather than guessing, there's a quick, repeatable workflow that tells you most of what matters in a few minutes. This guide walks through it step by step.
Step 1 — Composition and a GC sanity check
Start with the basics: how long is it, and what is its GC content? Length tells you whether you're looking at an oligo, a gene-sized fragment or a whole construct. GC content is the first thing that affects downstream work — very high or very low GC changes primer behaviour, PCR conditions and how evenly the region sequences. A balanced GC (roughly 40–60%) is the easy case; values outside that are worth noting before you design anything.
Step 2 — Reading frames and ORFs (does it code?)
Next, ask whether the sequence could encode a protein. Scanning all six reading frames for open reading frames (ORFs) reveals the longest stretch that runs from a start codon to a stop without interruption. A long ORF in one frame strongly suggests a coding sequence; the absence of any meaningful ORF points to a non-coding region, a regulatory element, or the wrong strand. Translating the longest ORF and eyeballing the protein is a good confirmation step.
Step 3 — Restriction sites for cloning
If you might clone or otherwise manipulate the fragment, scan it for restriction enzyme sites. The most valuable result is the list of enzymes that cut exactly once: single cutters are ideal for linearising a plasmid or setting up directional cloning, whereas an enzyme that cuts your insert multiple times will fragment it. Knowing which common enzymes don't cut at all is equally useful when you need to leave the insert untouched.
Step 4 — Primers to amplify it
Finally, if you want to PCR the fragment, the sequence ends give you a first pair of primers. Taking ~20 nt from each end (the reverse primer being the reverse complement of the 3' end) and checking their melting temperatures tells you whether a simple amplification is feasible and whether the two Tm values are close enough to share an annealing step. Treat these as a starting point and refine for dimers and specificity before ordering.
Doing it all in one pass
Each of these steps has its own dedicated tool, but running them one by one for every new sequence is tedious. A one-click sequence analyzer composes the whole workflow — composition, ORFs, single-cutter enzymes and end-primer Tm — into a single report you can copy or download, so characterising a new sequence becomes a single paste rather than four.
Frequently asked questions
- What's the first thing to check on an unknown sequence?
- Length and GC content. They immediately tell you the scale of the sequence and flag any GC extremes that will affect primer design and PCR before you commit to anything downstream.
- How do I tell if a sequence is coding?
- Scan all six reading frames for open reading frames. A long ORF running from a start codon to a stop in one frame strongly suggests a coding sequence; translating it and checking the protein confirms it.
- Why do single-cutter enzymes matter?
- An enzyme that cuts your sequence exactly once is ideal for linearising or for directional cloning, while an enzyme that cuts multiple times would fragment the insert. The single cutters are usually the ones you build a cloning strategy around.
Related tools
Paste a sequence, get composition, ORFs, restriction sites and primers at once.
Find open reading frames in all six frames and translate them.
Find recognition and cut sites for common restriction enzymes.
Calculate GC%, AT% and per-base composition of a sequence.