Controlled Experiments: Design, Variables, and Best Practices

A controlled experiment is the backbone of empirical science — the method that separates a testable claim from a guess. This page covers how controlled experiments are structured, what makes them valid, where they succeed and where they strain, and how researchers decide whether a controlled design is the right tool for a given question. The principles here apply across biology, psychology, agriculture, medicine, and any discipline where cause-and-effect claims need defending.

Definition and scope

When a pharmaceutical company tests a new drug against a placebo, or when an agronomist compares two fertilizers on side-by-side plots of identical soil, both are running controlled experiments. The defining feature is deliberate manipulation of one factor while holding everything else as constant as possible — the experimental group receives the treatment, the control group does not, and the difference in outcomes is attributed to that single change.

The scientific method as a whole depends on this logic, but the controlled experiment is its sharpest expression. As the National Institutes of Health describes in its research methodology guidance, the control condition establishes a baseline — the world as it would be without the intervention — against which the experimental condition is compared (NIH Office of Research on Women's Health, Research Methodology).

Three categories of variables govern every controlled experiment:

  1. Independent variable — the factor the researcher deliberately changes (e.g., drug dosage, light exposure, temperature)
  2. Dependent variable — the outcome being measured (e.g., tumor size, plant height, reaction time)
  3. Controlled variables (also called constants) — everything else held fixed to prevent confounding (e.g., age of participants, time of day, ambient humidity)

Missing or inadequately managed controlled variables are the most common source of invalid results. A study measuring caffeine's effect on memory that fails to control for sleep quality the night before has a confound large enough to swamp the signal.

How it works

The experiment begins with a falsifiable hypothesis — a specific, testable prediction about what will happen when the independent variable changes. The population or material is then divided into groups, ideally through random assignment, which distributes unknown confounders roughly equally across conditions. Random assignment is not the same as random sampling: the former concerns how subjects are allocated to groups; the latter concerns how they were recruited in the first place. Both matter, and conflating them is a recognizable error in popular science reporting.

Blinding adds another layer of control. In a single-blind design, participants do not know which group they are in. In a double-blind design — the gold standard in clinical trials — neither participants nor the experimenters measuring outcomes know the group assignments until after data collection ends. The U.S. Food and Drug Administration requires double-blind, placebo-controlled trials as the evidentiary standard for new drug approvals (FDA, Guidance for Industry: Adequate and Well-Controlled Studies).

Sample size is calculated before data collection begins, not after. A study with too few participants lacks statistical power — the ability to detect a real effect even if one exists. The conventional threshold is 80% power at a significance level of α = 0.05, meaning a 5% tolerance for false positives. These thresholds are conventions, not physical laws, and they are debated actively in the scientific community.

Common scenarios

Controlled experiments appear across a striking range of settings:

The broader landscape of scientific inquiry includes observational studies, surveys, and modeling, but the controlled experiment holds a privileged position because it is the only design that can establish causation rather than correlation under controlled conditions.

Decision boundaries

A controlled experiment is not always the right method — and using it when it is not appropriate produces results that are either useless or actively misleading.

When controlled experiments are appropriate:
- The independent variable can be ethically and practically manipulated
- The system being studied is small enough and stable enough to hold extraneous variables constant
- The outcome can be measured reliably within a reasonable timeframe

When they are inappropriate or insufficient:
- Ethical constraints prevent random assignment (researchers cannot randomly assign people to smoke for 20 years)
- The phenomenon of interest is historical, ecological, or social at a scale that resists manipulation
- Tight laboratory control creates an artificial environment that doesn't generalize to real-world conditions — a problem called low external validity

The contrast between internal validity (did the experiment accurately measure what it intended?) and external validity (do the results hold outside the lab?) is the central tension in experimental design. A perfectly controlled laboratory study on stress responses in 20 undergraduate students may have high internal validity and limited external validity. An observational study of 50,000 people may show the reverse profile.

Researchers navigating these tradeoffs are guided by resources like the CONSORT Statement (Consolidated Standards of Reporting Trials), a 25-item checklist developed collaboratively by methodologists and journal editors to standardize how randomized trials are reported and evaluated.

References