AP统计术语词典
GLOSSARY
Alternative hypothesis—the theory that the researcher hopes to confirm byrejecting the null hypothesis
Association —when some of the variability in one variable can be accounted forby the other
Bar graph—graph in which the frequencies of categories are displayed withbars; analogous to a histogram for numerical data
Bimodal —distribution with two (or more) most common values; see mode Binomial distribution—probability distribution for a random variable X in a
binomial setting;
where n is the number of independent trials, p is the probability of successon each trial, and x is the count of successes out of the n trials
Binomial setting (experiment)—when each of a fixed number, n , of
observations either succeeds or fails, independently, with probability p Bivariate data—having to do with two variables
Block —a grouping of experimental units thought to be related to the response tothe treatment
Block design—procedure by which experimental units are put into
homogeneous groups in an attempt to control for the effects of the group on theresponse
Blocking —see block design
Boxplot (box and whisker plot)—graphical representation of the five-numbersummary of a dataset. Each value in the five-number summary is located overits corresponding value on a number line. A box is drawn that ranges from Q1to Q3 and “whiskers” extend to the maximum and minimum values from Q1and Q3.
Categorical data—see qualitative data
Census —attempt to contact every member of a population
Center —the “middle” of a distribution; either the mean or the median
Central limit theorem—theorem that states that the sampling distribution of asample mean becomes approximately normal when the sample size is large
Chi-square (χ2) goodness-of-fit test—compares a set of observed categoricalvalues to a set of expected values under a set of hypothesized proportions for
the categories;
Cluster sample—The population is first divided into sections or “clusters.”Then we randomly select an entire cluster, or clusters, and include all of themembers of the cluster(s) in the sample.
Coefficient of determination (r 2) —measures the proportion of variation in theresponse variable explained by regression on the explanatory variable
Complement of an event—set of all outcomes in the sample space that are notin the event
Completely randomized design—when all subjects (or experimental units) arerandomly assigned to treatments in an experiment
Conditional probability—the probability of one event succeeding given thatsome other event has already occurred
Confidence interval—an interval that, with a given level of confidence, islikely to contain a population value; (estimate) ± (margin of error)
Confidence level—the probability that the procedure used to construct aninterval will generate an interval that does contain the population valueConfounding variable—has an effect on the outcomes of the study but whoseeffects cannot be separated from those of the treatment variable
Contingency table—see two-way table
Continuous data—data that can be measured, or take on values in an interval;the set of possible values cannot be counted
Continuous random variable—a random variable whose values are continuousdata; takes all values in an interval
Control —see statistical control
Convenience sample—sample chosen without any random mechanism; choosesindividuals based on ease of selection
Correlation coefficient (r ) —measures the strength of the linear relationship
between two quantitative variables;
Correlation is not causation—just because two variables correlate stronglydoes not mean that one caused the other
Critical value—values in a distribution that identify certain specified areas ofthe distribution
Degrees of freedom—number of independent data-points in a distributionDensity function—a function that is everywhere non-negative and has a totalarea equal to 1 underneath it and above the horizontal axis
Descriptive statistics—process of examining data analytically and graphicallyDimension —size of a two-way table; r × c
Discrete data—data that can be counted (possibly infinite) or placed in orderDiscrete random variable—random variable whose values are discrete dataDotplot —graph in which data values are identified as dots placed above theircorresponding values on a number line
Double blind—experimental design in which neither the subjects nor the studyadministrators know what treatment a subject has received
Empirical Rule (68-95-99.7 Rule)—states that, in a normal distribution, about68% of the terms are within one standard deviation of the mean, about 95%are within two standard deviations, and about 99.7% are within threestandard deviations
Estimate —sample value used to approximate a value of a parameter
Event —in probability, a subset of a sample space; a set of one or more simpleoutcomes
Expected value—mean value of a discrete random variable
Experiment —study in which a researcher measures the responses to a treatmentvariable, or variables, imposed and controlled by the researcher
Experimental units—individuals on which experiments are conductedExplanatory variable—explains changes in response variable; treatmentvariable; independent variable
Extrapolation —predictions about the value of a variable based on the value ofanother variable outside the range of measured values
First quartile—25th percentile
Five-number summary—for a dataset, [minimum value, Q1, median, Q3,maximum value]
Geometric setting—independent observations, each of which succeeds or failswith the same probability p ; number of trials needed until first success isvariable of interest
Histogram —graph in which the frequencies of numerical data are displayedwith bars; analogous to a bar graph for categorical data
Homogeneity of proportions—chi-square hypothesis in which proportions of acategorical variable are tested for homogeneity across two or more
populations
Independent events—knowing one event occurs does not change theprobability that the other occurs; P(A) = P(A|B )
Independent variable—see explanatory variable
Inferential statistics—use of sample data to make inferences about populationsInfluential observation—observation, usually in the x direction, whoseremoval would have a marked impact on the slope of the regression lineInterpolation —predictions about the value of a variable based on the value ofanother variable within the range of measured values
Interquartile range—value of the third quartile minus the value of the firstquartile; contains middle 50% of the data
Least-squares regression line—of all possible lines, the line that minimizesthe sum of squared errors (residuals) from the line
Line of best fit—see least-squares regression line
Lurking variable—one that has an effect on the outcomes of the study butwhose influence was not part of the investigation
Margin of error—measure of uncertainty in the estimate of a parameter;(critical value) · (standard error)
Marginal totals—row and column totals in a two-way table
Matched pairs—experimental units paired by a researcher based on somecommon characteristic or characteristic
Matched pairs design—experimental design that utilizes each pair as a block;one unit receives one treatment, and the other unit receives the other treatmentMean —sum of all the values in a dataset divided by the number of valuesMedian —halfway through an ordered dataset, below and above which lies anequal number of data values; 50th percentile
Mode —most common value in a distribution
Mound-shaped (bell-shaped)—distribution in which data values tend to clusterabout the center of the distribution; characteristic of a normal distributionMutually exclusive events—events that cannot occur simultaneously; if oneoccurs, the other doesn’t
Negatively associated—larger values of one variable are associated withsmaller values of the other; see associated
Nonresponse bias—occurs when subjects selected for a sample do not respondNormal curve—familiar bell-shaped density curve; symmetric about its mean;defined in terms of its mean and standard deviation;
Normal distribution—distribution of a random variable X so that P(a
Null hypothesis—hypothesis being tested—usually a statement that there is noeffect or difference between treatments; what a researcher wants to disproveto support his/her alternative
Numerical data—see quantitative data
Observational study—when variables of interest are observed and measuredbut no treatment is imposed in an attempt to influence the response
Observed values—counts of outcomes in an experiment or study; comparedwith expected values in a chi-square analysis
One-sided alternative—alternative hypothesis that varies from the null in onlyone direction
One-sided test—used when an alternative hypothesis states that the true valueis less than or greater than the hypothesized value
Outcome —simple events in a probability experiment
Outlier —a data value that is far removed from the general pattern of the dataP (A and B)—probability that both A and B occur; P(A and B ) = P (A) · P (A|B)P (A or B)—probability that either A or B occurs; P(A or B ) = P (A) + P (B) –P(A and B )
P value—probability of getting a sample value at least as extreme as thatobtained by chance alone assuming the null hypothesis is true
Parameter —measure that describes a population
Percentile rank—proportion of terms in the distributions less than the valuebeing considered
Placebo —an inactive procedure or treatment
Placebo effect—effect, often positive, attributable to the patient’s expectationthat the treatment will have an effect
Point estimate—value based on sample data that represents a likely value for apopulation parameter
Positively associated—larger values of one variable are associated with largervalues of the other; see associated
Power of the test—probability of rejecting a null hypothesis against a specificalternative
Probability distribution—identification of the outcomes of a random variable
together with the probabilities associated with those outcomes
Probability histogram—histogram for a probability distribution; horizontal axisshows the outcomes, vertical axis shows the probabilities of those outcomesProbability of an event—relative frequency of the number of ways an eventcan succeed to the total number of ways it can succeed or fail
Probability sample—sampling technique that uses a random mechanism toselect the members of the sample
Proportion —ratio of the count of a particular outcome to the total number ofoutcomes
Qualitative data—data whose values range over categories rather than valuesQuantitative data—data whose values are numerical
Quartiles —25th, 50th, and 75th percentiles of a dataset
Random phenomenon—unclear how any one trial will turn out, but there is aregular distribution of outcomes in a large number of trials
Random sample—sample in which each member of the sample is chosen bychance and each member of the population has an equal chance to be in thesample
Random variable—numerical outcome of a random phenomenon (randomexperiment)
Randomization —random assignment of experimental units to treatmentsRange —difference between the maximum and minimum values of a datasetReplication —repetition of each treatment enough times to help control forchance variation
Representative sample—sample that possesses the essential characteristics ofthe population from which it was taken
Residual —in a regression, the actual value minus the predicted value
Resistant statistic—one whose numerical value is not influenced by extremevalues in the dataset
Response bias—bias that stems from respondents’ inaccurate or untruthfulresponse
Response variable—measures the outcome of a study
Robust —when a procedure may still be useful even if the conditions needed tojustify it are not completely satisfied
Robust procedure—procedure that still works reasonably well even if the
assumptions needed for it are violated; the t -procedures are robust against theassumption of normality as long as there are no outliers or severe skewness.Sample space—set of all possible mutually exclusive outcomes of a probabilityexperiment
Sample survey—using a sample from a population to obtain responses toquestions from individuals
Sampling distribution of a statistic—distribution of all possible values of astatistic for samples of a given size
Sampling frame—list of experimental units from which the sample is selectedScatterplot —graphical representation of a set of ordered pairs; horizontal axisis first element in the pair, vertical axis is the second
Shape —geometric description of a dataset: mound-shaped; symmetric, uniform;skewed; etc.
Significance level (α) —probability value that, when compared to the P -value, determines whether a finding is statistically significant
Simple random sample (SRS)—sample in which all possible samples of thesame size are equally likely to be the sample chosen
Simulation —random imitation of a probabilistic situation
Skewed —distribution that is asymmetrical
Skewed left (right)—asymmetrical with more of a tail on the left (right) than onthe right (left)
Spread —variability of a distribution
Standard deviation
—square root of the variance;
Standard error—estimate of population standard deviation based on sampledata
Standard normal distribution—normal distribution with a mean of 0 and astandard deviation of 1
Standard normal probability—normal probability calculated from the standardnormal distribution
Statistic —measure that describes a sample (e.g., sample mean)
Statistical control—holding constant variables in an experiment that mightaffect the response but are not one of the treatment variables
Statistically significant—a finding that is unlikely to have occurred by chanceStatistics —science of data
Stemplot (stem-and-leaf plot)—graph in which ordinal data are broken into“stems” and “leaves”; visually similar to a histogram except that all the dataare retained
Stratified random sample—groups of interest (strata) chosen in such a way thatthey appear in approximately the same proportions in the sample as in thepopulation
Subjects —human experimental units
Survey —obtaining responses to questions from individuals
Symmetric —data values distributed equally above and below the center of thedistribution
Systematic bias—the mean of the sampling distribution of a statistic does notequal the mean of the population; see unbiased estimate
Systematic sample—probability sample in which one of the first n subjects ischosen at random for the sample and then each n th person after that is chosenfor the sample
t -distribution —the distribution with n – 1 degrees of freedom for thet statistic
—
Test statistic
—
Third quartile—75th percentile
Treatment variable—see explanatory variable
Tree diagram—graphical technique for showing all possible outcomes in aprobability experiment
Two-sided alternative—alternative hypothesis that can vary from the null ineither direction; values much greater than or much less than the null provideevidence against the null
Two-sided test—a hypothesis test with a two-sided alternative
Two-way table—table that lists the outcomes of two categorical variables; thevalues of one category are given as the row variable, and the values of theother category are given as the column variable; also called a contingencytable
Type-I error—the error made when a true hypothesis is rejected
Type-II error—the error made when a false hypothesis is not rejected
Unbiased estimate—mean of the sampling distribution of the estimate equalsthe parameter being estimated
Undercoverage —some groups in a population are not included in a samplefrom that population
Uniform —distribution in which all data values have the same frequency ofoccurrence
Univariate data—having to do with a single variable
Variance —average of the squared deviations from their mean of a set of
observations;
Voluntary response bias—bias inherent when people choose to respond to asurvey or poll; bias is typically toward opinions of those who feel moststrongly
Voluntary response sample—sample in which participants are free to respondor not to a survey or a poll
Wording bias—creation of response bias attributable to the phrasing of aquestion
z -score
—number of standard deviations a term is above or below the mean;