Jump to content

Scatter plot

From Wikipedia, the free encyclopedia
(Redirected fromScattergram)
Scatter plot
One of theSeven Basic Tools of Quality
First described byJohn Herschel
PurposeTo identify the type of relationship (if any) between two quantitative variables
Waiting time between eruptions and the duration of the eruption for theOld Faithful GeyserinYellowstone National Park,Wyoming,USA. This chart suggests there are generally two types of eruptions: short-wait-short-duration, and long-wait-long-duration.
A 3D scatter plot allows the visualization of multivariate data. This scatter plot takes multiple scalar variables and uses them for different axes in phase space. The different variables are combined to form coordinates in the phase space and they are displayed using glyphs and coloured using another scalar variable.[1]

Ascatter plot,also called ascatterplot,scatter graph,scatter chart,scattergram,orscatter diagram,[2]is a type ofplotormathematical diagramusingCartesian coordinatesto display values for typically twovariablesfor a set of data. If the points are coded (color/shape/size), one additional variable can be displayed. The data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on thevertical axis.[3]

The first description of the scatter plot is generally attributed toJohn Herschel(1792–1871).[4][5]

Overview

[edit]

A scatter plot can be used either when one continuous variable is under the control of the experimenter and the other depends on it or when both continuous variables are independent. If aparameterexists that is systematically incremented and/or decremented by the other, it is called thecontrol parameterorindependent variableand is customarily plotted along the horizontal axis. The measured ordependent variableis customarily plotted along the vertical axis. If no dependent variable exists, either type of variable can be plotted on either axis and a scatter plot will illustrate only the degree ofcorrelation(notcausation) between two variables.[citation needed]

A scatter plot can suggest various kinds of correlations between variables with a certainconfidence interval.For example, weight and height would be on they-axis, and height would be on thex-axis. Correlations may be positive (rising), negative (falling), or null (uncorrelated). If the dots' pattern slopes from lower left to upper right, it indicates a positivecorrelationbetween the variables being studied. If the pattern of dots slopes from upper left to lower right, it indicates a negative correlation. A line ofbest fit(alternatively called 'trendline') can be drawn to study the relationship between the variables. An equation for the correlation between the variables can be determined by established best-fit procedures. For a linear correlation, the best-fit procedure is known aslinear regressionand is guaranteed to generate a correct solution in a finite time. No universal best-fit procedure is guaranteed to generate a correct solution for arbitrary relationships. A scatter plot is also very useful when we wish to see how two comparable data sets agree to show nonlinear relationships between variables. The ability to do this can be enhanced by adding a smooth line such asLOESS.[6]Furthermore, if the data are represented by a mixture model of simple relationships, these relationships will be visually evident as superimposed patterns.[citation needed]

The scatter diagram is one of theseven basic toolsofquality control.[7]

Scatter charts can be built in the form ofbubble,marker, or/andline charts.[8]

Example

[edit]

For example, to display a link between a person's lung capacity, and how long that person could hold their breath, a researcher would choose a group of people to study, then measure each one's lung capacity (first variable) and how long that person could hold their breath (second variable). The researcher would then plot the data in a scatter plot, assigning "lung capacity" to the horizontal axis, and "time holding breath" to the vertical axis.[citation needed]

A person with a lung capacity of400clwho held their breath for21.7 swould be represented by a single dot on the scatter plot at the point (400, 21.7) in theCartesian coordinates.The scatter plot of all the people in the study would enable the researcher to obtain a visual comparison of the two variables in the data set and will help to determine what kind of relationship there might be between the two variables.[citation needed]

Scatter plot matrices

[edit]

For a set of data variables (dimensions)X1,X2,...,Xk,the scatter plot matrix shows all the pairwise scatter plots of the variables on a single view with multiple scatterplots in a matrix format. Forkvariables, the scatterplot matrix will containkrows andkcolumns. A plot located on the intersection of row andjth column is a plot of variablesXiversusXj.[9]This means that each row and column is one dimension, and each cell plots a scatter plot of two dimensions.[citation needed]

Ageneralized scatter plot matrix[10]offers a range of displays of paired combinations of categorical and quantitative variables. Amosaic plot,fluctuation diagram,or facetedbar chartmay be used to display two categorical variables. Other plots are used for one categorical and one quantitative variables.

Visualization of 3D data along with the correspondent scatterplot matrix

See also

[edit]

References

[edit]
  1. ^Visualizations that have been created with VisItat wci.llnl.gov. Last updated: November 8, 2007.
  2. ^Jarrell, Stephen B. (1994).Basic Statistics(Special pre-publication ed.). Dubuque, Iowa: Wm. C. Brown Pub. p. 492.ISBN978-0-697-21595-6.When we search for a relationship between two quantitative variables, a standard graph of the available data pairs (X,Y), called ascatter diagram,frequently helps...
  3. ^Utts, Jessica M.Seeing Through Statistics3rd Edition, Thomson Brooks/Cole, 2005, pp 166-167.ISBN0-534-39402-7
  4. ^Friendly, Michael; Denis, Dan (2005). "The early origins and development of the scatterplot".Journal of the History of the Behavioral Sciences.41(2): 103–130.doi:10.1002/jhbs.20078.PMID15812820.
  5. ^https:// datavis.ca/papers/friendly-scat.pdf
  6. ^Cleveland, William (1993).Visualizing data.Murray Hill, N.J. Summit, N.J: At & T Bell Laboratories Published by Hobart Press.ISBN978-0963488404.
  7. ^Nancy R. Tague (2004)."Seven Basic Quality Tools".The Quality Toolbox.Milwaukee, Wisconsin:American Society for Quality.p. 15.Retrieved2010-02-05.
  8. ^"Scatter Chart – AnyChart JavaScript Chart Documentation".AnyChart. Archived fromthe originalon 1 February 2016.Retrieved3 February2016.
  9. ^Scatter Plot Matrixat itl.nist.gov.
  10. ^Emerson, John W.; Green, Walton A.; Schoerke, Barret; Crowley, Jason (2013). "The Generalized Pairs Plot".Journal of Computational and Graphical Statistics.22(1): 79–91.doi:10.1080/10618600.2012.694762.S2CID28344569.

Further reading

[edit]
  • Cattaneo, Matias D.; Crump, Richard K.; Farrell, Max H.; Feng, Yingjie (2024). "On Binscatter".American Economic Review.114(5): 1488–1514.
[edit]