In a market where it is increasingly important to relate data to obtain business insights, the scatter plot can be an important ally.
By using this technique in data analysis, it is possible to identify correlations, patterns, and trends that can directly affect a company’s performance. Therefore, understanding what a scatter plot is and how to use it is essential for those who seek to maximize their results through data analysis.
In this article, we will explore in more detail what a scatter plot is, how it can be used and interpreted, and much more.
Definition of Scatter Plot
A scatter plot is a type of chart that represents the relationship between two variables.
It is composed of a series of points that are positioned on a Cartesian plane, where each axis represents a variable. Each point on the graph represents a pair of values, one for each variable, and its position on the Cartesian plane indicates the relationship between them.
The type of correlation can be interpreted through the patterns revealed in a scatter plot. The strength of the correlation can be determined by the proximity of the points to each other on the graph. Points that end up being too far away from the general clustering of points are known as “outliers”.
Lines or curves can be displayed on the graph to aid in the analysis. This is usually known as a trend line and can be used to make estimates through interpolation. A trend line is drawn as close as possible to all points to show how it would look if all the points were condensed into a single line.
We will detail the types of correlation and how they can be interpreted below.
What is Correlation in a Scatter Plot?
Correlation is a statistical measure that indicates the degree of relationship between two variables.
For example, if you are analyzing data on a company’s sales performance, you can use a scatter plot to show how the number of sales relates to the price of the product. This can help identify whether there is a direct or indirect relationship between these variables.
Additionally, the scatter plot can also be used to detect outliers, which are points that are far away from the general pattern of data. These points can indicate errors or anomalies in the data, or even valuable insights that need to be explored in more detail.
Types of Correlation
Positive correlation occurs when the values of two variables increase together. In a scatter plot, this is represented by points that approach an ascending diagonal line, indicating that as the value of one variable increases, the value of the other also increases.
For example, a scatter plot showing the relationship between age and height of people usually exhibits a positive dispersion, since generally, the older the person, the taller they are.
Negative correlation occurs when the values of two variables move in opposite directions. In this case, the points on the scatter plot approach a descending diagonal line, indicating that as the value of one variable increases, the value of the other decreases.
For example, a scatter plot showing the relationship between the number of study hours and the failure rate in a course can exhibit a negative dispersion, indicating that the more study hours students dedicate to the course, the lower the failure rate.
Zero correlation occurs when there is no clear relationship between the two variables. In this case, the points on the scatter plot are scattered randomly and do not approach a diagonal line. This indicates that the variables do not have a significant relationship with each other.
For example, a scatter plot showing the relationship between the number of shoes a person owns and their profession may exhibit weak or no dispersion, as these variables are likely not related.
Strength of Correlation
The strength of correlation in a scatter plot indicates the degree of relationship between two variables represented in the plot. It is important to understand the strength of correlation as it can provide valuable information for decision making.
The strength of correlation can be visually evaluated on the scatter plot by the proximity of the points to the trend line of the plot.
The closer the points are to the trend line, the stronger the correlation. On the other hand, the further away the points are from the trend line, the weaker the correlation.
When is a scatter plot used?
A scatter plot is used when there is an interest in visualizing the relationship between two quantitative variables. It is particularly useful when verifying the existence of a possible cause-and-effect relationship between two variables or when observing the existence of a pattern or trend between them.
An example of a practical application of a scatter plot in financial analysis is the study of the relationship between two economic variables, such as stock prices and net profit. In this case, we can represent the stock prices on the X-axis and net profit on the Y-axis in the scatter plot. This way, it is possible to visualize whether there is a relationship between the two variables: if higher stock prices result in higher net profits, or if there is no relationship between the two variables.
The scatter plot is a fundamental tool for exploratory data analysis in different areas such as science, engineering, finance, and marketing, allowing the visualization and evaluation of the relationship between two variables of interest.
Step-by-Step Guide for Creating a Scatter Plot
1- Define the cause and effect
To create a scatter plot, you must first identify the cause and effect that you want to analyze, and from the quantitative data of both variables, you will verify if the two variables are related.
In business management, for example, it is common to raise hypotheses about the growth or decline of sales.
In addition, issues such as productivity and the power of marketing can also be analyzed through the scatter plot.
After defining the cause and effect that you want to investigate, collect the data for these two variables, so that you can draw them on the scatter plot.
You can do this on paper or even on the computer, the most important thing is that these numbers are real, so that the results are reliable and accurate.
2- Fit the variables
To compose the two axes of the scatter plot, it is necessary to put the dependent variable, i.e., the effect, on the vertical axis (Y).
The other variable, the cause, should be inserted on the horizontal axis (X).
3- Insert the numbers on the plot
Finally, simply insert the numbers on the scatter plot. To do this, you must draw a point for each occurrence of the data on their respective axes.
4- Interpret the correlation
After inserting the numbers on the scatter plot, check how the arrangement of the points is, identifying whether the correlation is positive, negative, or null.
After investing in a certain marketing campaign, you want to use a scatter plot to see if the increase in your company’s sales was due to the investment.
In this example, the X numbers correspond to the investment and the Y numbers correspond to the sales results, both expressed in Brazilian reais. To insert them into the scatter plot, check where the X number intersects with the Y number and mark a point, repeating this process for all subsequent numbers.
In the end, you will be able to analyze whether the correlation found is increasing (positive), strong (with close points) or weak (with distant points).
Interested in scatter plots and using Qlik Sense to create your graphics? Cluster makes your life easier by offering various extensions to improve visualization and software utilization in your company!
Check out Cluster Design’s solutions right now!
Looking for more?
Be the first to know
Get first-hand access to our free Design and Dataviz content directly in your inbox