**What is Correlation?**

When two or more items are connected or have a mutual relationship, this is called correlation. In mathematics, correlation is most frequently used to describe the process of determining the link between two variables, X and Y.

A positive correlation between X and Y indicates that when X increase Y should also increase and a negative correlation determines vice versa.

The exact relationship of the change between X and Y is determined statistically by a coefficient known as Pearson’s Correlation Coefficient.

## Here is the Formula :

Pearson Correlation Coefficient:

**C** = **Covariance** (X,Y) / (**Standard Dev** (X) * **Standard Dev **(Y))

Covariance

**Cov(X,Y)** = (Σ(Xi – Xavg) * (Yi – Yavg)) / (n-1), where n is the number of data points being measured in a statistical experiment.

And,

**Standard Dev(X)** = Sqrt(Σ(Xi – Xavg)^2)/(n-1)

**What are Reach and Impression?**

The main difference between Reach and Impressions is that Reach refers to the number of unique people who have interacted with your campaign, while Impressions refer to the number of people who have viewed it. One person may see your video twice. In that case, Impressions are two and reach is one.

Brand Reach is taking this concept one step further by calculating the number of unique persons who have interacted with your campaign in a given span of time.

Throughout the age of marketing and analytics, Brand Reach is always a tough nut to crack as it is nearly impossible to track the number of unique visitors interacting with your campaign.

However, Brand Reach can be estimated. In fact, estimating brand reach helps in two ways:

● To determine whether a specific campaign is worth pursuing.

● It also helps in determining a correlation between engaged reach and lead generation.

**Using the Concept of Correlation to Find Brand Reach Or Potential of an SEO Campaign**

Here’s what we did:

First We plotted two Graphs between the Google Analytics New Users and Google Search Console Impressions by taking four data sets in a month. (Each data set is an average of the daily data of a week).

One Graph was plotted between the Google Analytics new users and Google Search Console Impressions.

**The Data Points are as follows:**

(117.57, 50288)

(114.14, 48427)

(130.28, 47047)

(95.71, 45198)

To calculate the Pearson correlation coefficient, we need to first calculate the covariance and standard deviations of the two variables.

Let X be

the first variable (the first value in each pair) and Y be the second variable

(the second value in each pair).

First, we

calculate the means of X and Y:

mean(X) = (117.57 + 114.14 + 130.28 + 95.71) / 4 = 114.925

mean(Y) = (50288 + 48427 + 47047 + 45198) / 4 = 47790

Next, we calculate the differences of each value from their respective means:

X1 – mean(X) = 117.57 – 114.925 = 2.645

X2 – mean(X) = 114.14 – 114.925 = -0.785

X3 – mean(X) = 130.28 – 114.925 = 15.355

X4 – mean(X) = 95.71 – 114.925 = -19.215

Y1 – mean(Y) = 50288 – 47790 = 2498

Y2 – mean(Y) = 48427 – 47790 = 637

Y3 – mean(Y) = 47047 – 47790 = -743

Y4 – mean(Y) = 45198 – 47790 = -2592

Then, we calculate the product of these differences:

(X1 – mean(X))(Y1 – mean(Y)) = 2.645 * 2498 = 6614.81

(X2 – mean(X))(Y2 – mean(Y)) = -0.785 * 637 = -499.645

(X3 – mean(X))(Y3 – mean(Y)) = 15.355 * -743 = -11403.965

(X4 – mean(X))(Y4 – mean(Y)) = -19.215 * -2592 = 49727.08

Now, we can calculate the covariance of X and Y by summing these products and dividing by (n-1), where n is the number of data points:

cov(X,Y) = (6614.81 – 499.645 – 11403.965 + 49727.08) / 3 = 16179.083

Next, we calculate the standard deviations of X and Y:

std(X) = sqrt(((117.57 – 114.925)^2 + (114.14 – 114.925)^2 + (130.28 – 114.925)^2 + (95.71 – 114.925)^2) / 3) = 13.192

std(Y) = sqrt(((50288 – 47790)^2 + (48427 – 47790)^2 + (47047 – 47790)^2 + (45198 – 47790)^2) / 3) = 2114.464

Finally, we can calculate the Pearson correlation coefficient as the covariance of X and Y divided by the product of their standard deviations:

R1 = corr(X,Y) = cov(X,Y) / (std(X) * std(Y)) = 16179.083 / (13.192 * 2114.464) = 0.681

Therefore, the Pearson correlation coefficient for the given data set is 0.681. This indicates a moderately positive linear relationship between X and Y.

Another Graph was plotted between Google Analytics new Users and Google Search Console Clicks

The Data Points are as Below:

(117.57, 93.14)

(114.14, 92.28)

(130.28, 101.85)

(95.71, 87)

Now we calculate the Pearson’s Correlation Coefficient for the above Data Set:

To calculate the Pearson correlation coefficient, we need to first calculate the covariance and standard deviations of the two variables.

Let X be

the first variable (the first value in each pair) and Y be the second variable

(the second value in each pair).

First, we

calculate the means of X and Y:

mean(X) = (117.57 + 114.14 + 130.28 + 95.71) / 4 = 114.925

mean(Y) = (93.14 + 92.28 + 101.85 + 87) / 4 = 93.5675

Next, we calculate the differences of each value from their respective means:

X1 – mean(X) = 117.57 – 114.925 = 2.645

X2 – mean(X) = 114.14 – 114.925 = -0.785

X3 – mean(X) = 130.28 – 114.925 = 15.355

X4 – mean(X) = 95.71 – 114.925 = -19.215

Y1 – mean(Y) = 93.14 – 93.5675 = -0.4275

Y2 – mean(Y) = 92.28 – 93.5675 = -1.2875

Y3 – mean(Y) = 101.85 – 93.5675 = 8.2825

Y4 – mean(Y) = 87 – 93.5675 = -6.5675

Then, we calculate the product of these differences:

(X1 – mean(X))(Y1 – mean(Y)) = 2.645 * -0.4275 = -1.1309125

(X2 – mean(X))(Y2 – mean(Y)) = -0.785 * -1.2875 = 1.0092375

(X3 – mean(X))(Y3 – mean(Y)) = 15.355 * 8.2825 = 127.0254875

(X4 – mean(X))(Y4 – mean(Y)) = -19.215 * -6.5675 = 126.2144625

Now, we can calculate the covariance of X and Y by summing these products and dividing by (n-1), where n is the number of data points:

cov(X,Y) = (-1.1309125 + 1.0092375 + 127.0254875 + 126.2144625) / 3 = 84.036095

Next, we calculate the standard deviations of X and Y:

std(X) = sqrt(((117.57 – 114.925)^2 + (114.14 – 114.925)^2 + (130.28 – 114.925)^2 + (95.71 – 114.925)^2) / 3) = 13.192

std(Y) = sqrt(((93.14 – 93.5675)^2 + (92.28 – 93.5675)^2 + (101.85 – 93.5675)^2 + (87 – 93.5675)^2) / 3) = 5.838692

Finally, we can calculate the Pearson correlation coefficient as the covariance of X and Y divided by the product of their standard deviations:

corr(X,Y) = cov(X,Y) / (std(X) * std(Y)) = 84.036095 / (13.192 * 5.838692) = 1.000

Therefore, the Pearson correlation coefficient for this data set is 1.000, indicating a perfect positive correlation between X and Y. This means that as X increases, Y also increases at a constant rate. In other words, the two variables are perfectly linearly related.

R2 = corr(X,Y) = cov(X,Y) / (std(X) * std(Y)) = 84.036095 / (13.192 * 5.838692) = 1.000

We take the mean modulus of the two pearson correlation coefficient values :

R = |r1 + r2| / 2

R = 0.84

**Conclusion:**

The value indicates a very strong linear relationship between Google Analytics New Users and Google Search Console Impressions or Reach Data.

Having a value of more than 0.6 indicates a strong linear relationship and hence indicates that more users will lead to more impressions or clicks.

Hence the current Search Engine Optimization Campaign is successfully generating brand reach growth and hence is valuable.