E-news Express

mailfreda
Aug 27, 2025
10 min read

Updated: Aug 28, 2025

Project on Business Statistics

E-news Express (online news portal) suspects that the monthly subscriptions have decreased due to a poor-quality landing page.

A new landing page was designed. The question is whether this change will ensure an increase in engagement and conversions.

A statistical analysis needs to be performed to assess whether the change in the landing page will increase their subscriber base.

Link to code:

Data_Analysis_Projects/Data_analsis_python_projects/ENews_Express_Learner_Notebook_Full_Code (2).ipynb at Data-cleaning-and-insights · FredaMaree/Data_Analysis_Projects

Objectives

The main objective is to compare the new landing page's effectiveness to the old page to earn subscribers.

Solution approach

Determine if the users spend more time on the new landing page than the old one.
Check if the conversion rate for the new page is higher than the old page.
Investigate if the conversion status is dependent on the preferred language.
Examine if the time spent on the new page varies for different language users.

A/B testing will be used to determine whether the new landing page versus the old landing page attracts users based on a chosen metric.

Explore the dataset and extract insights using Exploratory Data Analysis

Data Overview

#View the first 5 rows of the dataset
df.head()
user_id  group	landing_page	    time_spent_on_the_page	converted language_preferred	
0	546592	control	old		    3.48     no	 Spanish
1	546468	treatment new			7.13	    yes   English
2	546462	treatment new			4.40		no    Spanish
3	546567	control	old			3.02	     no    French
4	546459	treatment new			4.75		yes   Spanish
	
# Check the shape of dataset
df.shape

# Check datatypes
df.info()

# Statistical summary for numerical values
df.describe()

	user_id	time_spent_on_the_page
count	100.000000	100.000000
mean	546517.000000	5.377800
std	52.295779	2.378166
min	546443.000000	0.190000
25%	546467.750000	3.880000
50%	546492.500000	5.415000
75%	546567.250000	7.022500
max	546592.000000	10.710000

Observation form exploring the dataset

Time spent on the page has an average of 5.38 minutes.
There are 100 rows and 6 columns
Group 1 and group 2 are equally represented with 50 entries per group.
The old and new landing pages are equally represented with 50 entries per group.
54 customers converted to the new landing page and 46 did not convert. -There are no missing values or duplicates in the data.

Univariate Analysis

Visualise distribution of landing page by group

# Univariate analysis of landing by group sns.histplot(data=df, x= 'landing_page', hue= 'group'); plt.title('Distribution of Landing Page by Group'); plt.xlabel('Landing Page'); plt.ylabel('Count');

The control and the treatment are equally represented with 50 entries per group.
The old and new landing pages are equally represented with 50 entries per group.

Bivariate Analysis

Distribution of time spent on landing page by group

#Distribution of time spent on landing page by group
sns.boxplot(data=df, x= 'landing_page', hue= 'group', y= 'time_spent_on_the_page');
plt.title('Distribution of Time Spent on Landing Page by Group');
plt.xlabel('Landing Page');
plt.ylabel('Time Spent on Page');

Observation:

The range of time spent in the control group is approximately from 0.5 to 11 minutes with an average of 4.5 minutes. It is slightly skewed to the right and customers may be spending more time on the page, because it is not as user friendly. There are no outliers.
The distribution of time spent in the treatment group, seems to follow a normal distribution. The range is smaller from 3 minutes to about 9.5 minutes. The average time spent is 6 minutes. There are outliers present.

main questions

Do the users spend more time on the new landing page than the existing landing page?

Perform Visual Analysis

sns.barplot(data=df, x= 'landing_page', y= 'time_spent_on_the_page');
plt.title('Time Spent on Landing Page');
plt.xlabel('Landing page');
plt.ylabel('Time Spent on Page');

Observation

The average time spent on the new landing page is about 6.3 minutes.
The average time spent on the old landing page is about 4.5 minutes

Step 1: Define the null and alternate hypotheses

H0: There is no difference between time spent on the new and old page.

H1: Time spent on the new page more than time spent on the old page

Step 2: Select Appropriate test

two-sample t-test

Step 3: Decide the significance level

The the level of significane (alpha) is set as 0.05

Step 4: Collect and prepare data

# import the required function
from scipy.stats import t

#Find the sample means and the sample standard deviations for the two samples
new_mean= round(df[df['landing_page']=='new']['time_spent_on_the_page'].mean(), 2)
old_mean= round(df[df['landing_page']=='old']['time_spent_on_the_page'].mean(), 2)
new_sample_standard_deviation= round(df[df['landing_page']=='new']['time_spent_on_the_page'].std(), 2)
old_sample_standard_deviation= round(df[df['landing_page']=='old']['time_spent_on_the_page'].std(), 2)
print('The time spent on the new landing page group has a mean of '+str (new_mean))
print('The time spent on the old landing page group has a mean of '+str (old_mean))
print('The time spent on the new landing page group has a a standard deviation of '+ str (new_sample_standard_deviation))
print('The time spent on the old landing page group has a a standard deviation of '+ str (old_sample_standard_deviation))

Step 5: Calculate the p-value

#import the required functions
from scipy.stats import ttest_ind

# find the p-value
new_page_times=df[df['landing_page']=='new']['time_spent_on_the_page']
old_page_times=df[df['landing_page']=='old']['time_spent_on_the_page']
t_statistic,p_value=ttest_ind(new_page_times,old_page_times,equal_var =False, alternative='greater')
print('The t-statistic is '+str(t_statistic))
print('The p-value is '+str(p_value))
The t-statistic is 3.7867702694199856
The p-value is 0.0001392381225166549

Step 6: Compare the p-value with α

# print the conclusion based on p-value
if p_value < 0.05:
    print(f'As the p-value {p_value} is less than the level of significance, we reject the null hypothesis.')
else:
    print(f'As the p-value {p_value} is greater than the level of significance, we fail to reject the null hypothesis.')
As the p-value 0.0001392381225166549 is less than the level of significance, we reject the null hypothesis.

Step 7: Draw an inference

Since the p-value is less than the 5% significance level, we reject the null hypothesis. Hence, we have enough statistical evidence to say that the time spent on the new landing page is more than the time spent on the old landing page.

Is the conversion rate (the proportion of users who visit the landing page and get converted) for the new page greater than the conversion rate for the old page?

Perform Visual Analysis

# Visual Analysis
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Bar plot of conversion rates
axes[0].bar(conversion_summary['landing_page'], conversion_summary['conversion_rate_pct'],
           color=['skyblue', 'lightcoral'], alpha=0.7)
axes[0].set_title('Conversion Rates by Landing Page', fontsize=14, fontweight='bold')
axes[0].set_ylabel('Conversion Rate (%)')
axes[0].set_xlabel('Landing Page')
for i, v in enumerate(conversion_summary['conversion_rate_pct']):
    axes[0].text(i, v + 0.1, f'{v:.2f}%', ha='center', va='bottom', fontweight='bold')

# Side-by-side comparison with actual numbers
x = np.arange(len(conversion_summary))
width = 0.35

axes[1].bar(x - width/2, conversion_summary['conversions'], width,
           label='Conversions', color='lightgreen', alpha=0.7)
axes[1].bar(x + width/2, conversion_summary['total_visitors'] - conversion_summary['conversions'],
           width, label='Non-conversions', color='lightcoral', alpha=0.7)

axes[1].set_title('Conversion vs Non-convy Landing Page', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Number of Users')
axes[1].set_xlabel('Landing Page')
axes[1].set_xticks(x)
axes[1].set_xticklabels(conversion_summary['landing_page'])
axes[1].legend()

plt.tight_layout()
plt.show()

Step 1: Define the null and alternate hypotheses

H0: The conversion rate for users of the new page, does not differ from the conversion rate of the users of the old page.
H1: The conversion rate of the new page group is more than the conversion rate of the old page group.

Step 2: Select Appropriate test

Two-Sample Proportions Z-Test

Step 3: Decide the significance level

The the level of significane (alpha) is set as 0.05

Step 4: Perform the test

# Set significance level
alpha = 0.05
#Perform the test
counts = [conversions_new, conversions_old]  # New page first for H1: p_new > p_old
nobs = [n_new, n_old]

z_stat, p_value = proportions_ztest(counts, nobs, alternative='larger')

print(f"\nTEST RESULTS:")
print(f"Z-statistic: {z_stat:.4f}")
print(f"P-value: {p_value:.6f}")
print(f"Significance level (α): {alpha}")

Z-statistic: 2.4077
P-value: 0.008026
Significance level (α): 0.05

Step 5. Inference and conclusion

Since p-value (0.008026) < α (0.05), we REJECT the null hypothesis
The new landing page has a significantly higher conversion rate than the old page.

Are the conversion and preferred language independent or related?

Perform Visual Analysis

#Visual analysis of conversion and language
sns.countplot(data=df, x= 'converted', hue= 'language_preferred');
plt.title('Conversion and Language');
plt.xlabel('Conversion');
plt.ylabel('Count');

Step 1: Define the null and alternate hypotheses

H0: The conversion and preferred language are independent
H1: The conversion and preferred language are related.

Step 2: Select Appropriate test

The Chi Square Test for Independence can be used

Step 3: Decide significance level

The the level of significane (alpha) is set as 0.05

Step 4: Collect and prepare data

#import the appropriate function
from scipy.stats import chi2_contingency

Step 5: Calculate the p-value

#Calculate the p-value
contingency_table = pd.crosstab(df['converted'], df['language_preferred'])
chi2, p_value, dof, expected = chi2_contingency(contingency_table)
print('The chi-square statistic is '+str(chi2))
print('The p-value is '+str(p_value))
The chi-square statistic is 3.0930306905370837
The p-value is 0.21298887487543447

Step 6: Compare the p value with 𝛼

# print the conclusion based on p-value
if p_value < 0.05:
    print(f'As the p-value {p_value} is less than the level of significance, we reject the null hypothesis.')
else:
    print(f'As the p-value {p_value} is greater than the level of significance, we fail to reject the null hypothesis.')

Step 7: Draw an inference

Since the p-value is more than the 5% significance level, we fail to reject the null hypothesis. Hence, we do not have enough statistical evidence to say that the conversion and preferred language is related.

4. Is the time spent on the new page the same for the different languages?

Perform visual analysis

sns.boxplot(data=df, x='language_preferred', y= 'time_spent_on_the_page', hue='landing_page')
plt.title('Time Spent on New Landing Page by Language');
plt.xlabel('Language');
plt.ylabel('Time Spent on Page');

Step 1: Define the null and alternative hypotheses

H0: mu Spanish =mu English =mu French
H1: at least one of the languages has a different time spent on the new landing page

Step 2: Select an Appropriate test

one-way ANOVA test

Step 3: Decide significance level

The the level of significane (alpha) is set as 0.05

Step 4: Collect and prepare data

#Define the time spent on the new landing page for each of the preferred languages
spanish_time= round(df[(df['language_preferred']=='Spanish') & (df['landing_page']=='new')]['time_spent_on_the_page'].mean(),2)
english_time= round(df[(df['language_preferred']=='English') & (df['landing_page']=='new')]['time_spent_on_the_page'].mean(),2)
french_time= round(df[(df['language_preferred']=='French') & (df['landing_page']=='new')]['time_spent_on_the_page'].mean(),2)
print('The average time spent on the new landing page for Spanish users (mu Spanish) is '+str(spanish_time),'minutes')
print('The average time spent on the new landing page for English (mu English) users is '+str(english_time),'minutes')
print('The average time spent on the new landing page for French (mu French)users is '+str(french_time),'minutes')
The average time spent on the new landing page for Spanish users (mu Spanish) is 5.84 minutes
The average time spent on the new landing page for English (mu English) users is 6.66 minutes
The average time spent on the new landing page for French (mu French)users is 6.2 minutes

Levene's test:
H0: All the population variances are equal
H1: At least one variance is different from the rest

#Assumption : homogenisity of variance
from scipy.stats import levene

spanish_time= (df[(df['language_preferred']=='Spanish') & (df['landing_page']=='new')]['time_spent_on_the_page'])
english_time= (df[(df['language_preferred']=='English') & (df['landing_page']=='new')]['time_spent_on_the_page'])
french_time= (df[(df['language_preferred']=='French') & (df['landing_page']=='new')]['time_spent_on_the_page'])
statistic,p_value=levene(spanish_time,english_time,french_time)
print('The levene statistic is '+str(statistic))
print('The p-value is '+str(p_value))
The levene statistic is 0.7736446756800186
The p-value is 0.46711357711340173
The p-value is large and we fail to reject the H0 with the Levene test.
Thus, all the population variances are equal.

Step 5: Calculate the p-value

# Import the required function
from scipy.stats import f_oneway

# Calculate the p-value
spanish_time= (df[(df['language_preferred']=='Spanish') & (df['landing_page']=='new')]['time_spent_on_the_page'])
english_time= (df[(df['language_preferred']=='English') & (df['landing_page']=='new')]['time_spent_on_the_page'])
french_time= (df[(df['language_preferred']=='French') & (df['landing_page']=='new')]['time_spent_on_the_page'])
f_statistic, p_value = f_oneway(spanish_time, english_time, french_time)
print('The f-statistic is '+str(f_statistic))
print('The p-value is '+str(p_value))

The f-statistic is 0.8543992770006822
The p-value is 0.43204138694325955

Step 6: Compare the p-value with 𝛼

# print the conclusion based on p-value
if p_value < 0.05:
    print(f'As the p-value {p_value} is less than the level of significance, we reject the null hypothesis.')
else:
    print(f'As the p-value {p_value} is greater than the level of significance, we fail to reject the null hypothesis.')
As the p-value 0.43204138694325955 is greater than the level of significance, we fail to reject the null hypothesis.

Step 7: Draw an inference

Since the p-value is more than the 5% significance level, we fail to reject the null hypothesis. Hence, we do not have enough statistical evidence to say that at least one of languages has a different time spent on the new landing page.

Conclusion and Business Recommendations

I conducted an A/B analysis of a new landing page and found that users spent significantly more time on it and showed higher conversion rates, supporting subscription growth. Based on these insights, I recommended further investment in website optimization, faster reporting, and innovative multimedia features to boost engagement. I also advised increasing the sample size for language preference analysis, implementing community-building tools, and continuing A/B testing and user behavior tracking to guide future improvements.