E-news Express
- mailfreda
- Aug 27, 2025
- 10 min read
Updated: Aug 28, 2025
Project on Business Statistics
E-news Express (online news portal) suspects that the monthly subscriptions have decreased due to a poor-quality landing page.
A new landing page was designed. The question is whether this change will ensure an increase in engagement and conversions.
A statistical analysis needs to be performed to assess whether the change in the landing page will increase their subscriber base.
Link to code:

Objectives
The main objective is to compare the new landing page's effectiveness to the old page to earn subscribers.
Solution approach
Determine if the users spend more time on the new landing page than the old one.
Check if the conversion rate for the new page is higher than the old page.
Investigate if the conversion status is dependent on the preferred language.
Examine if the time spent on the new page varies for different language users.
A/B testing will be used to determine whether the new landing page versus the old landing page attracts users based on a chosen metric.
Data Overview
#View the first 5 rows of the dataset
df.head()
user_id group landing_page time_spent_on_the_page converted language_preferred
0 546592 control old 3.48 no Spanish
1 546468 treatment new 7.13 yes English
2 546462 treatment new 4.40 no Spanish
3 546567 control old 3.02 no French
4 546459 treatment new 4.75 yes Spanish
# Check the shape of dataset
df.shape
# Check datatypes
df.info()
# Statistical summary for numerical values
df.describe()user_id | time_spent_on_the_page | |
count | 100.000000 | 100.000000 |
mean | 546517.000000 | 5.377800 |
std | 52.295779 | 2.378166 |
min | 546443.000000 | 0.190000 |
25% | 546467.750000 | 3.880000 |
50% | 546492.500000 | 5.415000 |
75% | 546567.250000 | 7.022500 |
max | 546592.000000 | 10.710000 |
Time spent on the page has an average of 5.38 minutes.
There are 100 rows and 6 columns
Group 1 and group 2 are equally represented with 50 entries per group.
The old and new landing pages are equally represented with 50 entries per group.
54 customers converted to the new landing page and 46 did not convert. -There are no missing values or duplicates in the data.
Univariate Analysis
Visualise distribution of landing page by group
# Univariate analysis of landing by group sns.histplot(data=df, x= 'landing_page', hue= 'group'); plt.title('Distribution of Landing Page by Group'); plt.xlabel('Landing Page'); plt.ylabel('Count');

The control and the treatment are equally represented with 50 entries per group.
The old and new landing pages are equally represented with 50 entries per group.
Bivariate Analysis
Distribution of time spent on landing page by group
#Distribution of time spent on landing page by group
sns.boxplot(data=df, x= 'landing_page', hue= 'group', y= 'time_spent_on_the_page');
plt.title('Distribution of Time Spent on Landing Page by Group');
plt.xlabel('Landing Page');
plt.ylabel('Time Spent on Page');
Observation:
The range of time spent in the control group is approximately from 0.5 to 11 minutes with an average of 4.5 minutes. It is slightly skewed to the right and customers may be spending more time on the page, because it is not as user friendly. There are no outliers.
The distribution of time spent in the treatment group, seems to follow a normal distribution. The range is smaller from 3 minutes to about 9.5 minutes. The average time spent is 6 minutes. There are outliers present.
main questions
Do the users spend more time on the new landing page than the existing landing page?
Perform Visual Analysis
sns.barplot(data=df, x= 'landing_page', y= 'time_spent_on_the_page');
plt.title('Time Spent on Landing Page');
plt.xlabel('Landing page');
plt.ylabel('Time Spent on Page');
Observation
The average time spent on the new landing page is about 6.3 minutes.
The average time spent on the old landing page is about 4.5 minutes
Step 1: Define the null and alternate hypotheses
H0: There is no difference between time spent on the new and old page.
H1: Time spent on the new page more than time spent on the old page
Step 2: Select Appropriate test
two-sample t-test
Step 3: Decide the significance level
The the level of significane (alpha) is set as 0.05
Step 4: Collect and prepare data
# import the required function
from scipy.stats import t
#Find the sample means and the sample standard deviations for the two samples
new_mean= round(df[df['landing_page']=='new']['time_spent_on_the_page'].mean(), 2)
old_mean= round(df[df['landing_page']=='old']['time_spent_on_the_page'].mean(), 2)
new_sample_standard_deviation= round(df[df['landing_page']=='new']['time_spent_on_the_page'].std(), 2)
old_sample_standard_deviation= round(df[df['landing_page']=='old']['time_spent_on_the_page'].std(), 2)
print('The time spent on the new landing page group has a mean of '+str (new_mean))
print('The time spent on the old landing page group has a mean of '+str (old_mean))
print('The time spent on the new landing page group has a a standard deviation of '+ str (new_sample_standard_deviation))
print('The time spent on the old landing page group has a a standard deviation of '+ str (old_sample_standard_deviation))Step 5: Calculate the p-value
#import the required functions
from scipy.stats import ttest_ind
# find the p-value
new_page_times=df[df['landing_page']=='new']['time_spent_on_the_page']
old_page_times=df[df['landing_page']=='old']['time_spent_on_the_page']
t_statistic,p_value=ttest_ind(new_page_times,old_page_times,equal_var =False, alternative='greater')
print('The t-statistic is '+str(t_statistic))
print('The p-value is '+str(p_value))
The t-statistic is 3.7867702694199856
The p-value is 0.0001392381225166549Step 6: Compare the p-value with α
# print the conclusion based on p-value
if p_value < 0.05:
print(f'As the p-value {p_value} is less than the level of significance, we reject the null hypothesis.')
else:
print(f'As the p-value {p_value} is greater than the level of significance, we fail to reject the null hypothesis.')
As the p-value 0.0001392381225166549 is less than the level of significance, we reject the null hypothesis.Step 7: Draw an inference
Since the p-value is less than the 5% significance level, we reject the null hypothesis. Hence, we have enough statistical evidence to say that the time spent on the new landing page is more than the time spent on the old landing page.
Is the conversion rate (the proportion of users who visit the landing page and get converted) for the new page greater than the conversion rate for the old page?
Perform Visual Analysis
# Visual Analysis
fig, axes = plt.subplots(1, 2, figsize=(15, 6))
# Bar plot of conversion rates
axes[0].bar(conversion_summary['landing_page'], conversion_summary['conversion_rate_pct'],
color=['skyblue', 'lightcoral'], alpha=0.7)
axes[0].set_title('Conversion Rates by Landing Page', fontsize=14, fontweight='bold')
axes[0].set_ylabel('Conversion Rate (%)')
axes[0].set_xlabel('Landing Page')
for i, v in enumerate(conversion_summary['conversion_rate_pct']):
axes[0].text(i, v + 0.1, f'{v:.2f}%', ha='center', va='bottom', fontweight='bold')
# Side-by-side comparison with actual numbers
x = np.arange(len(conversion_summary))
width = 0.35
axes[1].bar(x - width/2, conversion_summary['conversions'], width,
label='Conversions', color='lightgreen', alpha=0.7)
axes[1].bar(x + width/2, conversion_summary['total_visitors'] - conversion_summary['conversions'],
width, label='Non-conversions', color='lightcoral', alpha=0.7)
axes[1].set_title('Conversion vs Non-convy Landing Page', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Number of Users')
axes[1].set_xlabel('Landing Page')
axes[1].set_xticks(x)
axes[1].set_xticklabels(conversion_summary['landing_page'])
axes[1].legend()
plt.tight_layout()
plt.show()
Step 1: Define the null and alternate hypotheses
H0: The conversion rate for users of the new page, does not differ from the conversion rate of the users of the old page.
H1: The conversion rate of the new page group is more than the conversion rate of the old page group.
Step 2: Select Appropriate test
Two-Sample Proportions Z-Test
Step 3: Decide the significance level
The the level of significane (alpha) is set as 0.05
Step 4: Perform the test
# Set significance level
alpha = 0.05
#Perform the test
counts = [conversions_new, conversions_old] # New page first for H1: p_new > p_old
nobs = [n_new, n_old]
z_stat, p_value = proportions_ztest(counts, nobs, alternative='larger')
print(f"\nTEST RESULTS:")
print(f"Z-statistic: {z_stat:.4f}")
print(f"P-value: {p_value:.6f}")
print(f"Significance level (α): {alpha}")
Z-statistic: 2.4077
P-value: 0.008026
Significance level (α): 0.05Step 5. Inference and conclusion
Since p-value (0.008026) < α (0.05), we REJECT the null hypothesis
The new landing page has a significantly higher conversion rate than the old page.Are the conversion and preferred language independent or related?
Perform Visual Analysis
#Visual analysis of conversion and language
sns.countplot(data=df, x= 'converted', hue= 'language_preferred');
plt.title('Conversion and Language');
plt.xlabel('Conversion');
plt.ylabel('Count');
Step 1: Define the null and alternate hypotheses
H0: The conversion and preferred language are independent
H1: The conversion and preferred language are related.
Step 2: Select Appropriate test
The Chi Square Test for Independence can be used
Step 3: Decide significance level
The the level of significane (alpha) is set as 0.05
Step 4: Collect and prepare data
#import the appropriate function
from scipy.stats import chi2_contingencyStep 5: Calculate the p-value
#Calculate the p-value
contingency_table = pd.crosstab(df['converted'], df['language_preferred'])
chi2, p_value, dof, expected = chi2_contingency(contingency_table)
print('The chi-square statistic is '+str(chi2))
print('The p-value is '+str(p_value))
The chi-square statistic is 3.0930306905370837
The p-value is 0.21298887487543447Step 6: Compare the p value with 𝛼
# print the conclusion based on p-value
if p_value < 0.05:
print(f'As the p-value {p_value} is less than the level of significance, we reject the null hypothesis.')
else:
print(f'As the p-value {p_value} is greater than the level of significance, we fail to reject the null hypothesis.')Step 7: Draw an inference
Since the p-value is more than the 5% significance level, we fail to reject the null hypothesis. Hence, we do not have enough statistical evidence to say that the conversion and preferred language is related.
4. Is the time spent on the new page the same for the different languages?
Perform visual analysis
sns.boxplot(data=df, x='language_preferred', y= 'time_spent_on_the_page', hue='landing_page')
plt.title('Time Spent on New Landing Page by Language');
plt.xlabel('Language');
plt.ylabel('Time Spent on Page');
Step 1: Define the null and alternative hypotheses
H0: mu Spanish =mu English =mu French
H1: at least one of the languages has a different time spent on the new landing page
Step 2: Select an Appropriate test
one-way ANOVA test
Step 3: Decide significance level
The the level of significane (alpha) is set as 0.05
Step 4: Collect and prepare data
#Define the time spent on the new landing page for each of the preferred languages
spanish_time= round(df[(df['language_preferred']=='Spanish') & (df['landing_page']=='new')]['time_spent_on_the_page'].mean(),2)
english_time= round(df[(df['language_preferred']=='English') & (df['landing_page']=='new')]['time_spent_on_the_page'].mean(),2)
french_time= round(df[(df['language_preferred']=='French') & (df['landing_page']=='new')]['time_spent_on_the_page'].mean(),2)
print('The average time spent on the new landing page for Spanish users (mu Spanish) is '+str(spanish_time),'minutes')
print('The average time spent on the new landing page for English (mu English) users is '+str(english_time),'minutes')
print('The average time spent on the new landing page for French (mu French)users is '+str(french_time),'minutes')
The average time spent on the new landing page for Spanish users (mu Spanish) is 5.84 minutes
The average time spent on the new landing page for English (mu English) users is 6.66 minutes
The average time spent on the new landing page for French (mu French)users is 6.2 minutes
Levene's test:
H0: All the population variances are equal
H1: At least one variance is different from the rest
#Assumption : homogenisity of variance
from scipy.stats import levene
spanish_time= (df[(df['language_preferred']=='Spanish') & (df['landing_page']=='new')]['time_spent_on_the_page'])
english_time= (df[(df['language_preferred']=='English') & (df['landing_page']=='new')]['time_spent_on_the_page'])
french_time= (df[(df['language_preferred']=='French') & (df['landing_page']=='new')]['time_spent_on_the_page'])
statistic,p_value=levene(spanish_time,english_time,french_time)
print('The levene statistic is '+str(statistic))
print('The p-value is '+str(p_value))
The levene statistic is 0.7736446756800186
The p-value is 0.46711357711340173
The p-value is large and we fail to reject the H0 with the Levene test.
Thus, all the population variances are equal.Step 5: Calculate the p-value
# Import the required function
from scipy.stats import f_oneway
# Calculate the p-value
spanish_time= (df[(df['language_preferred']=='Spanish') & (df['landing_page']=='new')]['time_spent_on_the_page'])
english_time= (df[(df['language_preferred']=='English') & (df['landing_page']=='new')]['time_spent_on_the_page'])
french_time= (df[(df['language_preferred']=='French') & (df['landing_page']=='new')]['time_spent_on_the_page'])
f_statistic, p_value = f_oneway(spanish_time, english_time, french_time)
print('The f-statistic is '+str(f_statistic))
print('The p-value is '+str(p_value))
The f-statistic is 0.8543992770006822
The p-value is 0.43204138694325955Step 6: Compare the p-value with 𝛼
# print the conclusion based on p-value
if p_value < 0.05:
print(f'As the p-value {p_value} is less than the level of significance, we reject the null hypothesis.')
else:
print(f'As the p-value {p_value} is greater than the level of significance, we fail to reject the null hypothesis.')
As the p-value 0.43204138694325955 is greater than the level of significance, we fail to reject the null hypothesis.Step 7: Draw an inference
Since the p-value is more than the 5% significance level, we fail to reject the null hypothesis. Hence, we do not have enough statistical evidence to say that at least one of languages has a different time spent on the new landing page.
I conducted an A/B analysis of a new landing page and found that users spent significantly more time on it and showed higher conversion rates, supporting subscription growth. Based on these insights, I recommended further investment in website optimization, faster reporting, and innovative multimedia features to boost engagement. I also advised increasing the sample size for language preference analysis, implementing community-building tools, and continuing A/B testing and user behavior tracking to guide future improvements.
