๐Ÿ Python Examples - Comprehensive Code Library
โ† Back to PranavKulkarni.org
Lesson 5 ยท Data Science

Statistical Analysis with Python

Perform hypothesis testing, regression, and descriptive statistics with Python.

Descriptive Statistics

Summarize the main features of your data using measures of central tendency and dispersion.

import pandas as pd

stats = df["Price"].describe()
mean_val = df["Price"].mean()
median_val = df["Price"].median()
std_dev = df["Price"].std()

Correlation

Measure the strength and direction of the linear relationship between two variables.

correlation_matrix = df.corr()
sns.heatmap(correlation_matrix, annot=True)

Hypothesis Testing

Use scipy.stats to determine if observed differences are statistically significant.

from scipy import stats

# T-test for independent samples
t_stat, p_val = stats.ttest_ind(group1, group2)

if p_val < 0.05:
    print("Statistically significant difference")

Linear Regression

Model the relationship between a dependent variable and one or more independent variables.

import statsmodels.api as sm

X = sm.add_constant(df["SquareFootage"])
model = sm.OLS(df["Price"], X).fit()
print(model.summary())

โœ… Practice (20 minutes)

  • Calculate the correlation between "Age" and "Salary" in your dataset.
  • Perform a Chi-square test to check independence between two categorical variables.
  • Run a simple linear regression and interpret the R-squared value.
  • Create a box plot to visualize outliers in your numerical data.