Machine Learning with Scikit-learn

The Machine Learning Workflow

Data Prep: Load and clean data (using Pandas/NumPy).
Feature Selection: Pick inputs (X) and target (y).
Split: Divide data into training and testing sets.
Train: Fit a model to the training data.
Evaluate: Test performance on the testing set.

Linear Regression Example

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict and Evaluate
preds = model.predict(X_test)
print(f"MSE: {mean_squared_error(y_test, preds)}")

Classification with Random Forest

from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train, y_train)
print(f"Accuracy: {clf.score(X_test, y_test)}")

Scaling and Preprocessing

Many algorithms require normalized data. Use StandardScaler or MinMaxScaler.

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

✅ Practice (45 minutes)

Install scikit-learn: pip install scikit-learn.
Load the built-in "iris" dataset (datasets.load_iris()).
Train a Logistic Regression model to classify iris species.
Use classification_report to see precision, recall, and F1-score.
Tune a hyperparameter (like n_estimators) and observe the effect.