Machine Learning with Scikit-learn
Build predictive models using Scikit-learn and explore the world of AI.
The Machine Learning Workflow
- Data Prep: Load and clean data (using Pandas/NumPy).
- Feature Selection: Pick inputs (X) and target (y).
- Split: Divide data into training and testing sets.
- Train: Fit a model to the training data.
- Evaluate: Test performance on the testing set.
Linear Regression Example
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Train model
model = LinearRegression()
model.fit(X_train, y_train)
# Predict and Evaluate
preds = model.predict(X_test)
print(f"MSE: {mean_squared_error(y_test, preds)}")
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Train model
model = LinearRegression()
model.fit(X_train, y_train)
# Predict and Evaluate
preds = model.predict(X_test)
print(f"MSE: {mean_squared_error(y_test, preds)}")
Classification with Random Forest
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train, y_train)
print(f"Accuracy: {clf.score(X_test, y_test)}")
clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train, y_train)
print(f"Accuracy: {clf.score(X_test, y_test)}")
Scaling and Preprocessing
Many algorithms require normalized data. Use StandardScaler or MinMaxScaler.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
โ Practice (45 minutes)
- Install scikit-learn:
pip install scikit-learn. - Load the built-in "iris" dataset (
datasets.load_iris()). - Train a Logistic Regression model to classify iris species.
- Use
classification_reportto see precision, recall, and F1-score. - Tune a hyperparameter (like
n_estimators) and observe the effect.