Section 1 Foundations of ML and Data
Y'all will design and build a web-based appointment booking system using Python (Flask) for the backend, HTML/CSS/JavaScript for the frontend, and integrate a basic AI recommendation feature to enhance user experience. They will also learn UI design fundamentals to create a user-friendly interface.
This unit consists of two major components:Individual Exercises
- Listed below. 100/80/60 Quizzes will be given.
Group project (20% of final mark)
- In groups, you will develop a modern web app using Flask and Python. This project will continue throughout the whole unit. Details will be given soon.
The Theory
Learn the basics of supervised machine learning by:- Writing a simple linear regression algorithm from scratch (no ML libraries)
- Collecting your own training data
- Evaluating how well your model predicts new values
What Is Linear Regression?
Linear regression finds the "best-fit" straight line through a set of data points. The equation is:
$$ y\ =\ mx\ +\ b $$
Your job is to calculate the best values of m (slope) and b (intercept) based on your training data. But what does "best" mean?The Problem
You have data points:
$$ (x_1, y_1),\ (x_2, y_2),\ \ldots,\ (x_n, y_n) $$
and you want a line that gets as close as possible to all the points.
The Solution: Minimize the Total Error
The "error" for each point is how far off the prediction is from the actual value:
$$ Error_i\ =\ y_i\ -\ (mx_i\ +\ b) $$
But some errors are positive and some are negative — they cancel each other out. So instead, we square the errors to make them all positive:
$$ Squared Error_i\ = ( y_i\ −\ (mx_i\ + b ))^2$$
The total error is the sum of squared errors (SSE):
$$ SSE = \sum_{i=1}^{n} \left( y_i - (mx_i + b) \right)^2 $$
The Least Squares MethodThe least squares method means:
Find the values of m and b that minimize this total squared error.
This is a classic problem in calculus. When you solve it, you end up with formulas:
$$ m = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2} $$
$$ b = \bar{y} - m\bar{x} $$
These formulas guarantee the best-fit line by minimizing how far off the predictions are — in the squared sense.
What to do:
1. Collect Your Own Data
- Pick a real-world scenario that can be modeled with a straight line. Some ideas:
- Hours studied vs. test score
- Number of push-ups vs. time taken
- Day of the week vs. number of text messages sent
- Temperature vs. number of ice creams sold
- YouTube views vs. likes on your favorite channel
- Collect at least 25 data points (you can use a spreadsheet, paper, or online forms).
- Use 20 of your points to train the model. Keep 5 points as test data
- Store your data as two lists:
x = [...], y = [...]
2. Implement Linear Regression in Python
Use only basic Python: no libraries like scikit-learn. You're allowed to use:
math
module (for square roots if needed)statistics.mean()
(if you think you need it)
- Calculate the slope (m) and intercept (b) using the least squares formula.
- Create a
predict(x)
function to estimate y for any x.
3. Test and Evaluate
- Print the calculated values of m and b
- Use your predict() function to predict 5 values and compare them to actual test data. Print a table showing these results.
- Print the Mean Squared Error (MSE) between predicted and actual values
- Use this code to generate a plot of your line and your data.
What to Submit:
- Your Python code
- A print out of your program output
- A printed spreadsheet of your collected data (/10)
- A print out of your plotted line and data
- A short write-up :
- What data you collected and why
- The m and b values your model found
- How well you think your model worked
Assessment (/45)
Data Collection (clear, relevant, 25+ points) /10
Code Readability /10
Correct Implementation of m and b /5
Prediction Function Works /5
MSE Calculated and output /5
Plot of line and data /5
Write-up (clarity, reflection, correctness) /5
What is Data Preprocessing?
Before feeding data into a machine learning model, we often need to clean, transform, or standardize it so the model can learn effectively. This process is called data preprocessing.
One common task in preprocessing is scaling or normalizing features.
What Does It Mean to "Normalize" or "Scale" a Feature?
Imagine you're building a model to predict a student’s test score based on two features:
study_hours
(ranges from 0 to 10)pages_read
(ranges from 0 to 500)
These two numbers are on very different scales.
A machine learning model might treat the feature with larger numbers (pages_read
) as more important—just because of its scale—not because it actually has more influence on the outcome. That’s a problem.
To fix this, we scale the features so they’re on similar ranges.
Two Common Manual Methods
1. Min-Max Normalization
Rescales values to a range of 0 to 1:
$$ x_{\text{normalized}} = \frac{x - \text{min}(x)}{\text{max}(x) - \text{min}(x)} $$
Example:
If study_hours = 6
, with min = 0 and max = 10:
$$ \frac{6 - 0}{10 - 0} = 0.6 $$
2. Z-Score Standardization
Centers the data around 0 with a standard deviation of 1:
$$ x_{\text{standardized}} = \frac{x - \mu}{\sigma} $$
Where:
- µ is the mean of the feature
- Σ is the standard deviation
This method is good when data has outliers or isn’t bounded.
Why Should We Do This?
- Prevent one feature from dominating just because of its scale.
- Improve model convergence (makes training faster and more stable).
- Essential for models like linear regression, KNN, SVM, and neural networks.
Simple Manual Example (Min-Max)
def min_max_normalize(values):
min_val = min(values)
max_val = max(values)
return [(x - min_val) / (max_val - min_val) for x in values]
study_hours = [1, 2, 4, 6, 10]
normalized_hours = min_max_normalize(study_hours)
print(normalized_hours) # Output: [0.1, 0.2, 0.4, 0.6, 1.0]
Overview
In this assignment, you’ll explore a real-world dataset, perform basic data analysis and transformation, and apply your own linear regression model (from Assignment 1) to make predictions. You’ll also compare your model to a library-based one like sklearn.
Objectives:
- Practice reading and processing CSV or JSON data
- Perform basic exploratory data analysis (EDA)
- Manually scale or normalize data (feature preprocessing).
- Refine and extend their linear regression model (from Assignment 1)
- Visualize relationships between variables.
- Compare manual vs. library-based predictions
Assignment Tasks:
Pick and Import a Dataset
- Choose a small to medium-sized dataset that interests you from Kaggle Datasets. Tip: Look for datasets with a clear numerical target variable you can try to predict (e.g., test scores, prices, completion time).
- Use
pandas
t to load and explore your data:- View the first few rows: df.head()
- Summarize data: df.describe()
- Check for missing values: df.isnull().sum()
- Find correlations between columns: df.corr()
Visualize Relationships
- Use
matplotlib
to generate scatter plots (e.g., study_time vs score), line graphs and/or histograms.
- Use
Feature Engineering
- Watch Feature Engineering
- Create new features by combining existing ones (e.g., combine "study time" and "sleep hours" into a "readiness score").
- Manually normalize or scale features manually using min-max scaling or z-score standardization.
Use Your Own Regression Model
- Reuse or revise your custom linear regression code from Assignment 1.
- Select one independent variable (or your engineered feature) to predict a target.
- Split the data into training and test sets (e.g., 80/20 split).
- Make predictions on the test set.
- Calculate Sum of Squared Errors (SSE) for your predictions.
Compare with
sklearn.linear_model.LinearRegression
- Use sklearn.linear_model.LinearRegression to fit and predict.
- Compare your predictions to sklearn's.
-
Calculate SSE for both models and discuss:
- Which model performed better?
- Why might sklearn perform differently?
- Was your feature scaling effective?
What to Hand in:
- A description of your dataset (and a link to it)
- All code for data import, EDA, visualization, and regression
- Explanations of what you observed and learned
- Manual feature scaling and justification
- SSE values and interpretation
- Screenshots/plots of your visualizations
Section 2 App Development and Integration
You will need to be comfortable using HTML and CSS in this unit.
If you don't have any experience with HTML and CSS, code along with this video first: HTML and CSS Crash Course. Note: For this unit, all CSS must be included in an external CSS file. The video does not do this, rather it shows you how to add CSS to the top of the HTML document.
To learn more about specific aspects of each language use w3Schools HTML Tutorials and w3Schools CSS Tutorial.
Once you are comfortable with using HTML and CSS, learn about responsive design using tutorials like these: Simple Responsive Design, A practical guide to responsive web design.
Useful HTML ResourcesFlask is a lightweight, flexible, and free Python web framework that's commonly used to build web applications, APIs, and microservices. It's considered a microframework because it provides a minimal core set of functionalities and relies on extensions for more advanced features. Flask is suitable for both beginners and experienced developers, making it a popular choice for web development projects.
Jinja2 is a powerful and popular template engine for the Python programming language. It allows you to create dynamic HTML, XML, or other markup formats by embedding placeholders (called variables or tags) in a template file, which are then replaced with actual data at runtime.
Watch and code along with
this Flask tutorial playlistTo run a Flask App
- Using windows command line, run the python flask app. This starts the local web server from the folder containing the app.
If everything goes well, it will look something like this:
PS Microsoft.PowerShell.Core\FileSystem::H:\Documents\CompSci 12\Flask Project< py app_AI_like.py * Serving Flask app 'app_AI_like' * Debug mode: on WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. * Running on http://127.0.0.1:5000 Press CTRL+C to quit * Restarting with stat * Debugger is active! * Debugger PIN: 141-699-995
- Go to a browser and type http://127.0.0.1:5000/. This should show your current app running.
- Upload Assignment 1 to Github
- Add the ability to store and retrieve appointments in/from a json file
- Also add: upon making an appointment, show a success/failure message above the list of booked appointments.
- Commit your changes to Github
Assessment
Working Flask app that:- Accepts appointment bookings
- Saves and loads data from a JSON file
- Displays a success or failure message
- At least three commits (initial upload + two progress commits)
- Descriptive commit messages
- Public link submitted via Classroom
Rule-Based Logic
Use the method below to suggest less popular times to your users.
from collections import Counter
# Rule Based Logic: Return the three most popular appointment times
def get_most_popular_times(appointments):
# Extract all the appointment times from the list of appointment dictionaries
times = [a["time"] for a in appointments]
# If there are no appointments, return a default list of popular times
if not times:
return ["10:00", "11:00", "13:00"]
# Count how many times each time appears in the list
counts = Counter(times)
# Sort the times by how frequently they occur, in descending order
sorted_times = sorted(counts.items(), key=lambda x: x[1], reverse=True)
# Return the top 3 most frequent times
return [time for time, count in sorted_times [:3]]
Add this to your app
suggested_times = get_most_popular_times(appointments)Display in your HTML
<p>Suggested Times: </p> <ul> {% for time in suggested_times %} <li>{{ time }} </li> {% endfor %} </ul>
This is "AI-like" behavior without actual machine learning: it uses data-driven suggestions.
Watchp Machine Learning and Decision Trees
In Python, scikit-learn can be used to train a simple classifier if you want a real AI model.
Benefits of the Scikit Model
- No need for cloud APIs or authentication
- All runs locally — safe and fast
- Small data sets are enough
Train a Classifier on Appointment Data
This uses real machine learning with sklearn.tree.DecisionTreeClassifier
.
Scenario: Predict whether a time slot will be "good" or "bad" based on:
- Day of the week
- Hour of the day
If you haven't already, install scikit:
py -m pip install scikit-learn
Run This Code
from sklearn.tree import DecisionTreeClassifier
# Training data: [day_of_week (0=Mon), hour_of_day (24h format)]
X = [
[0, 9], [0, 10], [0, 11],
[1, 11], [1, 12],
[2, 13], [2, 14],
[4, 9], [4, 15]
]
# Labels: whether that time slot is a "good" time (popular, available)
y = ["yes", "yes", "yes", "yes", "yes", "no", "no", "no", "no"]
# Train model
model = DecisionTreeClassifier()
model.fit(X, y)
# Predict a new time slot
day = 1 # Tuesday
hour = 11 # 11 AM
prediction = model.predict([[day, hour]])
print("Should we suggest this time?", prediction[0])
Visualize the Model
Add this code to the previous example to visualize the tree the model creates:
# add this to the top
from sklearn.tree import export_text
# add this after calling model.predict()
print(export_text(model, feature_names= ["day", "hour" ]))
Now Add This To Your Project
- Train on past appointment data (e.g., from
appointments.json
) - Use
.predict( [weekday, hour ])
to suggest whether a time is "good" - Integrate this logic into the Flask route alongside
request.form [... ]
- Let the AI suggest times in the form or auto-fill them.
<label>Suggested Time: {{ suggested_times[0] }}</label> <input type="time" name="time" value="{{ suggested_times[0] }}">
Assessment
Demonstrate your use of this model in your code from Assignment 3.
Build a Flask web app that recommends study times, topics, or review strategies based on user input and past learning behavior. Use scikit-learn to model user success patterns.
Features- Base App Functionality:
Students log their study sessions:- Subject
- Time spent
- Time of day (Morning, Afternoon, Evening, Night)
- Mood or energy level
- Whether they felt it was effective or not
- App stores data in a JSON called
study_sessions.json
- Jinja2 templates show past study history
It covers:
- SciKit Algorithms
- Assessing accuracy of predictions
- Saving your model
In this exercise, you’ll build a new feature in your Flask app to test how accurate your AI model is using the data you've collected in study_sessions.json
.
- Use 20% of your study session data as test data
- Train your AI model on the remaining 80%
- Show the accuracy of the predictions
- Display the result on a web page
app.py
with a New Route- Open your
app.py
file. - Add this import at the top if it isn’t already there:
from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score import pandas as pd
- Then scroll down and add this new route:
@app.route('/test-ai')
def test_ai():
sessions = load_sessions()
if len(sessions) < 10:
flash("Not enough data to test the model. Please log at least 10 sessions.")
return redirect('/log')
# Convert to DataFrame
df = pd.DataFrame(sessions)
# Encode text columns into numbers
# Note: you may have named your features differently
df['mood'] = mood_encoder.fit_transform(df['mood'])
df['time_of_day'] = time_encoder.fit_transform(df['time_of_day'])
df['subject'] = subject_encoder.fit_transform(df['subject'])
df['effective'] = df['effective'].map({"Yes": 1, "No": 0})
# Split into 80% train and 20% test
X = df[['duration', 'time_of_day', 'mood', 'subject']]
y = df['success']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train and test the model
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
accuracy = round(accuracy_score(y_test, predictions) * 100, 2)
return render_template('test_ai_accuracy.html', accuracy=accuracy, total=len(sessions))
Create a New TemplateCreate a file called test_ai_accuracy.html
in your templates/
folder:
<!DOCTYPE html>
<html>
<head>
<title>AI Accuracy Test</title>
</head>
<body>
<h1>AI Model Accuracy Report</h1>
<p>Tested on 20% of {{ total }} logged sessions.p>
<h2>Accuracy: {{ accuracy }}%h2>
<p><a href="{{ url_for('home') }}">Back to Home</a></p>
</body>
</html>
Try It Out- Make sure you have at least 10 study sessions logged.
- Run your Flask app.
- Visit
/test-ai
in your browser (e.g.http://127.0.0.1:5000/test-ai
) - You should see a page like this:
AI Model Accuracy Report
Tested on 20% of 25 logged sessions.
Accuracy: 84.00%
Add a Link to Your HomepageIf you want a link from your home page:
<p><a href="{{ url_for('test_ai') }}">Evaluate AI Accuracy</a></p>
What’s Going On?- 80% of your data is used to train the model.
- 20% is used to test how well the model performs on new data.
- You see how often the model predicted correctly — that’s the accuracy.
You now have a simple way to measure how effective your AI is using real user data.
The Assignment: Pick the Best Model
Pick another ML model to predict the best appointment time. Research Random Forest, Logistic Regression, and KNN models. Pick one and assess the model accuracy against the Decision Tree and display the results.Section 3 Intelligent Interfaces and Final Project
1. Prepare Your Flask App
Let’s assume you have a basic app structure like this:
myapp/
├── app.py
├── data.json
├── templates/
│ └── index.html
└── static/
Example app.py
:
from flask import Flask, request, jsonify
import json
import os
app = Flask(__name__)
DATA_FILE = os.path.join(os.path.dirname(__file__), 'data.json')
@app.route('/')
def index():
return 'Hello from Flask!'
@app.route('/add', methods=<'POST'>)
def add_data():
new_entry = request.get_json()
with open(DATA_FILE, 'r') as f:
data = json.load(f)
data.append(new_entry)
with open(DATA_FILE, 'w') as f:
json.dump(data, f, indent=4)
return jsonify({'status': 'success', 'data': new_entry})
Example data.json
:
<>
2. Sign Up and Create a Web App on PythonAnywhere
- Go to https://www.pythonanywhere.com and sign up or log in.
- On the Dashboard, click Web > Add a new web app.
- Choose Manual configuration > Flask > your Python version (e.g., 3.10).
3. Upload Your Files
- In the Files tab, create a folder (e.g.,
myapp/
) and upload yourapp.py
,data.json
,templates/
, andstatic/
folders/files. - Make sure
data.json
has write permissions (you can leave it as-is; you’re the only user).
4. Configure the WSGI File
Go to Web > [your app ] > WSGI configuration file.
Edit the file to look like this (adjust the path to your folder):
import sys
import os
path = '/home/yourusername/myapp'
if path not in sys.path:
sys.path.append(path)
from app import app as application
5. Reload and Test
- Go back to the Web tab and click Reload.
- Visit your app’s URL (e.g.,
yourusername.pythonanywhere.com
) to confirm it’s running. - Send a POST request to
/add
(using Postman or JavaScript) to test the JSON writing.
Important Notes
- Free PythonAnywhere accounts cannot receive external HTTP requests to
/add
unless you're making the request from a client hosted on PythonAnywhere (e.g., your own JS frontend on the same domain). - PythonAnywhere allows write access to files in your home directory, so writing to
data.json
is okay. - You must avoid using absolute paths; use
os.path.join(os.path.dirname(__file__), 'data.json')
to make sure it works on their file system.