mdinfotech.net  



Section 1 Foundations of ML and Data

Y'all will design and build a web-based appointment booking system using Python (Flask) for the backend, HTML/CSS/JavaScript for the frontend, and integrate a basic AI recommendation feature to enhance user experience. They will also learn UI design fundamentals to create a user-friendly interface.

This unit consists of two major components:
  1. Individual Exercises

  2. Listed below. 100/80/60 Quizzes will be given.


  3. Group project (20% of final mark)

  4. In groups, you will develop a modern web app using Flask and Python. This project will continue throughout the whole unit. Details will be given soon.

The Theory

Learn the basics of supervised machine learning by:
  • Writing a simple linear regression algorithm from scratch (no ML libraries)
  • Collecting your own training data
  • Evaluating how well your model predicts new values


What Is Linear Regression?

Linear regression finds the "best-fit" straight line through a set of data points. The equation is:

$$ y\ =\ mx\ +\ b $$

Your job is to calculate the best values of m (slope) and b (intercept) based on your training data. But what does "best" mean?


The Problem

You have data points:

$$ (x_1, y_1),\ (x_2, y_2),\ \ldots,\ (x_n, y_n) $$

and you want a line that gets as close as possible to all the points.


The Solution: Minimize the Total Error
The "error" for each point is how far off the prediction is from the actual value:

$$ Error_i\ =\ y_i\ -\ (mx_i\ +\ b) $$


But some errors are positive and some are negative — they cancel each other out. So instead, we square the errors to make them all positive:

$$ Squared Error_i\ = ( y_i\ −\ (mx_i\ + b ))^2$$

The total error is the sum of squared errors (SSE):

$$ SSE = \sum_{i=1}^{n} \left( y_i - (mx_i + b) \right)^2 $$

The Least Squares Method

The least squares method means:
Find the values of m and b that minimize this total squared error.

This is a classic problem in calculus. When you solve it, you end up with formulas:

$$ m = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2} $$

$$ b = \bar{y} - m\bar{x} $$

These formulas guarantee the best-fit line by minimizing how far off the predictions are — in the squared sense.

What to do:


1. Collect Your Own Data

  • Pick a real-world scenario that can be modeled with a straight line. Some ideas:
    • Hours studied vs. test score
    • Number of push-ups vs. time taken
    • Day of the week vs. number of text messages sent
    • Temperature vs. number of ice creams sold
    • YouTube views vs. likes on your favorite channel
  • Collect at least 25 data points (you can use a spreadsheet, paper, or online forms).
  • Use 20 of your points to train the model. Keep 5 points as test data
  • Store your data as two lists: x = [...], y = [...]

2. Implement Linear Regression in Python

Use only basic Python: no libraries like scikit-learn. You're allowed to use:
  • math module (for square roots if needed)
  • statistics.mean() (if you think you need it)
Your code must:
  1. Calculate the slope (m) and intercept (b) using the least squares formula.
  2. Create a predict(x) function to estimate y for any x.

3. Test and Evaluate

  • Print the calculated values of m and b
  • Use your predict() function to predict 5 values and compare them to actual test data. Print a table showing these results.
  • Print the Mean Squared Error (MSE) between predicted and actual values
  • Use this code to generate a plot of your line and your data.

What to Submit:

  1. Your Python code
  2. A print out of your program output
  3. A printed spreadsheet of your collected data (/10)
  4. A print out of your plotted line and data
  5. A short write-up :
    • What data you collected and why
    • The m and b values your model found
    • How well you think your model worked

Assessment (/45)
Data Collection (clear, relevant, 25+ points) /10
Code Readability /10
Correct Implementation of m and b /5
Prediction Function Works /5
MSE Calculated and output /5
Plot of line and data /5
Write-up (clarity, reflection, correctness) /5

What is Data Preprocessing?

Before feeding data into a machine learning model, we often need to clean, transform, or standardize it so the model can learn effectively. This process is called data preprocessing.

One common task in preprocessing is scaling or normalizing features.


What Does It Mean to "Normalize" or "Scale" a Feature?

Imagine you're building a model to predict a student’s test score based on two features:

  • study_hours (ranges from 0 to 10)
  • pages_read (ranges from 0 to 500)

These two numbers are on very different scales.

A machine learning model might treat the feature with larger numbers (pages_read) as more important—just because of its scale—not because it actually has more influence on the outcome. That’s a problem.

To fix this, we scale the features so they’re on similar ranges.


Two Common Manual Methods

1. Min-Max Normalization

Rescales values to a range of 0 to 1:

$$ x_{\text{normalized}} = \frac{x - \text{min}(x)}{\text{max}(x) - \text{min}(x)} $$

Example:

If study_hours = 6, with min = 0 and max = 10:

$$ \frac{6 - 0}{10 - 0} = 0.6 $$

2. Z-Score Standardization

Centers the data around 0 with a standard deviation of 1:

$$ x_{\text{standardized}} = \frac{x - \mu}{\sigma} $$

Where:

  • µ is the mean of the feature
  • Σ is the standard deviation

This method is good when data has outliers or isn’t bounded.


Why Should We Do This?

  • Prevent one feature from dominating just because of its scale.
  • Improve model convergence (makes training faster and more stable).
  • Essential for models like linear regression, KNN, SVM, and neural networks.

Simple Manual Example (Min-Max)


def min_max_normalize(values):
    min_val = min(values)
    max_val = max(values)
    return [(x - min_val) / (max_val - min_val) for x in values] 

study_hours = [1, 2, 4, 6, 10] 
normalized_hours = min_max_normalize(study_hours)
print(normalized_hours)  # Output: [0.1, 0.2, 0.4, 0.6, 1.0] 

Overview

In this assignment, you’ll explore a real-world dataset, perform basic data analysis and transformation, and apply your own linear regression model (from Assignment 1) to make predictions. You’ll also compare your model to a library-based one like sklearn.

Objectives:

  • Practice reading and processing CSV or JSON data
  • Perform basic exploratory data analysis (EDA)
  • Manually scale or normalize data (feature preprocessing).
  • Refine and extend their linear regression model (from Assignment 1)
  • Visualize relationships between variables.
  • Compare manual vs. library-based predictions

Assignment Tasks:

  1. Pick and Import a Dataset

    • Choose a small to medium-sized dataset that interests you from Kaggle Datasets. Tip: Look for datasets with a clear numerical target variable you can try to predict (e.g., test scores, prices, completion time).
    • Use pandas t to load and explore your data:
      • View the first few rows: df.head()
      • Summarize data: df.describe()
      • Check for missing values: df.isnull().sum()
      • Find correlations between columns: df.corr()
  2. Visualize Relationships

    • Use matplotlib to generate scatter plots (e.g., study_time vs score), line graphs and/or histograms.
  3. Feature Engineering

    • Watch Feature Engineering
    • Create new features by combining existing ones (e.g., combine "study time" and "sleep hours" into a "readiness score").
    • Manually normalize or scale features manually using min-max scaling or z-score standardization.
  4. Use Your Own Regression Model

    • Reuse or revise your custom linear regression code from Assignment 1.
    • Select one independent variable (or your engineered feature) to predict a target.
    • Split the data into training and test sets (e.g., 80/20 split).
    • Make predictions on the test set.
    • Calculate Sum of Squared Errors (SSE) for your predictions.
  5. Compare with sklearn.linear_model.LinearRegression

    • Use sklearn.linear_model.LinearRegression to fit and predict.
    • Compare your predictions to sklearn's.
    • Calculate SSE for both models and discuss:
      • Which model performed better?
      • Why might sklearn perform differently?
      • Was your feature scaling effective?

What to Hand in:

  1. A description of your dataset (and a link to it)
  2. All code for data import, EDA, visualization, and regression
  3. Explanations of what you observed and learned
  4. Manual feature scaling and justification
  5. SSE values and interpretation
  6. Screenshots/plots of your visualizations

Section 2 App Development and Integration

You will need to be comfortable using HTML and CSS in this unit.

If you don't have any experience with HTML and CSS, code along with this video first: HTML and CSS Crash Course. Note: For this unit, all CSS must be included in an external CSS file. The video does not do this, rather it shows you how to add CSS to the top of the HTML document.

To learn more about specific aspects of each language use w3Schools HTML Tutorials and w3Schools CSS Tutorial.

Once you are comfortable with using HTML and CSS, learn about responsive design using tutorials like these: Simple Responsive Design, A practical guide to responsive web design.

Useful HTML Resources
  • Valid HTML5 document (Explanation)
  • Intro to HTML 5 Tutorials at w3Schools.
  • HTML Validator
  • HTML Reference
  • Useful CSS Resources
  • w3Schools Intro to CSS
  • CSS Validator
  • CSS Reference
  • Flask is a lightweight, flexible, and free Python web framework that's commonly used to build web applications, APIs, and microservices. It's considered a microframework because it provides a minimal core set of functionalities and relies on extensions for more advanced features. Flask is suitable for both beginners and experienced developers, making it a popular choice for web development projects.

    Jinja2 is a powerful and popular template engine for the Python programming language. It allows you to create dynamic HTML, XML, or other markup formats by embedding placeholders (called variables or tags) in a template file, which are then replaced with actual data at runtime.

    Watch and code along with

    this Flask tutorial playlist

    To run a Flask App

    1. Using windows command line, run the python flask app. This starts the local web server from the folder containing the app. If everything goes well, it will look something like this:
      PS Microsoft.PowerShell.Core\FileSystem::H:\Documents\CompSci 12\Flask Project< py app_AI_like.py             
       * Serving Flask app 'app_AI_like'
       * Debug mode: on
      WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
       * Running on http://127.0.0.1:5000
      Press CTRL+C to quit
       * Restarting with stat
       * Debugger is active!
       * Debugger PIN: 141-699-995
    2. Go to a browser and type http://127.0.0.1:5000/. This should show your current app running.
    Using the technology taught in the previous lesson, create a basic web form with fields for Name, Date, and Time and a submit button that allows a user to book an appointment. When the appointment is made, load a page that shows a list of all appointments made. (Ms Wear's example is saved on her drive at Documents\CompSci 12\Flask Project)
    1. Upload Assignment 1 to Github
    2. Add the ability to store and retrieve appointments in/from a json file
    3. Also add: upon making an appointment, show a success/failure message above the list of booked appointments.
    4. Commit your changes to Github

    Assessment

    Working Flask app that:
    1. Accepts appointment bookings
    2. Saves and loads data from a JSON file
    3. Displays a success or failure message
    GitHub repo with:
    1. At least three commits (initial upload + two progress commits)
    2. Descriptive commit messages
    3. Public link submitted via Classroom
    Scenario: Get your appointment booking app to suggest the best available time for a new user, based on past booking data. The goal is to make it smarter over time. If lots of people tend to book between 11:00 AM and 2:00 PM, which times should we recommend to a new user? Likely, you want to suggest less busy times.

    Rule-Based Logic

    Use the method below to suggest less popular times to your users.

    from collections import Counter
    
    # Rule Based Logic: Return the three most popular appointment times
    def get_most_popular_times(appointments):
        # Extract all the appointment times from the list of appointment dictionaries
        times =  [a["time"] for a in appointments]
        
        # If there are no appointments, return a default list of popular times
        if not times:
            return  ["10:00", "11:00", "13:00"]
            
        # Count how many times each time appears in the list    
        counts = Counter(times)
        
        # Sort the times by how frequently they occur, in descending order
        sorted_times = sorted(counts.items(), key=lambda x: x[1], reverse=True)
        
        # Return the top 3 most frequent times
        return  [time for time, count in sorted_times [:3]]
    Add this to your app
    suggested_times = get_most_popular_times(appointments)
    
    Display in your HTML
      <p>Suggested Times: </p>
     <ul>
      {% for time in suggested_times %}
         <li>{{ time }} </li>
      {% endfor %}
     </ul>
    
    

    This is "AI-like" behavior without actual machine learning: it uses data-driven suggestions.

    Watchp Machine Learning and Decision Trees

    In Python, scikit-learn can be used to train a simple classifier if you want a real AI model.

    Benefits of the Scikit Model

    • No need for cloud APIs or authentication
    • All runs locally — safe and fast
    • Small data sets are enough
    Resources
  • Machine Learning in Python
  • SciKitLearn Docs
  • Train a Classifier on Appointment Data

    This uses real machine learning with sklearn.tree.DecisionTreeClassifier.

    Scenario: Predict whether a time slot will be "good" or "bad" based on:

    • Day of the week
    • Hour of the day

    If you haven't already, install scikit:

    py -m pip install scikit-learn

    Run This Code

    from sklearn.tree import DecisionTreeClassifier
    
    # Training data: [day_of_week (0=Mon), hour_of_day (24h format)]
    X = [
        [0, 9], [0, 10], [0, 11],
        [1, 11], [1, 12],
        [2, 13], [2, 14],
        [4, 9], [4, 15]
    ]
    
    # Labels: whether that time slot is a "good" time (popular, available)
    y = ["yes", "yes", "yes", "yes", "yes", "no", "no", "no", "no"]
    
    # Train model
    model = DecisionTreeClassifier()
    model.fit(X, y)
    
    # Predict a new time slot
    day = 1    # Tuesday
    hour = 11  # 11 AM
    prediction = model.predict([[day, hour]])
    
    print("Should we suggest this time?", prediction[0])
    

    Visualize the Model

    Add this code to the previous example to visualize the tree the model creates:

    
    # add this to the top
    from sklearn.tree import export_text
    
    # add this after calling model.predict()
    print(export_text(model, feature_names= ["day", "hour" ]))
    

    Now Add This To Your Project

    • Train on past appointment data (e.g., from appointments.json)
    • Use .predict( [weekday, hour ]) to suggest whether a time is "good"
    • Integrate this logic into the Flask route alongside request.form [... ]
    • Let the AI suggest times in the form or auto-fill them.
       <label>Suggested Time: {{ suggested_times[0] }}</label>
      <input type="time" name="time" value="{{ suggested_times[0] }}">
          

    Assessment

    Demonstrate your use of this model in your code from Assignment 3.

    Get it marked. Assessment criteria: tba

    Build a Flask web app that recommends study times, topics, or review strategies based on user input and past learning behavior. Use scikit-learn to model user success patterns.

    Features
    • Base App Functionality:
      Students log their study sessions:
      • Subject
      • Time spent
      • Time of day (Morning, Afternoon, Evening, Night)
      • Mood or energy level
      • Whether they felt it was effective or not
    • App stores data in a JSON called study_sessions.json
    • Jinja2 templates show past study history
    AI Component
  • Predicts which conditions (time of day, subject, mood) lead to effective sessions
  • Recommends the best time to study based on a student's history
  • Watch and program along with Simple Machine Learning Code Tutorial for Beginners with Sklearn.
    It covers:
    1. SciKit Algorithms
    2. Assessing accuracy of predictions
    3. Saving your model
    AI Evaluation Tutorial: Measuring Your Model’s Accuracy

    In this exercise, you’ll build a new feature in your Flask app to test how accurate your AI model is using the data you've collected in study_sessions.json.

    What we'll do:
    • Use 20% of your study session data as test data
    • Train your AI model on the remaining 80%
    • Show the accuracy of the predictions
    • Display the result on a web page
    Update app.py with a New Route
    1. Open your app.py file.
    2. Add this import at the top if it isn’t already there:
      from sklearn.model_selection import train_test_split
      from sklearn.metrics import accuracy_score
      import pandas as pd
      
    3. Then scroll down and add this new route:
    @app.route('/test-ai')
    def test_ai():
        sessions = load_sessions()
    
        if len(sessions) < 10:
            flash("Not enough data to test the model. Please log at least 10 sessions.")
            return redirect('/log')
    
        # Convert to DataFrame
        df = pd.DataFrame(sessions)
    
        # Encode text columns into numbers
        # Note: you may have named your features differently
        df['mood'] = mood_encoder.fit_transform(df['mood'])
        df['time_of_day'] = time_encoder.fit_transform(df['time_of_day'])
        df['subject'] = subject_encoder.fit_transform(df['subject'])
        df['effective'] = df['effective'].map({"Yes": 1, "No": 0})
    
        # Split into 80% train and 20% test
        X = df[['duration', 'time_of_day', 'mood', 'subject']]
        y = df['success']
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
        # Train and test the model
        model = DecisionTreeClassifier()
        model.fit(X_train, y_train)
        predictions = model.predict(X_test)
        accuracy = round(accuracy_score(y_test, predictions) * 100, 2)
    
        return render_template('test_ai_accuracy.html', accuracy=accuracy, total=len(sessions))
    
    Create a New Template

    Create a file called test_ai_accuracy.html in your templates/ folder:

    <!DOCTYPE html>
    <html>
    <head>
        <title>AI Accuracy Test</title>
    </head>
    <body>
        <h1>AI Model Accuracy Report</h1>
        <p>Tested on 20% of {{ total }} logged sessions.p>
        <h2>Accuracy: {{ accuracy }}%h2>
    
        <p><a href="{{ url_for('home') }}">Back to Home</a></p>
    </body>
    </html>
    
    Try It Out
    1. Make sure you have at least 10 study sessions logged.
    2. Run your Flask app.
    3. Visit /test-ai in your browser (e.g. http://127.0.0.1:5000/test-ai)
    4. You should see a page like this:
    AI Model Accuracy Report
    Tested on 20% of 25 logged sessions.
    Accuracy: 84.00%
    
    Add a Link to Your Homepage

    If you want a link from your home page:

    <p><a href="{{ url_for('test_ai') }}">Evaluate AI Accuracy</a></p>
    
    What’s Going On?
    • 80% of your data is used to train the model.
    • 20% is used to test how well the model performs on new data.
    • You see how often the model predicted correctly — that’s the accuracy.
    Exercise Complete!

    You now have a simple way to measure how effective your AI is using real user data.

    The Assignment: Pick the Best Model

    Pick another ML model to predict the best appointment time. Research Random Forest, Logistic Regression, and KNN models. Pick one and assess the model accuracy against the Decision Tree and display the results.

    Section 3 Intelligent Interfaces and Final Project

    Watch UI Design Principles. need more stuff on design https://www.browserstack.com/guide/elements-of-modern-web-design and jinja2 form validation (https://www.youtube.com/watch?v=7df-ZY9q-2A) and exporting flask for web servers (https://www.youtube.com/watch?v=4_RYQJfiuVU)
    Watch Easy Flask App Deployment with PythonAnywhere | Beginner's Step-by-Step Guide. OR follow these instructions:

    1. Prepare Your Flask App

    Let’s assume you have a basic app structure like this:

    myapp/
    ├── app.py
    ├── data.json
    ├── templates/
    │   └── index.html
    └── static/
    

    Example app.py:

    from flask import Flask, request, jsonify
    import json
    import os
    
    app = Flask(__name__)
    DATA_FILE = os.path.join(os.path.dirname(__file__), 'data.json')
    
    @app.route('/')
    def index():
        return 'Hello from Flask!'
    
    @app.route('/add', methods=<'POST'>)
    def add_data():
        new_entry = request.get_json()
        with open(DATA_FILE, 'r') as f:
            data = json.load(f)
        data.append(new_entry)
        with open(DATA_FILE, 'w') as f:
            json.dump(data, f, indent=4)
        return jsonify({'status': 'success', 'data': new_entry})
    

    Example data.json:

    <>
    

    2. Sign Up and Create a Web App on PythonAnywhere

    1. Go to https://www.pythonanywhere.com and sign up or log in.
    2. On the Dashboard, click Web > Add a new web app.
    3. Choose Manual configuration > Flask > your Python version (e.g., 3.10).

    3. Upload Your Files

    1. In the Files tab, create a folder (e.g., myapp/) and upload your app.py, data.json, templates/, and static/ folders/files.
    2. Make sure data.json has write permissions (you can leave it as-is; you’re the only user).

    4. Configure the WSGI File

    Go to Web > [your app ] > WSGI configuration file.

    Edit the file to look like this (adjust the path to your folder):

    import sys
    import os
    
    path = '/home/yourusername/myapp'
    if path not in sys.path:
        sys.path.append(path)
    
    from app import app as application
    

    5. Reload and Test

    • Go back to the Web tab and click Reload.
    • Visit your app’s URL (e.g., yourusername.pythonanywhere.com) to confirm it’s running.
    • Send a POST request to /add (using Postman or JavaScript) to test the JSON writing.

    Important Notes

    • Free PythonAnywhere accounts cannot receive external HTTP requests to /add unless you're making the request from a client hosted on PythonAnywhere (e.g., your own JS frontend on the same domain).
    • PythonAnywhere allows write access to files in your home directory, so writing to data.json is okay.
    • You must avoid using absolute paths; use os.path.join(os.path.dirname(__file__), 'data.json') to make sure it works on their file system.
    mprove the look of your site using CSS and ensure all form data is validated. Publish to the web.
    Create an online app using Flask that utilizes AI to provide a service to your audience.