CWEM System Documentation

Core Functionality

Identity Verification

Utilizes facial embeddings (via MTCNN/Haar + FaceNet) compared against registered profiles (using SVM) to confirm user presence.

Pose Landmark Tracking

Detects 33 3D body pose landmarks (X, Y, Z, visibility) in real-time using MediaPipe for activity analysis.

Activity Classification

Classifies detected poses into predefined activities (e.g., working, on phone, looking away) using a trained model (XGBoost).

Automated Record Management

Calculates activity durations and automatically updates daily and cumulative records in the database (Cassandra).

Browser-Based Operation

Core processing (landmark/embedding extraction) and interaction occur within the user's web browser via JavaScript.

Privacy-Focused Transmission

Only transmits summarized, non-identifiable activity data (JSON format) to the backend server periodically.

System Architecture: Processing Flow

1. Frame Acquisition

Video frames are captured from the user's webcam using the browser's `getUserMedia` API.

Requires HTTPS connection for security and API access permission.

2. Feature Extraction (Client-Side)

Key visual features are extracted from each frame directly within the browser.

Pose Landmarks: MediaPipe JS library calculates 33 3D pose landmarks.
Face Detection: MTCNN or Haar Cascade (via JS library/WASM) detects face bounding box.
Face Cropping: Face region is isolated from the frame.
Face Embedding: FaceNet model (via TensorFlow.js or similar) generates a 128-dimension vector representing facial features.

3. Prediction & Analysis

Extracted features are used for recognition and classification, potentially involving backend communication.

Face Recognition: Embedding sent to backend API; compared against database using trained SVM model to identify user.
Pose Classification: Pose landmarks sent to backend API; classified into predefined activities using trained XGBoost model.
Backend likely implemented using Flask (Python). Communication via secure AJAX/WebSocket.

4. Duration Calculation

System tracks the time spent by the identified user in the classified pose/activity.

Duration is calculated when the user or the activity changes, or periodically.

5. Data Persistence

Activity summary (User ID, Activity Type, Calculated Duration, Timestamp) is sent to the backend.

Backend logic connects to Cassandra database.
Checks if user/date record exists; creates if new.
Updates `Daily Activity` table with the duration for the specific activity/user/date.
Updates `Total Activity` table, incrementing the cumulative duration for the activity/user.
Only JSON summary data is stored, adhering to privacy protocols.

Training Workflow

1. Data Collection (Offline)

Collect and label video data representing different users and activities.

Extract pose landmarks (MediaPipe) and save with activity labels (e.g., 'working', 'on_phone').
Detect/crop faces (MTCNN/Haar), generate embeddings (FaceNet), and save with user labels.
Data typically stored in CSV format.

2. Model Training

Train separate ML models using the collected, labeled data.

**Face Recognition Model:** Train SVM (or other classifiers like KNN, ANN tested) on face embeddings and user labels. Linear SVM chosen for efficiency/performance balance.
**Pose Classification Model:** Train XGBoost (or others like Random Forest, ANN tested) on pose landmarks and activity labels. XGBoost selected for accuracy and speed.
Data split into training/validation/test sets for tuning and evaluation.

3. Model Deployment

Save the trained models (e.g., using `joblib` or framework-specific methods).

Models are loaded by the backend (Flask API) to perform predictions on incoming data from the client-side application.

Technology Stack

Python

Flask

MediaPipe

MTCNN/Haar

FaceNet

Scikit-learn

SVM

XGBoost

OpenCV

TensorFlow/Keras

JavaScript

HTML5

CSS3

Cassandra

AWS

Nginx

Let's Encrypt

GitHub

Data Privacy Approach

Prioritizing Confidentiality

The system is designed with data minimization and privacy as core principles. Raw image or video data is never stored or transmitted off the user's local machine. Processing occurs as follows:

Facial landmarks and embeddings are generated locally in the browser.
These mathematical representations are sent to the backend for prediction (if needed).
Only the final, aggregated results (e.g., `UserID: 123, Activity: Working, Duration: 300s, Timestamp: ...`) are securely transmitted (via HTTPS) and stored in the database as a JSON summary.
This approach ensures compliance with privacy standards while enabling productivity analysis.

Data Management Strategy

Database Structure (Cassandra)

Connection established securely from the backend.
Checks for existing user records based on `employee_name` and `date`.
Adds new user/date combinations automatically.
Primary Key strategy likely involves `employee_name` and `date` for partitioning and clustering to optimize queries for daily records.

Key Data Tables

Daily Activity Table: Stores the duration of each classified activity for a specific user on a specific date. Allows for daily reporting.

Total Activity Table: Aggregates the total duration spent by each user on each activity across all time. Useful for long-term performance analysis.

Deployment Models

Cloud Deployment

Utilizes platforms like AWS EC2. Requires setup of a web server (Nginx as reverse proxy), application server (Gunicorn/Flask), database instance (Cassandra), and SSL certificate (Let's Encrypt) for HTTPS (essential for webcam access).

Alternative Platforms

Adaptable to other cloud providers (Azure, GCP) or PaaS solutions (Heroku, Replit) with appropriate configuration for dependencies and secure connections.

On-Premises Deployment

Can be hosted within an organization's internal network. Requires dedicated hardware, installation of all software dependencies (Python, DB, Web Server), and network configuration.

Potential Enhancements

Emotion Analysis

Integrating emotion detection models could provide insights into employee sentiment and well-being. By analyzing facial expressions for indicators of stress, happiness, or fatigue, the system could potentially offer anonymized, aggregated feedback to management regarding the overall work environment mood, helping to identify systemic issues or periods of high pressure.

Company Work Environment Management System Lite