Core Functionality
Identity Verification
Utilizes facial embeddings (via MTCNN/Haar + FaceNet) compared against registered profiles (using SVM) to confirm user presence.
Pose Landmark Tracking
Detects 33 3D body pose landmarks (X, Y, Z, visibility) in real-time using MediaPipe for activity analysis.
Activity Classification
Classifies detected poses into predefined activities (e.g., working, on phone, looking away) using a trained model (XGBoost).
Automated Record Management
Calculates activity durations and automatically updates daily and cumulative records in the database (Cassandra).
Browser-Based Operation
Core processing (landmark/embedding extraction) and interaction occur within the user's web browser via JavaScript.
Privacy-Focused Transmission
Only transmits summarized, non-identifiable activity data (JSON format) to the backend server periodically.
System Architecture: Processing Flow
1. Frame Acquisition
Video frames are captured from the user's webcam using the browser's `getUserMedia` API.
2. Feature Extraction (Client-Side)
Key visual features are extracted from each frame directly within the browser.
- Pose Landmarks: MediaPipe JS library calculates 33 3D pose landmarks.
- Face Detection: MTCNN or Haar Cascade (via JS library/WASM) detects face bounding box.
- Face Cropping: Face region is isolated from the frame.
- Face Embedding: FaceNet model (via TensorFlow.js or similar) generates a 128-dimension vector representing facial features.
3. Prediction & Analysis
Extracted features are used for recognition and classification, potentially involving backend communication.
- Face Recognition: Embedding sent to backend API; compared against database using trained SVM model to identify user.
- Pose Classification: Pose landmarks sent to backend API; classified into predefined activities using trained XGBoost model.
- Backend likely implemented using Flask (Python). Communication via secure AJAX/WebSocket.
4. Duration Calculation
System tracks the time spent by the identified user in the classified pose/activity.
5. Data Persistence
Activity summary (User ID, Activity Type, Calculated Duration, Timestamp) is sent to the backend.
- Backend logic connects to Cassandra database.
- Checks if user/date record exists; creates if new.
- Updates `Daily Activity` table with the duration for the specific activity/user/date.
- Updates `Total Activity` table, incrementing the cumulative duration for the activity/user.
- Only JSON summary data is stored, adhering to privacy protocols.
Training Workflow
1. Data Collection (Offline)
Collect and label video data representing different users and activities.
- Extract pose landmarks (MediaPipe) and save with activity labels (e.g., 'working', 'on_phone').
- Detect/crop faces (MTCNN/Haar), generate embeddings (FaceNet), and save with user labels.
- Data typically stored in CSV format.
2. Model Training
Train separate ML models using the collected, labeled data.
- **Face Recognition Model:** Train SVM (or other classifiers like KNN, ANN tested) on face embeddings and user labels. Linear SVM chosen for efficiency/performance balance.
- **Pose Classification Model:** Train XGBoost (or others like Random Forest, ANN tested) on pose landmarks and activity labels. XGBoost selected for accuracy and speed.
- Data split into training/validation/test sets for tuning and evaluation.
3. Model Deployment
Save the trained models (e.g., using `joblib` or framework-specific methods).
Technology Stack
Python
Flask
MediaPipe
MTCNN/Haar
FaceNet
Scikit-learn
SVM
XGBoost
OpenCV
TensorFlow/Keras
JavaScript
HTML5
CSS3
Cassandra
AWS
Nginx
Let's Encrypt
GitHub
Data Privacy Approach
Prioritizing Confidentiality
The system is designed with data minimization and privacy as core principles. Raw image or video data is never stored or transmitted off the user's local machine. Processing occurs as follows:
- Facial landmarks and embeddings are generated locally in the browser.
- These mathematical representations are sent to the backend for prediction (if needed).
- Only the final, aggregated results (e.g., `UserID: 123, Activity: Working, Duration: 300s, Timestamp: ...`) are securely transmitted (via HTTPS) and stored in the database as a JSON summary.
- This approach ensures compliance with privacy standards while enabling productivity analysis.
Data Management Strategy
Database Structure (Cassandra)
- Connection established securely from the backend.
- Checks for existing user records based on `employee_name` and `date`.
- Adds new user/date combinations automatically.
- Primary Key strategy likely involves `employee_name` and `date` for partitioning and clustering to optimize queries for daily records.
Key Data Tables
Deployment Models
Potential Enhancements
Emotion Analysis
Integrating emotion detection models could provide insights into employee sentiment and well-being. By analyzing facial expressions for indicators of stress, happiness, or fatigue, the system could potentially offer anonymized, aggregated feedback to management regarding the overall work environment mood, helping to identify systemic issues or periods of high pressure.