1. High-Level System Architecture

This diagram outlines the overarching infrastructure, showing how the client layer interacts with the Django backend and the underlying data storage systems.

Code snippet

graph TD
    subgraph Client Layer
        A[React SPA Vite]
        B[Browser Extension MV3]
    end

    subgraph Backend Layer
        C[Django Ninja API]
        D[Celery Workers]
        E[Django Admin]
    end

    subgraph Data Layer
        F[(PostgreSQL 16\npg_trgm, pgvector, pg_fts)]
        G[(Redis\nBroker, Cache, Session)]
    end

    A -- HTTPS / REST --> C
    B -- HTTPS / REST --> C
    
    C --> F
    C --> G
    
    D --> F
    D --> G
    
    C -. Enqueues Async Tasks .-> D
    
    classDef default fill:#f9f9f9,stroke:#333,stroke-width:1px;
    classDef storage fill:#e1f5fe,stroke:#03a9f4,stroke-width:2px;
    class F,G storage;

2. Core Entity-Relationship Diagram (ERD)

This mid-level diagram maps out the core Django database models and how they relate to one another, particularly focusing on how users, words, and the spaced repetition system (SRS) intersect.

Code snippet

erDiagram
    USER ||--o{ USER_WORD_KNOWLEDGE : "tracks status of"
    USER ||--o{ FLASHCARD : "owns"
    USER ||--o{ DECK : "creates"
    USER ||--o{ USER_DECK_SUBSCRIPTION : "subscribes to"
    
    WORD ||--o{ WORD_SENSE : "has meaning"
    WORD ||--o{ WORD_RELATIONSHIP : "relates to"
    
    WORD_SENSE ||--o{ EXAMPLE_SENTENCE : "provides context"
    WORD_SENSE ||--o{ CONTEXT_EXAMPLE : "found in"
    
    DECK ||--o{ DECK_WORD : "contains"
    DECK_WORD }o--|| WORD : "references"
    
    FLASHCARD ||--|| SPACED_REPETITION_DATA : "has FSRS state"
    FLASHCARD }o--|| WORD_SENSE : "tests"
    FLASHCARD }o--o| CONTEXT_EXAMPLE : "uses context"

    DOCUMENT_READING_PROGRESS }o--|| USER : "saves state for"

3. Flashcard Creation & SRS Initialization Pipeline

This low-level sequence diagram details the exact flow of data when a user adds a word to their library, showing the database transaction that links the word, the user’s knowledge state, and the Spaced Repetition System (SRS).

Code snippet

sequenceDiagram
    actor User
    participant React UI
    participant Django API
    participant PostgreSQL
    
    User->>React UI: Clicks "Add to Library"
    React UI->>Django API: POST /api/learning/flashcards/
    
    activate Django API
    Django API->>PostgreSQL: BEGIN TRANSACTION
    
    Django API->>PostgreSQL: 1. Create Flashcard
    Django API->>PostgreSQL: 2. Upsert UserWordKnowledge (status='learning')
    Django API->>PostgreSQL: 3. Create SpacedRepetitionData (state=0 / New)
    
    alt If custom context provided
        Django API->>PostgreSQL: 4. Update ContextExample (increment use_count)
    end
    
    Django API->>PostgreSQL: COMMIT
    Django API-->>React UI: Return {flashcard_id, srs_state, due_date}
    deactivate Django API
    
    React UI->>React UI: Invalidate RTK Query known-word cache
    React UI-->>User: Show success toast

4. Async PDF Processing & OCR Flow

This flowchart illustrates the asynchronous pipeline handled by Celery when a user uploads a PDF. It highlights the decision tree for triggering OCR and how words are mapped to physical page locations.

Code snippet

flowchart TD
    Start([User Uploads PDF]) --> Dedupe{File Hash\nExists?}
    
    Dedupe -- Yes --> ReturnExisting[Return existing Document ID]
    Dedupe -- No --> S3[Store in Object Storage S3/R2]
    
    S3 --> CreateDoc[Create UserDocument\nstatus='pending']
    CreateDoc --> Enqueue[Enqueue Celery Task:\nprocess_pdf]
    
    subgraph Celery Worker
        Enqueue --> Download[Download PDF to Temp Dir]
        Download --> Extract[PyMuPDF: Extract Page Text]
        
        Extract --> CheckLength{Is text length\n< 50 chars?}
        CheckLength -- Yes --> OCR[Run Tesseract OCR]
        CheckLength -- No --> Tokenize[Tokenize & Lemmatize]
        OCR --> Tokenize
        
        Tokenize --> Lookup[(Bulk Lookup in Word Table)]
        Lookup --> JSONB[Build word_positions JSONB dict]
        JSONB --> Update[Update Document\nstatus='ready']
    end
    
    Update --> Notify([Frontend Polls & Renders PDF])

5. Text-to-Flashcard Analysis Pipeline

This diagram shows the text-processing logic used when a user pastes raw English text, filtering it down to the most relevant, unknown vocabulary words using Natural Language Processing (NLP).

Code snippet

flowchart LR
    A([Raw Pasted Text]) --> B[NLTK: Tokenize Words]
    B --> C[Remove Stop Words]
    C --> D[WordNet: Lemmatize]
    D --> E[Deduplicate Lemmas]
    
    E --> F[(Database: Bulk Word Lookup)]
    F --> G[(Database: Fetch User's Known Word IDs)]
    
    G --> H{Is Word in\nKnown IDs?}
    H -- Yes --> I[Mark 'Known']
    H -- No --> J[Mark 'Unknown']
    
    I --> K[Attach Best WordSense]
    J --> K
    
    K --> L[Sort by frequency_rank ascending]
    L --> M([Return Ranked Interactive Checklist])

6. Flashcard FSRS State Machine

This state diagram shows the lifecycle of a flashcard within the Spaced Repetition System (FSRS) based on the defined STATE_CHOICES.

Code snippet

stateDiagram-v2
    [*] --> New : Created (State 0)
    
    New --> Learning : Rated Again/Hard
    New --> Review : Rated Good/Easy (Mastery Threshold)
    
    Learning --> Learning : Rated Again
    Learning --> Review : Graduated
    
    Review --> Relearning : Lapsed (Rated Again)
    Review --> Review : Rated Hard/Good/Easy
    
    Relearning --> Review : Re-graduated

7. Frontend Application Architecture

This diagram breaks down the React Single Page Application (SPA) structure, showing how features, shared components, and the Redux store are organized.

Code snippet

graph TD
    App[src/app] --> Router[router.tsx]
    App --> Store[store.ts - Redux]
    
    subgraph Features [src/features]
        Dict[dictionary]
        Flash[flashcards]
        Decks[decks]
        PDF[pdfReader]
        Text2Flash[textToFlashcard]
        Auth[auth]
    end
    
    subgraph Shared [src/shared]
        UI[components - WordCard, etc.]
        Hooks[hooks - useWordLookup]
        API[api - RTK Query]
    end
    
    subgraph Workers [src/workers]
        CW[colorize.worker.ts]
    end

    Router --> Features
    Features --> Shared
    PDF --> CW

8. Browser Extension Lookup Sequence

This sequence diagram illustrates the privacy-first approach of the browser extension when looking up a word, checking local caches before hitting the backend.

Code snippet

sequenceDiagram
    participant Webpage
    participant Extension
    participant LocalCache
    participant MindLexa API
    
    Webpage->>Extension: User hovers over word "profligate"
    
    Extension->>LocalCache: Check Stop-word List
    Extension->>LocalCache: Check 'ignored_words'
    Extension->>LocalCache: Check 'known_word_ids' Set
    Extension->>LocalCache: Check 'definition_cache'
    
    alt If not found in any cache
        Extension->>MindLexa API: GET /api/extension/lookup/?word=profligate
        MindLexa API-->>Extension: Return definition & senses
    end
    
    Extension->>Webpage: Render Popup (Definition, Phonetic, Actions)

9. Study Session Interaction Loop

This flowchart maps the user experience during an active flashcard review session, demonstrating how the queue is managed and FSRS calculations are triggered.

Code snippet

flowchart TD
    Start([User Clicks 'Study Now']) --> Fetch[GET 20 Due Cards\nordered by due_date]
    Fetch --> Initialize[Initialize Session Queue in Redux]
    
    Initialize --> ShowFront[Show Card Front:\nWord, Context]
    ShowFront --> Flip{User Flips Card}
    Flip --> ShowBack[Show Card Back:\nMeaning, POS, Notes]
    
    ShowBack --> Rate{User Selects Rating}
    Rate --> |Again 1| FSRS1[FSRS calculates new state]
    Rate --> |Hard 2| FSRS2[FSRS calculates new state]
    Rate --> |Good 3| FSRS3[FSRS calculates new state]
    Rate --> |Easy 4| FSRS4[FSRS calculates new state]
    
    FSRS1 --> QueueBack[Send to Back of Queue]
    FSRS2 --> Advance
    FSRS3 --> Advance
    FSRS4 --> Advance
    
    Advance --> CheckEmpty{Is Queue Empty?}
    QueueBack --> CheckEmpty
    
    CheckEmpty -- No --> ShowFront
    CheckEmpty -- Yes --> End([End Session: Show Summary])

10. Curriculum and Deck Class Diagram

This class diagram highlights the relationships in the learning module, specifically how official hierarchies (NCTB, IELTS) map to decks and user subscriptions.

Code snippet

classDiagram
    class Deck {
        +String title
        +Boolean is_official
        +Boolean is_public
        +String board
        +String class_level
        +String subject
    }
    class DeckWord {
        +Integer order
        +Datetime added_at
    }
    class UserDeckSubscription {
        +Datetime subscribed_at
        +Datetime last_accessed
    }
    class Word {
        +String text
        +Integer frequency_rank
    }
    class User {
        +String username
    }

    User "1" -- "0..*" Deck : creates
    User "1" -- "0..*" UserDeckSubscription : has
    Deck "1" -- "0..*" UserDeckSubscription : is followed by
    Deck "1" -- "0..*" DeckWord : contains
    DeckWord "0..*" -- "1" Word : links to

11. PDF Colorization Web Worker Logic

This diagram details how the main thread and web worker collaborate to highlight known/unknown words in a PDF without freezing the UI.

Code snippet

flowchart LR
    A[Main Thread: PDF.js Page Render] --> B[Extract text tokens]
    B --> C(Post message to\ncolorize.worker.ts)
    
    subgraph Web Worker
        C --> D{Does token exist in\nknownWordIds Set?}
        D -- Yes --> E[Assign 'green' highlight]
        D -- No --> F{Is token in Dictionary?}
        F -- Yes --> G[Assign 'yellow' highlight]
        F -- No --> H[Assign no highlight]
    end
    
    E --> I[Return ColorMap]
    G --> I
    H --> I
    
    I --> J(Post message to Main Thread)
    J --> K[Apply CSS Overlays to DOM]

12. Dictionary Search Logic

This flowchart illustrates the backend decision process when a user searches for a term, detailing the fallback from exact match to trigram similarity.

Code snippet

flowchart TD
    Query([User Search Query]) --> Exact{Word.objects.filter\niexact=query}
    
    Exact -- Found --> Prefetch[Prefetch Senses & Examples]
    Prefetch --> Return[Return 200: Word Detail]
    
    Exact -- Not Found --> Trigram{PostgreSQL Trigram Search\nsimilarity > 0.3}
    
    Trigram -- Found --> Suggestions[Return 404: List of Suggestions]
    Trigram -- Not Found --> Banglish{Banglish Search\nbanglish_keywords}
    
    Banglish -- Found --> Suggestions
    Banglish -- Not Found --> Empty[Return 404: No Results]

13. System Scalability & Deployment Infrastructure

This high-level physical architecture graph shows how the system is designed to scale using load balancers, caching, and background workers.

Code snippet

graph TB
    Internet((Internet)) --> CDN[Cloudflare / CloudFront CDN]
    CDN --> LB[Load Balancer]
    
    subgraph App Servers
        LB --> G1[Gunicorn Worker 1]
        LB --> G2[Gunicorn Worker 2]
        LB --> G3[Gunicorn Worker 3]
    end
    
    G1 & G2 & G3 --> PgBouncer[PgBouncer Connection Pool]
    
    subgraph Data Tier
        PgBouncer --> DB[(PostgreSQL Primary)]
        DB -.-> DBReplica[(PostgreSQL Replica)]
        G1 & G2 & G3 --> Redis1[(Redis: Cache)]
        G1 & G2 & G3 --> Redis2[(Redis: Celery Broker)]
    end
    
    subgraph Background Processing
        Redis2 --> Celery1[Celery: General Tasks]
        Redis2 --> Celery2[Celery: OCR Queue]
        Celery1 & Celery2 --> DB
    end

14. Data Import Phase Plan

This Gantt chart represents the proposed schedule for seeding the database with Wiktionary data, generating AI meanings, and manually curating NCTB lists.

Code snippet

gantt
    title MindLexa Data Import Strategy
    dateFormat  YYYY-MM-DD
    axisFormat  %W
    
    section Automated Imports
    Seed Frequency Ranks :done, 2024-01-01, 7d
    Import Wiktionary English Data :done, 2024-01-01, 7d
    Import IELTS/SAT Wordlists :active, 2024-01-08, 5d
    
    section AI Processing
    Batch generate Top 10k Bangla Meanings :active, 2024-01-08, 7d
    Generate pgvector Embeddings :2024-01-15, 14d
    
    section Manual Curation
    Human review of AI top 1k words :2024-01-15, 7d
    Manual NCTB Vocabulary Entry :2024-01-15, 21d

15. User Word Knowledge (Source of Truth) ERD

This specialized ERD focuses solely on UserWordKnowledge, the central table that dictates if a user knows a word, separating global vocabulary from individual learning states.

Code snippet

erDiagram
    USER ||--o{ USER_WORD_KNOWLEDGE : "has specific status for"
    WORD ||--o{ USER_WORD_KNOWLEDGE : "is tracked by"
    
    USER_WORD_KNOWLEDGE {
        int user_id FK
        int word_id FK
        enum status "unknown, seen, learning, known, ignored"
        datetime first_encountered
        datetime last_encountered
        int encounter_count
        int confidence "0-100"
    }

    USER {
        string native_language
        string proficiency_level
    }

    WORD {
        string text
        int frequency_rank
    }

16. User Onboarding & Authentication Flow

This state diagram illustrates the user journey from registration through profile configuration, highlighting the JWT token lifecycle and required setup steps before accessing the main dashboard.

Code snippet

stateDiagram-v2
    [*] --> Guest : Unauthenticated
    
    Guest --> Registration : Submits Email/Pass
    Registration --> EmailVerification : Account Created
    
    EmailVerification --> Login : Clicks Verify Link
    Guest --> Login : Existing User
    
    Login --> SetupProfile : Returns JWT (15m) & Refresh (30d)
    
    state SetupProfile {
        [*] --> SetProficiency : "Absolute Beginner" to "Advanced"
        SetProficiency --> SetNativeLang : Default "bn"
        SetNativeLang --> SetDailyGoal : e.g., 20 words/day
    }
    
    SetupProfile --> Authenticated : Profile Complete
    
    Authenticated --> Dashboard : Fetches Initial State
    Authenticated --> Login : Refresh Token Expires (30d)

17. Semantic Dictionary Graph Structure

This specialized class diagram zooms in purely on the dictionary and linguistics side of the database. This structure is what specifically powers the AI Word Graph and synonym/antonym relationships.

Code snippet

classDiagram
    class Word {
        +String text
        +Vector embedding (1536 dims)
        +Integer frequency_rank
    }
    class WordSense {
        +String part_of_speech
        +String meaning_en
        +String meaning_bn
        +Array banglish_keywords
    }
    class ExampleSentence {
        +String sentence_en
        +String sentence_bn
    }
    class WordRelationship {
        +String relation_type (synonym, antonym, root)
        +Float confidence
    }
    
    Word "1" *-- "1..*" WordSense : contains
    WordSense "1" *-- "0..*" ExampleSentence : illustrates
    Word "1" -- "0..*" WordRelationship : source_word
    Word "1" -- "0..*" WordRelationship : target_word

18. Known-Word Cache Invalidation Flow

This data flow diagram illustrates how the frontend efficiently handles the known_word_ids cache to colorize PDFs in real-time without overwhelming the backend, including the triggers that invalidate this cache.

Code snippet

flowchart TD
    subgraph Triggers
        AddCard[User creates Flashcard]
        MarkKnown[User clicks 'Mark as Known']
        IgnoreWord[User ignores word]
    end

    subgraph Frontend State
        RTK[RTK Query:\nuseGetKnownWordIdsQuery]
        Redux[(Redux Store:\nSet reference)]
        Local[(LocalStorage:\nCache + Timestamp)]
    end
    
    subgraph PDF Reader Module
        Worker[colorize.worker.ts]
        Canvas[PDF.js Viewport]
    end

    AddCard & MarkKnown & IgnoreWord --> |Invalidates Cache| RTK
    
    RTK -- Fetches /api/users/me/known-word-ids/ --> Backend[(Django API)]
    Backend -- Returns Array --> RTK
    
    RTK -- Hydrates --> Redux
    RTK -- Persists --> Local
    
    Redux -- Passes Set as ArrayBuffer --> Worker
    Worker -- Returns Color Map --> Canvas

With these added to the previous 15, you now have a complete, end-to-end visual documentation suite for the MindLexa application ranging from the macro infrastructure down to specific web worker interactions! Let me know if you want to iterate on any of them or if you’re ready to move to the next phase of your project.