1. High-Level System Architecture
This diagram outlines the overarching infrastructure, showing how the client layer interacts with the Django backend and the underlying data storage systems.
Code snippet
graph TD subgraph Client Layer A[React SPA Vite] B[Browser Extension MV3] end subgraph Backend Layer C[Django Ninja API] D[Celery Workers] E[Django Admin] end subgraph Data Layer F[(PostgreSQL 16\npg_trgm, pgvector, pg_fts)] G[(Redis\nBroker, Cache, Session)] end A -- HTTPS / REST --> C B -- HTTPS / REST --> C C --> F C --> G D --> F D --> G C -. Enqueues Async Tasks .-> D classDef default fill:#f9f9f9,stroke:#333,stroke-width:1px; classDef storage fill:#e1f5fe,stroke:#03a9f4,stroke-width:2px; class F,G storage;
2. Core Entity-Relationship Diagram (ERD)
This mid-level diagram maps out the core Django database models and how they relate to one another, particularly focusing on how users, words, and the spaced repetition system (SRS) intersect.
Code snippet
erDiagram USER ||--o{ USER_WORD_KNOWLEDGE : "tracks status of" USER ||--o{ FLASHCARD : "owns" USER ||--o{ DECK : "creates" USER ||--o{ USER_DECK_SUBSCRIPTION : "subscribes to" WORD ||--o{ WORD_SENSE : "has meaning" WORD ||--o{ WORD_RELATIONSHIP : "relates to" WORD_SENSE ||--o{ EXAMPLE_SENTENCE : "provides context" WORD_SENSE ||--o{ CONTEXT_EXAMPLE : "found in" DECK ||--o{ DECK_WORD : "contains" DECK_WORD }o--|| WORD : "references" FLASHCARD ||--|| SPACED_REPETITION_DATA : "has FSRS state" FLASHCARD }o--|| WORD_SENSE : "tests" FLASHCARD }o--o| CONTEXT_EXAMPLE : "uses context" DOCUMENT_READING_PROGRESS }o--|| USER : "saves state for"
3. Flashcard Creation & SRS Initialization Pipeline
This low-level sequence diagram details the exact flow of data when a user adds a word to their library, showing the database transaction that links the word, the user’s knowledge state, and the Spaced Repetition System (SRS).
Code snippet
sequenceDiagram actor User participant React UI participant Django API participant PostgreSQL User->>React UI: Clicks "Add to Library" React UI->>Django API: POST /api/learning/flashcards/ activate Django API Django API->>PostgreSQL: BEGIN TRANSACTION Django API->>PostgreSQL: 1. Create Flashcard Django API->>PostgreSQL: 2. Upsert UserWordKnowledge (status='learning') Django API->>PostgreSQL: 3. Create SpacedRepetitionData (state=0 / New) alt If custom context provided Django API->>PostgreSQL: 4. Update ContextExample (increment use_count) end Django API->>PostgreSQL: COMMIT Django API-->>React UI: Return {flashcard_id, srs_state, due_date} deactivate Django API React UI->>React UI: Invalidate RTK Query known-word cache React UI-->>User: Show success toast
4. Async PDF Processing & OCR Flow
This flowchart illustrates the asynchronous pipeline handled by Celery when a user uploads a PDF. It highlights the decision tree for triggering OCR and how words are mapped to physical page locations.
Code snippet
flowchart TD Start([User Uploads PDF]) --> Dedupe{File Hash\nExists?} Dedupe -- Yes --> ReturnExisting[Return existing Document ID] Dedupe -- No --> S3[Store in Object Storage S3/R2] S3 --> CreateDoc[Create UserDocument\nstatus='pending'] CreateDoc --> Enqueue[Enqueue Celery Task:\nprocess_pdf] subgraph Celery Worker Enqueue --> Download[Download PDF to Temp Dir] Download --> Extract[PyMuPDF: Extract Page Text] Extract --> CheckLength{Is text length\n< 50 chars?} CheckLength -- Yes --> OCR[Run Tesseract OCR] CheckLength -- No --> Tokenize[Tokenize & Lemmatize] OCR --> Tokenize Tokenize --> Lookup[(Bulk Lookup in Word Table)] Lookup --> JSONB[Build word_positions JSONB dict] JSONB --> Update[Update Document\nstatus='ready'] end Update --> Notify([Frontend Polls & Renders PDF])
5. Text-to-Flashcard Analysis Pipeline
This diagram shows the text-processing logic used when a user pastes raw English text, filtering it down to the most relevant, unknown vocabulary words using Natural Language Processing (NLP).
Code snippet
flowchart LR A([Raw Pasted Text]) --> B[NLTK: Tokenize Words] B --> C[Remove Stop Words] C --> D[WordNet: Lemmatize] D --> E[Deduplicate Lemmas] E --> F[(Database: Bulk Word Lookup)] F --> G[(Database: Fetch User's Known Word IDs)] G --> H{Is Word in\nKnown IDs?} H -- Yes --> I[Mark 'Known'] H -- No --> J[Mark 'Unknown'] I --> K[Attach Best WordSense] J --> K K --> L[Sort by frequency_rank ascending] L --> M([Return Ranked Interactive Checklist])
6. Flashcard FSRS State Machine
This state diagram shows the lifecycle of a flashcard within the Spaced Repetition System (FSRS) based on the defined STATE_CHOICES.
Code snippet
stateDiagram-v2 [*] --> New : Created (State 0) New --> Learning : Rated Again/Hard New --> Review : Rated Good/Easy (Mastery Threshold) Learning --> Learning : Rated Again Learning --> Review : Graduated Review --> Relearning : Lapsed (Rated Again) Review --> Review : Rated Hard/Good/Easy Relearning --> Review : Re-graduated
7. Frontend Application Architecture
This diagram breaks down the React Single Page Application (SPA) structure, showing how features, shared components, and the Redux store are organized.
Code snippet
graph TD App[src/app] --> Router[router.tsx] App --> Store[store.ts - Redux] subgraph Features [src/features] Dict[dictionary] Flash[flashcards] Decks[decks] PDF[pdfReader] Text2Flash[textToFlashcard] Auth[auth] end subgraph Shared [src/shared] UI[components - WordCard, etc.] Hooks[hooks - useWordLookup] API[api - RTK Query] end subgraph Workers [src/workers] CW[colorize.worker.ts] end Router --> Features Features --> Shared PDF --> CW
8. Browser Extension Lookup Sequence
This sequence diagram illustrates the privacy-first approach of the browser extension when looking up a word, checking local caches before hitting the backend.
Code snippet
sequenceDiagram participant Webpage participant Extension participant LocalCache participant MindLexa API Webpage->>Extension: User hovers over word "profligate" Extension->>LocalCache: Check Stop-word List Extension->>LocalCache: Check 'ignored_words' Extension->>LocalCache: Check 'known_word_ids' Set Extension->>LocalCache: Check 'definition_cache' alt If not found in any cache Extension->>MindLexa API: GET /api/extension/lookup/?word=profligate MindLexa API-->>Extension: Return definition & senses end Extension->>Webpage: Render Popup (Definition, Phonetic, Actions)
9. Study Session Interaction Loop
This flowchart maps the user experience during an active flashcard review session, demonstrating how the queue is managed and FSRS calculations are triggered.
Code snippet
flowchart TD Start([User Clicks 'Study Now']) --> Fetch[GET 20 Due Cards\nordered by due_date] Fetch --> Initialize[Initialize Session Queue in Redux] Initialize --> ShowFront[Show Card Front:\nWord, Context] ShowFront --> Flip{User Flips Card} Flip --> ShowBack[Show Card Back:\nMeaning, POS, Notes] ShowBack --> Rate{User Selects Rating} Rate --> |Again 1| FSRS1[FSRS calculates new state] Rate --> |Hard 2| FSRS2[FSRS calculates new state] Rate --> |Good 3| FSRS3[FSRS calculates new state] Rate --> |Easy 4| FSRS4[FSRS calculates new state] FSRS1 --> QueueBack[Send to Back of Queue] FSRS2 --> Advance FSRS3 --> Advance FSRS4 --> Advance Advance --> CheckEmpty{Is Queue Empty?} QueueBack --> CheckEmpty CheckEmpty -- No --> ShowFront CheckEmpty -- Yes --> End([End Session: Show Summary])
10. Curriculum and Deck Class Diagram
This class diagram highlights the relationships in the learning module, specifically how official hierarchies (NCTB, IELTS) map to decks and user subscriptions.
Code snippet
classDiagram class Deck { +String title +Boolean is_official +Boolean is_public +String board +String class_level +String subject } class DeckWord { +Integer order +Datetime added_at } class UserDeckSubscription { +Datetime subscribed_at +Datetime last_accessed } class Word { +String text +Integer frequency_rank } class User { +String username } User "1" -- "0..*" Deck : creates User "1" -- "0..*" UserDeckSubscription : has Deck "1" -- "0..*" UserDeckSubscription : is followed by Deck "1" -- "0..*" DeckWord : contains DeckWord "0..*" -- "1" Word : links to
11. PDF Colorization Web Worker Logic
This diagram details how the main thread and web worker collaborate to highlight known/unknown words in a PDF without freezing the UI.
Code snippet
flowchart LR A[Main Thread: PDF.js Page Render] --> B[Extract text tokens] B --> C(Post message to\ncolorize.worker.ts) subgraph Web Worker C --> D{Does token exist in\nknownWordIds Set?} D -- Yes --> E[Assign 'green' highlight] D -- No --> F{Is token in Dictionary?} F -- Yes --> G[Assign 'yellow' highlight] F -- No --> H[Assign no highlight] end E --> I[Return ColorMap] G --> I H --> I I --> J(Post message to Main Thread) J --> K[Apply CSS Overlays to DOM]
12. Dictionary Search Logic
This flowchart illustrates the backend decision process when a user searches for a term, detailing the fallback from exact match to trigram similarity.
Code snippet
flowchart TD Query([User Search Query]) --> Exact{Word.objects.filter\niexact=query} Exact -- Found --> Prefetch[Prefetch Senses & Examples] Prefetch --> Return[Return 200: Word Detail] Exact -- Not Found --> Trigram{PostgreSQL Trigram Search\nsimilarity > 0.3} Trigram -- Found --> Suggestions[Return 404: List of Suggestions] Trigram -- Not Found --> Banglish{Banglish Search\nbanglish_keywords} Banglish -- Found --> Suggestions Banglish -- Not Found --> Empty[Return 404: No Results]
13. System Scalability & Deployment Infrastructure
This high-level physical architecture graph shows how the system is designed to scale using load balancers, caching, and background workers.
Code snippet
graph TB Internet((Internet)) --> CDN[Cloudflare / CloudFront CDN] CDN --> LB[Load Balancer] subgraph App Servers LB --> G1[Gunicorn Worker 1] LB --> G2[Gunicorn Worker 2] LB --> G3[Gunicorn Worker 3] end G1 & G2 & G3 --> PgBouncer[PgBouncer Connection Pool] subgraph Data Tier PgBouncer --> DB[(PostgreSQL Primary)] DB -.-> DBReplica[(PostgreSQL Replica)] G1 & G2 & G3 --> Redis1[(Redis: Cache)] G1 & G2 & G3 --> Redis2[(Redis: Celery Broker)] end subgraph Background Processing Redis2 --> Celery1[Celery: General Tasks] Redis2 --> Celery2[Celery: OCR Queue] Celery1 & Celery2 --> DB end
14. Data Import Phase Plan
This Gantt chart represents the proposed schedule for seeding the database with Wiktionary data, generating AI meanings, and manually curating NCTB lists.
Code snippet
gantt title MindLexa Data Import Strategy dateFormat YYYY-MM-DD axisFormat %W section Automated Imports Seed Frequency Ranks :done, 2024-01-01, 7d Import Wiktionary English Data :done, 2024-01-01, 7d Import IELTS/SAT Wordlists :active, 2024-01-08, 5d section AI Processing Batch generate Top 10k Bangla Meanings :active, 2024-01-08, 7d Generate pgvector Embeddings :2024-01-15, 14d section Manual Curation Human review of AI top 1k words :2024-01-15, 7d Manual NCTB Vocabulary Entry :2024-01-15, 21d
15. User Word Knowledge (Source of Truth) ERD
This specialized ERD focuses solely on UserWordKnowledge, the central table that dictates if a user knows a word, separating global vocabulary from individual learning states.
Code snippet
erDiagram USER ||--o{ USER_WORD_KNOWLEDGE : "has specific status for" WORD ||--o{ USER_WORD_KNOWLEDGE : "is tracked by" USER_WORD_KNOWLEDGE { int user_id FK int word_id FK enum status "unknown, seen, learning, known, ignored" datetime first_encountered datetime last_encountered int encounter_count int confidence "0-100" } USER { string native_language string proficiency_level } WORD { string text int frequency_rank }
16. User Onboarding & Authentication Flow
This state diagram illustrates the user journey from registration through profile configuration, highlighting the JWT token lifecycle and required setup steps before accessing the main dashboard.
Code snippet
stateDiagram-v2 [*] --> Guest : Unauthenticated Guest --> Registration : Submits Email/Pass Registration --> EmailVerification : Account Created EmailVerification --> Login : Clicks Verify Link Guest --> Login : Existing User Login --> SetupProfile : Returns JWT (15m) & Refresh (30d) state SetupProfile { [*] --> SetProficiency : "Absolute Beginner" to "Advanced" SetProficiency --> SetNativeLang : Default "bn" SetNativeLang --> SetDailyGoal : e.g., 20 words/day } SetupProfile --> Authenticated : Profile Complete Authenticated --> Dashboard : Fetches Initial State Authenticated --> Login : Refresh Token Expires (30d)
17. Semantic Dictionary Graph Structure
This specialized class diagram zooms in purely on the dictionary and linguistics side of the database. This structure is what specifically powers the AI Word Graph and synonym/antonym relationships.
Code snippet
classDiagram class Word { +String text +Vector embedding (1536 dims) +Integer frequency_rank } class WordSense { +String part_of_speech +String meaning_en +String meaning_bn +Array banglish_keywords } class ExampleSentence { +String sentence_en +String sentence_bn } class WordRelationship { +String relation_type (synonym, antonym, root) +Float confidence } Word "1" *-- "1..*" WordSense : contains WordSense "1" *-- "0..*" ExampleSentence : illustrates Word "1" -- "0..*" WordRelationship : source_word Word "1" -- "0..*" WordRelationship : target_word
18. Known-Word Cache Invalidation Flow
This data flow diagram illustrates how the frontend efficiently handles the known_word_ids cache to colorize PDFs in real-time without overwhelming the backend, including the triggers that invalidate this cache.
Code snippet
flowchart TD subgraph Triggers AddCard[User creates Flashcard] MarkKnown[User clicks 'Mark as Known'] IgnoreWord[User ignores word] end subgraph Frontend State RTK[RTK Query:\nuseGetKnownWordIdsQuery] Redux[(Redux Store:\nSet reference)] Local[(LocalStorage:\nCache + Timestamp)] end subgraph PDF Reader Module Worker[colorize.worker.ts] Canvas[PDF.js Viewport] end AddCard & MarkKnown & IgnoreWord --> |Invalidates Cache| RTK RTK -- Fetches /api/users/me/known-word-ids/ --> Backend[(Django API)] Backend -- Returns Array --> RTK RTK -- Hydrates --> Redux RTK -- Persists --> Local Redux -- Passes Set as ArrayBuffer --> Worker Worker -- Returns Color Map --> Canvas
With these added to the previous 15, you now have a complete, end-to-end visual documentation suite for the MindLexa application ranging from the macro infrastructure down to specific web worker interactions! Let me know if you want to iterate on any of them or if you’re ready to move to the next phase of your project.