# Tree-sitter Highlighting Implementation Plan This document is the working plan for replacing Chroma-based highlighting with a Tree-sitter-first syntax system in Gim. The current renderer in `internal/editor/view.go` is tightly coupled to Chroma and computes syntax styles during rendering. That is the opposite of the architecture Tree-sitter wants. Tree-sitter works best when parsing and highlighting are maintained as buffer state and rendering only consumes cached results. This plan assumes: - Chroma will be removed entirely. - The renderer can be rebuilt to better fit the new syntax model. - We are willing to do a full-buffer parse and full rehighlight first, then optimize incrementally. - Correct architecture matters more than preserving the current render pipeline. --- ## Project Goal Build a syntax system where: - each buffer owns syntax state - Tree-sitter parsing is maintained across edits - highlights are cached outside the renderer - the renderer consumes precomputed style data - byte-oriented parser results are converted into rune-oriented render data --- ## Success Criteria - [ ] `internal/editor/view.go` does not directly call Chroma or Tree-sitter - [ ] Chroma is fully removed from the codebase - [ ] syntax state exists independently from rendering - [ ] each buffer can be parsed and highlighted through Tree-sitter - [ ] the renderer reads cached highlight data for visible lines - [ ] edits invalidate and recompute syntax state - [ ] the system handles UTF-8 text correctly - [ ] multi-line captures work correctly - [ ] incremental parsing exists for normal text edits - [ ] syntax-related behavior has focused tests --- ## Architectural Direction The target data flow is: `Buffer -> Syntax Engine -> Highlight Cache -> Renderer` Not: `Renderer -> Parse -> Highlight -> Draw` Core separation of concerns: - `internal/core` Holds text and buffer mutation behavior. - `internal/syntax` Owns parser state, queries, highlight cache, invalidation, and update logic. - `internal/style` Owns theme mapping from capture names to `lipgloss.Style`. - `internal/editor` Owns rendering, cursor, selection overlay, gutters, statusline, and viewport logic. --- ## Key Constraints And Risks ### Byte vs Rune Indexing Tree-sitter reports positions in bytes. The editor currently renders by runes. This means the syntax engine must own conversion from byte-based capture ranges to rune-based render ranges. This conversion should never be spread across the renderer. - [ ] define one internal representation for parser/query positions - [ ] define one internal representation for render positions - [ ] keep conversion logic isolated inside `internal/syntax` ### Multi-line Captures Strings, comments, and some language constructs can span lines. - [ ] highlight cache supports ranges spanning multiple lines - [ ] renderer can consume per-line results from multi-line captures ### Query Precedence Tree-sitter queries can produce overlapping captures. - [ ] define deterministic precedence rules - [ ] document how broad captures and specific captures are resolved ### Full Parse First, Incremental Later The initial version does not need to be optimal. - [ ] initial version can parse and rehighlight the full buffer - [ ] follow-up version uses `tree.Edit`, old trees, and changed ranges --- ## Target Package Layout Planned package layout: - [ ] `internal/syntax/types.go` - [ ] `internal/syntax/engine.go` - [ ] `internal/syntax/state.go` - [ ] `internal/syntax/registry.go` - [ ] `internal/syntax/treesitter.go` - [ ] `internal/syntax/query.go` - [ ] `internal/syntax/cache.go` - [ ] `internal/style/theme.go` or equivalent capture-to-style mapping helpers Likely existing files to update: - [ ] `internal/editor/model.go` - [ ] `internal/editor/model_builder.go` - [ ] `internal/editor/view.go` - [ ] `internal/core/buffer.go` - [ ] `internal/command/handlers.go` - [ ] `go.mod` Likely files to remove or heavily reduce: - [ ] Chroma-specific logic in `internal/style/style.go` - [ ] direct Chroma setup in editor model builders and command handlers --- ## Data Model Plan ### 1. Syntax Engine The syntax engine should be editor-facing and buffer-aware. Responsibilities: - attach syntax state to buffers - initialize parser and query data from filetype - reparse after edits - maintain dirty regions or dirty lines - build cached line highlight results - expose line results to the renderer Checklist: - [ ] define `Engine` interface in `internal/syntax/engine.go` - [ ] decide whether syntax state is owned directly by the engine or attached to buffers - [ ] add a field on `editor.Model` for the syntax engine ### 2. Per-buffer Syntax State Each buffer needs syntax state. The important point is that syntax is buffer-level, not window-level. Suggested fields: - [ ] parser - [ ] language - [ ] query or compiled query set - [ ] current parse tree - [ ] source snapshot or source builder access - [ ] dirty line or dirty range tracking - [ ] cached line highlight results - [ ] version counter for cache invalidation ### 3. Highlight Cache Representation Start with the representation that makes integration easiest. Recommended first version: - cached per-line `[]lipgloss.Style` Recommended longer-term representation: - cached per-line spans like `[]Span{StartRune, EndRune, StyleID}` Implementation choice: - [ ] phase 1 uses per-rune style maps for easiest renderer integration - [ ] phase 2 evaluates switching internal cache to spans ### 4. Theme Mapping Theme logic should map Tree-sitter captures such as `keyword`, `function`, `string`, `comment`, and `type.builtin` to `lipgloss.Style`. Checklist: - [ ] create capture-name to style mapping layer - [ ] support fallback from specific captures to broader categories - [ ] keep theme logic independent from parser/query logic --- ## Phased Implementation Plan ## Phase 0: Cleanly Commit To Tree-sitter Purpose: Remove architectural assumptions that only make sense for Chroma. Tasks: - [ ] decide the initial supported filetypes for Tree-sitter - [ ] decide where query files live and how they are loaded - [ ] decide whether `main.go` demo code should be removed or moved to a more explicit demo location - [ ] audit Chroma references in the repo - [ ] list all codepaths that currently construct or depend on `style.ChromaStyle` Done when: - [ ] there is a clear inventory of Chroma-coupled code - [ ] there is a clear inventory of Tree-sitter assets to load per language ## Phase 1: Introduce Syntax As A Real Subsystem Purpose: Create the new architecture boundary before changing rendering behavior. Tasks: - [ ] create `internal/syntax` - [ ] define the engine interface - [ ] add a syntax engine field to `editor.Model` - [ ] initialize the syntax engine in model construction - [ ] remove direct highlighting calls from `view.go` - [ ] route visible line highlighting through the syntax engine Done when: - [ ] `view.go` asks the syntax subsystem for line highlight data - [ ] syntax work no longer begins inside the render loop itself ## Phase 2: Define Buffer Text Access And Edit Notifications Purpose: Make buffer mutations visible to the syntax system in a structured way. Tasks: - [ ] decide whether edits are emitted from `core.Buffer` or from editor actions - [ ] define an internal edit event type - [ ] include enough data for Tree-sitter incremental edits later - [ ] wire `SetLine`, `InsertLine`, and `DeleteLine` changes into syntax invalidation - [ ] decide whether first version uses whole-buffer invalidation Suggested edit event fields: - [ ] start byte - [ ] old end byte - [ ] new end byte - [ ] start point - [ ] old end point - [ ] new end point - [ ] affected line range Done when: - [ ] syntax invalidation happens when text changes - [ ] invalidation does not depend on the render loop noticing text changed ## Phase 3: Build Minimal Tree-sitter Registry And Loader Purpose: Provide one place that maps filetypes to languages and queries. Tasks: - [ ] create a registry for language metadata - [ ] map filetype strings to Tree-sitter language bindings - [ ] map filetypes to highlight query file paths - [ ] load and compile queries once per language where practical - [ ] define behavior for unsupported filetypes Done when: - [ ] opening a supported buffer can resolve a language and query set - [ ] unsupported buffers degrade cleanly without crashing the renderer ## Phase 4: Implement Full-buffer Parsing And Full-buffer Highlighting Purpose: Get correct Tree-sitter highlighting working before optimizing. Tasks: - [ ] create per-buffer syntax state - [ ] build full source text from buffer contents - [ ] parse full source text into a tree - [ ] run highlight query across the full tree - [ ] collect captures in deterministic order - [ ] resolve overlapping captures consistently - [ ] convert capture byte ranges into per-line rune-based style maps - [ ] cache line results for renderer consumption Done when: - [ ] a supported filetype can be fully highlighted without Chroma - [ ] renderer uses cached line results from Tree-sitter ## Phase 5: Rebuild Renderer Integration Around Cached Syntax Data Purpose: Simplify the renderer so it consumes syntax cache rather than doing syntax work. Tasks: - [ ] redesign line render input around line text plus syntax cache - [ ] ensure gutter rendering stays independent from syntax rendering - [ ] ensure cursor overlay works on top of syntax styling - [ ] ensure visual selection overlay works on top of syntax styling - [ ] verify blank lines and end-of-line cursor rendering still behave correctly - [ ] verify window width padding still uses background style consistently Done when: - [ ] line drawing is purely a render operation - [ ] no parser or query logic exists in `view.go` ## Phase 6: Remove Chroma Completely Purpose: Delete the old highlighting path and simplify styling around capture-based theming. Tasks: - [ ] remove Chroma dependencies from `go.mod` - [ ] remove `GetLexer` - [ ] remove `MakeStyleMap` - [ ] remove `Styles.ChromaStyle` if no longer needed - [ ] replace Chroma-derived theme extraction with explicit Gim theme definitions - [ ] update commands that currently switch Chroma styles Done when: - [ ] the build no longer depends on Chroma packages - [ ] no codepath references Chroma tokens, lexers, or styles ## Phase 7: Add Incremental Parsing Purpose: Move from correct-but-simple to correct-and-efficient. Tasks: - [ ] preserve old trees per buffer - [ ] call `tree.Edit` before reparsing - [ ] parse new content using the old tree - [ ] compute changed ranges - [ ] decide whether rehighlighting happens by changed byte range, changed point range, or affected line range - [ ] update only changed cache regions - [ ] verify cache invalidation around inserted and deleted lines Done when: - [ ] small edits do not require full-buffer reparsing and rehighlighting - [ ] highlighting updates correctly after insertions, deletions, joins, and splits ## Phase 8: Improve Cache Representation If Needed Purpose: Reduce memory churn and simplify overlay logic if per-rune style maps become too heavy. Tasks: - [ ] measure cost of per-line `[]lipgloss.Style` - [ ] consider switching internal storage to spans - [ ] keep renderer-facing API stable if possible - [ ] optimize only after correctness and incremental behavior exist Done when: - [ ] cache format is deliberate rather than inherited from the old renderer ## Phase 9: Expand Language Support Purpose: Generalize the system after the first language works well. Tasks: - [ ] ship one language first, likely Go - [ ] add additional language bindings and queries one by one - [ ] verify filetype detection and registry behavior for each language - [ ] define how language-specific capture tweaks are handled Done when: - [ ] the system can scale beyond a single demo language without architectural changes ## Phase 10: Testing And Verification Purpose: Make syntax behavior trustworthy as the engine evolves. Tasks: - [ ] add unit tests for registry lookup - [ ] add unit tests for byte-to-rune range conversion - [ ] add unit tests for overlapping capture resolution - [ ] add unit tests for multi-line highlight extraction - [ ] add integration tests for visible rendering of highlighted lines - [ ] add edit tests for incremental updates after insert, delete, split, and join operations - [ ] add tests covering UTF-8 characters and mixed-width content Done when: - [ ] syntax bugs can be reproduced and locked down with tests --- ## Suggested Order Of Attack If working on this piece by piece, this is the recommended order: - [ ] Phase 1 first - [ ] Phase 2 second - [ ] Phase 3 third - [ ] Phase 4 fourth - [ ] Phase 5 fifth - [ ] Phase 6 sixth - [ ] Phase 7 seventh - [ ] Phase 10 continuously during all phases - [ ] Phase 8 only if profiling says it matters - [ ] Phase 9 after one language is solid --- ## Concrete First Milestone The first milestone should be intentionally small but architectural. Milestone goal: - [ ] create `internal/syntax` - [ ] add syntax engine field to `editor.Model` - [ ] make `view.go` consume syntax results instead of computing syntax itself - [ ] use placeholder or basic full-buffer syntax data, even if the first output is minimal This milestone matters because it breaks the most important bad dependency: rendering owning syntax. --- ## Concrete Second Milestone Milestone goal: - [ ] support one language with Tree-sitter full-buffer parse and full-buffer highlighting - [ ] cache per-line style results - [ ] render highlighted output without Chroma --- ## Concrete Third Milestone Milestone goal: - [ ] wire edit invalidation into buffer mutation paths - [ ] update Tree-sitter state after edits - [ ] keep highlights correct after normal editing commands --- ## Concrete Fourth Milestone Milestone goal: - [ ] add true incremental parse updates - [ ] rehighlight only changed regions - [ ] validate performance on larger files --- ## Open Design Questions - [ ] Should syntax state live inside `core.Buffer` or stay in the syntax engine keyed by buffer ID? - [ ] Should the renderer consume per-rune styles or span-based styles? - [ ] Should the syntax engine rebuild full source text on demand, or should buffers expose a stable full-text API? - [ ] How should unsupported filetypes render: plain text or fallback queryless token classes? - [ ] Should theme capture fallback be static or configurable? - [ ] Should parser/query assets be embedded or read from disk at runtime? --- ## Notes For Implementation Guidelines while building this: - [ ] keep parsing and rendering separate from the first commit - [ ] optimize only after correctness is established - [ ] prefer one supported language done correctly over several partial languages - [ ] keep UTF-8 correctness in mind from the first Tree-sitter integration - [ ] avoid letting temporary renderer hacks become permanent API boundaries - [ ] test line split, line join, backspace-at-start, delete-at-end, and multi-line comments early --- ## Definition Of Done This project is done when all of the following are true: - [ ] Chroma is gone - [ ] Tree-sitter is the only syntax engine - [ ] syntax state is maintained outside rendering - [ ] edits update syntax state correctly - [ ] renderer consumes cached syntax data cleanly - [ ] highlight output is correct for supported languages - [ ] UTF-8 behavior is correct - [ ] incremental parsing is working - [ ] tests cover the risky pieces