15 KiB
Tree-sitter Highlighting Implementation Plan
This document is the working plan for replacing Chroma-based highlighting with a Tree-sitter-first syntax system in Gim.
The current renderer in internal/editor/view.go is tightly coupled to Chroma and computes syntax styles during rendering.
That is the opposite of the architecture Tree-sitter wants. Tree-sitter works best when parsing and highlighting are
maintained as buffer state and rendering only consumes cached results.
This plan assumes:
- Chroma will be removed entirely.
- The renderer can be rebuilt to better fit the new syntax model.
- We are willing to do a full-buffer parse and full rehighlight first, then optimize incrementally.
- Correct architecture matters more than preserving the current render pipeline.
Project Goal
Build a syntax system where:
- each buffer owns syntax state
- Tree-sitter parsing is maintained across edits
- highlights are cached outside the renderer
- the renderer consumes precomputed style data
- byte-oriented parser results are converted into rune-oriented render data
Success Criteria
internal/editor/view.godoes not directly call Chroma or Tree-sitter- Chroma is fully removed from the codebase
- syntax state exists independently from rendering
- each buffer can be parsed and highlighted through Tree-sitter
- the renderer reads cached highlight data for visible lines
- edits invalidate and recompute syntax state
- the system handles UTF-8 text correctly
- multi-line captures work correctly
- incremental parsing exists for normal text edits
- syntax-related behavior has focused tests
Architectural Direction
The target data flow is:
Buffer -> Syntax Engine -> Highlight Cache -> Renderer
Not:
Renderer -> Parse -> Highlight -> Draw
Core separation of concerns:
internal/coreHolds text and buffer mutation behavior.internal/syntaxOwns parser state, queries, highlight cache, invalidation, and update logic.internal/styleOwns theme mapping from capture names tolipgloss.Style.internal/editorOwns rendering, cursor, selection overlay, gutters, statusline, and viewport logic.
Key Constraints And Risks
Byte vs Rune Indexing
Tree-sitter reports positions in bytes.
The editor currently renders by runes.
This means the syntax engine must own conversion from byte-based capture ranges to rune-based render ranges. This conversion should never be spread across the renderer.
- define one internal representation for parser/query positions
- define one internal representation for render positions
- keep conversion logic isolated inside
internal/syntax
Multi-line Captures
Strings, comments, and some language constructs can span lines.
- highlight cache supports ranges spanning multiple lines
- renderer can consume per-line results from multi-line captures
Query Precedence
Tree-sitter queries can produce overlapping captures.
- define deterministic precedence rules
- document how broad captures and specific captures are resolved
Full Parse First, Incremental Later
The initial version does not need to be optimal.
- initial version can parse and rehighlight the full buffer
- follow-up version uses
tree.Edit, old trees, and changed ranges
Target Package Layout
Planned package layout:
internal/syntax/types.gointernal/syntax/engine.gointernal/syntax/state.gointernal/syntax/registry.gointernal/syntax/treesitter.gointernal/syntax/query.gointernal/syntax/cache.gointernal/style/theme.goor equivalent capture-to-style mapping helpers
Likely existing files to update:
internal/editor/model.gointernal/editor/model_builder.gointernal/editor/view.gointernal/core/buffer.gointernal/command/handlers.gogo.mod
Likely files to remove or heavily reduce:
- Chroma-specific logic in
internal/style/style.go - direct Chroma setup in editor model builders and command handlers
Data Model Plan
1. Syntax Engine
The syntax engine should be editor-facing and buffer-aware.
Responsibilities:
- attach syntax state to buffers
- initialize parser and query data from filetype
- reparse after edits
- maintain dirty regions or dirty lines
- build cached line highlight results
- expose line results to the renderer
Checklist:
- define
Engineinterface ininternal/syntax/engine.go - decide whether syntax state is owned directly by the engine or attached to buffers
- add a field on
editor.Modelfor the syntax engine
2. Per-buffer Syntax State
Each buffer needs syntax state. The important point is that syntax is buffer-level, not window-level.
Suggested fields:
- parser
- language
- query or compiled query set
- current parse tree
- source snapshot or source builder access
- dirty line or dirty range tracking
- cached line highlight results
- version counter for cache invalidation
3. Highlight Cache Representation
Start with the representation that makes integration easiest.
Recommended first version:
- cached per-line
[]lipgloss.Style
Recommended longer-term representation:
- cached per-line spans like
[]Span{StartRune, EndRune, StyleID}
Implementation choice:
- phase 1 uses per-rune style maps for easiest renderer integration
- phase 2 evaluates switching internal cache to spans
4. Theme Mapping
Theme logic should map Tree-sitter captures such as keyword, function, string, comment, and type.builtin to lipgloss.Style.
Checklist:
- create capture-name to style mapping layer
- support fallback from specific captures to broader categories
- keep theme logic independent from parser/query logic
Phased Implementation Plan
Phase 0: Cleanly Commit To Tree-sitter
Purpose:
Remove architectural assumptions that only make sense for Chroma.
Tasks:
- decide the initial supported filetypes for Tree-sitter
- decide where query files live and how they are loaded
- decide whether
main.godemo code should be removed or moved to a more explicit demo location - audit Chroma references in the repo
- list all codepaths that currently construct or depend on
style.ChromaStyle
Done when:
- there is a clear inventory of Chroma-coupled code
- there is a clear inventory of Tree-sitter assets to load per language
Phase 1: Introduce Syntax As A Real Subsystem
Purpose:
Create the new architecture boundary before changing rendering behavior.
Tasks:
- create
internal/syntax - define the engine interface
- add a syntax engine field to
editor.Model - initialize the syntax engine in model construction
- remove direct highlighting calls from
view.go - route visible line highlighting through the syntax engine
Done when:
view.goasks the syntax subsystem for line highlight data- syntax work no longer begins inside the render loop itself
Phase 2: Define Buffer Text Access And Edit Notifications
Purpose:
Make buffer mutations visible to the syntax system in a structured way.
Tasks:
- decide whether edits are emitted from
core.Bufferor from editor actions - define an internal edit event type
- include enough data for Tree-sitter incremental edits later
- wire
SetLine,InsertLine, andDeleteLinechanges into syntax invalidation - decide whether first version uses whole-buffer invalidation
Suggested edit event fields:
- start byte
- old end byte
- new end byte
- start point
- old end point
- new end point
- affected line range
Done when:
- syntax invalidation happens when text changes
- invalidation does not depend on the render loop noticing text changed
Phase 3: Build Minimal Tree-sitter Registry And Loader
Purpose:
Provide one place that maps filetypes to languages and queries.
Tasks:
- create a registry for language metadata
- map filetype strings to Tree-sitter language bindings
- map filetypes to highlight query file paths
- load and compile queries once per language where practical
- define behavior for unsupported filetypes
Done when:
- opening a supported buffer can resolve a language and query set
- unsupported buffers degrade cleanly without crashing the renderer
Phase 4: Implement Full-buffer Parsing And Full-buffer Highlighting
Purpose:
Get correct Tree-sitter highlighting working before optimizing.
Tasks:
- create per-buffer syntax state
- build full source text from buffer contents
- parse full source text into a tree
- run highlight query across the full tree
- collect captures in deterministic order
- resolve overlapping captures consistently
- convert capture byte ranges into per-line rune-based style maps
- cache line results for renderer consumption
Done when:
- a supported filetype can be fully highlighted without Chroma
- renderer uses cached line results from Tree-sitter
Phase 5: Rebuild Renderer Integration Around Cached Syntax Data
Purpose:
Simplify the renderer so it consumes syntax cache rather than doing syntax work.
Tasks:
- redesign line render input around line text plus syntax cache
- ensure gutter rendering stays independent from syntax rendering
- ensure cursor overlay works on top of syntax styling
- ensure visual selection overlay works on top of syntax styling
- verify blank lines and end-of-line cursor rendering still behave correctly
- verify window width padding still uses background style consistently
Done when:
- line drawing is purely a render operation
- no parser or query logic exists in
view.go
Phase 6: Remove Chroma Completely
Purpose:
Delete the old highlighting path and simplify styling around capture-based theming.
Tasks:
- remove Chroma dependencies from
go.mod - remove
GetLexer - remove
MakeStyleMap - remove
Styles.ChromaStyleif no longer needed - replace Chroma-derived theme extraction with explicit Gim theme definitions
- update commands that currently switch Chroma styles
Done when:
- the build no longer depends on Chroma packages
- no codepath references Chroma tokens, lexers, or styles
Phase 7: Add Incremental Parsing
Purpose:
Move from correct-but-simple to correct-and-efficient.
Tasks:
- preserve old trees per buffer
- call
tree.Editbefore reparsing - parse new content using the old tree
- compute changed ranges
- decide whether rehighlighting happens by changed byte range, changed point range, or affected line range
- update only changed cache regions
- verify cache invalidation around inserted and deleted lines
Done when:
- small edits do not require full-buffer reparsing and rehighlighting
- highlighting updates correctly after insertions, deletions, joins, and splits
Phase 8: Improve Cache Representation If Needed
Purpose:
Reduce memory churn and simplify overlay logic if per-rune style maps become too heavy.
Tasks:
- measure cost of per-line
[]lipgloss.Style - consider switching internal storage to spans
- keep renderer-facing API stable if possible
- optimize only after correctness and incremental behavior exist
Done when:
- cache format is deliberate rather than inherited from the old renderer
Phase 9: Expand Language Support
Purpose:
Generalize the system after the first language works well.
Tasks:
- ship one language first, likely Go
- add additional language bindings and queries one by one
- verify filetype detection and registry behavior for each language
- define how language-specific capture tweaks are handled
Done when:
- the system can scale beyond a single demo language without architectural changes
Phase 10: Testing And Verification
Purpose:
Make syntax behavior trustworthy as the engine evolves.
Tasks:
- add unit tests for registry lookup
- add unit tests for byte-to-rune range conversion
- add unit tests for overlapping capture resolution
- add unit tests for multi-line highlight extraction
- add integration tests for visible rendering of highlighted lines
- add edit tests for incremental updates after insert, delete, split, and join operations
- add tests covering UTF-8 characters and mixed-width content
Done when:
- syntax bugs can be reproduced and locked down with tests
Suggested Order Of Attack
If working on this piece by piece, this is the recommended order:
- Phase 1 first
- Phase 2 second
- Phase 3 third
- Phase 4 fourth
- Phase 5 fifth
- Phase 6 sixth
- Phase 7 seventh
- Phase 10 continuously during all phases
- Phase 8 only if profiling says it matters
- Phase 9 after one language is solid
Concrete First Milestone
The first milestone should be intentionally small but architectural.
Milestone goal:
- create
internal/syntax - add syntax engine field to
editor.Model - make
view.goconsume syntax results instead of computing syntax itself - use placeholder or basic full-buffer syntax data, even if the first output is minimal
This milestone matters because it breaks the most important bad dependency: rendering owning syntax.
Concrete Second Milestone
Milestone goal:
- support one language with Tree-sitter full-buffer parse and full-buffer highlighting
- cache per-line style results
- render highlighted output without Chroma
Concrete Third Milestone
Milestone goal:
- wire edit invalidation into buffer mutation paths
- update Tree-sitter state after edits
- keep highlights correct after normal editing commands
Concrete Fourth Milestone
Milestone goal:
- add true incremental parse updates
- rehighlight only changed regions
- validate performance on larger files
Open Design Questions
- Should syntax state live inside
core.Bufferor stay in the syntax engine keyed by buffer ID? - Should the renderer consume per-rune styles or span-based styles?
- Should the syntax engine rebuild full source text on demand, or should buffers expose a stable full-text API?
- How should unsupported filetypes render: plain text or fallback queryless token classes?
- Should theme capture fallback be static or configurable?
- Should parser/query assets be embedded or read from disk at runtime?
Notes For Implementation
Guidelines while building this:
- keep parsing and rendering separate from the first commit
- optimize only after correctness is established
- prefer one supported language done correctly over several partial languages
- keep UTF-8 correctness in mind from the first Tree-sitter integration
- avoid letting temporary renderer hacks become permanent API boundaries
- test line split, line join, backspace-at-start, delete-at-end, and multi-line comments early
Definition Of Done
This project is done when all of the following are true:
- Chroma is gone
- Tree-sitter is the only syntax engine
- syntax state is maintained outside rendering
- edits update syntax state correctly
- renderer consumes cached syntax data cleanly
- highlight output is correct for supported languages
- UTF-8 behavior is correct
- incremental parsing is working
- tests cover the risky pieces