diff --git a/TREE_SITTER_HIGHLIGHTING_PLAN.md b/TREE_SITTER_HIGHLIGHTING_PLAN.md new file mode 100644 index 0000000..103d93b --- /dev/null +++ b/TREE_SITTER_HIGHLIGHTING_PLAN.md @@ -0,0 +1,525 @@ +# Tree-sitter Highlighting Implementation Plan + +This document is the working plan for replacing Chroma-based highlighting with a Tree-sitter-first syntax system in Gim. + +The current renderer in `internal/editor/view.go` is tightly coupled to Chroma and computes syntax styles during rendering. +That is the opposite of the architecture Tree-sitter wants. Tree-sitter works best when parsing and highlighting are +maintained as buffer state and rendering only consumes cached results. + +This plan assumes: + +- Chroma will be removed entirely. +- The renderer can be rebuilt to better fit the new syntax model. +- We are willing to do a full-buffer parse and full rehighlight first, then optimize incrementally. +- Correct architecture matters more than preserving the current render pipeline. + +--- + +## Project Goal + +Build a syntax system where: + +- each buffer owns syntax state +- Tree-sitter parsing is maintained across edits +- highlights are cached outside the renderer +- the renderer consumes precomputed style data +- byte-oriented parser results are converted into rune-oriented render data + +--- + +## Success Criteria + +- [ ] `internal/editor/view.go` does not directly call Chroma or Tree-sitter +- [ ] Chroma is fully removed from the codebase +- [ ] syntax state exists independently from rendering +- [ ] each buffer can be parsed and highlighted through Tree-sitter +- [ ] the renderer reads cached highlight data for visible lines +- [ ] edits invalidate and recompute syntax state +- [ ] the system handles UTF-8 text correctly +- [ ] multi-line captures work correctly +- [ ] incremental parsing exists for normal text edits +- [ ] syntax-related behavior has focused tests + +--- + +## Architectural Direction + +The target data flow is: + +`Buffer -> Syntax Engine -> Highlight Cache -> Renderer` + +Not: + +`Renderer -> Parse -> Highlight -> Draw` + +Core separation of concerns: + +- `internal/core` + Holds text and buffer mutation behavior. +- `internal/syntax` + Owns parser state, queries, highlight cache, invalidation, and update logic. +- `internal/style` + Owns theme mapping from capture names to `lipgloss.Style`. +- `internal/editor` + Owns rendering, cursor, selection overlay, gutters, statusline, and viewport logic. + +--- + +## Key Constraints And Risks + +### Byte vs Rune Indexing + +Tree-sitter reports positions in bytes. + +The editor currently renders by runes. + +This means the syntax engine must own conversion from byte-based capture ranges to rune-based render ranges. This conversion should never be spread across the renderer. + +- [ ] define one internal representation for parser/query positions +- [ ] define one internal representation for render positions +- [ ] keep conversion logic isolated inside `internal/syntax` + +### Multi-line Captures + +Strings, comments, and some language constructs can span lines. + +- [ ] highlight cache supports ranges spanning multiple lines +- [ ] renderer can consume per-line results from multi-line captures + +### Query Precedence + +Tree-sitter queries can produce overlapping captures. + +- [ ] define deterministic precedence rules +- [ ] document how broad captures and specific captures are resolved + +### Full Parse First, Incremental Later + +The initial version does not need to be optimal. + +- [ ] initial version can parse and rehighlight the full buffer +- [ ] follow-up version uses `tree.Edit`, old trees, and changed ranges + +--- + +## Target Package Layout + +Planned package layout: + +- [ ] `internal/syntax/types.go` +- [ ] `internal/syntax/engine.go` +- [ ] `internal/syntax/state.go` +- [ ] `internal/syntax/registry.go` +- [ ] `internal/syntax/treesitter.go` +- [ ] `internal/syntax/query.go` +- [ ] `internal/syntax/cache.go` +- [ ] `internal/style/theme.go` or equivalent capture-to-style mapping helpers + +Likely existing files to update: + +- [ ] `internal/editor/model.go` +- [ ] `internal/editor/model_builder.go` +- [ ] `internal/editor/view.go` +- [ ] `internal/core/buffer.go` +- [ ] `internal/command/handlers.go` +- [ ] `go.mod` + +Likely files to remove or heavily reduce: + +- [ ] Chroma-specific logic in `internal/style/style.go` +- [ ] direct Chroma setup in editor model builders and command handlers + +--- + +## Data Model Plan + +### 1. Syntax Engine + +The syntax engine should be editor-facing and buffer-aware. + +Responsibilities: + +- attach syntax state to buffers +- initialize parser and query data from filetype +- reparse after edits +- maintain dirty regions or dirty lines +- build cached line highlight results +- expose line results to the renderer + +Checklist: + +- [ ] define `Engine` interface in `internal/syntax/engine.go` +- [ ] decide whether syntax state is owned directly by the engine or attached to buffers +- [ ] add a field on `editor.Model` for the syntax engine + +### 2. Per-buffer Syntax State + +Each buffer needs syntax state. The important point is that syntax is buffer-level, not window-level. + +Suggested fields: + +- [ ] parser +- [ ] language +- [ ] query or compiled query set +- [ ] current parse tree +- [ ] source snapshot or source builder access +- [ ] dirty line or dirty range tracking +- [ ] cached line highlight results +- [ ] version counter for cache invalidation + +### 3. Highlight Cache Representation + +Start with the representation that makes integration easiest. + +Recommended first version: + +- cached per-line `[]lipgloss.Style` + +Recommended longer-term representation: + +- cached per-line spans like `[]Span{StartRune, EndRune, StyleID}` + +Implementation choice: + +- [ ] phase 1 uses per-rune style maps for easiest renderer integration +- [ ] phase 2 evaluates switching internal cache to spans + +### 4. Theme Mapping + +Theme logic should map Tree-sitter captures such as `keyword`, `function`, `string`, `comment`, and `type.builtin` to `lipgloss.Style`. + +Checklist: + +- [ ] create capture-name to style mapping layer +- [ ] support fallback from specific captures to broader categories +- [ ] keep theme logic independent from parser/query logic + +--- + +## Phased Implementation Plan + +## Phase 0: Cleanly Commit To Tree-sitter + +Purpose: + +Remove architectural assumptions that only make sense for Chroma. + +Tasks: + +- [ ] decide the initial supported filetypes for Tree-sitter +- [ ] decide where query files live and how they are loaded +- [ ] decide whether `main.go` demo code should be removed or moved to a more explicit demo location +- [ ] audit Chroma references in the repo +- [ ] list all codepaths that currently construct or depend on `style.ChromaStyle` + +Done when: + +- [ ] there is a clear inventory of Chroma-coupled code +- [ ] there is a clear inventory of Tree-sitter assets to load per language + +## Phase 1: Introduce Syntax As A Real Subsystem + +Purpose: + +Create the new architecture boundary before changing rendering behavior. + +Tasks: + +- [ ] create `internal/syntax` +- [ ] define the engine interface +- [ ] add a syntax engine field to `editor.Model` +- [ ] initialize the syntax engine in model construction +- [ ] remove direct highlighting calls from `view.go` +- [ ] route visible line highlighting through the syntax engine + +Done when: + +- [ ] `view.go` asks the syntax subsystem for line highlight data +- [ ] syntax work no longer begins inside the render loop itself + +## Phase 2: Define Buffer Text Access And Edit Notifications + +Purpose: + +Make buffer mutations visible to the syntax system in a structured way. + +Tasks: + +- [ ] decide whether edits are emitted from `core.Buffer` or from editor actions +- [ ] define an internal edit event type +- [ ] include enough data for Tree-sitter incremental edits later +- [ ] wire `SetLine`, `InsertLine`, and `DeleteLine` changes into syntax invalidation +- [ ] decide whether first version uses whole-buffer invalidation + +Suggested edit event fields: + +- [ ] start byte +- [ ] old end byte +- [ ] new end byte +- [ ] start point +- [ ] old end point +- [ ] new end point +- [ ] affected line range + +Done when: + +- [ ] syntax invalidation happens when text changes +- [ ] invalidation does not depend on the render loop noticing text changed + +## Phase 3: Build Minimal Tree-sitter Registry And Loader + +Purpose: + +Provide one place that maps filetypes to languages and queries. + +Tasks: + +- [ ] create a registry for language metadata +- [ ] map filetype strings to Tree-sitter language bindings +- [ ] map filetypes to highlight query file paths +- [ ] load and compile queries once per language where practical +- [ ] define behavior for unsupported filetypes + +Done when: + +- [ ] opening a supported buffer can resolve a language and query set +- [ ] unsupported buffers degrade cleanly without crashing the renderer + +## Phase 4: Implement Full-buffer Parsing And Full-buffer Highlighting + +Purpose: + +Get correct Tree-sitter highlighting working before optimizing. + +Tasks: + +- [ ] create per-buffer syntax state +- [ ] build full source text from buffer contents +- [ ] parse full source text into a tree +- [ ] run highlight query across the full tree +- [ ] collect captures in deterministic order +- [ ] resolve overlapping captures consistently +- [ ] convert capture byte ranges into per-line rune-based style maps +- [ ] cache line results for renderer consumption + +Done when: + +- [ ] a supported filetype can be fully highlighted without Chroma +- [ ] renderer uses cached line results from Tree-sitter + +## Phase 5: Rebuild Renderer Integration Around Cached Syntax Data + +Purpose: + +Simplify the renderer so it consumes syntax cache rather than doing syntax work. + +Tasks: + +- [ ] redesign line render input around line text plus syntax cache +- [ ] ensure gutter rendering stays independent from syntax rendering +- [ ] ensure cursor overlay works on top of syntax styling +- [ ] ensure visual selection overlay works on top of syntax styling +- [ ] verify blank lines and end-of-line cursor rendering still behave correctly +- [ ] verify window width padding still uses background style consistently + +Done when: + +- [ ] line drawing is purely a render operation +- [ ] no parser or query logic exists in `view.go` + +## Phase 6: Remove Chroma Completely + +Purpose: + +Delete the old highlighting path and simplify styling around capture-based theming. + +Tasks: + +- [ ] remove Chroma dependencies from `go.mod` +- [ ] remove `GetLexer` +- [ ] remove `MakeStyleMap` +- [ ] remove `Styles.ChromaStyle` if no longer needed +- [ ] replace Chroma-derived theme extraction with explicit Gim theme definitions +- [ ] update commands that currently switch Chroma styles + +Done when: + +- [ ] the build no longer depends on Chroma packages +- [ ] no codepath references Chroma tokens, lexers, or styles + +## Phase 7: Add Incremental Parsing + +Purpose: + +Move from correct-but-simple to correct-and-efficient. + +Tasks: + +- [ ] preserve old trees per buffer +- [ ] call `tree.Edit` before reparsing +- [ ] parse new content using the old tree +- [ ] compute changed ranges +- [ ] decide whether rehighlighting happens by changed byte range, changed point range, or affected line range +- [ ] update only changed cache regions +- [ ] verify cache invalidation around inserted and deleted lines + +Done when: + +- [ ] small edits do not require full-buffer reparsing and rehighlighting +- [ ] highlighting updates correctly after insertions, deletions, joins, and splits + +## Phase 8: Improve Cache Representation If Needed + +Purpose: + +Reduce memory churn and simplify overlay logic if per-rune style maps become too heavy. + +Tasks: + +- [ ] measure cost of per-line `[]lipgloss.Style` +- [ ] consider switching internal storage to spans +- [ ] keep renderer-facing API stable if possible +- [ ] optimize only after correctness and incremental behavior exist + +Done when: + +- [ ] cache format is deliberate rather than inherited from the old renderer + +## Phase 9: Expand Language Support + +Purpose: + +Generalize the system after the first language works well. + +Tasks: + +- [ ] ship one language first, likely Go +- [ ] add additional language bindings and queries one by one +- [ ] verify filetype detection and registry behavior for each language +- [ ] define how language-specific capture tweaks are handled + +Done when: + +- [ ] the system can scale beyond a single demo language without architectural changes + +## Phase 10: Testing And Verification + +Purpose: + +Make syntax behavior trustworthy as the engine evolves. + +Tasks: + +- [ ] add unit tests for registry lookup +- [ ] add unit tests for byte-to-rune range conversion +- [ ] add unit tests for overlapping capture resolution +- [ ] add unit tests for multi-line highlight extraction +- [ ] add integration tests for visible rendering of highlighted lines +- [ ] add edit tests for incremental updates after insert, delete, split, and join operations +- [ ] add tests covering UTF-8 characters and mixed-width content + +Done when: + +- [ ] syntax bugs can be reproduced and locked down with tests + +--- + +## Suggested Order Of Attack + +If working on this piece by piece, this is the recommended order: + +- [ ] Phase 1 first +- [ ] Phase 2 second +- [ ] Phase 3 third +- [ ] Phase 4 fourth +- [ ] Phase 5 fifth +- [ ] Phase 6 sixth +- [ ] Phase 7 seventh +- [ ] Phase 10 continuously during all phases +- [ ] Phase 8 only if profiling says it matters +- [ ] Phase 9 after one language is solid + +--- + +## Concrete First Milestone + +The first milestone should be intentionally small but architectural. + +Milestone goal: + +- [ ] create `internal/syntax` +- [ ] add syntax engine field to `editor.Model` +- [ ] make `view.go` consume syntax results instead of computing syntax itself +- [ ] use placeholder or basic full-buffer syntax data, even if the first output is minimal + +This milestone matters because it breaks the most important bad dependency: rendering owning syntax. + +--- + +## Concrete Second Milestone + +Milestone goal: + +- [ ] support one language with Tree-sitter full-buffer parse and full-buffer highlighting +- [ ] cache per-line style results +- [ ] render highlighted output without Chroma + +--- + +## Concrete Third Milestone + +Milestone goal: + +- [ ] wire edit invalidation into buffer mutation paths +- [ ] update Tree-sitter state after edits +- [ ] keep highlights correct after normal editing commands + +--- + +## Concrete Fourth Milestone + +Milestone goal: + +- [ ] add true incremental parse updates +- [ ] rehighlight only changed regions +- [ ] validate performance on larger files + +--- + +## Open Design Questions + +- [ ] Should syntax state live inside `core.Buffer` or stay in the syntax engine keyed by buffer ID? +- [ ] Should the renderer consume per-rune styles or span-based styles? +- [ ] Should the syntax engine rebuild full source text on demand, or should buffers expose a stable full-text API? +- [ ] How should unsupported filetypes render: plain text or fallback queryless token classes? +- [ ] Should theme capture fallback be static or configurable? +- [ ] Should parser/query assets be embedded or read from disk at runtime? + +--- + +## Notes For Implementation + +Guidelines while building this: + +- [ ] keep parsing and rendering separate from the first commit +- [ ] optimize only after correctness is established +- [ ] prefer one supported language done correctly over several partial languages +- [ ] keep UTF-8 correctness in mind from the first Tree-sitter integration +- [ ] avoid letting temporary renderer hacks become permanent API boundaries +- [ ] test line split, line join, backspace-at-start, delete-at-end, and multi-line comments early + +--- + +## Definition Of Done + +This project is done when all of the following are true: + +- [ ] Chroma is gone +- [ ] Tree-sitter is the only syntax engine +- [ ] syntax state is maintained outside rendering +- [ ] edits update syntax state correctly +- [ ] renderer consumes cached syntax data cleanly +- [ ] highlight output is correct for supported languages +- [ ] UTF-8 behavior is correct +- [ ] incremental parsing is working +- [ ] tests cover the risky pieces diff --git a/flake.nix b/flake.nix index 12556b6..ee5f6d4 100644 --- a/flake.nix +++ b/flake.nix @@ -41,7 +41,7 @@ export GOOS=linux export GOARCH=amd64 export CGO_CFLAGS=-Wno-error=cpp; - export CGO_ENABLED=0 + export CGO_ENABLED=1 # Exec zsh to replace the current shell process with zsh. # This ensures your prompt and zsh configurations load correctly. diff --git a/go.mod b/go.mod index 1b8f938..cc27f68 100644 --- a/go.mod +++ b/go.mod @@ -7,6 +7,8 @@ require ( github.com/charmbracelet/bubbletea v1.3.10 github.com/charmbracelet/lipgloss v1.1.0 github.com/charmbracelet/x/exp/teatest v0.0.0-20260209132835-6b065b8ba62c + github.com/tree-sitter/go-tree-sitter v0.25.0 + github.com/tree-sitter/tree-sitter-javascript v0.25.0 ) require ( @@ -25,11 +27,13 @@ require ( github.com/lucasb-eyer/go-colorful v1.3.0 // indirect github.com/mattn/go-isatty v0.0.20 // indirect github.com/mattn/go-localereader v0.0.1 // indirect + github.com/mattn/go-pointer v0.0.1 // indirect github.com/mattn/go-runewidth v0.0.19 // indirect github.com/muesli/ansi v0.0.0-20230316100256-276c6243b2f6 // indirect github.com/muesli/cancelreader v0.2.2 // indirect github.com/muesli/termenv v0.16.0 // indirect github.com/rivo/uniseg v0.4.7 // indirect + github.com/tree-sitter/tree-sitter-go v0.25.0 // indirect github.com/xo/terminfo v0.0.0-20220910002029-abceb7e1c41e // indirect golang.org/x/sys v0.38.0 // indirect golang.org/x/text v0.28.0 // indirect diff --git a/go.sum b/go.sum index 1c426f4..6834b54 100644 --- a/go.sum +++ b/go.sum @@ -30,6 +30,8 @@ github.com/clipperhouse/stringish v0.1.1 h1:+NSqMOr3GR6k1FdRhhnXrLfztGzuG+VuFDfa github.com/clipperhouse/stringish v0.1.1/go.mod h1:v/WhFtE1q0ovMta2+m+UbpZ+2/HEXNWYXQgCt4hdOzA= github.com/clipperhouse/uax29/v2 v2.5.0 h1:x7T0T4eTHDONxFJsL94uKNKPHrclyFI0lm7+w94cO8U= github.com/clipperhouse/uax29/v2 v2.5.0/go.mod h1:Wn1g7MK6OoeDT0vL+Q0SQLDz/KpfsVRgg6W7ihQeh4g= +github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c= +github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= github.com/dlclark/regexp2 v1.11.5 h1:Q/sSnsKerHeCkc/jSTNq1oCm7KiVgUMZRDUoRu0JQZQ= github.com/dlclark/regexp2 v1.11.5/go.mod h1:DHkYz0B9wPfa6wondMfaivmHpzrQ3v9q8cnmRbL6yW8= github.com/erikgeiser/coninput v0.0.0-20211004153227-1c3628e74d0f h1:Y/CXytFA4m6baUTXGLOoWe4PQhGxaX0KpnayAqC48p4= @@ -42,6 +44,8 @@ github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWE github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y= github.com/mattn/go-localereader v0.0.1 h1:ygSAOl7ZXTx4RdPYinUpg6W99U8jWvWi9Ye2JC/oIi4= github.com/mattn/go-localereader v0.0.1/go.mod h1:8fBrzywKY7BI3czFoHkuzRoWE9C+EiG4R1k4Cjx5p88= +github.com/mattn/go-pointer v0.0.1 h1:n+XhsuGeVO6MEAp7xyEukFINEa+Quek5psIR/ylA6o0= +github.com/mattn/go-pointer v0.0.1/go.mod h1:2zXcozF6qYGgmsG+SeTZz3oAbFLdD3OWqnUbNvJZAlc= github.com/mattn/go-runewidth v0.0.19 h1:v++JhqYnZuu5jSKrk9RbgF5v4CGUjqRfBm05byFGLdw= github.com/mattn/go-runewidth v0.0.19/go.mod h1:XBkDxAl56ILZc9knddidhrOlY5R/pDhgLpndooCuJAs= github.com/muesli/ansi v0.0.0-20230316100256-276c6243b2f6 h1:ZK8zHtRHOkbHy6Mmr5D264iyp3TiX5OmNcI5cIARiQI= @@ -50,8 +54,38 @@ github.com/muesli/cancelreader v0.2.2 h1:3I4Kt4BQjOR54NavqnDogx/MIoWBFa0StPA8ELU github.com/muesli/cancelreader v0.2.2/go.mod h1:3XuTXfFS2VjM+HTLZY9Ak0l6eUKfijIfMUZ4EgX0QYo= github.com/muesli/termenv v0.16.0 h1:S5AlUN9dENB57rsbnkPyfdGuWIlkmzJjbFf0Tf5FWUc= github.com/muesli/termenv v0.16.0/go.mod h1:ZRfOIKPFDYQoDFF4Olj7/QJbW60Ol/kL1pU3VfY/Cnk= +github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM= +github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= github.com/rivo/uniseg v0.4.7 h1:WUdvkW8uEhrYfLC4ZzdpI2ztxP1I582+49Oc5Mq64VQ= github.com/rivo/uniseg v0.4.7/go.mod h1:FN3SvrM+Zdj16jyLfmOkMNblXMcoc8DfTHruCPUcx88= +github.com/stretchr/testify v1.10.0 h1:Xv5erBjTwe/5IxqUQTdXv5kgmIvbHo3QQyRwhJsOfJA= +github.com/stretchr/testify v1.10.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY= +github.com/tree-sitter/go-tree-sitter v0.25.0 h1:sx6kcg8raRFCvc9BnXglke6axya12krCJF5xJ2sftRU= +github.com/tree-sitter/go-tree-sitter v0.25.0/go.mod h1:r77ig7BikoZhHrrsjAnv8RqGti5rtSyvDHPzgTPsUuU= +github.com/tree-sitter/tree-sitter-c v0.23.4 h1:nBPH3FV07DzAD7p0GfNvXM+Y7pNIoPenQWBpvM++t4c= +github.com/tree-sitter/tree-sitter-c v0.23.4/go.mod h1:MkI5dOiIpeN94LNjeCp8ljXN/953JCwAby4bClMr6bw= +github.com/tree-sitter/tree-sitter-cpp v0.23.4 h1:LaWZsiqQKvR65yHgKmnaqA+uz6tlDJTJFCyFIeZU/8w= +github.com/tree-sitter/tree-sitter-cpp v0.23.4/go.mod h1:doqNW64BriC7WBCQ1klf0KmJpdEvfxyXtoEybnBo6v8= +github.com/tree-sitter/tree-sitter-embedded-template v0.23.2 h1:nFkkH6Sbe56EXLmZBqHHcamTpmz3TId97I16EnGy4rg= +github.com/tree-sitter/tree-sitter-embedded-template v0.23.2/go.mod h1:HNPOhN0qF3hWluYLdxWs5WbzP/iE4aaRVPMsdxuzIaQ= +github.com/tree-sitter/tree-sitter-go v0.25.0 h1:cEB0Q3LHgZtS+ECHx9wcP7AwzoOddJFQCVmytX42cVU= +github.com/tree-sitter/tree-sitter-go v0.25.0/go.mod h1:Jrx8QqYN0v7npv1fJRH1AznddllYiCMUChtVjxPK040= +github.com/tree-sitter/tree-sitter-html v0.23.2 h1:1UYDV+Yd05GGRhVnTcbP58GkKLSHHZwVaN+lBZV11Lc= +github.com/tree-sitter/tree-sitter-html v0.23.2/go.mod h1:gpUv/dG3Xl/eebqgeYeFMt+JLOY9cgFinb/Nw08a9og= +github.com/tree-sitter/tree-sitter-java v0.23.5 h1:J9YeMGMwXYlKSP3K4Us8CitC6hjtMjqpeOf2GGo6tig= +github.com/tree-sitter/tree-sitter-java v0.23.5/go.mod h1:NRKlI8+EznxA7t1Yt3xtraPk1Wzqh3GAIC46wxvc320= +github.com/tree-sitter/tree-sitter-javascript v0.25.0 h1:ZkWETb66/w8cc13yhfnNuHOLDQWl3BnKlH6f9AdR88c= +github.com/tree-sitter/tree-sitter-javascript v0.25.0/go.mod h1:lmGD1EJdCA+v0S1u2fFgepMg/opzSg/4pgFym2FPGAs= +github.com/tree-sitter/tree-sitter-json v0.24.8 h1:tV5rMkihgtiOe14a9LHfDY5kzTl5GNUYe6carZBn0fQ= +github.com/tree-sitter/tree-sitter-json v0.24.8/go.mod h1:F351KK0KGvCaYbZ5zxwx/gWWvZhIDl0eMtn+1r+gQbo= +github.com/tree-sitter/tree-sitter-php v0.23.11 h1:iHewsLNDmznh8kgGyfWfujsZxIz1YGbSd2ZTEM0ZiP8= +github.com/tree-sitter/tree-sitter-php v0.23.11/go.mod h1:T/kbfi+UcCywQfUNAJnGTN/fMSUjnwPXA8k4yoIks74= +github.com/tree-sitter/tree-sitter-python v0.23.6 h1:qHnWFR5WhtMQpxBZRwiaU5Hk/29vGju6CVtmvu5Haas= +github.com/tree-sitter/tree-sitter-python v0.23.6/go.mod h1:cpdthSy/Yoa28aJFBscFHlGiU+cnSiSh1kuDVtI8YeM= +github.com/tree-sitter/tree-sitter-ruby v0.23.1 h1:T/NKHUA+iVbHM440hFx+lzVOzS4dV6z8Qw8ai+72bYo= +github.com/tree-sitter/tree-sitter-ruby v0.23.1/go.mod h1:kUS4kCCQloFcdX6sdpr8p6r2rogbM6ZjTox5ZOQy8cA= +github.com/tree-sitter/tree-sitter-rust v0.23.2 h1:6AtoooCW5GqNrRpfnvl0iUhxTAZEovEmLKDbyHlfw90= +github.com/tree-sitter/tree-sitter-rust v0.23.2/go.mod h1:hfeGWic9BAfgTrc7Xf6FaOAguCFJRo3RBbs7QJ6D7MI= github.com/xo/terminfo v0.0.0-20220910002029-abceb7e1c41e h1:JVG44RsyaB9T2KIHavMF/ppJZNG9ZpyihvCd0w101no= github.com/xo/terminfo v0.0.0-20220910002029-abceb7e1c41e/go.mod h1:RbqR21r5mrJuqunuUZ/Dhy/avygyECGrLceyNeo4LiM= golang.org/x/exp v0.0.0-20231006140011-7918f672742d h1:jtJma62tbqLibJ5sFQz8bKtEM8rJBtfilJ2qTU199MI= @@ -62,3 +96,5 @@ golang.org/x/sys v0.38.0 h1:3yZWxaJjBmCWXqhN1qh02AkOnCQ1poK6oF+a7xWL6Gc= golang.org/x/sys v0.38.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks= golang.org/x/text v0.28.0 h1:rhazDwis8INMIwQ4tpjLDzUhx6RlXqZNPEM0huQojng= golang.org/x/text v0.28.0/go.mod h1:U8nCwOR8jO/marOQ0QbDiOngZVEBB7MAiitBuMjXiNU= +gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= +gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= diff --git a/main.go b/main.go new file mode 100644 index 0000000..616d724 --- /dev/null +++ b/main.go @@ -0,0 +1,373 @@ +package main + +import ( + "fmt" + "os" + "sort" + "strings" + + sitter "github.com/tree-sitter/go-tree-sitter" + ts_go "github.com/tree-sitter/tree-sitter-go/bindings/go" +) + +// Sample Go source to highlight +const source = ` + package main + + func main () { + println("Hello" + 5) + } +` + +type Highlight struct { + StartRow uint // 0-indexed line number + StartCol uint // 0-indexed column (bytes) + EndRow uint + EndCol uint + Capture string +} + +// Theme maps capture names to ANSI escape codes. +// In your editor you'd use lipgloss styles instead. +var theme = map[string]string{ + "keyword": "\033[1;35m", // bold magenta + "keyword.type": "\033[1;35m", + "keyword.function": "\033[1;35m", + "keyword.return": "\033[1;35m", + "keyword.coroutine": "\033[1;35m", + "keyword.repeat": "\033[1;35m", + "keyword.import": "\033[1;35m", + "keyword.conditional": "\033[1;35m", + "type": "\033[33m", // yellow + "type.builtin": "\033[33m", + "type.definition": "\033[1;33m", // bold yellow + "function": "\033[1;34m", // bold blue + "function.call": "\033[34m", // blue + "function.method": "\033[34m", + "function.method.call": "\033[34m", + "function.builtin": "\033[1;31m", + "variable": "\033[37m", // white + "variable.parameter": "\033[3;37m", // italic white + "variable.member": "\033[37m", + "constant": "\033[1;36m", // bold cyan + "constant.builtin": "\033[1;36m", + "string": "\033[32m", // green + "string.escape": "\033[1;32m", + "number": "\033[36m", // cyan + "number.float": "\033[36m", + "boolean": "\033[36m", + "operator": "\033[93m", // bright yellow + "comment": "\033[2;37m", // dim + "comment.documentation": "\033[2;37m", + "module": "\033[35m", // magenta + "label": "\033[33m", + "property": "\033[37m", + "constructor": "\033[1;33m", + "punctuation.delimiter": "\033[37m", + "punctuation.bracket": "\033[37m", +} + +const reset = "\033[0m" + +func main() { + code := []byte(source) + lines := strings.Split(source, "\n") + + // --- Step 1: Parse --- + lang := sitter.NewLanguage(ts_go.Language()) + parser := sitter.NewParser() + defer parser.Close() + parser.SetLanguage(lang) + + tree := parser.Parse(code, nil) + defer tree.Close() + root := tree.RootNode() + + // --- Step 2: Load query from highlights.scm --- + queryBytes, err := os.ReadFile("queries/go/highlights.scm") + if err != nil { + fmt.Fprintf(os.Stderr, "Failed to read highlights.scm: %v\n", err) + return + } + + query, queryErr := sitter.NewQuery(lang, string(queryBytes)) + if queryErr != nil { + fmt.Fprintf(os.Stderr, "Query error: %v\n", queryErr) + return + } + defer query.Close() + + // --- Step 3: Run query --- + cursor := sitter.NewQueryCursor() + defer cursor.Close() + captures := cursor.Captures(query, root, code) + + var highlights []Highlight + for match, captureIdx := captures.Next(); match != nil; match, captureIdx = captures.Next() { + capture := match.Captures[captureIdx] + captureName := query.CaptureNames()[capture.Index] + // Skip @spell — it's a nvim spellcheck hint, not a highlight + if captureName == "spell" { + continue + } + node := capture.Node + start := node.StartPosition() + end := node.EndPosition() + highlights = append(highlights, Highlight{ + StartRow: start.Row, + StartCol: start.Column, + EndRow: end.Row, + EndCol: end.Column, + Capture: captureName, + }) + } + + // --- Step 4: Show captures with positions --- + fmt.Println("=== Captures (row:col → row:col) ===") + for _, h := range highlights { + // Extract text for display using the source lines + text := extractText(lines, h) + fmt.Printf(" %d:%-2d → %d:%-2d @%-22s %q\n", + h.StartRow, h.StartCol, h.EndRow, h.EndCol, h.Capture, text) + } + fmt.Println() + + // --- Step 5: Render with colors using row:col positions --- + // Build a per-line map of column ranges to capture names. + // Sort so wider (less specific) ranges come first — last writer wins. + sort.Slice(highlights, func(i, j int) bool { + if highlights[i].StartRow == highlights[j].StartRow { + if highlights[i].StartCol == highlights[j].StartCol { + // Wider range first so more specific overwrites it + if highlights[i].EndRow == highlights[j].EndRow { + return highlights[i].EndCol > highlights[j].EndCol + } + return highlights[i].EndRow > highlights[j].EndRow + } + return highlights[i].StartCol < highlights[j].StartCol + } + return highlights[i].StartRow < highlights[j].StartRow + }) + + // captureAt[row][col] = capture name (last writer wins) + captureAt := make(map[uint]map[uint]string) + for _, h := range highlights { + for row := h.StartRow; row <= h.EndRow; row++ { + if captureAt[row] == nil { + captureAt[row] = make(map[uint]string) + } + startCol := uint(0) + if row == h.StartRow { + startCol = h.StartCol + } + endCol := uint(len(lines[row])) + if row == h.EndRow { + endCol = h.EndCol + } + for col := startCol; col < endCol; col++ { + captureAt[row][col] = h.Capture + } + } + } + + fmt.Println("=== Colored output ===") + printColored(lines, captureAt) + + // ===================================================================== + // INCREMENTAL PARSING DEMO + // ===================================================================== + // When a user types in your editor, you don't re-parse the whole file. + // Instead you: + // 1. Tell the OLD tree what changed (tree.Edit) + // 2. Parse the new source, passing the old tree + // 3. Tree-sitter reuses unchanged nodes and only re-parses the edit + // 4. Use ChangedRanges to know which lines need re-highlighting + // + // This is O(edit size + log(file size)) instead of O(file size). + + fmt.Println("\n========================================") + fmt.Println("=== INCREMENTAL PARSE DEMO ===") + fmt.Println("========================================") + fmt.Println() + + // Simulate: user changes "Hello" → "Goodbye" on row 4 + // Before: println("Hello" + 5) + // After: println("Goodbye" + 5) + oldSource := source + newSource := strings.Replace(oldSource, `"Hello"`, `"Goodbye"`, 1) + + // Find where the edit happened (in a real editor you already know this + // from the keystroke — you don't need to search for it) + editStart := strings.Index(oldSource, `"Hello"`) + oldEnd := editStart + len(`"Hello"`) + newEnd := editStart + len(`"Goodbye"`) + + // Convert byte offset to row:col for the InputEdit + editStartPoint := byteToPoint(oldSource, uint(editStart)) + oldEndPoint := byteToPoint(oldSource, uint(oldEnd)) + newEndPoint := byteToPoint(newSource, uint(newEnd)) + + fmt.Printf("Edit: replaced %q → %q\n", "Hello", "Goodbye") + fmt.Printf(" at byte %d, row %d col %d\n", editStart, editStartPoint.Row, editStartPoint.Column) + fmt.Println() + + // Step 1: Tell the old tree what changed + tree.Edit(&sitter.InputEdit{ + StartByte: uint(editStart), + OldEndByte: uint(oldEnd), + NewEndByte: uint(newEnd), + StartPosition: editStartPoint, + OldEndPosition: oldEndPoint, + NewEndPosition: newEndPoint, + }) + + // Step 2: Parse the new source, passing the old (edited) tree. + // Tree-sitter will REUSE all nodes that weren't affected by the edit + // and only re-parse the region around the change. + newCode := []byte(newSource) + newTree := parser.Parse(newCode, tree) + defer newTree.Close() + + // Step 3: See exactly which ranges changed + changedRanges := newTree.ChangedRanges(tree) + fmt.Printf("Changed ranges: %d\n", len(changedRanges)) + for i, r := range changedRanges { + fmt.Printf(" range %d: row %d:%d → row %d:%d\n", + i, r.StartPoint.Row, r.StartPoint.Column, r.EndPoint.Row, r.EndPoint.Column) + } + fmt.Println() + + // In your editor, you'd ONLY re-run the highlight query on the changed + // ranges (using cursor.SetByteRange or cursor.SetPointRange), then + // update just those lines in your display. Everything else stays cached. + + // For this demo, let's re-highlight the full new tree to show the result + newRoot := newTree.RootNode() + newLines := strings.Split(newSource, "\n") + + cursor2 := sitter.NewQueryCursor() + defer cursor2.Close() + newCaptures := cursor2.Captures(query, newRoot, newCode) + + var newHighlights []Highlight + for match, captureIdx := newCaptures.Next(); match != nil; match, captureIdx = newCaptures.Next() { + capture := match.Captures[captureIdx] + captureName := query.CaptureNames()[capture.Index] + if captureName == "spell" { + continue + } + node := capture.Node + start := node.StartPosition() + end := node.EndPosition() + newHighlights = append(newHighlights, Highlight{ + StartRow: start.Row, StartCol: start.Column, + EndRow: end.Row, EndCol: end.Column, + Capture: captureName, + }) + } + + sort.Slice(newHighlights, func(i, j int) bool { + if newHighlights[i].StartRow == newHighlights[j].StartRow { + if newHighlights[i].StartCol == newHighlights[j].StartCol { + if newHighlights[i].EndRow == newHighlights[j].EndRow { + return newHighlights[i].EndCol > newHighlights[j].EndCol + } + return newHighlights[i].EndRow > newHighlights[j].EndRow + } + return newHighlights[i].StartCol < newHighlights[j].StartCol + } + return newHighlights[i].StartRow < newHighlights[j].StartRow + }) + + newCaptureAt := make(map[uint]map[uint]string) + for _, h := range newHighlights { + for row := h.StartRow; row <= h.EndRow; row++ { + if newCaptureAt[row] == nil { + newCaptureAt[row] = make(map[uint]string) + } + startCol := uint(0) + if row == h.StartRow { + startCol = h.StartCol + } + endCol := uint(len(newLines[row])) + if row == h.EndRow { + endCol = h.EndCol + } + for col := startCol; col < endCol; col++ { + newCaptureAt[row][col] = h.Capture + } + } + } + + fmt.Println("=== After edit (colored output) ===") + printColored(newLines, newCaptureAt) +} + +// printColored renders source lines with ANSI colors based on the capture map. +func printColored(lines []string, captureAt map[uint]map[uint]string) { + for row, line := range lines { + currentCapture := "" + for col := uint(0); col < uint(len(line)); col++ { + cap := "" + if rowMap, ok := captureAt[uint(row)]; ok { + cap = rowMap[col] + } + if cap != currentCapture { + if currentCapture != "" { + fmt.Print(reset) + } + if color, ok := theme[cap]; ok { + fmt.Print(color) + } + currentCapture = cap + } + fmt.Print(string(line[col])) + } + if currentCapture != "" { + fmt.Print(reset) + } + fmt.Println() + } +} + +// byteToPoint converts a byte offset into a row:col Point. +func byteToPoint(src string, offset uint) sitter.Point { + row := uint(0) + col := uint(0) + for i := range offset { + if src[i] == '\n' { + row++ + col = 0 + } else { + col++ + } + } + return sitter.NewPoint(row, col) +} + +// extractText pulls the highlighted text from source lines using row:col positions. +func extractText(lines []string, h Highlight) string { + if h.StartRow == h.EndRow { + line := lines[h.StartRow] + end := min(h.EndCol, uint(len(line))) + return line[h.StartCol:end] + } + // Multi-line highlight (rare, but possible for block comments etc.) + var result string + for row := h.StartRow; row <= h.EndRow; row++ { + line := lines[row] + start := uint(0) + if row == h.StartRow { + start = h.StartCol + } + end := uint(len(line)) + if row == h.EndRow { + end = h.EndCol + } + if result != "" { + result += "\n" + } + result += line[start:end] + } + return result +} diff --git a/queries/go/highlights.scm b/queries/go/highlights.scm new file mode 100644 index 0000000..7675cb7 --- /dev/null +++ b/queries/go/highlights.scm @@ -0,0 +1,254 @@ +; Forked from tree-sitter-go +; Copyright (c) 2014 Max Brunsfeld (The MIT License) +; +; Identifiers +(type_identifier) @type + +(type_spec + name: (type_identifier) @type.definition) + +(field_identifier) @property + +(identifier) @variable + +(package_identifier) @module + +(parameter_declaration + (identifier) @variable.parameter) + +(variadic_parameter_declaration + (identifier) @variable.parameter) + +(label_name) @label + +(const_spec + name: (identifier) @constant) + +; Function calls +(call_expression + function: (identifier) @function.call) + +(call_expression + function: (selector_expression + field: (field_identifier) @function.method.call)) + +; Function definitions +(function_declaration + name: (identifier) @function) + +(method_declaration + name: (field_identifier) @function.method) + +(method_elem + name: (field_identifier) @function.method) + +; Constructors +((call_expression + (identifier) @constructor) + (#lua-match? @constructor "^[nN]ew.+$")) + +((call_expression + (identifier) @constructor) + (#lua-match? @constructor "^[mM]ake.+$")) + +; Operators +[ + "--" + "-" + "-=" + ":=" + "!" + "!=" + "..." + "*" + "*" + "*=" + "/" + "/=" + "&" + "&&" + "&=" + "&^" + "&^=" + "%" + "%=" + "^" + "^=" + "+" + "++" + "+=" + "<-" + "<" + "<<" + "<<=" + "<=" + "=" + "==" + ">" + ">=" + ">>" + ">>=" + "|" + "|=" + "||" + "~" +] @operator + +; Keywords +[ + "break" + "const" + "continue" + "default" + "defer" + "goto" + "range" + "select" + "var" + "fallthrough" +] @keyword + +[ + "type" + "struct" + "interface" +] @keyword.type + +"func" @keyword.function + +"return" @keyword.return + +"go" @keyword.coroutine + +"for" @keyword.repeat + +[ + "import" + "package" +] @keyword.import + +[ + "else" + "case" + "switch" + "if" +] @keyword.conditional + +; Builtin types +[ + "chan" + "map" +] @type.builtin + +((type_identifier) @type.builtin + (#any-of? @type.builtin + "any" "bool" "byte" "comparable" "complex128" "complex64" "error" "float32" "float64" "int" + "int16" "int32" "int64" "int8" "rune" "string" "uint" "uint16" "uint32" "uint64" "uint8" + "uintptr")) + +; Builtin functions +((identifier) @function.builtin + (#any-of? @function.builtin + "append" "cap" "clear" "close" "complex" "copy" "delete" "imag" "len" "make" "max" "min" "new" + "panic" "print" "println" "real" "recover")) + +; Delimiters +"." @punctuation.delimiter + +"," @punctuation.delimiter + +":" @punctuation.delimiter + +";" @punctuation.delimiter + +"(" @punctuation.bracket + +")" @punctuation.bracket + +"{" @punctuation.bracket + +"}" @punctuation.bracket + +"[" @punctuation.bracket + +"]" @punctuation.bracket + +; Literals +(interpreted_string_literal) @string + +(raw_string_literal) @string + +(rune_literal) @string + +(escape_sequence) @string.escape + +(int_literal) @number + +(float_literal) @number.float + +(imaginary_literal) @number + +[ + (true) + (false) +] @boolean + +[ + (nil) + (iota) +] @constant.builtin + +(keyed_element + . + (literal_element + (identifier) @variable.member)) + +(field_declaration + name: (field_identifier) @variable.member) + +; Comments +(comment) @comment @spell + +; Doc Comments +(source_file + . + (comment)+ @comment.documentation) + +(source_file + (comment)+ @comment.documentation + . + (const_declaration)) + +(source_file + (comment)+ @comment.documentation + . + (function_declaration)) + +(source_file + (comment)+ @comment.documentation + . + (type_declaration)) + +(source_file + (comment)+ @comment.documentation + . + (var_declaration)) + +; Spell +((interpreted_string_literal) @spell + (#not-has-parent? @spell import_spec)) + +; Regex +(call_expression + (selector_expression) @_function + (#any-of? @_function + "regexp.Match" "regexp.MatchReader" "regexp.MatchString" "regexp.Compile" "regexp.CompilePOSIX" + "regexp.MustCompile" "regexp.MustCompilePOSIX") + (argument_list + . + [ + (raw_string_literal + (raw_string_literal_content) @string.regexp) + (interpreted_string_literal + (interpreted_string_literal_content) @string.regexp) + ]))