init: starting up the treesitter parsing

This commit is contained in:
Hayden Hargreaves 2026-04-06 22:31:40 -07:00
parent bc08213f07
commit f96c1c1302
6 changed files with 1193 additions and 1 deletions

View File

@ -0,0 +1,525 @@
# Tree-sitter Highlighting Implementation Plan
This document is the working plan for replacing Chroma-based highlighting with a Tree-sitter-first syntax system in Gim.
The current renderer in `internal/editor/view.go` is tightly coupled to Chroma and computes syntax styles during rendering.
That is the opposite of the architecture Tree-sitter wants. Tree-sitter works best when parsing and highlighting are
maintained as buffer state and rendering only consumes cached results.
This plan assumes:
- Chroma will be removed entirely.
- The renderer can be rebuilt to better fit the new syntax model.
- We are willing to do a full-buffer parse and full rehighlight first, then optimize incrementally.
- Correct architecture matters more than preserving the current render pipeline.
---
## Project Goal
Build a syntax system where:
- each buffer owns syntax state
- Tree-sitter parsing is maintained across edits
- highlights are cached outside the renderer
- the renderer consumes precomputed style data
- byte-oriented parser results are converted into rune-oriented render data
---
## Success Criteria
- [ ] `internal/editor/view.go` does not directly call Chroma or Tree-sitter
- [ ] Chroma is fully removed from the codebase
- [ ] syntax state exists independently from rendering
- [ ] each buffer can be parsed and highlighted through Tree-sitter
- [ ] the renderer reads cached highlight data for visible lines
- [ ] edits invalidate and recompute syntax state
- [ ] the system handles UTF-8 text correctly
- [ ] multi-line captures work correctly
- [ ] incremental parsing exists for normal text edits
- [ ] syntax-related behavior has focused tests
---
## Architectural Direction
The target data flow is:
`Buffer -> Syntax Engine -> Highlight Cache -> Renderer`
Not:
`Renderer -> Parse -> Highlight -> Draw`
Core separation of concerns:
- `internal/core`
Holds text and buffer mutation behavior.
- `internal/syntax`
Owns parser state, queries, highlight cache, invalidation, and update logic.
- `internal/style`
Owns theme mapping from capture names to `lipgloss.Style`.
- `internal/editor`
Owns rendering, cursor, selection overlay, gutters, statusline, and viewport logic.
---
## Key Constraints And Risks
### Byte vs Rune Indexing
Tree-sitter reports positions in bytes.
The editor currently renders by runes.
This means the syntax engine must own conversion from byte-based capture ranges to rune-based render ranges. This conversion should never be spread across the renderer.
- [ ] define one internal representation for parser/query positions
- [ ] define one internal representation for render positions
- [ ] keep conversion logic isolated inside `internal/syntax`
### Multi-line Captures
Strings, comments, and some language constructs can span lines.
- [ ] highlight cache supports ranges spanning multiple lines
- [ ] renderer can consume per-line results from multi-line captures
### Query Precedence
Tree-sitter queries can produce overlapping captures.
- [ ] define deterministic precedence rules
- [ ] document how broad captures and specific captures are resolved
### Full Parse First, Incremental Later
The initial version does not need to be optimal.
- [ ] initial version can parse and rehighlight the full buffer
- [ ] follow-up version uses `tree.Edit`, old trees, and changed ranges
---
## Target Package Layout
Planned package layout:
- [ ] `internal/syntax/types.go`
- [ ] `internal/syntax/engine.go`
- [ ] `internal/syntax/state.go`
- [ ] `internal/syntax/registry.go`
- [ ] `internal/syntax/treesitter.go`
- [ ] `internal/syntax/query.go`
- [ ] `internal/syntax/cache.go`
- [ ] `internal/style/theme.go` or equivalent capture-to-style mapping helpers
Likely existing files to update:
- [ ] `internal/editor/model.go`
- [ ] `internal/editor/model_builder.go`
- [ ] `internal/editor/view.go`
- [ ] `internal/core/buffer.go`
- [ ] `internal/command/handlers.go`
- [ ] `go.mod`
Likely files to remove or heavily reduce:
- [ ] Chroma-specific logic in `internal/style/style.go`
- [ ] direct Chroma setup in editor model builders and command handlers
---
## Data Model Plan
### 1. Syntax Engine
The syntax engine should be editor-facing and buffer-aware.
Responsibilities:
- attach syntax state to buffers
- initialize parser and query data from filetype
- reparse after edits
- maintain dirty regions or dirty lines
- build cached line highlight results
- expose line results to the renderer
Checklist:
- [ ] define `Engine` interface in `internal/syntax/engine.go`
- [ ] decide whether syntax state is owned directly by the engine or attached to buffers
- [ ] add a field on `editor.Model` for the syntax engine
### 2. Per-buffer Syntax State
Each buffer needs syntax state. The important point is that syntax is buffer-level, not window-level.
Suggested fields:
- [ ] parser
- [ ] language
- [ ] query or compiled query set
- [ ] current parse tree
- [ ] source snapshot or source builder access
- [ ] dirty line or dirty range tracking
- [ ] cached line highlight results
- [ ] version counter for cache invalidation
### 3. Highlight Cache Representation
Start with the representation that makes integration easiest.
Recommended first version:
- cached per-line `[]lipgloss.Style`
Recommended longer-term representation:
- cached per-line spans like `[]Span{StartRune, EndRune, StyleID}`
Implementation choice:
- [ ] phase 1 uses per-rune style maps for easiest renderer integration
- [ ] phase 2 evaluates switching internal cache to spans
### 4. Theme Mapping
Theme logic should map Tree-sitter captures such as `keyword`, `function`, `string`, `comment`, and `type.builtin` to `lipgloss.Style`.
Checklist:
- [ ] create capture-name to style mapping layer
- [ ] support fallback from specific captures to broader categories
- [ ] keep theme logic independent from parser/query logic
---
## Phased Implementation Plan
## Phase 0: Cleanly Commit To Tree-sitter
Purpose:
Remove architectural assumptions that only make sense for Chroma.
Tasks:
- [ ] decide the initial supported filetypes for Tree-sitter
- [ ] decide where query files live and how they are loaded
- [ ] decide whether `main.go` demo code should be removed or moved to a more explicit demo location
- [ ] audit Chroma references in the repo
- [ ] list all codepaths that currently construct or depend on `style.ChromaStyle`
Done when:
- [ ] there is a clear inventory of Chroma-coupled code
- [ ] there is a clear inventory of Tree-sitter assets to load per language
## Phase 1: Introduce Syntax As A Real Subsystem
Purpose:
Create the new architecture boundary before changing rendering behavior.
Tasks:
- [ ] create `internal/syntax`
- [ ] define the engine interface
- [ ] add a syntax engine field to `editor.Model`
- [ ] initialize the syntax engine in model construction
- [ ] remove direct highlighting calls from `view.go`
- [ ] route visible line highlighting through the syntax engine
Done when:
- [ ] `view.go` asks the syntax subsystem for line highlight data
- [ ] syntax work no longer begins inside the render loop itself
## Phase 2: Define Buffer Text Access And Edit Notifications
Purpose:
Make buffer mutations visible to the syntax system in a structured way.
Tasks:
- [ ] decide whether edits are emitted from `core.Buffer` or from editor actions
- [ ] define an internal edit event type
- [ ] include enough data for Tree-sitter incremental edits later
- [ ] wire `SetLine`, `InsertLine`, and `DeleteLine` changes into syntax invalidation
- [ ] decide whether first version uses whole-buffer invalidation
Suggested edit event fields:
- [ ] start byte
- [ ] old end byte
- [ ] new end byte
- [ ] start point
- [ ] old end point
- [ ] new end point
- [ ] affected line range
Done when:
- [ ] syntax invalidation happens when text changes
- [ ] invalidation does not depend on the render loop noticing text changed
## Phase 3: Build Minimal Tree-sitter Registry And Loader
Purpose:
Provide one place that maps filetypes to languages and queries.
Tasks:
- [ ] create a registry for language metadata
- [ ] map filetype strings to Tree-sitter language bindings
- [ ] map filetypes to highlight query file paths
- [ ] load and compile queries once per language where practical
- [ ] define behavior for unsupported filetypes
Done when:
- [ ] opening a supported buffer can resolve a language and query set
- [ ] unsupported buffers degrade cleanly without crashing the renderer
## Phase 4: Implement Full-buffer Parsing And Full-buffer Highlighting
Purpose:
Get correct Tree-sitter highlighting working before optimizing.
Tasks:
- [ ] create per-buffer syntax state
- [ ] build full source text from buffer contents
- [ ] parse full source text into a tree
- [ ] run highlight query across the full tree
- [ ] collect captures in deterministic order
- [ ] resolve overlapping captures consistently
- [ ] convert capture byte ranges into per-line rune-based style maps
- [ ] cache line results for renderer consumption
Done when:
- [ ] a supported filetype can be fully highlighted without Chroma
- [ ] renderer uses cached line results from Tree-sitter
## Phase 5: Rebuild Renderer Integration Around Cached Syntax Data
Purpose:
Simplify the renderer so it consumes syntax cache rather than doing syntax work.
Tasks:
- [ ] redesign line render input around line text plus syntax cache
- [ ] ensure gutter rendering stays independent from syntax rendering
- [ ] ensure cursor overlay works on top of syntax styling
- [ ] ensure visual selection overlay works on top of syntax styling
- [ ] verify blank lines and end-of-line cursor rendering still behave correctly
- [ ] verify window width padding still uses background style consistently
Done when:
- [ ] line drawing is purely a render operation
- [ ] no parser or query logic exists in `view.go`
## Phase 6: Remove Chroma Completely
Purpose:
Delete the old highlighting path and simplify styling around capture-based theming.
Tasks:
- [ ] remove Chroma dependencies from `go.mod`
- [ ] remove `GetLexer`
- [ ] remove `MakeStyleMap`
- [ ] remove `Styles.ChromaStyle` if no longer needed
- [ ] replace Chroma-derived theme extraction with explicit Gim theme definitions
- [ ] update commands that currently switch Chroma styles
Done when:
- [ ] the build no longer depends on Chroma packages
- [ ] no codepath references Chroma tokens, lexers, or styles
## Phase 7: Add Incremental Parsing
Purpose:
Move from correct-but-simple to correct-and-efficient.
Tasks:
- [ ] preserve old trees per buffer
- [ ] call `tree.Edit` before reparsing
- [ ] parse new content using the old tree
- [ ] compute changed ranges
- [ ] decide whether rehighlighting happens by changed byte range, changed point range, or affected line range
- [ ] update only changed cache regions
- [ ] verify cache invalidation around inserted and deleted lines
Done when:
- [ ] small edits do not require full-buffer reparsing and rehighlighting
- [ ] highlighting updates correctly after insertions, deletions, joins, and splits
## Phase 8: Improve Cache Representation If Needed
Purpose:
Reduce memory churn and simplify overlay logic if per-rune style maps become too heavy.
Tasks:
- [ ] measure cost of per-line `[]lipgloss.Style`
- [ ] consider switching internal storage to spans
- [ ] keep renderer-facing API stable if possible
- [ ] optimize only after correctness and incremental behavior exist
Done when:
- [ ] cache format is deliberate rather than inherited from the old renderer
## Phase 9: Expand Language Support
Purpose:
Generalize the system after the first language works well.
Tasks:
- [ ] ship one language first, likely Go
- [ ] add additional language bindings and queries one by one
- [ ] verify filetype detection and registry behavior for each language
- [ ] define how language-specific capture tweaks are handled
Done when:
- [ ] the system can scale beyond a single demo language without architectural changes
## Phase 10: Testing And Verification
Purpose:
Make syntax behavior trustworthy as the engine evolves.
Tasks:
- [ ] add unit tests for registry lookup
- [ ] add unit tests for byte-to-rune range conversion
- [ ] add unit tests for overlapping capture resolution
- [ ] add unit tests for multi-line highlight extraction
- [ ] add integration tests for visible rendering of highlighted lines
- [ ] add edit tests for incremental updates after insert, delete, split, and join operations
- [ ] add tests covering UTF-8 characters and mixed-width content
Done when:
- [ ] syntax bugs can be reproduced and locked down with tests
---
## Suggested Order Of Attack
If working on this piece by piece, this is the recommended order:
- [ ] Phase 1 first
- [ ] Phase 2 second
- [ ] Phase 3 third
- [ ] Phase 4 fourth
- [ ] Phase 5 fifth
- [ ] Phase 6 sixth
- [ ] Phase 7 seventh
- [ ] Phase 10 continuously during all phases
- [ ] Phase 8 only if profiling says it matters
- [ ] Phase 9 after one language is solid
---
## Concrete First Milestone
The first milestone should be intentionally small but architectural.
Milestone goal:
- [ ] create `internal/syntax`
- [ ] add syntax engine field to `editor.Model`
- [ ] make `view.go` consume syntax results instead of computing syntax itself
- [ ] use placeholder or basic full-buffer syntax data, even if the first output is minimal
This milestone matters because it breaks the most important bad dependency: rendering owning syntax.
---
## Concrete Second Milestone
Milestone goal:
- [ ] support one language with Tree-sitter full-buffer parse and full-buffer highlighting
- [ ] cache per-line style results
- [ ] render highlighted output without Chroma
---
## Concrete Third Milestone
Milestone goal:
- [ ] wire edit invalidation into buffer mutation paths
- [ ] update Tree-sitter state after edits
- [ ] keep highlights correct after normal editing commands
---
## Concrete Fourth Milestone
Milestone goal:
- [ ] add true incremental parse updates
- [ ] rehighlight only changed regions
- [ ] validate performance on larger files
---
## Open Design Questions
- [ ] Should syntax state live inside `core.Buffer` or stay in the syntax engine keyed by buffer ID?
- [ ] Should the renderer consume per-rune styles or span-based styles?
- [ ] Should the syntax engine rebuild full source text on demand, or should buffers expose a stable full-text API?
- [ ] How should unsupported filetypes render: plain text or fallback queryless token classes?
- [ ] Should theme capture fallback be static or configurable?
- [ ] Should parser/query assets be embedded or read from disk at runtime?
---
## Notes For Implementation
Guidelines while building this:
- [ ] keep parsing and rendering separate from the first commit
- [ ] optimize only after correctness is established
- [ ] prefer one supported language done correctly over several partial languages
- [ ] keep UTF-8 correctness in mind from the first Tree-sitter integration
- [ ] avoid letting temporary renderer hacks become permanent API boundaries
- [ ] test line split, line join, backspace-at-start, delete-at-end, and multi-line comments early
---
## Definition Of Done
This project is done when all of the following are true:
- [ ] Chroma is gone
- [ ] Tree-sitter is the only syntax engine
- [ ] syntax state is maintained outside rendering
- [ ] edits update syntax state correctly
- [ ] renderer consumes cached syntax data cleanly
- [ ] highlight output is correct for supported languages
- [ ] UTF-8 behavior is correct
- [ ] incremental parsing is working
- [ ] tests cover the risky pieces

View File

@ -41,7 +41,7 @@
export GOOS=linux
export GOARCH=amd64
export CGO_CFLAGS=-Wno-error=cpp;
export CGO_ENABLED=0
export CGO_ENABLED=1
# Exec zsh to replace the current shell process with zsh.
# This ensures your prompt and zsh configurations load correctly.

4
go.mod
View File

@ -7,6 +7,8 @@ require (
github.com/charmbracelet/bubbletea v1.3.10
github.com/charmbracelet/lipgloss v1.1.0
github.com/charmbracelet/x/exp/teatest v0.0.0-20260209132835-6b065b8ba62c
github.com/tree-sitter/go-tree-sitter v0.25.0
github.com/tree-sitter/tree-sitter-javascript v0.25.0
)
require (
@ -25,11 +27,13 @@ require (
github.com/lucasb-eyer/go-colorful v1.3.0 // indirect
github.com/mattn/go-isatty v0.0.20 // indirect
github.com/mattn/go-localereader v0.0.1 // indirect
github.com/mattn/go-pointer v0.0.1 // indirect
github.com/mattn/go-runewidth v0.0.19 // indirect
github.com/muesli/ansi v0.0.0-20230316100256-276c6243b2f6 // indirect
github.com/muesli/cancelreader v0.2.2 // indirect
github.com/muesli/termenv v0.16.0 // indirect
github.com/rivo/uniseg v0.4.7 // indirect
github.com/tree-sitter/tree-sitter-go v0.25.0 // indirect
github.com/xo/terminfo v0.0.0-20220910002029-abceb7e1c41e // indirect
golang.org/x/sys v0.38.0 // indirect
golang.org/x/text v0.28.0 // indirect

36
go.sum
View File

@ -30,6 +30,8 @@ github.com/clipperhouse/stringish v0.1.1 h1:+NSqMOr3GR6k1FdRhhnXrLfztGzuG+VuFDfa
github.com/clipperhouse/stringish v0.1.1/go.mod h1:v/WhFtE1q0ovMta2+m+UbpZ+2/HEXNWYXQgCt4hdOzA=
github.com/clipperhouse/uax29/v2 v2.5.0 h1:x7T0T4eTHDONxFJsL94uKNKPHrclyFI0lm7+w94cO8U=
github.com/clipperhouse/uax29/v2 v2.5.0/go.mod h1:Wn1g7MK6OoeDT0vL+Q0SQLDz/KpfsVRgg6W7ihQeh4g=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/dlclark/regexp2 v1.11.5 h1:Q/sSnsKerHeCkc/jSTNq1oCm7KiVgUMZRDUoRu0JQZQ=
github.com/dlclark/regexp2 v1.11.5/go.mod h1:DHkYz0B9wPfa6wondMfaivmHpzrQ3v9q8cnmRbL6yW8=
github.com/erikgeiser/coninput v0.0.0-20211004153227-1c3628e74d0f h1:Y/CXytFA4m6baUTXGLOoWe4PQhGxaX0KpnayAqC48p4=
@ -42,6 +44,8 @@ github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWE
github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
github.com/mattn/go-localereader v0.0.1 h1:ygSAOl7ZXTx4RdPYinUpg6W99U8jWvWi9Ye2JC/oIi4=
github.com/mattn/go-localereader v0.0.1/go.mod h1:8fBrzywKY7BI3czFoHkuzRoWE9C+EiG4R1k4Cjx5p88=
github.com/mattn/go-pointer v0.0.1 h1:n+XhsuGeVO6MEAp7xyEukFINEa+Quek5psIR/ylA6o0=
github.com/mattn/go-pointer v0.0.1/go.mod h1:2zXcozF6qYGgmsG+SeTZz3oAbFLdD3OWqnUbNvJZAlc=
github.com/mattn/go-runewidth v0.0.19 h1:v++JhqYnZuu5jSKrk9RbgF5v4CGUjqRfBm05byFGLdw=
github.com/mattn/go-runewidth v0.0.19/go.mod h1:XBkDxAl56ILZc9knddidhrOlY5R/pDhgLpndooCuJAs=
github.com/muesli/ansi v0.0.0-20230316100256-276c6243b2f6 h1:ZK8zHtRHOkbHy6Mmr5D264iyp3TiX5OmNcI5cIARiQI=
@ -50,8 +54,38 @@ github.com/muesli/cancelreader v0.2.2 h1:3I4Kt4BQjOR54NavqnDogx/MIoWBFa0StPA8ELU
github.com/muesli/cancelreader v0.2.2/go.mod h1:3XuTXfFS2VjM+HTLZY9Ak0l6eUKfijIfMUZ4EgX0QYo=
github.com/muesli/termenv v0.16.0 h1:S5AlUN9dENB57rsbnkPyfdGuWIlkmzJjbFf0Tf5FWUc=
github.com/muesli/termenv v0.16.0/go.mod h1:ZRfOIKPFDYQoDFF4Olj7/QJbW60Ol/kL1pU3VfY/Cnk=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/rivo/uniseg v0.4.7 h1:WUdvkW8uEhrYfLC4ZzdpI2ztxP1I582+49Oc5Mq64VQ=
github.com/rivo/uniseg v0.4.7/go.mod h1:FN3SvrM+Zdj16jyLfmOkMNblXMcoc8DfTHruCPUcx88=
github.com/stretchr/testify v1.10.0 h1:Xv5erBjTwe/5IxqUQTdXv5kgmIvbHo3QQyRwhJsOfJA=
github.com/stretchr/testify v1.10.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY=
github.com/tree-sitter/go-tree-sitter v0.25.0 h1:sx6kcg8raRFCvc9BnXglke6axya12krCJF5xJ2sftRU=
github.com/tree-sitter/go-tree-sitter v0.25.0/go.mod h1:r77ig7BikoZhHrrsjAnv8RqGti5rtSyvDHPzgTPsUuU=
github.com/tree-sitter/tree-sitter-c v0.23.4 h1:nBPH3FV07DzAD7p0GfNvXM+Y7pNIoPenQWBpvM++t4c=
github.com/tree-sitter/tree-sitter-c v0.23.4/go.mod h1:MkI5dOiIpeN94LNjeCp8ljXN/953JCwAby4bClMr6bw=
github.com/tree-sitter/tree-sitter-cpp v0.23.4 h1:LaWZsiqQKvR65yHgKmnaqA+uz6tlDJTJFCyFIeZU/8w=
github.com/tree-sitter/tree-sitter-cpp v0.23.4/go.mod h1:doqNW64BriC7WBCQ1klf0KmJpdEvfxyXtoEybnBo6v8=
github.com/tree-sitter/tree-sitter-embedded-template v0.23.2 h1:nFkkH6Sbe56EXLmZBqHHcamTpmz3TId97I16EnGy4rg=
github.com/tree-sitter/tree-sitter-embedded-template v0.23.2/go.mod h1:HNPOhN0qF3hWluYLdxWs5WbzP/iE4aaRVPMsdxuzIaQ=
github.com/tree-sitter/tree-sitter-go v0.25.0 h1:cEB0Q3LHgZtS+ECHx9wcP7AwzoOddJFQCVmytX42cVU=
github.com/tree-sitter/tree-sitter-go v0.25.0/go.mod h1:Jrx8QqYN0v7npv1fJRH1AznddllYiCMUChtVjxPK040=
github.com/tree-sitter/tree-sitter-html v0.23.2 h1:1UYDV+Yd05GGRhVnTcbP58GkKLSHHZwVaN+lBZV11Lc=
github.com/tree-sitter/tree-sitter-html v0.23.2/go.mod h1:gpUv/dG3Xl/eebqgeYeFMt+JLOY9cgFinb/Nw08a9og=
github.com/tree-sitter/tree-sitter-java v0.23.5 h1:J9YeMGMwXYlKSP3K4Us8CitC6hjtMjqpeOf2GGo6tig=
github.com/tree-sitter/tree-sitter-java v0.23.5/go.mod h1:NRKlI8+EznxA7t1Yt3xtraPk1Wzqh3GAIC46wxvc320=
github.com/tree-sitter/tree-sitter-javascript v0.25.0 h1:ZkWETb66/w8cc13yhfnNuHOLDQWl3BnKlH6f9AdR88c=
github.com/tree-sitter/tree-sitter-javascript v0.25.0/go.mod h1:lmGD1EJdCA+v0S1u2fFgepMg/opzSg/4pgFym2FPGAs=
github.com/tree-sitter/tree-sitter-json v0.24.8 h1:tV5rMkihgtiOe14a9LHfDY5kzTl5GNUYe6carZBn0fQ=
github.com/tree-sitter/tree-sitter-json v0.24.8/go.mod h1:F351KK0KGvCaYbZ5zxwx/gWWvZhIDl0eMtn+1r+gQbo=
github.com/tree-sitter/tree-sitter-php v0.23.11 h1:iHewsLNDmznh8kgGyfWfujsZxIz1YGbSd2ZTEM0ZiP8=
github.com/tree-sitter/tree-sitter-php v0.23.11/go.mod h1:T/kbfi+UcCywQfUNAJnGTN/fMSUjnwPXA8k4yoIks74=
github.com/tree-sitter/tree-sitter-python v0.23.6 h1:qHnWFR5WhtMQpxBZRwiaU5Hk/29vGju6CVtmvu5Haas=
github.com/tree-sitter/tree-sitter-python v0.23.6/go.mod h1:cpdthSy/Yoa28aJFBscFHlGiU+cnSiSh1kuDVtI8YeM=
github.com/tree-sitter/tree-sitter-ruby v0.23.1 h1:T/NKHUA+iVbHM440hFx+lzVOzS4dV6z8Qw8ai+72bYo=
github.com/tree-sitter/tree-sitter-ruby v0.23.1/go.mod h1:kUS4kCCQloFcdX6sdpr8p6r2rogbM6ZjTox5ZOQy8cA=
github.com/tree-sitter/tree-sitter-rust v0.23.2 h1:6AtoooCW5GqNrRpfnvl0iUhxTAZEovEmLKDbyHlfw90=
github.com/tree-sitter/tree-sitter-rust v0.23.2/go.mod h1:hfeGWic9BAfgTrc7Xf6FaOAguCFJRo3RBbs7QJ6D7MI=
github.com/xo/terminfo v0.0.0-20220910002029-abceb7e1c41e h1:JVG44RsyaB9T2KIHavMF/ppJZNG9ZpyihvCd0w101no=
github.com/xo/terminfo v0.0.0-20220910002029-abceb7e1c41e/go.mod h1:RbqR21r5mrJuqunuUZ/Dhy/avygyECGrLceyNeo4LiM=
golang.org/x/exp v0.0.0-20231006140011-7918f672742d h1:jtJma62tbqLibJ5sFQz8bKtEM8rJBtfilJ2qTU199MI=
@ -62,3 +96,5 @@ golang.org/x/sys v0.38.0 h1:3yZWxaJjBmCWXqhN1qh02AkOnCQ1poK6oF+a7xWL6Gc=
golang.org/x/sys v0.38.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
golang.org/x/text v0.28.0 h1:rhazDwis8INMIwQ4tpjLDzUhx6RlXqZNPEM0huQojng=
golang.org/x/text v0.28.0/go.mod h1:U8nCwOR8jO/marOQ0QbDiOngZVEBB7MAiitBuMjXiNU=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=

373
main.go Normal file
View File

@ -0,0 +1,373 @@
package main
import (
"fmt"
"os"
"sort"
"strings"
sitter "github.com/tree-sitter/go-tree-sitter"
ts_go "github.com/tree-sitter/tree-sitter-go/bindings/go"
)
// Sample Go source to highlight
const source = `
package main
func main () {
println("Hello" + 5)
}
`
type Highlight struct {
StartRow uint // 0-indexed line number
StartCol uint // 0-indexed column (bytes)
EndRow uint
EndCol uint
Capture string
}
// Theme maps capture names to ANSI escape codes.
// In your editor you'd use lipgloss styles instead.
var theme = map[string]string{
"keyword": "\033[1;35m", // bold magenta
"keyword.type": "\033[1;35m",
"keyword.function": "\033[1;35m",
"keyword.return": "\033[1;35m",
"keyword.coroutine": "\033[1;35m",
"keyword.repeat": "\033[1;35m",
"keyword.import": "\033[1;35m",
"keyword.conditional": "\033[1;35m",
"type": "\033[33m", // yellow
"type.builtin": "\033[33m",
"type.definition": "\033[1;33m", // bold yellow
"function": "\033[1;34m", // bold blue
"function.call": "\033[34m", // blue
"function.method": "\033[34m",
"function.method.call": "\033[34m",
"function.builtin": "\033[1;31m",
"variable": "\033[37m", // white
"variable.parameter": "\033[3;37m", // italic white
"variable.member": "\033[37m",
"constant": "\033[1;36m", // bold cyan
"constant.builtin": "\033[1;36m",
"string": "\033[32m", // green
"string.escape": "\033[1;32m",
"number": "\033[36m", // cyan
"number.float": "\033[36m",
"boolean": "\033[36m",
"operator": "\033[93m", // bright yellow
"comment": "\033[2;37m", // dim
"comment.documentation": "\033[2;37m",
"module": "\033[35m", // magenta
"label": "\033[33m",
"property": "\033[37m",
"constructor": "\033[1;33m",
"punctuation.delimiter": "\033[37m",
"punctuation.bracket": "\033[37m",
}
const reset = "\033[0m"
func main() {
code := []byte(source)
lines := strings.Split(source, "\n")
// --- Step 1: Parse ---
lang := sitter.NewLanguage(ts_go.Language())
parser := sitter.NewParser()
defer parser.Close()
parser.SetLanguage(lang)
tree := parser.Parse(code, nil)
defer tree.Close()
root := tree.RootNode()
// --- Step 2: Load query from highlights.scm ---
queryBytes, err := os.ReadFile("queries/go/highlights.scm")
if err != nil {
fmt.Fprintf(os.Stderr, "Failed to read highlights.scm: %v\n", err)
return
}
query, queryErr := sitter.NewQuery(lang, string(queryBytes))
if queryErr != nil {
fmt.Fprintf(os.Stderr, "Query error: %v\n", queryErr)
return
}
defer query.Close()
// --- Step 3: Run query ---
cursor := sitter.NewQueryCursor()
defer cursor.Close()
captures := cursor.Captures(query, root, code)
var highlights []Highlight
for match, captureIdx := captures.Next(); match != nil; match, captureIdx = captures.Next() {
capture := match.Captures[captureIdx]
captureName := query.CaptureNames()[capture.Index]
// Skip @spell — it's a nvim spellcheck hint, not a highlight
if captureName == "spell" {
continue
}
node := capture.Node
start := node.StartPosition()
end := node.EndPosition()
highlights = append(highlights, Highlight{
StartRow: start.Row,
StartCol: start.Column,
EndRow: end.Row,
EndCol: end.Column,
Capture: captureName,
})
}
// --- Step 4: Show captures with positions ---
fmt.Println("=== Captures (row:col → row:col) ===")
for _, h := range highlights {
// Extract text for display using the source lines
text := extractText(lines, h)
fmt.Printf(" %d:%-2d → %d:%-2d @%-22s %q\n",
h.StartRow, h.StartCol, h.EndRow, h.EndCol, h.Capture, text)
}
fmt.Println()
// --- Step 5: Render with colors using row:col positions ---
// Build a per-line map of column ranges to capture names.
// Sort so wider (less specific) ranges come first — last writer wins.
sort.Slice(highlights, func(i, j int) bool {
if highlights[i].StartRow == highlights[j].StartRow {
if highlights[i].StartCol == highlights[j].StartCol {
// Wider range first so more specific overwrites it
if highlights[i].EndRow == highlights[j].EndRow {
return highlights[i].EndCol > highlights[j].EndCol
}
return highlights[i].EndRow > highlights[j].EndRow
}
return highlights[i].StartCol < highlights[j].StartCol
}
return highlights[i].StartRow < highlights[j].StartRow
})
// captureAt[row][col] = capture name (last writer wins)
captureAt := make(map[uint]map[uint]string)
for _, h := range highlights {
for row := h.StartRow; row <= h.EndRow; row++ {
if captureAt[row] == nil {
captureAt[row] = make(map[uint]string)
}
startCol := uint(0)
if row == h.StartRow {
startCol = h.StartCol
}
endCol := uint(len(lines[row]))
if row == h.EndRow {
endCol = h.EndCol
}
for col := startCol; col < endCol; col++ {
captureAt[row][col] = h.Capture
}
}
}
fmt.Println("=== Colored output ===")
printColored(lines, captureAt)
// =====================================================================
// INCREMENTAL PARSING DEMO
// =====================================================================
// When a user types in your editor, you don't re-parse the whole file.
// Instead you:
// 1. Tell the OLD tree what changed (tree.Edit)
// 2. Parse the new source, passing the old tree
// 3. Tree-sitter reuses unchanged nodes and only re-parses the edit
// 4. Use ChangedRanges to know which lines need re-highlighting
//
// This is O(edit size + log(file size)) instead of O(file size).
fmt.Println("\n========================================")
fmt.Println("=== INCREMENTAL PARSE DEMO ===")
fmt.Println("========================================")
fmt.Println()
// Simulate: user changes "Hello" → "Goodbye" on row 4
// Before: println("Hello" + 5)
// After: println("Goodbye" + 5)
oldSource := source
newSource := strings.Replace(oldSource, `"Hello"`, `"Goodbye"`, 1)
// Find where the edit happened (in a real editor you already know this
// from the keystroke — you don't need to search for it)
editStart := strings.Index(oldSource, `"Hello"`)
oldEnd := editStart + len(`"Hello"`)
newEnd := editStart + len(`"Goodbye"`)
// Convert byte offset to row:col for the InputEdit
editStartPoint := byteToPoint(oldSource, uint(editStart))
oldEndPoint := byteToPoint(oldSource, uint(oldEnd))
newEndPoint := byteToPoint(newSource, uint(newEnd))
fmt.Printf("Edit: replaced %q → %q\n", "Hello", "Goodbye")
fmt.Printf(" at byte %d, row %d col %d\n", editStart, editStartPoint.Row, editStartPoint.Column)
fmt.Println()
// Step 1: Tell the old tree what changed
tree.Edit(&sitter.InputEdit{
StartByte: uint(editStart),
OldEndByte: uint(oldEnd),
NewEndByte: uint(newEnd),
StartPosition: editStartPoint,
OldEndPosition: oldEndPoint,
NewEndPosition: newEndPoint,
})
// Step 2: Parse the new source, passing the old (edited) tree.
// Tree-sitter will REUSE all nodes that weren't affected by the edit
// and only re-parse the region around the change.
newCode := []byte(newSource)
newTree := parser.Parse(newCode, tree)
defer newTree.Close()
// Step 3: See exactly which ranges changed
changedRanges := newTree.ChangedRanges(tree)
fmt.Printf("Changed ranges: %d\n", len(changedRanges))
for i, r := range changedRanges {
fmt.Printf(" range %d: row %d:%d → row %d:%d\n",
i, r.StartPoint.Row, r.StartPoint.Column, r.EndPoint.Row, r.EndPoint.Column)
}
fmt.Println()
// In your editor, you'd ONLY re-run the highlight query on the changed
// ranges (using cursor.SetByteRange or cursor.SetPointRange), then
// update just those lines in your display. Everything else stays cached.
// For this demo, let's re-highlight the full new tree to show the result
newRoot := newTree.RootNode()
newLines := strings.Split(newSource, "\n")
cursor2 := sitter.NewQueryCursor()
defer cursor2.Close()
newCaptures := cursor2.Captures(query, newRoot, newCode)
var newHighlights []Highlight
for match, captureIdx := newCaptures.Next(); match != nil; match, captureIdx = newCaptures.Next() {
capture := match.Captures[captureIdx]
captureName := query.CaptureNames()[capture.Index]
if captureName == "spell" {
continue
}
node := capture.Node
start := node.StartPosition()
end := node.EndPosition()
newHighlights = append(newHighlights, Highlight{
StartRow: start.Row, StartCol: start.Column,
EndRow: end.Row, EndCol: end.Column,
Capture: captureName,
})
}
sort.Slice(newHighlights, func(i, j int) bool {
if newHighlights[i].StartRow == newHighlights[j].StartRow {
if newHighlights[i].StartCol == newHighlights[j].StartCol {
if newHighlights[i].EndRow == newHighlights[j].EndRow {
return newHighlights[i].EndCol > newHighlights[j].EndCol
}
return newHighlights[i].EndRow > newHighlights[j].EndRow
}
return newHighlights[i].StartCol < newHighlights[j].StartCol
}
return newHighlights[i].StartRow < newHighlights[j].StartRow
})
newCaptureAt := make(map[uint]map[uint]string)
for _, h := range newHighlights {
for row := h.StartRow; row <= h.EndRow; row++ {
if newCaptureAt[row] == nil {
newCaptureAt[row] = make(map[uint]string)
}
startCol := uint(0)
if row == h.StartRow {
startCol = h.StartCol
}
endCol := uint(len(newLines[row]))
if row == h.EndRow {
endCol = h.EndCol
}
for col := startCol; col < endCol; col++ {
newCaptureAt[row][col] = h.Capture
}
}
}
fmt.Println("=== After edit (colored output) ===")
printColored(newLines, newCaptureAt)
}
// printColored renders source lines with ANSI colors based on the capture map.
func printColored(lines []string, captureAt map[uint]map[uint]string) {
for row, line := range lines {
currentCapture := ""
for col := uint(0); col < uint(len(line)); col++ {
cap := ""
if rowMap, ok := captureAt[uint(row)]; ok {
cap = rowMap[col]
}
if cap != currentCapture {
if currentCapture != "" {
fmt.Print(reset)
}
if color, ok := theme[cap]; ok {
fmt.Print(color)
}
currentCapture = cap
}
fmt.Print(string(line[col]))
}
if currentCapture != "" {
fmt.Print(reset)
}
fmt.Println()
}
}
// byteToPoint converts a byte offset into a row:col Point.
func byteToPoint(src string, offset uint) sitter.Point {
row := uint(0)
col := uint(0)
for i := range offset {
if src[i] == '\n' {
row++
col = 0
} else {
col++
}
}
return sitter.NewPoint(row, col)
}
// extractText pulls the highlighted text from source lines using row:col positions.
func extractText(lines []string, h Highlight) string {
if h.StartRow == h.EndRow {
line := lines[h.StartRow]
end := min(h.EndCol, uint(len(line)))
return line[h.StartCol:end]
}
// Multi-line highlight (rare, but possible for block comments etc.)
var result string
for row := h.StartRow; row <= h.EndRow; row++ {
line := lines[row]
start := uint(0)
if row == h.StartRow {
start = h.StartCol
}
end := uint(len(line))
if row == h.EndRow {
end = h.EndCol
}
if result != "" {
result += "\n"
}
result += line[start:end]
}
return result
}

254
queries/go/highlights.scm Normal file
View File

@ -0,0 +1,254 @@
; Forked from tree-sitter-go
; Copyright (c) 2014 Max Brunsfeld (The MIT License)
;
; Identifiers
(type_identifier) @type
(type_spec
name: (type_identifier) @type.definition)
(field_identifier) @property
(identifier) @variable
(package_identifier) @module
(parameter_declaration
(identifier) @variable.parameter)
(variadic_parameter_declaration
(identifier) @variable.parameter)
(label_name) @label
(const_spec
name: (identifier) @constant)
; Function calls
(call_expression
function: (identifier) @function.call)
(call_expression
function: (selector_expression
field: (field_identifier) @function.method.call))
; Function definitions
(function_declaration
name: (identifier) @function)
(method_declaration
name: (field_identifier) @function.method)
(method_elem
name: (field_identifier) @function.method)
; Constructors
((call_expression
(identifier) @constructor)
(#lua-match? @constructor "^[nN]ew.+$"))
((call_expression
(identifier) @constructor)
(#lua-match? @constructor "^[mM]ake.+$"))
; Operators
[
"--"
"-"
"-="
":="
"!"
"!="
"..."
"*"
"*"
"*="
"/"
"/="
"&"
"&&"
"&="
"&^"
"&^="
"%"
"%="
"^"
"^="
"+"
"++"
"+="
"<-"
"<"
"<<"
"<<="
"<="
"="
"=="
">"
">="
">>"
">>="
"|"
"|="
"||"
"~"
] @operator
; Keywords
[
"break"
"const"
"continue"
"default"
"defer"
"goto"
"range"
"select"
"var"
"fallthrough"
] @keyword
[
"type"
"struct"
"interface"
] @keyword.type
"func" @keyword.function
"return" @keyword.return
"go" @keyword.coroutine
"for" @keyword.repeat
[
"import"
"package"
] @keyword.import
[
"else"
"case"
"switch"
"if"
] @keyword.conditional
; Builtin types
[
"chan"
"map"
] @type.builtin
((type_identifier) @type.builtin
(#any-of? @type.builtin
"any" "bool" "byte" "comparable" "complex128" "complex64" "error" "float32" "float64" "int"
"int16" "int32" "int64" "int8" "rune" "string" "uint" "uint16" "uint32" "uint64" "uint8"
"uintptr"))
; Builtin functions
((identifier) @function.builtin
(#any-of? @function.builtin
"append" "cap" "clear" "close" "complex" "copy" "delete" "imag" "len" "make" "max" "min" "new"
"panic" "print" "println" "real" "recover"))
; Delimiters
"." @punctuation.delimiter
"," @punctuation.delimiter
":" @punctuation.delimiter
";" @punctuation.delimiter
"(" @punctuation.bracket
")" @punctuation.bracket
"{" @punctuation.bracket
"}" @punctuation.bracket
"[" @punctuation.bracket
"]" @punctuation.bracket
; Literals
(interpreted_string_literal) @string
(raw_string_literal) @string
(rune_literal) @string
(escape_sequence) @string.escape
(int_literal) @number
(float_literal) @number.float
(imaginary_literal) @number
[
(true)
(false)
] @boolean
[
(nil)
(iota)
] @constant.builtin
(keyed_element
.
(literal_element
(identifier) @variable.member))
(field_declaration
name: (field_identifier) @variable.member)
; Comments
(comment) @comment @spell
; Doc Comments
(source_file
.
(comment)+ @comment.documentation)
(source_file
(comment)+ @comment.documentation
.
(const_declaration))
(source_file
(comment)+ @comment.documentation
.
(function_declaration))
(source_file
(comment)+ @comment.documentation
.
(type_declaration))
(source_file
(comment)+ @comment.documentation
.
(var_declaration))
; Spell
((interpreted_string_literal) @spell
(#not-has-parent? @spell import_spec))
; Regex
(call_expression
(selector_expression) @_function
(#any-of? @_function
"regexp.Match" "regexp.MatchReader" "regexp.MatchString" "regexp.Compile" "regexp.CompilePOSIX"
"regexp.MustCompile" "regexp.MustCompilePOSIX")
(argument_list
.
[
(raw_string_literal
(raw_string_literal_content) @string.regexp)
(interpreted_string_literal
(interpreted_string_literal_content) @string.regexp)
]))