open-moonvit

open-moonvit Price (MOONVIT)

Pool	Dex	Age	Price	TXNS	Volume	5M	1H	24H	FDV
MOONVIT/SOL		18h	$0.0₅1594	10	$957.67	-	-	-46.97%	$1.59K

About open-moonvit

This is an ultra-simple, single-file PyTorch implementation of MoonViT, the native-resolution vision encoder from Kimi-VL. ## MoonViT - Pytorch ![Model Architecture](model_arch.png)

This is an ultra-simple, single-file PyTorch implementation of MoonViT, the native-resolution vision encoder from Kimi-VL. I implemented this model because I think it's a great ViT variation with the ability to ingest images of dynamic sizes and resolutions at scale. ## Install ```bash $ pip install open-moonvit ``` Or from source: ```bash $ git clone https://github.com/kyegomez/open-moonvit $ cd open-moonvit $ pip install -e . ``` FlashAttention is optional. If `flash_attn` is importable and you're on CUDA, the var-length kernel is used automatically. Otherwise a block-diagonal SDPA fallback runs on CPU / MPS / CUDA with no extra dependencies. ```bash $ pip install flash-attn --no-build-isolation # optional ``` ## Usage ```python import torch from open_moonvit import MoonViT, MoonViTConfig, MLPProjector encoder = MoonViT(MoonViTConfig()) # ~413M params, SigLIP-SO-400M defaults # a batch of images at different resolutions, no padding, no resizing images = [ torch.randn(3, 224, 280), torch.randn(3, 140, 196), torch.randn(3, 336, 336), ] out = encoder(images) out.last_hidden_state # (L_total, 1152) packed patch tokens out.cu_seqlens # (4,) int32 image boundaries in the packed seq out.grid_shapes # [(16,20), (10,14), (24,24)] ``` To feed an LLM, compose with the MLP projector (2×2 pixel-shuffle then a two-layer MLP): ```python projector = MLPProjector( vision_hidden_size = 1152, llm_hidden_size = 2048, ) tokens, grids, cu = projector(out.last_hidden_state, out.grid_shapes, out.cu_seqlens) tokens.shape # (L_total // 4, 2048) ``` ## How it works ```mermaid flowchart TD A([Images]) --> B[MoonViTPatchEmbed] B --> C[AbsolutePosEmbedInterpolator] subgraph enc[MoonViT] C --> D[MoonViTEncoderLayer × 27] R[RotaryEmbedding2D] -.-> D D --> LN[post LayerNorm] end LN --> PS[PixelShuffle2x] subgraph proj[MLPProjector] PS --> MLP[Linear · Act · Linear] end MLP --> OUT([LLM Tokens]) ``` Four things to internalize: 1. **Packing, not padding.** Images of different shapes become one long sequence. No wasted compute on pad tokens. 2. **Two positional embeddings, added together.** The paper i

MOONVIT/SOL on Meteora DBC

open-moonvit (MOONVIT)

Token Statistics

Price Changes

Related Tokens on Solana

Liquidity Pools

open-moonvit

About open-moonvit