MemorySpine: O(1) Memory Context Extension for Large Language Models via 2-Bit Quantized Embedding Storage

Abu Saad

MemorySpine: O(1) Memory Context Extension for Large Language Models via 2-Bit Quantized Embedding Storage

Authors: Abu Saad

I present a replacement of KV cache, the "MemorySpine", a constant-memory context extension system for Large Language Models that decouples semantic storage from model architecture. Unlike KV-cache approaches whose memory grows as O(n·L·d), MemorySpine operates at O(1) memory complexity by storing embedding-level semantic fingerprints rather than per-layer attention states. I employ an orthogonal rotation matrix Ω initialized via Modified Gram-Schmidt for content-addressable hashing, ensuring uniform slot distribution with near-zero collision rates. It can theoretically have billion token context limit with just 5gb ram unlike kv cache taking 30+ram for million context in LLM.

Comments: 23 Pages.

Download: PDF

Submission history

[v1] 2026-04-05 15:39:32

Unique-IP document downloads: 43 times

ai.Vixra.org is a AI assisted e-print repository rather than a journal. Articles hosted may not yet have been verified by peer-review and should be treated as preliminary. In particular, anything that appears to include financial or legal advice or proposed medical treatments should be treated with due caution. ai.Vixra.org will not be responsible for any consequences of actions that result from any form of use of any documents on this website.

Add your own feedback and questions here:
You are equally welcome to be positive or negative about any paper but please be polite. If you are being critical you must mention at least one specific error, otherwise your comment will be deleted as unhelpful.

Artificial Intelligence

MemorySpine: O(1) Memory Context Extension for Large Language Models via 2-Bit Quantized Embedding Storage

Submission history