Changing AI math could reduce the hardware burden, researchers show
Sophisticated AI models tend to require a lot of memory and take up a lot of storage space. One of the ways to reduce that footprint involves a process called quantization, which changes how model weights are represented and stored. But quantization has its drawbacks.
Andrés Mac Allister, CEO and founder of The SEMQ Group, believes there's another way to make machine learning more efficient and less resource intensive. Instead of compressing model weights (specifically embeddings), he contends you can separate the semantics (the meaning) from how that meaning is represented.
Model weights, including embeddings (which map tokens to vectors), are the numbers in a machine learning model that determine how strongly one piece of information relates to another. Taken all together, they reflect learned behavior.
These parameters are commonly represented in Full-Precision (FP32), which requires 4 bytes per parameter. A 7B parameter model at FP32 would need about 28 GB of disk space and memory.
To save space, the model might be quantized at FP16/BF16, which requires 2 bytes per parameter. The resulting model would need about 14 GB of disk space and memory. And there are smaller quantization options like FP8, INT8/Q8, Q6, Q5, Q4, Q3, and Q2, each of which reduces the storage and memory footprint while also reducing precision – the answers get worse.
SEMQ stands for Symbolic Embedding Multi-Quantization. As described in a paper published earlier this year, SEMQ "replaces raw vectors with fixed-dimensional symbolic structures that preserve relational properties, such as relative similarity ordering and neighborhood structure, while decoupling representation from metrics, indexing, and execution semantics."
Essentially, Mac Allister has devised a way to construct a semantic abstraction layer that decouples the meaning captured in embeddings – vectors representing data – from the way that data is represented.
The operative idea is that semantic relationships depend primarily on the relative orientation of embedding vectors, so the absolute magnitude of those vectors becomes less important to preserve. That's less data to store.
The potential impact to businesses running AI workloads depends on the portion of infrastructure costs attributable to semantic state.
"An embedding is usually represented as a long vector of floating-point numbers," Mac Allister explained in an email to The Register. "In conventional embedding systems, semantic state is typically stored as a sequence of high-precision numerical coordinates. Those coordinates jointly encode both magnitude and direction in the embedding space.
"Our original question was whether a substantial part of the useful semantic information could instead be represented through the structural relationship among components, how they move relative to one another, which regions they occupy and what directional configuration they form in the overall space."
To this end, SEMQ aims to represent relative geometry rather than an enumeration of independent floating-point magnitudes.
"That matters because semantic systems generally care about relationships, similarity, neighborhood, continuity, retrieval behavior, change over time, rather than only about preserving each raw numeric value in isolation,' said Mac Allister. "The result is a portable representation of semantic state that can be reproduced, audited, compared and transferred across processes."
According to Mac Allister, initial validation tests that focused on converting the embedding-based semantic state into a deterministic .semq representation, restoring it, and evaluating the stability of retrieval and classification operations have shown good results.
"For example, in one benchmark using the Banking77 dataset from MTEB and the all-MiniLM-L6-v2 embedding model, the FP32 baseline achieved 92.26 percent accuracy. SEMQ achieved 92.27 percent effectively matching the FP32 baseline within 0.03 percentage points."
SEMQ thus did substantially better than 4-bit quantization, which registered 56.05 percent accuracy, 36.22 percentage points less than FP32.
"These are not claims that conventional quantization is universally ineffective but they show that, in this particular semantic classification setting, preserving the relevant semantic structure is materially different from simply reducing numerical precision," said Mac Allister.
Applying SEMQ can be done at the point of data ingestion – organizations can use the SDK on the vectors generated by their embedding model on their documents to encode that data as an .semq artifact – or at query time to load, query, compare, restore, and verify that encoding.
"That means a team can adopt SEMQ without replacing its LLM, embedding model, vector database or agent framework," said Mac Allister. "It can initially run alongside the existing stack as a sidecar layer, then become the representation used for selected retrieval or memory workloads."
Potential use cases, he said, include making embeddings or memory state portable across systems, reproducing semantic state across different runs or machines; auditing model changes; reducing dependence on opaque or hard-to-reproduce stateful pipelines; and diffing semantic state.
He added that the SEMQ can be extended to runtime cognitive state.
"In our research, .semq files have been used to snapshot and restore transformer KV-cache state across process boundaries," he said. "That is not a pre-training workflow either: but a runtime-state workflow for pausing, transferring and resuming an active model session."
Mac Allister isn't yet ready to talk about specific customers. He said his company is working through a Founding Design Partnership Program with organizations exploring applications in enterprise AI, retrieval, agent memory, and auditable AI workflows. This includes some AI infrastructure hyperscalers and some companies operating at the AI application layer.
"We signed NDAs with all of them, so I cannot name all of the organizations publicly yet," he said. "What I can say is that the interest has come from teams dealing with AI systems where reproducibility, state, lower infrastructure overhead, and the ability to inspect semantic behavior are operationally important. So this is a big problem for big companies." ®
What's Your Reaction?
Like
0
Dislike
0
Love
0
Funny
0
Wow
0
Sad
0
Angry
0
Comments (0)