Memories.ai, a spinout from Meta’s Reality Labs, is racing to give machines something they’ve long lacked: a durable, human‑like visual memory that can follow them through the physical world. Backed by an $8 million seed round and a high‑profile collaboration with Nvidia, the startup is positioning its “visual memory layer” as core infrastructure for the next generation of AI wearables and robots.
From Meta’s labs to a new AI layer
Memories.ai was founded by former Meta Reality Labs researcher Dr. Shawn Shen and co‑founder Enmin (Ben) Zhou, both veterans of AR/VR and computer vision work inside big tech. Their experience convinced them that while AI had become excellent at working with text and short clips, it still behaved like it had “goldfish memory” when confronted with long, continuous streams of real‑world video.
Shen has summed up the problem in a line that has since become the company’s thesis: “AI is already doing really well in the digital world. What about the physical world? AI wearables, robotics need memories as well… Ultimately, you need AI to have visual memories. We believe in that future.” That sense that memory is a missing layer not a nice‑to‑have feature pushed him out of a research role and into building a focused company.
Zhou has been equally blunt about the technical gap they are trying to close. “Today’s AI has goldfish memory. Even the best video LLMs max out at about an hour,” he said. “Memories.ai has an unlimited context window for video understanding.” In practical terms, they want AI systems to look back across hours, days or weeks of footage as naturally as a person recalling a familiar place or routine.
Inside the large visual memory model
At the core of the company’s approach is what it calls a Large Visual Memory Model, or LVMM. Instead of treating video as a series of isolated clips, the LVMM ingests continuous visual streams from cameras on glasses, robots or fixed installations and turns them into an evolving, searchable record of experience.
Zhou has described this system as “the long‑term visual memory of an AI, a deeply compressed, indexed, and retrievable store of visual experience, much like how we recall moments from the past.” The model is designed so that applications and agents can ask their own memories questions, such as: When did this object last appear, and where? How did this room or aisle look yesterday or last week? and what happened in the minutes leading up to a specific event?
That requires more than simply embedding frames into a vector database. The LVMM is built around continuity: how scenes change, how context builds, and how the timeline itself can be reconstructed on demand. The company argues that this structure allows not just fuzzy similarity search, but “pixel‑level” recall of specific moments, along with the surrounding context.
Partnering with Nvidia for “physical AI”
Memories.ai’s latest move is a collaboration with Nvidia, tapping its GPUs and AI software stacks to scale the visual memory layer across wearables and robotics. The startup is integrating Nvidia’s vision and video technologies including advanced vision‑language and video analytics tools directly into its memory pipeline.
The aim is straightforward: let AI glasses, home robots, industrial machines and other “physical AI” devices record and recall visual experiences at scale without each team having to build its own memory system. Nvidia’s infrastructure handles the heavy video processing workload; Memories.ai’s LVMM sits on top as the long‑term memory and retrieval layer.
The timing aligns with a broader industry push into physical AI, from smart glasses to humanoid robots. Many of these devices can see the world in real time, but forget almost everything they’ve seen a few minutes later. Memories.ai is betting that any serious device operating in the real world will eventually need a durable, neutral memory layer and it wants to become the default choice inside that stack.
LUCI and the data challenge
Long‑term visual memory cannot exist without long‑term visual data. Early on, the team discovered that off‑the‑shelf cameras and recorders were not giving them the kind of footage they needed: battery life, reliability and control over capture conditions all fell short. In response, Memories.ai built its own in‑house capture device, called LUCI.
LUCI is not designed to turn the company into a hardware brand. Instead, it is used by internal data collectors to gather realistic training video in warehouses, offices, homes and other everyday spaces. By owning the capture process, the team can expose its LVMM to a wide range of environments and scenarios, which is crucial if the memory layer is going to underpin safety‑critical applications in logistics, manufacturing and consumer wearables.
The existence of LUCI also highlights a strategic balance: Memories.ai wants to remain an infrastructure provider, not a gadget company, yet it still needs high‑quality, long‑form video to build and stress‑test its models. Custom capture hardware is the compromise that lets it do both.
From lost keys to safer warehouses
The use cases that Memories.ai and its backers highlight range from mundane to mission‑critical. On the consumer side, AI‑enabled glasses with access to a visual memory layer could help users answer everyday questions: Where did I leave my keys? Which door did I use last time? Where exactly did I park? Instead of a chatbot that happens to have a camera, the assistant would feel like it actually remembers what it has seen during the day.
In industrial and enterprise environments, the stakes are much higher. A warehouse robot with visual memory can learn traffic patterns across multiple shifts, study near‑miss incidents, and detect subtle layout changes that might create new safety risks. Vision‑based safety systems could reconstruct the sequence of events leading up to an accident, not just the final moment, and use that understanding to redesign processes.
Co‑founder and academic Markandey Sharma has argued that giving operators this kind of deeper context “could genuinely transform warehouse safety and productivity.” For investors, those grounded use cases are a response to critics who say the AI sector sometimes chases abstract benchmarks instead of real‑world impact.
Beyond buzzwords: more than “vector DB memory”
“Memory” has become a fashionable term in AI, often referring to storing chat histories or embedding text in a vector database. Many companies claim to have “solved memory” with these approaches, and they work reasonably well for documents or short interactions. But continuous video is different.
Memories.ai is deliberately distancing its LVMM from that pattern. Rather than handling video as disconnected snapshots, the model is built around timelines: sequences of scenes, evolving environments and the accumulation of context over long periods of time. The company says this structure allows it to retrieve not only a matching frame but the full chain of events around it and to do so in a way that can be verified against the raw footage.
In safety‑critical contexts, that verifiability is a key differentiator. When robots misbehave or incidents occur, operators need to see exactly what the system saw and how it interpreted events over time. A visual memory layer grounded in actual video, rather than loosely associated text embeddings, provides a stronger basis for auditing and trust.
Funding, competition and what comes next
Memories.ai’s $8 million seed round brought in Susa Ventures, Samsung Next, Crane Venture Partners, Fusion Fund, Seedcamp and Creator Ventures, with law firm Wilson Sonsini advising the deal. The capital is being used to deepen research on LVMM, scale infrastructure and expand integrations so that developers can access long‑term visual memory via APIs and tools instead of building their own memory stacks.
The company is entering an increasingly crowded space. Big tech firms, research labs and startups are all exploring forms of long‑term context and retrieval for AI systems. What distinguishes Memories.ai, at least in its current positioning, is its narrow focus on long‑duration visual memory and its independence from any specific hardware platform. The Nvidia partnership further anchors it inside a powerful ecosystem of robotics, autonomous systems and AI hardware.
At the same time, success will bring hard questions. Continuous visual memory touches on privacy, consent and surveillance concerns, especially in workplaces and public spaces. The same capability that can make warehouses safer or assistants more useful can also enable intrusive monitoring if deployed without safeguards. As regulators move from text‑based AI to physical, sensor‑rich systems, persistent visual memory is likely to be scrutinized closely.
For now, Memories.ai remains an infrastructure company with an outsized ambition. Its founders have been clear about their bet: “We believe visual memory is as fundamental as language models.” If that proves accurate, the visual memory layer they are building for wearables and robotics may eventually sit, quietly and invisibly, beneath a wide range of AI products shaping how machines see, remember and act in the real world, even if most users never know its name.
Comments