Mansi Dhanania – Building 21

Beyond Echo: Is Generative AI Capable of Novelty?

Does an idea need to be "original" to be "new", or is novelty simply a byproduct of a broken expectation? Beyond Echo explores what it would mean for an AI system to generate ideas that are experienced as genuinely new, rather than predictable recombinations of existing patterns. The project questions whether novelty is an objective property of ideas or a perceptual one, shaped by what an observer (human or machine) already knows. Drawing on the Information Gap theory of curiosity, I investigate how expectation, surprise and unanswered questions drive human ideation, and whether similar dynamics can be meaningfully modeled in artificial systems. My project combines conceptual inquiry with experimental probes of generative and representational models, examining how ideas occupy and move through semantic embedding spaces, asking whether novelty can be induced by altering how models explore their own representational structures. Rather than treating creativity as a fixed capability, Beyond Echo treats it as an emergent property shaped by architecture, representation and interaction. The project aims to surface both the possibilities and the limits of current AI systems, and to reflect on what it would mean if machines could produce ideas that are indistinguishable from human insight. Would human creativity be compromised if ideation could be outsourced?

Where It Started

The question first took shape during my internship at A*STAR Singapore, where I studied how large language models represent compound words. My team and I tested whether models truly understood that 'black box' is not only a colored box but an abstract concept: something unknown. We built transformation vectors to capture the relational shift. Smaller models like BERT struggled; larger ones showed consistency in how they merged and transformed representations.

Publication: Nagar, Aishik, et al. "How do Transformer Embeddings Represent Compositions? A Functional Analysis." arXiv:2506.00914 (2025)

That result left me with a question I couldn't let go of: if a model can internalize subtle semantic transformations, could it also generate ideas that are genuinely new, not just statistically unlikely recombinations of its training data?

This question deepened through my master's thesis at McGill, where I build assistive AI systems for blind users navigating complex environments like grocery stores. My system can detect objects, guide users, describe items, and respond to spoken queries. And yet: it reacts, but does not reason. I kept imagining what it would look like if it could anticipate and warn a blind user about a protruding shelf edge, suggest a safer route around an obstacle, not because those examples existed in training data, but because it had developed an understanding of what 'risk' means.
‍

Defining Novelty

Before I could measure anything, I had to figure out what 'novelty' even means. I looked into existing literature on computational creativity, cognitive science, and Margaret Boden’s framework distinguishing three types of creativity:

Combinational: new mixtures of existing concepts
Exploratory: new variations within a known structure or rule set
Transformational: concepts that redefine the space of possibilities itself

But harder questions kept surfacing. Is novelty different from creativity? Does it require consciousness, lived experience, or intent? Can something be novel if it is logically flawed? If an AI generates something that seems novel once but never again, did it achieve anything? These became the spine of the project.

"True novelty, when it arrives, always comes unannounced and unlooked-for. Curiosity plays no part in it."

"Undirected curiosity is meaningless. To know that you don't know something, you must know enough about it to recognize what your gaps look like, and what would fill them."

‍

The Technical Experiments

The project had two main tracks: a mechanistic interpretability study of how a reasoning model internally represents creative versus logical tasks, and an LLM-guided reinforcement learning loop designed to push a model toward genuinely novel outputs.
‍

Track 1: What's Actually Happening Inside a Model?

I ran a mechanistic interpretability analysis on DeepSeek-R1-Distill-1.5B, a reasoning model specifically trained to think before answering. The question: does a model built for reasoning develop a distinct internal representation for creative tasks?

I gave the model three categories of prompts: creative (open-ended, divergent), logical (step-by-step convergent reasoning), and factual (retrieval-like, low reasoning demand). I then extracted the hidden states from every layer and used PCA, Centered Kernel Alignment similarity, probing classifiers, and attention entropy to see how the model internally categorises these task types. The conclusion, as expected: Creativity is not a cognitive mode.
‍

Track 2: The LLM + RL Feedback Loop

The second track asked a different question: instead of probing what's inside a model, can we push one toward genuinely novel generation by forcing it to interact with an environment it doesn't understand?

I built four agents and challenged them to balance a pole in a broken physics world (environments where the physics had been deliberately altered to violate intuition). World 2 had negative gravity (the pole falls upward). World 3 had zero gravity and high damping (no momentum, everything sluggish). World 4 had a 3-frame action delay, breaking causality.

Agent 1 (Silent Instinct): Pure DQN. No language model. The baseline.
Agent 2 (Rule Follower): One-shot LLM. Generates a single hypothesis and reward function, then trains without revision. Clever once, but cannot learn from failure.
Agent 3 (Scientist): Full LLM-RL loop. Observes episodes, forms a hypothesis, writes a reward function, trains for 10,000 timesteps, sees the failure diagnostics, and revises. This loop repeats.
Agent 4 (Novelty Seeker): Agent 3 plus a novelty pressure mechanism: hypotheses too similar to previous ones are rejected outright. The model is forced to search genuinely different regions of idea space.

The loop in practice: the LLM observes broken episodes → forms a hypothesis (e.g., 'the pole rises, not falls') → writes a reward function → DQN trains → gets a performance score → sees the failure diagnostics → revises the hypothesis and tries again.

The finding: adding the iterative feedback loop (Agent 3) and especially the novelty pressure (Agent 4) produced hypotheses that occupied more distant regions of embedding space compared to earlier agents. The LLM-RL loop showed signs of moving toward genuinely novel generation in a way that simply probing static embeddings never captured. Giving the model an environment to fail in, and a chance to revise, seemed to do something that prompt engineering alone couldn't.

This raised a question I'm still sitting with:

Does giving models the capacity to act within an environment drive creative generation? Is this what makes humans creative? Not the architecture, but the embodiment? Can a three-dimensional world model induce creativity in an agent free to act within it?

On a deeper level, when we encounter something that seems creative, novel, or alive with meaning, and we can't distinguish whether it came from a human or a machine, what does that tell us about what we value, and why we value it?

That question doesn't have an answer yet. But I think it's one of the most important questions we'll face as AI becomes more capable, more fluent, and more present in the spaces where human creativity used to be unchallenged territory.

Full code and results: github.com/MansiDhanania/Novelty-in-LLM-Guided-RL
‍