> For the complete documentation index, see [llms.txt](https://city-protocol.gitbook.io/docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://city-protocol.gitbook.io/docs/attention-as-a-servcie/aiugc-layer.md).

# AIUGC Layer

AI-generated video has crossed the threshold from novelty to mainstream production tooling. Industry data suggests that AI-generated short-form content now accounts for approximately 40% of new online video, **with output quality increasingly indistinguishable from human-edited media.**&#x20;

Evidence of this shift is already visible at scale — remix-style content built around culturally resonant brands has repeatedly produced hundreds of derivative videos generating multi-million-view traction on platforms like TikTok, exhibiting strong hook rates and high completion ratios.

Viral City abstracts this capability into a structured, asset-bound generation pipeline.

<figure><img src="/files/kTkO4wJH1mlfAv6VBfJG" alt=""><figcaption></figcaption></figure>

***

### **Template-Driven Generation Architecture**

For every on-chain asset, Viral City maintains a curated library of generative templates — parameterized content blueprints that encode proven short-form structures (e.g., hook-first origin clip, meme remix, character POV monologue, narrative recap, reveal trailer, duet response).&#x20;

Each template encapsulates platform-native pacing, aspect ratio, caption cadence, and tonal direction as generation constraints, ensuring outputs conform to the structural patterns empirically correlated with high retention and shareability.

#### **Creation follows an intent-driven flow:**

{% stepper %}
{% step %}
A user selects an on-chain asset.
{% endstep %}

{% step %}
The user selects a template or describes intent in natural language, specifying the joke, the scene, the angle, or the call-to-action.
{% endstep %}

{% step %}
The pipeline produces a platform-ready video that is structurally optimized, tonally on-brand, and semantically bound to the selected asset.
{% endstep %}
{% endstepper %}

***

### **Identity-Consistent Generation via Latent Anchoring**

A core technical challenge in AI-generated brand content is *`character drift`.*&#x20;

{% hint style="info" %}
**Character Drift:** The tendency of generative models to produce visually or tonally inconsistent representations of the same subject across outputs.&#x20;
{% endhint %}

Viral City addresses this through a multi-layered identity conditioning stack. Each on-chain asset is associated with a canonical identity embedding: a composite latent representation derived from reference imagery, style descriptors, and brand-defined attribute constraints.&#x20;

During generation, this embedding is injected via cross-attention conditioning, anchoring the diffusion process to the asset's **visual and narrative identity.**&#x20;

<figure><img src="/files/9TD9GE1SGSdXLbW4NhJy" alt=""><figcaption></figcaption></figure>

Supplementary LoRA modules, fine-tuned per asset or asset class, enforce stylistic coherence across output variants, ensuring that whether a character appears in a meme remix or a cinematic trailer, it remains recognizably and verifiably *the **same entity*****.**

***

### **Voice Layer**

Visual consistency alone is insufficient for full character fidelity; voice is the other half of identity.&#x20;

Viral City integrates **ElevenLabs** as the voice synthesis backbone, giving users access to an industry-leading **TTS engine** with **extensive customization capabilities**.&#x20;

This integration ensures that voice output is production-grade out of the box while remaining highly flexible: the same asset can speak in a punchy, high-energy register for a meme remix and shift to a calm, narrative tone for a recap, **all while retaining a consistent and recognizable vocal identity**.&#x20;

***

### **Virality Optimization Layer**

Beyond visual and auditory fidelity, the pipeline integrates a retention-aware generation strategy.&#x20;

Templates are not static and are continuously informed by engagement signal feedback (view-through rates, share ratios, remix frequency) aggregated across the platform.&#x20;

As a result, any user, regardless of editing skill or creative background, can generate studio-grade, visually and vocally identity-consistent, algorithmically competitive short-form video — directly tied to on-chain assets — in a single interaction.