How to Create Consistent AI Video (Part 2)

Creating consistent AI video can be tricky, but with the right methods, you can achieve stunning results.

Key Takeaways:

How to install and use Tiled VAE to distribute GPU load for AI video generation.
Using multiple ControlNet models for enhanced video consistency.
Creating larger grids for long 9×16 resolution videos.
Tips for fixing common issues like inconsistent facial features and flickering.

In this blog post, I’ll dive into Tokyojab’s advanced techniques for ensuring temporal consistency in AI videos. We’ll explore his latest hacks to help you transform your video with stable, flicker-free results. Whether you’re new to AI filmmaking or an experienced creator, these tips will help you take your video production to the next level.

Introduction to Consistent AI Video Creation

Tokyojab, a highly innovative AI video creator, has developed several hacks that ensure consistency across frames in AI-generated videos. Temporal consistency ensures that elements, like facial features, stay stable across frames, preventing flickering and jittering.

In Part 1, I introduced two clever hacks from Tokyojab that significantly improve AI video quality. In this follow-up, we’ll dive deeper into these methods, share additional insights, and explore new tools like Tiled VAE and ControlNet for more detailed and longer video projects.

Step-by-Step Guide

1. Installing Tiled VAE and Depth Map Extension

The first step in creating a consistent AI video is installing the Tiled VAE, which helps distribute the GPU workload across frames. This extension is a part of the Multi-Diffusion extension, and here’s how to install it:

Copy the GitHub link for the Tiled VAE extension.
In Stable Diffusion, go to the Extensions tab and install from the URL.
Follow the same steps to install the Depth Map extension, which will help create depth maps for use in ControlNet.

These tools are essential for anyone working with smaller GPUs, as they help make the process smoother.

2. Creating a Larger Grid

To make longer, 9×16 resolution videos, you’ll need to create a grid of 9 images (or more, depending on your project). Here’s how:

Export your video frames as an image sequence (e.g., 111 frames).
Carefully pick the keyframes that showcase significant movement or facial changes.
Use the same grid method as in Part 1 but scale it up to 9 images for larger grids.
Download the grid once it’s processed and use it in your video editing software.

This technique allows for more flexibility when editing longer videos while maintaining temporal consistency.

3. Using the Depth Map Extension

After creating the grid, use the Depth Map extension to enhance your results:

In Stable Diffusion, go to the Depth tab.
Select “Batch From Directory” and load your keyframe folder.
Set the necessary parameters (GPU enabled) and generate the depth maps.
These depth maps will help you create consistent, flicker-free backgrounds.

This process ensures that not only the foreground is consistent, but the background stays stable as well, avoiding any unwanted flicker.

4. Installing and Using 3 ControlNet Units

For even more control over your AI video, use multiple ControlNet units. Here’s how to set it up:

Enable three ControlNet units in Stable Diffusion: Lineart, Depth Map, and Open Pose.
Adjust the settings for each, focusing on details like resolution and control weight.
For close-up shots, you might need to disable some units if they don’t improve the results.
Experiment with different settings to find the best balance for your video.

This method gives you the flexibility to manage complex movements and facial expressions with greater precision.

5. Creating and Refining Prompts

When generating video content with Stable Diffusion, creating precise prompts is key. For this tutorial, I used the new Deliberate Model and combined it with a LoRa LoRa model to enhance details. Here are some helpful prompt tips:

For subject matter, use “sexy young woman” and add elements like “light glare on film” and “light reflected on film” to control lighting.
Experiment with CFG scale settings and sampling steps to get the desired results.
Use the High Res Fix trick to upscale your images and add more detail to the final output.

By following these prompt techniques, you can ensure more consistent facial features and better overall detail.

6. Solving Common Issues with Facial Features

One common problem with AI-generated videos is the inconsistency of facial features. I encountered this issue with teeth in my project, and after trying different techniques (like adjusting CFG scales and sampling steps), I found a workaround:

Use a Polygon Tool in DaVinci Resolve to mask and blend the original and generated images.

This allows you to keep critical details, like the mouth or teeth, consistent throughout the video.

Conclusion

By following these advanced techniques, you’ll be able to create consistent AI videos with stable backgrounds and flicker-free results. Whether you’re working with smaller GPUs or high-end hardware, these methods will help you produce professional-looking videos.

Call-to-Action

If you found this tutorial helpful, consider subscribing to my YouTube channel for more AI filmmaking tutorials. Let me know in the comments if you have any questions or thoughts!