5th February 2024
Artificial Intelligence (AI) is evolving at an unprecedented speed and has already become an essential tool for many people. From the widely popular ChatGPT, with its countless applications across many industries, to text-to-image models like Midjourney and Stable Diffusion that can generate impressive images from a simple text prompt.
In this article, I explore how these AI technologies can assist artists and improve visualisation workflows. Then speculate what the future might hold for AI in architectural visualisation and if artists should be excited or concerned.
Before we jump into our examples let’s take a quick look at the main tools that we use.
Stable Diffusion is an AI model that generates images based on a text prompt. It can be accessed in different ways but we recommend downloading it locally and using the WebUI (Automatic1111). It is entirely free to use but requires a reasonably powerful GPU.
Generated in Stable Diffusion by Curved Axis - "pink monkey with a donkey's head"
The images that Stable Diffusion generates are dependent on the model it uses. By default, Stable Diffusion uses its own base model but you will get better results by downloading any other popular realistic model, we use a model called epiCRealism for the examples. The type of images a model produces depend on the images it was trained on, if a model’s training data only includes watercolour art, then it will be great at recreating that style but unable to produce realistic images. Also, I would recommend avoiding any models that contain “XL” in their name whilst you are getting started, these differ from standard models and will complicate things if you try to follow the steps in the article.
We also use a Stable Diffusion extension called ControlNet. ControlNet allows us to constrain our generated images based on elements of our input, using different methods. The control methods that we use in this article are Canny (edge detection) and OpenPose (human pose).
We won’t cover the process of installing Stable Diffusion and ControlNet here as they are well-documented on the official pages and elsewhere on the web.
Midjourney is an online text-to-image model that is accessed through Discord. It generates impressive images, is simple to use and doesn’t require any specific hardware, as all the processing is done on their servers. However, it is a paid service (starting at $10/month) and lacks many of the features available in Stable Diffusion.
Generated in Midjourney by Curved Axis.
AI can struggle with understanding context and using common sense. We see it when looking at the details of AI images, particularly with staircases - they can range from unusual or impractical to impossible! As is evidenced in the images above.
Photoshop’s Generative Fill is a text-to-image AI tool that is free and easily accessible to anyone already using the software. It is great for cleaning up and extending photos but tends to fall short when creating images from scratch or generating people.
Runway is an online tool that can create videos from images or text prompts. It has a free version which limits the output resolution, adds a watermark and has a limited amount of generations but also offers a paid subscription (from $15/month) that includes upscaling, watermark removal and an increased amount of generations that refresh each month.
Topaz Photo AI & Video AI are two separate apps developed by the same team - Topaz Labs - they cost $199 and $299 respectively. With Photo AI you can upscale, enhance and denoise your photos, Video AI also offers these features, as well as, frame interpolation and stabilization.
Although we won’t make much use of it here ChatGPT deserves a quick mention. It is an AI chatbot that can do many things including write emails, research a topic or generate MAXScript for you, as we explore later. It is accessible online and free to use, with an option to upgrade for $20/month for some additional features.
Now we have covered the main tools, here are 9 practical uses for AI to improve your architectural visualisation work!
Adding a person, or multiple people, to a render often improves it. It creates interest, demonstrates scale and brings life to a visualisation. People are typically added in one of two ways, by either using 3D models of people in a scene or compositing 2D cutouts into the render in Photoshop.
Both methods are suitable depending on the scenario, however, they have their downsides. It can be very difficult to find 2D cutout people that match the scene, lighting, and perspective, and on the other hand, the realism of 3D people is often lacking, especially if they are near the camera.
We find the best solution to avoid these problems is to use 3D people and improve them in post-production with AI. This way we get the correct lighting, shadows, perspective and scale from the 3D model, with the added realism that AI can give us.
Side-by-side comparison of raw render (left) and Stable Diffusion outputs with two different prompts (middle and right)
Let’s look at how we can achieve the results from the image above with Stable Diffusion and ControlNet. The completed render is also included below.
Render with improved people from Stable Diffusion
Generated in Midjourney and upscaled in Stable Diffusion by Curved Axis
I imagine you are sick of hearing people talk about how AI can produce photorealistic images - I know that I am. Fortunately for artists, AI image generation currently has some fundamental issues that I don't see being resolved anytime soon. The main problem is the lack of control over the output, it is not possible to input drawings, materials or other information, so we cannot generate images that match the specifications. It is also not able to produce consistent results with controlled changes, for example, if we want to adjust the camera position or change the lighting whilst retaining existing elements, it is not possible.
AI apps, VERAS and ArkoAI, for Revit and SketchUp allow you to regain some control as they generate images based on a 3D model and the viewport preview. Although they are both impressive apps, they have paid subscriptions and we can achieve similar, or better results, with a clay render, Stable Diffusion and ControlNet.
We do this by following the same process as the previous point but by using standard img2img instead of inpaint. Just upload a clay render, write a prompt, turn on ControlNet then click generate.
Comparison of original clay render (left) and output from Stable Diffusion (right)
As you can see in the images above, by using this technique we have much more control over the output however it can be difficult to adjust specific elements like the planting or materials. So we feel the best use for this is concept design, to help explore ideas early on in a project, you can test out many different looks and lighting scenarios in minutes just by adjusting the prompts - as you can see in the four images below.
Images generated by Stable Diffusion with different prompts to adjust the lighting and mood
Much like image generation, AI can also create impressive videos with just a text prompt, using an AI app called Runway. However, if we only use a prompt, we run into the same issue of lack of control. So we are going to use a render as our starting point by simply uploading it to Runway and clicking generate.
Videos generated in Runway based on renders by Curved Axis
As you can see in the video above, although the results aren't perfect, they are quite impressive!
Runway also recently introduced a motion brush tool and the ability to add a description to image to video generations. While these tools should give us much more control, in practice, we found they didn't improve our videos. We are certain that the output quality and level of control will continue to improve rapidly, and expect to be able to generate high-quality cinemagraphs from renders soon enough.
3D-rendered animations use at least 24 rendered frames per second (FPS) of animation, as a result, producing them can take a lot of rendering time and processing power. However, we can reduce this time with AI interpolation, making animations faster and less expensive to create. This can be especially useful for sharing early previews of an animation or in projects with time or budget constraints.
Motion interpolation has existed for many years but the addition of AI takes it to a completely new level. AI algorithms analyse the footage, create a depth map and then use it to accurately predict what happens between existing frames. This can increase the FPS of an animation by 2, 4, 8 or even 16 times which means you only have to render 15 frames for a 10-second animation (at 24 FPS). You can then let AI fill in the missing pieces, taking only a fraction of the time to render!
The most comprehensive software for this is Topaz Video AI, however, free apps Flowframes and DAIN-App can also achieve great results once you get past the basic UI.
Comparison of the ~4 fps original animation and the interpolated 30 fps animation
The quality of the output is dependent on the type of motion in the scene. It handles smooth camera movements well but can have difficulties with complex motion, refractions, repetitive patterns, people and fluid simulations, where it produces some unnatural results. So, we wouldn’t recommend relying on this to produce final client animations but it can be a useful tool to have in your arsenal. You can also use it to push animations beyond 30 FPS to a smooth 60+ FPS.
We can also use AI to increase the resolution of renders and animations through AI upscaling. AI upscaling allows us to enlarge low-resolution images, add detail and remove noise. Many different tools can achieve this, including Topaz Photo AI, Stable Diffusion img2img or one of the many image upscaling websites like Bigjpg.
Comparison of a low-resolution render (left) and an upscaled version (right) using Topaz Photo AI
As you can see from the images above the upscaled version is sharper and has more detail.
We are not suggesting that you produce your renders at a low resolution and then upscale them, as high-resolution renders always have more detail and better quality. However, this can work well for quick preview renders or when creating textures - more on that in the next point.
For animations, the selection of apps is smaller. Topaz Video AI is the go-to option, however, some web-based options like TensorPix and AI Video Enhancer seem to produce similar results.
It is a common problem for 3D artists to receive material references that are low-resolution or not seamless. We can solve this with a couple of AI tools and turn low-quality references into high-resolution seamless textures.
A 512x512 pixel non-seamless marble texture
Firstly, we extend the texture, this allows us to use it on a larger surface and not have any noticeable repetition. To do this we, open the texture in Photoshop, enlarge the canvas, select the empty pixels, and use Photoshop’s generative AI to fill in the empty space. In most cases, you can keep the prompt empty but you might need to run it a few times or combine multiple options to get the best results.
Next, we make it seamless by using filter-offset to move the seams to the centre of the image, select the seams and use generative AI to blend the edges of the texture. Then just use offset again with the opposite values to reset it to the initial position.
Finally, you can upscale your texture using one of the tools from the previous point and there you have it, the texture is ready to use.
A seamless, extended and upscaled marble texture based on our original reference
As well as improving existing textures, we can use AI to create textures from scratch. Any text-to-image generator is capable of this but we will use Midjourney. We start by submitting a prompt:
"large slab of calacatta marble texture --tile --ar 1:1"
If we analyse the prompt:
Now we have a grid of images, we can upscale the option that we like most or create more variations. Generating many variations can be useful for tiles or planks as we can apply different ones to each mesh element to create a realistic effect.
The standard output from Midjourney is 1024 x 1024 pixels. But by using the upscaling and extending techniques in the last two points we can increase the scale and resolution of the texture with relative ease.
A seamless texture generated in Midjourney with the prompt "large slab of calacatta marble texture --tile --ar 1:1"
As well as textures, we can also use Midjourney to create artwork for our scenes. This is great, as finding good royalty-free art that matches the colour scheme and style of a scene can be tough.
So once again we start with a prompt:
"abstract artwork, white, blue and gold --ar 3:2"
Artwork variations generated in Midjourney with the prompt "abstract artwork, white, blue and gold --ar 3:2"
As you can see in the images above, this prompt gives us some nice options, we can upscale our favourite option and add it to our render. As intended, it fits our bedroom render below perfectly.
A bedroom render with AI-generated artwork on the wall
One issue with Midjourney and other image generators is that they only generate diffuse textures. If we need normal and roughness maps then we have to create them ourselves. A solution to this problem is an online tool called Poly, which is an AI texture generator made for the 3D industry.
Poly has a large selection of generated textures available to edit and download, most of which are realistic, particularly considering they are AI-generated. If you are not satisfied with these options you can create your own textures based on a prompt and use their editor to fine-tune the output. However, we weren't able to generate anything as high-quality as their existing assets.
The generated textures are free to use for personal and educational purposes but if you want to use them for commercial projects you have to subscribe to Poly Infinity for $20 a month. This also unlocks 4k and 8k outputs along with faster processing and some other features.
Whilst neither of these processes is perfect, they will continue to improve. Even in their current state, they can generate reasonably high-quality textures and could be useful for obscure textures that are not available elsewhere.
Creating 3D models with AI hasn't reached the widespread popularity that image generation has quite yet but it is growing fast. Apps like Kaedim and 3DFY allow you to create 3D models based on images or descriptions but are more geared towards stylised game assets rather than realism. They also don't have any way of testing them out and have quite costly subscription options. Genie, however, doesn't have these issues.
Genie is a tool created by Luma Labs AI that is currently in its early stages of development but shows a lot of promise. Much like Midjourney, it generates four options based on a prompt and then allows you to refine any of the options. You can then view these in 3D and download them in multiple formats along with their textures. All this is completely free, at least whilst they are in their initial research phase.
Let's enter a prompt into Genie, look at the results and refine the best option.
"realistic chesterfield armchair, brown leather, 4K, HD"
Genie results from the prompt "realistic chesterfield armchair, brown leather, 4K, HD"
The refined result from option 1 in the previous image
Although the quality isn't good enough for the majority of situations, the process and the ability to create complex models fast is incredible. You can view the model from the image above in 3D and download it here.
Genie is exciting but still only in its early stages of development and needs some more time to improve. However, the developers also have an app called Luma AI that creates photo-scanned 3D models on an iPhone with the help of AI. The app guides you through capturing footage of your subject and produces an interactive 3D scene using NeRF (Neural Radiance Field) technology. Then lets you download the object as a 3D model with its textures, it's quick, easy to use and produces some great results!
As you may have seen from our previous articles we love to use MAXScript, unfortunately, one of its biggest hurdles is that many artists are just not comfortable writing it. So, what is the solution? You get ChatGPT to do it for you.
Explain to ChatGPT what you want to do and ask it to write you a script for it.
Curved Axis: Can you write me a maxscript to create many random primitive objects with different wire colors in a 3ds max scene?
ChatGPT: Certainly! Here's an example of a MaxScript that will create multiple random primitive objects with different wire colors in a 3ds Max scene:
-- Number of objects to create
numObjects = 10
-- Loop to create objects
for i = 1 to numObjects do
(
-- Random object type
objType = random 1 3
-- Random position
posX = random -100 100
posY = random -100 100
posZ = random -100 100
-- Random RGB values
red = random 255
green = random 255
blue = random 255
-- Create object based on type
case objType of
(
1: -- Sphere
(obj = sphere radius:10 pos:[posX, posY, posZ])
2: -- Box
(obj = box width:20 length:20 height:20 pos:[posX, posY, posZ])
3: -- Cylinder
(obj = cylinder height:30 radius:10 pos:[posX, posY, posZ])
)
-- Assign random wire color to the created object
obj.wirecolor = color red green blue
)
It also goes on to explain how to execute the script in 3ds Max.
Unfortunately, if we run this code in 3ds Max throws up an error. Like us, it can make mistakes but we can report this back to ChatGPT and fix the code.
Curved Axis: I am receiving this error on lines 16-18"Argument count error: random wanted 2, got 1" can you fix the code?
ChatGPT: Certainly! Here's the modified code that includes random dimensions for the primitive objects:
-- Number of objects to create
numObjects = 10
-- Loop to create objects
for i = 1 to numObjects do
(
-- Random object type
objType = random 1 3
-- Random position
posX = random -100 100
posY = random -100 100
posZ = random -100 100
-- Random RGB values
red = random 0 255
green = random 0 255
blue = random 0 255
-- Create object based on type
case objType of
(
1: -- Sphere
(obj = sphere radius:10 pos:[posX, posY, posZ])
2: -- Box
(obj = box width:20 length:20 height:20 pos:[posX, posY, posZ])
3: -- Cylinder
(obj = cylinder height:30 radius:10 pos:[posX, posY, posZ])
)
-- Assign random wire color to the created object
obj.wirecolor = color red green blue
)
It was a simple fix, when randomising the colours it only had 1 input instead of 2, by changing all instances of "random 255" to "random 0 255" it fixed it.
We can continue to develop the code by asking ChatGPT to add new functionality. For example - “Can you also make the dimensions of the primitive objects random?” or "Can you edit this code to also create torus and tube primitives?".
It might not be the most useful script but you can see how quickly we created a working script. We didn't need any knowledge of MAXscript or to directly edit the code. It also adds helpful comments throughout the code, making it easy to understand and edit if needed.
You can even cut out the middleman and write the prompts in 3ds Max with the max-gpt tool by JokerMartini.
The way that we create visualisations is constantly evolving, pushing the quality and speed of our work. AI tools are the next big step for the architectural visualisation industry and embracing the change is important to keep up. All of these new tools and possibilities are exciting but they should come with a healthy dose of concern.
The concern is that in the future will artists be needed at all? It is a question that some of the community have very strong opinions about. A few believe that artists could become obsolete in the near future, whilst others think that current issues will continue to stump AI for many years to come. I tend to agree with the latter, AI's influence will continue to grow in the coming years, automating more tasks and producing images that compete with low-quality renders. However, I don’t see this as the death of the arch viz artist at all, I see it as another powerful set of tools to help improve our work and push into new directions. Whatever you believe, the speed at which AI is currently advancing will make for some very interesting years to come.
Given the subject matter, it was only apt for an AI, ChatGPT, to provide its own conclusion on the article. So here you go:
"And that's a wrap! We've taken a fascinating journey through the world of AI in architectural visualization. From jazzing up our visualizations with realistic people to conjuring seamless textures and even dabbling in scriptwriting, AI has certainly become our trusty sidekick in the creative process.
As we marvel at these nifty tools, it's natural to wonder about the future. Will AI one day outshine us in this creative dance? Well, that's a debate for another day. For now, let's celebrate the magic that AI brings to the table while keeping our paintbrushes, or in this case, our styluses, firmly in hand.
So here's to the artists, and the AI enthusiasts – may our collaborations continue to evolve, creating a symphony where human creativity and artificial intelligence harmonize in the most unexpected and delightful ways. Until next time, happy creating!"
Back to the human writer - It’s no doubt impressive but we're safe for now.
Let us know what you think! Would you like to see more content like this? Or have you got a question for us? Either way, we look forward to hearing your thoughts and continuing the discussion.
Alternatively, if you want to talk about career opportunities, discuss an upcoming project, or just say hi, we would love to hear from you too!