I’m playing around with having Machine learning models generate the cover of my next book, MAN AMONGST GODS in a slightly pretentious attempt for the cover to imitate the subject matter of the book.
You may have seen the author photo on my main website page which is generated by machine learning models to create unique images from text prompts.
An example city street
Here are a few examples of a tech noir city street, just so you get an idea of what these machine learning models can do. The below image was generated with the text prompt: "A tech-noir city street".
I don't think the "tech" part comes through, but apart from that we have a noirish city street you can imagine Sam Spade sneaking around in.
Let's look at how we can change this with only an additional word. Let's get the same thing, but with a certain style. This time I used the text prompt: "A steampunk tech-noir city street".
This I think is a better result. I like the colouring better and we now have some old fashioned tech finding its way into the image, whilst still retaining the noirish city street. I'd go to what I imagine is some kind of bar on the left for a steam infused absinthe anyday!
But let's make things harder by requesting it to painted by a particular artist. So this time the text prompt is: "A steampunk tech-noir city street by Giger".
Giger really came to the party and made it his own! Not sure if it is a city street any longer really, but when did Giger ever paint city streets? This looks more like the innards of the Alien space ship cathedral style.
CLIP + VQGAN
So what is actually going on? The above images are generated by two neural network machine learning models (CLIP + VQGAN) to create unique images from text prompts. In this case these two complement each other:
VQGAN is good at generating images that look similar to others.
CLIP is able to determine how well a caption (or prompt) matches an image.
So Basically VQCAN will iterate images over and over again, whilst CLIP is standing over its shoulder, giving it a score of how well each iteration matches the initial text input. After 200 iterations you should get something CLIP scores as ok or better.
God and the Devil visualised
The model I used (https://accomplice.ai/) has been trained on 14 million images that has been properly annotated of what they contain. So in my mind this model represents the conceptual idea of almost anything in the world! An artist may have an idea of what something looks like or how to visualise a certain concepts, but what happens when machine learning takes over and learns for millions of these images and concepts?
Well, we can tell it to provide a visualisation of anything so why not God? Or the Devil? Why not indeed?
I will let you all take this in and make your own judgements, but you have to agree I've provided a glimpse of what lurks behind the curtain of reality - Good or Bad.
Comments