>>39774473I hope this doesn't come across as too retarded but after looking into "ControlNet" and the whole external network thing it has going on referenced from this video and timestamped
>https://youtu.be/fhIGt7QGg4w?t=634>10:34 - 13:40since this whole, ControlNet/External Network thing works off only using the encoding layers for the pose provided and yet produces such high-quality results compared to other methods
>18:06 - 24:53It seems to imply that there’s a more efficient process to image generation compared to the traditional way, that being, starting off with form and then adding onto the form with the model details rather than the traditional way of how models work (which I don’t know fuck all about)
Now, what does this mean? I don’t fucking know, but my gut instinct simply tells me that if both traditionally trained models produce something of far lower quality than something like ControlNet/an External Network, then wouldn’t that mean that by only using the encoding layers (which are the beginning layers) and then letting the model alone fill in the rest in the decoding layers until it finally gives you the output mean that, by doing specific tasks or ways of building the image in a certain order, like giving form or shape in the encoding layers (the beginning half of the image generation process) and then filling in with details in the decoding layers (latter half of the image generation process) mean that we could be far more efficient with our data sets?
Think about it, (unless I'm absolutely and completely incorrect), by breaking up the image generation process into stages, like shape, form, details, then color (extremely shitty example but you get the point) could we not be min-maxing our datasets and their ability to generate images by using a process like this instead of meshing images together where we get terrible anatomy and fucked up faces?
P 1/2