ramoz 6 hours ago

FYI: For Flux, there is a lot more power in the text-encoder & you can prompt with more meaningful and comprehensive sentences. Thus, less of the traditional comma separated & concise phrasing we saw in stable diffusion.

You should do the same with your training images. Caption everything you do not want the model to remember as "you" (what you're doing, wearing, accompanied by, accessories, etc).

isoprophlex 6 hours ago

I did this for our beloved, dead cat... On replicate, too. I loved the results, until at one point I suddenly got really creeped out about the thing I was doing.

  • ryandvm 6 hours ago

    This is going to be big business I think. I have probably sent hundreds of thousands of emails, texts, chats, etc. It would be well within the realm of possibility to train an LLM on a loved ones communications corpus and allow you to chat with "them" after they're gone.

    Possible? Yes. Convincing results? Probably. Good idea? I doubt it.

    • mipmap04 6 hours ago

      Oh man, I did this with my dad's voice after he died and set up a thing where I could talk with an LLM-backed assistant and have it respond in his voice and mannerisms. It was a very weird coping and grief period and I ultimately hit a point where I got really weirded out about what I was doing.

    • portaouflop 6 hours ago

      I think that was 1:1 a black mirror episode

    • oskarkk an hour ago

      This reminds me of paintings in Harry Potter.

    • waspleg 4 hours ago

      Literally a Black Mirror episode.

    • slig 6 hours ago

      I remember seeing it here on HN that someone did that with a group chat and it would reply as each friend.

    • knicholes 6 hours ago

      This is exactly what I'd want to do for my "smart urn."

      • TeMPOraL 2 hours ago

        Code golf task: implement the whole pipeline above in minimum amount of (existing as of now) ComfyUI nodes.

        Extra challenge: extend that to produce videos (e.g. via "live portrait" nodes/models), to implement the digital version of the magic paintings (and newspaper photos) from Harry Potter.

        EDIT:

        I'm not joking. This feels like a weekend challenge today; "live portraits" in particular work fast today on a half-decent consumer GPU, like my RTX 4070 Ti (the old one, not Super), and I believe (but haven't tested yet) even training a LoRA from a couple dozen images is reasonably doable locally too.

        In general, my experience with Stable Diffusion and ComfyUI is that, for fully local scenario on normal person's hardware (i.e. not someone's totally normal PC that happens to have eight 30xx GPUs in a cluster), the capabilities and speed are light years ahead of LLM space.

        Just for comparison, yesterday I - like half the techies on the planet - got to run me some local DeepSeek-R1. The 1.58 bit dynamic quant topped at 0.16 tokens per second. It's about the same as it takes a SD1.5 derivative to generate me a decent-looking HD image. I could probably get them running parallel in lock-step (SD on GPU, compute-bound; DeepSeek on CPU, RAM-bandwidth bound) and get one image per LLM token.

      • mystified5016 4 hours ago

        Forget an urn, I want my digital ghost to haunt a furby.

petercooper 7 hours ago

Replicate does make this particularly easy while still being somewhat developer focused. I've used it for a few people in our group chat so we can make silly in-joke memes and stuff and the results are quite stunning. Replicate then offers the model up over a simple API (shown in the post) if you wanted to let people generate right from the chat, etc. Replicate is worth poking around a bit more broadly, too, they have some interesting models on there (though the pricing tends not to be very competitive if you were going to do it at scale.)

m463 an hour ago

I had set up automatic1111 a while back, and I believe the webui let you your image generation have a starting image. It's kind of fun to have a cartoon of yourself based on an image.

ge96 7 hours ago

What I want is to be able to feed in a bunch of videos and generate an animatable (from talking) 3D face from that data. I suppose you in theory only need 3 images (front and sides). But mapping pixels to motion is interesting (facial expressions).

There wouldn't be depth data so it would be inferred from shadows

  • timdiggerm 5 hours ago

    Why do you want to do that?

    • ge96 5 hours ago

      My case is not directly nefarious, for example an old popular YouTuber that streamed in the early 2000s taking their content and making a model of them for personal use like a 3D chat bot but with that person's quirks

      Edit: when I say "nefarious" I mean you can use that tech to impersonate someone (eg. political reason) but for my case it's more the creeper type cloning someone for personal use eg. Replika

      Tangent, the holo vtubers industry is interesting since they build up these characters with some unique persona/theme and then people follow that specific model, they could make themselves into an AI easily since it's a rigged 3D asset but of course it would be boring compared to the real thing

      • GaggiX 4 hours ago

        >they could make themselves into an AI easily since it's a rigged 3D asset but of course it would be boring compared to the real thing

        The most popular vtuber on Twitch is an AI tho

        • ge96 3 hours ago

          You talking NeuroSama? I haven't kept up with it in a bit

          I'm not sure if that's truly AI since the Turtle drives her

          Edit: if the source was open I'd believe it

          • GaggiX 3 hours ago

            >I'm not sure if that's truly AI

            It has always been a LLM. There is no human typing at insane speed to the TTS.

            • ge96 3 hours ago

              I'm referring to live interception of messages which I guess has to be done to be compliant with Twitch's terms -- there is a human there

              edit: but yeah the fact that so many people interact with her shows generated content can keep people occupied

thefourthchime 4 hours ago

I did this a while back, though it was pictures of my wife in lingerie.

- I asked grok to generate a list of racey prompts. - Has replicate generate them via script. About 10-20% are very poor, I filtered those out manually. - It also has NSFW guardrails, but a simple retry or word juggle gives you a chance to get around it.

I think I spent $10

  • Der_Einzige 4 hours ago

    There is a parallel "underground" AI research world of stuff like this, with it's hub on "civit.ai" instead of huggingface.

    Often the innovations from that world are ahead of mainstream AI research by years. You should see what coomers did for LLM sampling in order to get over issues with "slop" responses just for their own pervy interests. This is a full several years before the mainstream crowd ever cared.

    • ok_dad 4 hours ago

      Porn has always pushed the boundaries of media on the internet. I don't know why people are surprised! Since sex is something nearly everyone does, it would make sense that a lot of human progress were the result of trying to integrate sex and whatever new tech is out there at the time. I am sure a hundred years ago some inventors were pushing the boundaries of motors in sex toys, and in another hundred years some other inventor will be pushing the boundaries on putting sex in holograms.

    • DrSiemer an hour ago

      It's kind of annoying that some of the best models out there have a tendency to produce very not safe for work results.

      Look mom, I can make some cool astrology images for you! Whoops, that's boobs. That too. And this one. Ehh, hold up, I need to add a pile of negative prompts first...

      • wongarsu an hour ago

        Sketching nude humans is a huge part of how human painters learn. Because surprisingly clothed humans are just nude humans with some fabric over them, and the fabric can make it harder to tell what's going on.

        Even if we assumed equal amounts of effort, it wouldn't be surprising if a large corpus of nude images in the training data improved model results.

        But maybe we should have better negative prompt presets for different levels of decency

njx 3 hours ago

Thank you for sharing. Is there any model that can help train convert pictures into cartoon or flat vector illustration?

manishsharan 7 hours ago

This is fantastic but now you need to train a model to detect AI generated images from actual photos. Then of course , a model to beat the detector model and then a model to catch the model that beats the detector model and so on.

Thank you from people holding NVDA.

  • beng-nl 6 hours ago

    You may have re-invented GANs :-)

DoodahMan 5 hours ago

is something like this possible to do with video yet?

deadbabe 6 hours ago

I’m imagining something where an influencer trains AI to make and post images of themselves on social media, then the influencer dies but the AI keeps going forever.

  • ge96 4 hours ago

    The impact is kind of interesting, how do you know someone's legit, the person doing basejumping or whatever

    Thanos/NFTs: where did that take you? right back to me

    Thinking hardware with built in chain interface for proof

    Oh man dating apps too

    That's true love though, two people meet up IRL they're both like wtf who are you