How I used speech synthesis AI and an AI digital human creator

I’ve been dabbling with a number of generative AI platforms that create videos out of photographs, and audio that clones my voice with AI-prompted text for dialogue. My Ph.D. research aims to create the essence of myself using AI in combination with portrait photography and moving image.

Unfortunately, I’m still searching for the right combination of AI technology to accurately represent who I am, one with a more interactive autobiography that shares my legacy and the essence of me. I am also aware that perhaps I am not sharing the right information about myself because I’m thankfully not close to death yet. But enough of this morbid talk. I share with you the platforms I used with images and links to the videos and audio created.

Text

I used ChatGPT to create the script for the AI audio application and I asked the following questions:

What would artist Ginger Liu say about the possibility of AI replacing the family portrait photograph?
Can the essence of a person be recreated by AI and will it help or hinder bereavement?
Will AI replace the portrait photograph as an identity referent?
Will AI replicate the true essence of a person better than a photograph?

ChatGPT came up with some great points which I have edited for waffle.

When it comes to generative AI portraits, the topic of essence and identity is particularly intriguing. As an artist, I believe that the essence of a person goes beyond their physical appearance. It’s the intangible qualities that make up a person’s character, personality, and soul. And while generative AI can produce images that look remarkably like the person it’s modeled after, it’s impossible for it to capture their essence in the same way that a human photographer might.

That being said, I don’t believe that generative AI portraits are inherently bad or unethical. They can serve as an interesting artistic experiment and raise important questions about identity and authenticity in the digital age. But it’s crucial that we acknowledge their limitations and not mistake them for the real thing.

As an artist, I’m always interested in exploring new forms of art and technology. I’m currently working on several projects that incorporate AI, and I’m excited to see how the field evolves in the coming years. Ultimately, I believe that the best art comes from a deep understanding of the human experience, and that’s something that AI can’t replicate.

I believe that photography has a unique ability to capture and preserve memories of our loved ones. But with the rise of AI, there’s been an increasing interest in using generative AI portraits to simulate the presence of a lost loved one. This raises questions about the role of essence and identity in AI portraits and how they compare to traditional photography.

While AI portraits may be able to create a realistic likeness of a person, I don’t believe they can truly replicate the essence of a person or capture the memories and emotions associated with traditional photography. In this sense, AI may never replace photography in terms of its ability to help us deal with grief.

However, I do think that AI has the potential to prolong grief if not used in a thoughtful and responsible way. It’s important to consider the ethical implications of using AI to create simulated conversations with a lost loved one or other forms of AI-generated content. It’s essential that we remain mindful of the potential impact on individuals dealing with grief and ensure that AI is used in a way that respects the dignity and memory of the person we have lost.

While AI has the potential to create new ways of experiencing and remembering our loved ones, I believe that traditional photography will always have a special place in our hearts and memories. We must approach the use of AI with caution and consideration to ensure that it enhances rather than detracts from our ability to cope with loss and grief.

Audio

I used ElevenLabs to replicate my speech. First I recorded around one minute of my voice and uploaded it to the platform. I uploaded the ChatGPT text and within minutes it accurately copied my voice.

Listen Here

Video

With the digital human creator D-ID, I uploaded a video still from an interview I recorded a few weeks ago which is of higher image quality, and for balance uploaded a photo from my cell phone. I then uploaded the ElevenLabs audio file of me reciting the ChatGPT script.

The results were not what I expected. As you can see in the videos below, I’m ALL moving lips.

Watch Video

Final Thoughts

ElevenLabs is still in beta and it produced an exceptional clone of my voice, and that’s incredible but now anyone can copy my voice and do something criminal with it. Good luck with that!

D-ID was a little off in replicating my mouth when I speak but it did do a great job with eye blinking and head movement.

Both platforms were incredibly easy to use for businesses needing cheap videos for websites and social media.

Thank you to Ethan Mollick for sharing his ideas.

Ginger Liu is the founder of Ginger Media & Entertainment, a Ph.D. Researcher in artificial intelligence and visual arts media, and an author, journalist, artist, and filmmaker. Listen to the Podcast.

https://www.buymeacoffee.com/gliuphoto