Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:05
Jerrel Arkes: Hi Joop. In this episode we are talking about
0:07
using AI to create good videos. For our podcast we are using a
0:11
tool descript, and you tested a lot with it. So can you share
0:15
your experience with the tool descript for our podcast?
0:18
Joop Snijder: Yes, I've used the tool Descript to convert our
0:21
voice into text. It uses the AI technique called, speech to text
0:26
to do this conversion. And I think it's an asset for content
0:29
marketing. The first thing it does is it detects voices in the
0:34
audio file and it can detect it automatically. But as a user, I
0:38
can help the tool by providing the number of speakers, so in our
0:41
case, two speakers, and then it can detect our voices. And after
0:47
detecting the different voices, it does two things. It performs
0:51
the speech to text and it labels each part of the text to the
0:55
right person. In parts where our voices overlapped, it has some
1:00
difficulty to label the right person in the sentences. And now
1:03
that I have the transcript of the podcast, I can edit our podcast
1:07
not by editing the audio file, but by editing the text. So after
1:12
removing words like the hums, it instantly edits the audio and
1:17
video together to keep them in sync. And I think that's a pretty
1:20
cool usage of AI isn't it? Jerrel Arkes: Yeah, it really is. And I think also for silences,
1:26
right? Joop Snijder: It is also for silences and also, if you have
1:33
some sentences that are not fluent, you can remove them and,
1:38
it edits the audio and video together. Because English is not
1:43
our native language. I sometimes forget preposition or
1:47
mispronounce a word. And if that happens, I can overdub words with
1:51
my own voice. And Descript is using AI for this overdubbing. I
1:57
have trained my voice with the script they provide by recording
2:01
10 minutes of my voice. It can reproduce my voice by training an
2:04
AI model out of my recording. Jerrel Arkes: Are you happy with the results of that?
2:10
Joop Snijder: Yeah, it sounds great. But I found out that using
2:13
the syntactic version of my voice isn't that easy if it hasn't
2:19
trained certain words I'm trying to overdub, it substitutes the
2:23
word with gibberish. It includes a voice that's saying, literally
2:28
the word gibberish That's kind of annoying. There's a more
2:33
expensive option to teach the AI more words, but you have to pay a
2:37
premium for that. And, from my perspective, it also adds a lot
2:42
of unnecessary punctuation, like commas in the transcript. And if
2:47
you remove them, it edits the audio which can lead to cutting
2:51
off some words. What I found out is you can overcome this problem,
2:56
not by removing the comma, but by correcting the previous word,
2:59
including the comma, and then removing the comma. And this way
3:04
you only edit the transcript and not the audio. So I think
3:08
Descript can improve this by using some syntax corrections on
3:11
the transcript without editing the audio. That would be nice.
3:14
Jerrel Arkes: And do you think this is a better solution than
3:17
editing the video in a normal timeline in Adobe Premiere, for
3:20
example? Joop Snijder: Oh yeah, sure. Because now I can do it within
3:25
text that's much easier. And I have the transcript, so the tool
3:29
saves me a lot of time. And I think the best is yet to come
3:33
because I can use the transcript and use ChatGPT to transform all
3:38
the text into a blogpost, article or get a summary of the
3:41
transcript. And if you turn it into a social media post. I
3:45
automated this process myself, and within a few minutes I have
3:50
all of this text ready for our content marketing. So it's
3:53
definitely worth the time and money I think, to use this tool.
3:56
If you want to have subtitles for videos, or get a transcription of
3:59
some audio. Jerrel Arkes: Sounds great. And I think in the future, for example,
4:04
when we have created a video avatar of ourselves. We can even
4:07
use it to type the text that the video avatar of ourselves should
4:11
say in our voice. Right? Joop Snijder: Yeah, that would be nice.
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More