Podchaser Logo
Home
04 -  AI-Driven Video Production: Exploring Descript as a Tool

04 - AI-Driven Video Production: Exploring Descript as a Tool

Released Monday, 13th February 2023
Good episode? Give it some love!
04 -  AI-Driven Video Production: Exploring Descript as a Tool

04 - AI-Driven Video Production: Exploring Descript as a Tool

04 -  AI-Driven Video Production: Exploring Descript as a Tool

04 - AI-Driven Video Production: Exploring Descript as a Tool

Monday, 13th February 2023
Good episode? Give it some love!
Rate Episode

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:05

Jerrel Arkes: Hi Joop. In this episode we are talking about

0:07

using AI to create good videos. For our podcast we are using a

0:11

tool descript, and you tested a lot with it. So can you share

0:15

your experience with the tool descript for our podcast?

0:18

Joop Snijder: Yes, I've used the tool Descript to convert our

0:21

voice into text. It uses the AI technique called, speech to text

0:26

to do this conversion. And I think it's an asset for content

0:29

marketing. The first thing it does is it detects voices in the

0:34

audio file and it can detect it automatically. But as a user, I

0:38

can help the tool by providing the number of speakers, so in our

0:41

case, two speakers, and then it can detect our voices. And after

0:47

detecting the different voices, it does two things. It performs

0:51

the speech to text and it labels each part of the text to the

0:55

right person. In parts where our voices overlapped, it has some

1:00

difficulty to label the right person in the sentences. And now

1:03

that I have the transcript of the podcast, I can edit our podcast

1:07

not by editing the audio file, but by editing the text. So after

1:12

removing words like the hums, it instantly edits the audio and

1:17

video together to keep them in sync. And I think that's a pretty

1:20

cool usage of AI isn't it? Jerrel Arkes: Yeah, it really is. And I think also for silences,

1:26

right? Joop Snijder: It is also for silences and also, if you have

1:33

some sentences that are not fluent, you can remove them and,

1:38

it edits the audio and video together. Because English is not

1:43

our native language. I sometimes forget preposition or

1:47

mispronounce a word. And if that happens, I can overdub words with

1:51

my own voice. And Descript is using AI for this overdubbing. I

1:57

have trained my voice with the script they provide by recording

2:01

10 minutes of my voice. It can reproduce my voice by training an

2:04

AI model out of my recording. Jerrel Arkes: Are you happy with the results of that?

2:10

Joop Snijder: Yeah, it sounds great. But I found out that using

2:13

the syntactic version of my voice isn't that easy if it hasn't

2:19

trained certain words I'm trying to overdub, it substitutes the

2:23

word with gibberish. It includes a voice that's saying, literally

2:28

the word gibberish That's kind of annoying. There's a more

2:33

expensive option to teach the AI more words, but you have to pay a

2:37

premium for that. And, from my perspective, it also adds a lot

2:42

of unnecessary punctuation, like commas in the transcript. And if

2:47

you remove them, it edits the audio which can lead to cutting

2:51

off some words. What I found out is you can overcome this problem,

2:56

not by removing the comma, but by correcting the previous word,

2:59

including the comma, and then removing the comma. And this way

3:04

you only edit the transcript and not the audio. So I think

3:08

Descript can improve this by using some syntax corrections on

3:11

the transcript without editing the audio. That would be nice.

3:14

Jerrel Arkes: And do you think this is a better solution than

3:17

editing the video in a normal timeline in Adobe Premiere, for

3:20

example? Joop Snijder: Oh yeah, sure. Because now I can do it within

3:25

text that's much easier. And I have the transcript, so the tool

3:29

saves me a lot of time. And I think the best is yet to come

3:33

because I can use the transcript and use ChatGPT to transform all

3:38

the text into a blogpost, article or get a summary of the

3:41

transcript. And if you turn it into a social media post. I

3:45

automated this process myself, and within a few minutes I have

3:50

all of this text ready for our content marketing. So it's

3:53

definitely worth the time and money I think, to use this tool.

3:56

If you want to have subtitles for videos, or get a transcription of

3:59

some audio. Jerrel Arkes: Sounds great. And I think in the future, for example,

4:04

when we have created a video avatar of ourselves. We can even

4:07

use it to type the text that the video avatar of ourselves should

4:11

say in our voice. Right? Joop Snijder: Yeah, that would be nice.

Unlock more with Podchaser Pro

  • Audience Insights
  • Contact Information
  • Demographics
  • Charts
  • Sponsor History
  • and More!
Pro Features