To remove the time stamps from the YouTube video transcript, you can use a Node.js program that processes the input file line-by-line and writes the cleaned data to the output file. Here is an example of how you can achieve this using modern JavaScript (Node.js) and the readline
and fs
modules.
- First, make sure you have Node.js installed on your system. If not, download and install it from Node.js.
- Create a new JavaScript file, for example
clean_transcript.js
, and add the following code:
const readline = require('readline');
const fs = require('fs');
const inputFile = 'transcript.txt';
const outputFile = 'cleaned.txt';
const readInterface = readline.createInterface({
input: fs.createReadStream(inputFile),
output: fs.createWriteStream(outputFile),
console: false
});
readInterface.on('line', function(line) {
// Check if the line contains a timestamp using a regular expression
const timestampPattern = /^\d+:\d{2}$/;
if (!timestampPattern.test(line)) {
fs.appendFileSync(outputFile, line + '\n');
}
});
readInterface.on('close', function() {
console.log('File cleaned and saved as ' + outputFile);
});
- Save the file and open a terminal or command prompt in the same directory as the
clean_transcript.js
file. - Run the program using the command:
node clean_transcript.js
- After running the program, you should see a message indicating that the file has been cleaned and saved as
cleaned.txt
. Check thecleaned.txt
file in the same directory, and you should see the cleaned transcript without the time stamps.
The program reads the transcript.txt
file line-by-line, checks each line against a regular expression to determine if it contains a time stamp, and if not, writes the line to the cleaned.txt
file. When the process is complete, the program will display a message indicating that the cleaned file has been saved.