Imagine I have a file like this:
INSERT INTO table VALUES('1','<p><em>The lazy fox jumps again</em></p>bunch of other html<p><em>Is the lazy fox crazy?</em></p>')
INSERT INTO table VALUES('2','<p><em>The lazy fox jumps again</em></p>bunch of other html<p><em>Is the lazy fox crazy?</em></p>')
INSERT INTO table VALUES('3','<p><em>The lazy fox jumps again</em></p>bunch of other html<p><em>Is the lazy fox crazy?</em></p>')And I wish to delete only the first occurrence of <p><em> and </em></p> so I end up with something like this:
INSERT INTO table VALUES('1','The lazy fox jumps againbunch of other html<p><em>Is the lazy fox crazy?</em></p>')
INSERT INTO table VALUES('2','The lazy fox jumps againbunch of other html<p><em>Is the lazy fox crazy?</em></p>')
INSERT INTO table VALUES('3','The lazy fox jumps againbunch of other html<p><em>Is the lazy fox crazy?</em></p>')... how can I do that with sed (or perl)? The statement...:
sed "1,/INSERT INTO/s/<p><em>//g"... only replaces the first occurrence in the file, not on every line.
Help is much appreciated.
2 Answers
If you want to process all lines with INSERT INTO, do not provide address range. If you want to only replace the first occurence of a string, do not provide /g:
sed -e '/INSERT INTO/s/<p><em>//' -e '/INSERT INTO/s/<\/em><\/p>//' 1 Here's one way you could do it with perl:
perl -pe 's:<p><em>(.*?)</em></p>:$1:' infileThe .*? quantifier is non-greedy, so only the first pair of tags will be matched.
Output:
INSERT INTO table VALUES('1','The lazy fox jumps againbunch of other html<p><em>Is the lazy fox crazy?</em></p>')
INSERT INTO table VALUES('2','The lazy fox jumps againbunch of other html<p><em>Is the lazy fox crazy?</em></p>')
INSERT INTO table VALUES('3','The lazy fox jumps againbunch of other html<p><em>Is the lazy fox crazy?</em></p>')