Glam Prestige Journal

Bright entertainment trends with youth appeal.

I have tried grep, awk, sed and am starting to try xmlstarlet but I'm not finding much support with that.

I'm guessing the xmlstarlet is telling me that the XML is ill formed, but all I want to do is find tags that contain a specific hex color and print the text between the tags.

My file.xml looks like this:

<p>Do not print this.</p>
<p><span>Print this.</span></p>

I have tried

$ cat file.xml | grep -oP '(?<=\"<span>\").*?(?=\"</span>")'

grep produces no output

sed -n '/span[><]" '/span’/{print $3} file.xml

awk command runs but does not terminate or print anything.

xmlstarlet produces:

Unescaped '<' not allowed in attributes values

which is referring to another line in the file, but I am guessing this violation is why xmlstarlet halts.

1 Answer

Considering that the file.xml is not properly formatted xml, you can do the following:

grep -o '<span style=\" color: #595959;">.*</span>' file.xml | xmllint --xpath 'string(//span)' -

The grep part of the command will find the whole line containing your desired span tag (tags included). The line is then piped to xmllint that will use xpath query to locate your text. Keep in mind that this will work if you do not have multiple span tags, that match the grep criteria, in the same line.

However, if you have properly formed xml, you can use only xmllint (I just put and tags around your file), which is preffered way to work with xml files. The command would be:

xmllint --xpath 'string((//span[@style=" color: #595959;"])[1])' file.xml

Note the [1] in the command. This is used to show you the first result of the query. If you have multiple span tags with same style attribute, you can get those texts by using [2], [3], etc.

2

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy