I have tried grep, awk, sed and am starting to try xmlstarlet but I'm not finding much support with that.
I'm guessing the xmlstarlet is telling me that the XML is ill formed, but all I want to do is find tags that contain a specific hex color and print the text between the tags.
My file.xml looks like this:
<p>Do not print this.</p>
<p><span>Print this.</span></p>I have tried
$ cat file.xml | grep -oP '(?<=\"<span>\").*?(?=\"</span>")'grep produces no output
sed -n '/span[><]" '/span’/{print $3} file.xmlawk command runs but does not terminate or print anything.
xmlstarlet produces:
Unescaped '<' not allowed in attributes valueswhich is referring to another line in the file, but I am guessing this violation is why xmlstarlet halts.
1 Answer
Considering that the file.xml is not properly formatted xml, you can do the following:
grep -o '<span style=\" color: #595959;">.*</span>' file.xml | xmllint --xpath 'string(//span)' -The grep part of the command will find the whole line containing your desired span tag (tags included). The line is then piped to xmllint that will use xpath query to locate your text. Keep in mind that this will work if you do not have multiple span tags, that match the grep criteria, in the same line.
However, if you have properly formed xml, you can use only xmllint (I just put and tags around your file), which is preffered way to work with xml files. The command would be:
xmllint --xpath 'string((//span[@style=" color: #595959;"])[1])' file.xmlNote the [1] in the command. This is used to show you the first result of the query. If you have multiple span tags with same style attribute, you can get those texts by using [2], [3], etc.