I need help with this command that is not working on my computer:
egrep "^\S+\tAA\tAA\tBB\tBB\tAA\tAA" data.frame_file.txt >> filtered_data.frame_file It is creating the filtered_data.frame_file.txt but it is empty. Also, it is not giving any error or message.
This is a sample of the data set of Ballen et al 2019 I am working with:
| probeset_id | Runner886 | Runner886 | A_batizocoi_K9484 | A_batizocoi_K9484 | A_stenosperma_V10309 | A_stenosperma_V10309 | |-------------- |----------- |----------- |------------------- |------------------- |---------------------- |---------------------- | | AX-123373785 | BB | BB | BB | BB | BB | BB | | AX-147207617 | AA | AA | AA | AA | AA | AA | | AX-147207618 | AA | AA | AA | AA | AA | AA | | AX-147207619 | AB | AB | AA | AA | AA | AA | | AX-147207620 | BB | BB | BB | BB | BB | BB | | AX-147207621 | BB | BB | AB | AB | NoCall | AB | | AX-147207622 | BB | BB | AB | AB | AA | AA | | AX-147207623 | NoCall | NoCall | NoCall | AB | AA | AA | | AX-147207624 | BB | BB | BB | BB | BB | BB | | AX-147207625 | AB | AB | AA | NoCall | NoCall | AA | | AX-147207626 | AA | AA | AA | AA | AA | AA | | AX-147207627 | AB | AB | AA | AA | AB | AB | | AX-147207628 | AB | AB | AA | AA | AB | AA | | AX-147207629 | AA | AA | AA | AA | AA | AA | | AX-147207630 | BB | BB | BB | BB | BB | BB | | AX-147207631 | AB | AB | BB | BB | AB | AB | | AX-147207632 | BB | BB | BB | BB | BB | BB | | AX-147207633 | BB | BB | BB | BB | BB | BB | | AX-147207634 | BB | BB | BB | BB | BB | BB | | AX-147207635 | BB | BB | BB | BB | BB | BB | | AX-147207636 | AA | AA | AA | AA | BB | BB | | AX-147207637 | AB | AB | AA | AA | BB | BB | | AX-147207638 | BB | BB | BB | BB | BB | BB | | AX-147207639 | BB | BB | BB | BB | BB | BB | | AX-147207640 | AB | AB | BB | BB | AA | AA | | AX-147207641 | AB | AB | BB | BB | BB | BB | | AX-147207642 | AA | NoCall | AA | NoCall | BB | BB | | AX-147207643 | AA | AA | BB | BB | AA | AA | | AX-147207644 | AA | AA | AA | AA | AA | AA |6
1 Answer
Assuming you actually have a tab-separated file (not with the borders as shown), then try this:
egrep $'^\S+\tAA\tAA\tBB\tBB\tAA\tAA' data.frame_file.txt >> filtered_data.frame_file
# ....^^............................^That uses ANSI-C Quoting so grep sees actual tab characters in the pattern.
You should use grep -E instead of egrep -- the grep(1) man page says:
In addition, the variant programs
egrep,fgrepandrgrepare the same asgrep -E,grep -F, andgrep -r, respectively. These variants are deprecated, but are provided for backward compatibility.
An alternative way to filter that text:
awk -F '\t' '$2=="AA" && $3=="AA" && $4=="BB" && $5=="BB" && $6=="AA" && $7=="AA"' file.tsv 2