Glam Prestige Journal

Bright entertainment trends with youth appeal.

I need help with this command that is not working on my computer:

egrep "^\S+\tAA\tAA\tBB\tBB\tAA\tAA" data.frame_file.txt >> filtered_data.frame_file 

It is creating the filtered_data.frame_file.txt but it is empty. Also, it is not giving any error or message.

This is a sample of the data set of Ballen et al 2019 I am working with:

| probeset_id | Runner886 | Runner886 | A_batizocoi_K9484 | A_batizocoi_K9484 | A_stenosperma_V10309 | A_stenosperma_V10309 |
|-------------- |----------- |----------- |------------------- |------------------- |---------------------- |---------------------- |
| AX-123373785 | BB | BB | BB | BB | BB | BB |
| AX-147207617 | AA | AA | AA | AA | AA | AA |
| AX-147207618 | AA | AA | AA | AA | AA | AA |
| AX-147207619 | AB | AB | AA | AA | AA | AA |
| AX-147207620 | BB | BB | BB | BB | BB | BB |
| AX-147207621 | BB | BB | AB | AB | NoCall | AB |
| AX-147207622 | BB | BB | AB | AB | AA | AA |
| AX-147207623 | NoCall | NoCall | NoCall | AB | AA | AA |
| AX-147207624 | BB | BB | BB | BB | BB | BB |
| AX-147207625 | AB | AB | AA | NoCall | NoCall | AA |
| AX-147207626 | AA | AA | AA | AA | AA | AA |
| AX-147207627 | AB | AB | AA | AA | AB | AB |
| AX-147207628 | AB | AB | AA | AA | AB | AA |
| AX-147207629 | AA | AA | AA | AA | AA | AA |
| AX-147207630 | BB | BB | BB | BB | BB | BB |
| AX-147207631 | AB | AB | BB | BB | AB | AB |
| AX-147207632 | BB | BB | BB | BB | BB | BB |
| AX-147207633 | BB | BB | BB | BB | BB | BB |
| AX-147207634 | BB | BB | BB | BB | BB | BB |
| AX-147207635 | BB | BB | BB | BB | BB | BB |
| AX-147207636 | AA | AA | AA | AA | BB | BB |
| AX-147207637 | AB | AB | AA | AA | BB | BB |
| AX-147207638 | BB | BB | BB | BB | BB | BB |
| AX-147207639 | BB | BB | BB | BB | BB | BB |
| AX-147207640 | AB | AB | BB | BB | AA | AA |
| AX-147207641 | AB | AB | BB | BB | BB | BB |
| AX-147207642 | AA | NoCall | AA | NoCall | BB | BB |
| AX-147207643 | AA | AA | BB | BB | AA | AA |
| AX-147207644 | AA | AA | AA | AA | AA | AA |
6

1 Answer

Assuming you actually have a tab-separated file (not with the borders as shown), then try this:

egrep $'^\S+\tAA\tAA\tBB\tBB\tAA\tAA' data.frame_file.txt >> filtered_data.frame_file
# ....^^............................^

That uses ANSI-C Quoting so grep sees actual tab characters in the pattern.


You should use grep -E instead of egrep -- the grep(1) man page says:

In addition, the variant programs egrep, fgrep and rgrep are the same as grep -E, grep -F, and grep -r, respectively. These variants are deprecated, but are provided for backward compatibility.


An alternative way to filter that text:

awk -F '\t' '$2=="AA" && $3=="AA" && $4=="BB" && $5=="BB" && $6=="AA" && $7=="AA"' file.tsv
2

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy