TL;DR: After a week-long investigation, I think either the ath9k_htc firmware is broken, or a large proportion of the 0cf3:9271 devices is buggy. My next plan is to purchase or pick up a different wifi adapter that avoids the firmware. My inspection of the ath9k_htc driver suggests that all Atheros 9271 chips use this firmware.
After I updated my system from Ubuntu 18.04 LTS to Ubuntu 20.04 LTS, my USB wifi adapter has dropping my connection after some time of usage. I can fix the problem temporarily by unplugging the adapter and plugging it back in, but I would like to fix the underlying problem so that the device stops doing this completely. I do not know where to start looking, however.
lsb_release -a:
LSB Version: core-11.1.0ubuntu2-noarch:security-11.1.0ubuntu2-noarch
Distributor ID: Ubuntu
Description: Ubuntu 20.04.2 LTS
Release: 20.04
Codename: focalI enabled debug logs for the ath9k_htc driver. Apart from that config change, I built the kernel from the same sources as the Linux kernel debian package from Ubuntu's repos, which I acquired with apt-get source linux-modules-extra-5.4.0-70-generic
uname -a:
Linux herpcomp 5.4.94afw #1 SMP Sat Apr 10 21:27:47 PDT 2021 x86_64 x86_64 x86_64 GNU/LinuxHere is a link to an excerpt from running journalctl - . I think I caught the moment where the device acting weirdly since I see there is a log line saying:
Apr 11 16:59:14 herpcomp kernel: wlx00c0ca82d3fc: deauthenticated from 04:d9:f5:b1:f8:30 (Reason: 15=4WAY_HANDSHAKE_TIMEOUT)
Here is also a set of logs specifically showing NetworkManager and wpa_supplicant around the time where my adapter started acting weirdly: . This was retrieved with journalctl -S "2021-04-11 16:55:00" -U "2021-04-11 17:10:00" -u NetworkManager -u wpa_supplicant.
This is the contents of /etc/NetworkManager.d/conf.d/default-wifi-powersave-on.conf:
$ cat default-wifi-powersave-on.conf
[connection]
wifi.powersave = 2sudo iw reg get shows:
global
country US: DFS-FCC (2400 - 2472 @ 40), (N/A, 30), (N/A) (5150 - 5250 @ 80), (N/A, 23), (N/A), AUTO-BW (5250 - 5350 @ 80), (N/A, 23), (0 ms), DFS, AUTO-BW (5470 - 5730 @ 160), (N/A, 23), (0 ms), DFS (5730 - 5850 @ 80), (N/A, 30), (N/A) (57240 - 71000 @ 2160), (N/A, 40), (N/A)
phy#0
country GB: DFS-ETSI (2400 - 2483 @ 40), (N/A, 20), (N/A) (5150 - 5250 @ 80), (N/A, 23), (N/A), NO-OUTDOOR, AUTO-BW (5250 - 5350 @ 80), (N/A, 20), (0 ms), NO-OUTDOOR, DFS, AUTO-BW (5470 - 5730 @ 160), (N/A, 26), (0 ms), DFS (5725 - 5875 @ 80), (N/A, 20), (0 ms), DFS (57000 - 71000 @ 2160), (N/A, 40), (N/A)Since I live in the US, I assume the global option is set correctly. I'm not sure why phy#0 is set to GB, though, and I don't know how to change it.
Any ideas about where else I should look to find the problem? My next guess is to search for logs and configurations related to wpa_supplicant, and do a bit of research on what iw reg get/set is actually doing. I'll leave updates as I work on this problem.
EDIT 1: I added in a link to some logs I got from running journalctl -S "2021-04-11 16:55:00" -U "2021-04-11 17:10:00" -u NetworkManager -u wpa_supplicant. The time interval contains the moments where my wireless adapter acted weirdly. The logs are pasted at .
Interestingly, I found the following sequence of logs from reading the kernel, NetworkManager, and wpa_supplicant logs together:
Apr 11 16:59:05 herpcomp kernel: ath: phy1: AWAKE -> FULL-SLEEP
Apr 11 16:59:05 herpcomp kernel: ath: phy1: Set HW Key
Apr 11 16:59:05 herpcomp kernel: ath: phy1: FULL-SLEEP -> AWAKE
Apr 11 16:59:05 herpcomp kernel: ath: phy1: AWAKE -> FULL-SLEEP
Apr 11 16:59:05 herpcomp wpa_supplicant[1212]: wlx00c0ca82d3fc: CTRL-EVENT-DISCONNECTED bssid=04:d9:f5:b1:f8:30 reason=0
Apr 11 16:59:05 herpcomp NetworkManager[1174]: <info> [1618185545.3145] device (wlx00c0ca82d3fc): supplicant interface state: completed -> disconnected
Apr 11 16:59:05 herpcomp kernel: ath: phy1: FULL-SLEEP -> AWAKE
Apr 11 16:59:05 herpcomp kernel: ath: phy1: Set channel: 2452 MHzI am not too entirely sure what it means when the ath9k_htc driver says that the device is going from "AWAKE" to "FULL-SLEEP", but that looks very suspicious.
EDIT 2: I collected some 2 sets of logs from wpa_supplicant. This first file are logs when I was unable to connect to my router:
This second file of logs are when I successfully connected to my router:
In the first file that describes my wifi card's inability to connect with the router, I found that message 1 of the 4 way handshake was being repeated constantly. The router (which I will call the access point, or AP, here) would send me message 1 containing the AP's ANonce. wpa_supplicant would then respond back with the SNonce that was just generated. Then, the AP would respond back with the same message 1 again, rather than message 3 of the 4 way handshake.
This did not occur at all under the successful case. In that case, the 4 way handshake proceeds smoothly. AP sent me message 1. wpa_supplicant responded with message 2. AP sent back message 3, and wpa_supplicant completes the handshake by sending back message 4.
I further confirmed this weird behavior with wireshark. I found that when I was unable to connect to the AP, the AP was just constantly sending the same message 1. Then, I compared the 4 way handshakes when I could not connect with the case where I could connect. I found no differences in the AP's message 1 across the two cases.
I also compared wpa_supplicant's response in a case of failure and a case of success; apart from the differences in SNonce and the message integrity check, the response from wpa_supplicant are the same.
At this point, I'm thinking that the AP is not behaving correctly. I haven't solved the problem completely yet, but I have at least narrowed down the likely problem areas.
EDIT 3: I made a discovery tonight! I used diff on the two sets of logs I retrieved from wpa_supplicant. One set of logs followed my system when the connection was successful, and the other set of logs were when my connections were failures. I found that the ANonces I was receiving from the router were different in two locations:
First logs show:
4a 47 9e 38 c5 1b fc e4 c1 b2 73 7d 9a 2d 8f 34 3c 68 80 b2 bb c0 b0 ed ce ec 5f 55 1a 4e 36 d6
Second logs show:
4a 47 9e 38 c5 1b fc e4 c1 b2 73 7d 9a 2d 8f 34 3c 68 80 b2 bb c0 b0 ed ce ec 5f 55 1a 4e 36 da
The two nonces are almost exactly the same. The only difference is in the last 4 bits (the last hex digit in the above pair of strings).
This would explain my router's behavior. wpa_supplicant is calculating MICs using a different nonce than what the router is using.
Now the problem becomes wondering why the last 4th and 3rd to last bits are different.
I had already tried connecting my system to my phone's hotspot when wifi connections were failing, and I found that my system showed the same symptoms as when it failed to connect to the router. Since I encountered the same problem when I used my phone rather than the router, I doubt the router is sending me the wrong nonce. I assume that something on my system is collecting data from the router incorrectly.
My search will probably start somewhere in the ath9k_htc driver again. I doubt wpa_supplicant's nl80211 driver is misbehaving; otherwise I would expect to see this problem in many users' systems, which would have probably collected the attention of wpa_supplicant's maintainers.
EDIT 4: I took a closer look at the nonces. I found that each time wpa_supplicant failed the 4way handshake, the next attempt would include a nonce that was 1 integer larger than the previous nonce. For example, if I look at the last 2 bytes of the nonce, I see the following from each handshake attempt:
db 07
db 08
db 09
db 0a
db 0b
db 0c
db 0d
db 0eSomething is incrementing the nonces. I'm wondering if the replay counter could be related, since that counter is incremented on each handshake attempt. Perhaps the replay counter's memory location is overlapping with the key?
EDIT 5: After a lengthy week-long investigation, I've found that the driver isn't misbehaving. The networking stack in the kernel isn't misbehaving. wpa_supplicant is not misbehaving. It seems that there is just something wrong very low-level. I also recall I had a similar (probably the same) problem with another wifi adapter that had the same USB vendor and device ID as my currently wonky adapter. I conclude that there is either something wrong with the firmware, or something wrong with the a large proportion of these 0cf3:9271 chips.
I think I will just have to pick up a different wifi card or a different wifi adapter altogether.
10 Reset to default