New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange User Agent strings don't get logged #386
Comments
It is the user agent of one of the most used browsers in the world :) |
We're not the only ones: https://twitter.com/yipcw/status/697369206531166213 |
Not sure whether this would help: |
Ah, good old GB2312. Transcoding those bytes from GB2312 to UTF-8 we get The real problem is that the UA is sending an invalid 'User-Agent' header. From the RFCs:
From what I've read and what I understand, there shouldn't be a problem sanitising rubbish by C-style backslash-escaping the raw octets. I don't mind the database literally holding the string |
It appears that to get octets, we needed the encode function: |
You can either switch your terminal's encoding, or you can use some shell voodoo to make curl send the exact bytes in the User-Agent header field; for example:
On our repository I've hacked some heuristic-based detection and conversion into some parts of our code, but that was a quick-fix workaround (for existing bad data) and a657554 presents a much more correct type of fix. (Note: we've also changed our database to use utf8mb4 encodings and collations.) |
Fixes #386 by removing SQL injection vector using user agent string.
Just spotted this in the apache error log
DBD::mysql::st execute failed: Incorrect string value: '\xC6\xC6\xBD\xE2\xBA\xF3...' for column 'requester_user_agent' at row 1 at /usr/share/eprints/perl_lib/EPrints/Database.pm line 1184.
User-Agent:
"\xc6\xc6\xbd\xe2\xba\xf3\xb5\xc4"
Not quite sure what this is meant to be - but it doesn't go into the database cleanly!
Might need to do some form of sanity check on the User-Agent.
The text was updated successfully, but these errors were encountered: