Ensures safe e-mail subject #346

Merged
laurensmartina merged 2 commits from hypha-202 into master 2020-12-30 13:25:40 +00:00
laurensmartina commented 2020-10-04 20:57:20 +00:00 (Migrated from github.com)

Some e-mail providers reject e-mail messages
with non-ASCII characters in the subjects.

Fixes #202

Some e-mail providers reject e-mail messages with non-ASCII characters in the subjects. Fixes #202
matthijskooijman (Migrated from github.com) reviewed 2020-10-04 20:57:20 +00:00
matthijskooijman commented 2020-10-05 07:53:58 +00:00 (Migrated from github.com)

Looks good at first glance. I think it would be good to also link to the relevant spec or RFC in a coment, making it easier to verify the implementation later. Also, did you consider using the more readable quoted-printable encoding, as suggested here? https://stackoverflow.com/a/54688095/740048

Looks good at first glance. I think it would be good to also link to the relevant spec or RFC in a coment, making it easier to verify the implementation later. Also, did you consider using the more readable quoted-printable encoding, as suggested here? https://stackoverflow.com/a/54688095/740048
laurensmartina commented 2020-10-05 10:21:56 +00:00 (Migrated from github.com)

Also, did you consider using the more readable quoted-printable encoding, as suggested here? https://stackoverflow.com/a/54688095/740048

Implemented the quoted-printable fix

> Also, did you consider using the more readable quoted-printable encoding, as suggested here? https://stackoverflow.com/a/54688095/740048 Implemented the quoted-printable fix
matthijskooijman commented 2020-10-05 11:04:59 +00:00 (Migrated from github.com)

This is defined in https://tools.ietf.org/rfc/rfc2047.html

It seems like the current implementation looks conformant, except that the quoted_printable_encode PHP function actually implements the RFC2045 version of quoted-printable, which is a bit more complicated (it handles line breaks differently, but there might be other differences that we should technically correct for). Also, RFC2047 poses a requirement on maximum (line) lengths:

While there is no limit to the length of a multiple-line header
field, each line of a header field that contains one or more
'encoded-word's is limited to 76 characters.

Which corresponds to the max line length requirement of RFC2822, which requires breaking on whitespace in the original text.

I wonder if we should just not use quoted_printable_encode and do our own encoding (which should be simple enough, though I'm not yet sure how to guarantee the max line length).

This is defined in https://tools.ietf.org/rfc/rfc2047.html It seems like the current implementation looks conformant, except that the `quoted_printable_encode` PHP function actually implements the RFC2045 version of quoted-printable, which is a bit more complicated (it handles line breaks differently, but there might be other differences that we should technically correct for). Also, RFC2047 poses a requirement on maximum (line) lengths: > While there is no limit to the length of a multiple-line header field, each line of a header field that contains one or more 'encoded-word's is limited to 76 characters. Which corresponds to the max line length requirement of RFC2822, which requires breaking on whitespace in the original text. I wonder if we should just not use `quoted_printable_encode` and do our own encoding (which should be simple enough, though I'm not yet sure how to guarantee the max line length).
matthijskooijman commented 2020-12-30 11:01:45 +00:00 (Migrated from github.com)

I would like to get a solution for this merged. We currently have the older base64-based solution on the Stadsbron server, to at least fix this issue, but having an unmerged commit on the live server is a hassle, so let's move this forward.

It's been a while since I looked at this, but I remember that doing the quoted-printable properly involves implementing it ourselves which is not terribly complicated, but needs a thorough reading of the relevant specs. Also, there was some unclarity about maximum lengths for lines and encoded entities, but I can't recall what exactly (I think there is a badly specified rule that any whitespace between quoted-printable elements is ignored, which allows cutting long encoded elements up into multiple ones with a dummy newline in between, to satisfy the pre-quoted-printable-standard line length limits). In any case, this is more work than I'm prepared to invest right now.

So, I would suggest a compromise:

  • Send ASCII-only subjects as-is
  • Send other subjects base64-encoded

This hurts the plaintext-reasability of such subjects, but e-mail clients should just handle this transparently, and having subjects that are harder to read outside of a proper mailclient is probaly better than not having mail delivered at all.

I've implemented this proposal in an added commit. @laurensmartina could you see if this looks ok to you? I added a TODO for maybe later implementing the quoted printable, with some references.

If this is ok, I'll squash-merge this PR.

I would like to get a solution for this merged. We currently have the older base64-based solution on the Stadsbron server, to at least fix this issue, but having an unmerged commit on the live server is a hassle, so let's move this forward. It's been a while since I looked at this, but I remember that doing the quoted-printable properly involves implementing it ourselves which is not terribly complicated, but needs a thorough reading of the relevant specs. Also, there was some unclarity about maximum lengths for lines and encoded entities, but I can't recall what exactly (I think there is a badly specified rule that any whitespace between quoted-printable elements is ignored, which allows cutting long encoded elements up into multiple ones with a dummy newline in between, to satisfy the pre-quoted-printable-standard line length limits). In any case, this is more work than I'm prepared to invest right now. So, I would suggest a compromise: - Send ASCII-only subjects as-is - Send other subjects base64-encoded This hurts the plaintext-reasability of such subjects, but e-mail clients should just handle this transparently, and having subjects that are harder to read outside of a proper mailclient is probaly better than not having mail delivered at all. I've implemented this proposal in an added commit. @laurensmartina could you see if this looks ok to you? I added a TODO for maybe later implementing the quoted printable, with some references. If this is ok, I'll squash-merge this PR.
laurensmartina (Migrated from github.com) reviewed 2020-12-30 12:11:29 +00:00
laurensmartina (Migrated from github.com) left a comment

looks good

looks good
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
harmen/hypha!346
No description provided.