Wrong way around: only allow http:// and https:// (and generally filtering out anything thats not letters, numbers, slash or dot is probably a good idea. Remove any sequences of more than one slash or dot.

Exactly.

Whitelist only trusted schemes, do not wait to blacklist untrusted.

I wrote the Go HTML sanitizer: https://github.com/microcosm-cc/bluemonday and have a rule for user generated (untrusted) content that basically does whitelist just the things that one can trust: https://github.com/microcosm-cc/bluemonday/blob/master/helpe...

That states that URIs must be:

1. Parseable

2. Relative

3. Or one of: mailto http https

4. And that I will add rel="nofollow" to external links, and additionally I'll add "rel="noopener" if the link has a target="_blank" attribute

Oh, and I do not trust Data URIs either.