I'm really glad you put a lot of focus into automatically tracking the things the user is likely interested in. I tried using Obsidian, but it felt like I was spending more effort remembering to save information and create proper back-links than I was actually retaining any information.
I've recently started working on something similar as an excuse to learn machine learning, but it's still mostly vaporware outside the firefox extension I wrote. I think that by saving some basic metadata (when a page was viewed, what browser was used to view it), and using ML to judge how similar the contents of a page is to another, it should be able to automatically create links between related information. Ideally, it'd be able to handle information outside the browser. For example, if a log file is saved, then a web page is viewed with similar contents to the log file, it would be able to detect that the web page is probably a reference for the log file.
Like I said, it's mostly vaporware, but I think that products like these are going to be the future of collaboration tools.
Congrats on getting started.
I agree with Obsidian - I think that most people forget the maintenance time it takes to build a lifelong Knowledge Management System.
I like your idea - document similarity is a well known area in ML.
Feel free to take my Chrome Extension and use the parts where it tracks key paragraphs in an article (using a user's click/ hover/ attention behaviour) and use that as the corpus for your ML similarity models.
Intuitively it makes more sense to run document similarity on key points/ paragraphs than the whole web page.
If you want the whole web page though, there's code in the Chrome Extension that use's Mozilla's readability lib (https://github.com/mozilla/readability) to purify the web content.