Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Graph of documents (like Logseq / Obsidian) #204

Open
pablo03v opened this issue Mar 23, 2024 · 7 comments
Open

[Feature Request] Graph of documents (like Logseq / Obsidian) #204

pablo03v opened this issue Mar 23, 2024 · 7 comments
Labels
feature request New feature or request

Comments

@pablo03v
Copy link

Is your feature request related to a problem? Please describe.
Can't see wich docs are connected to each other and where tags are used.

Describe the solution you'd like
A view where I can see the tags I used in a mindmap. I do understand the complexity of the issue, but I would easily do the switch to crypt.ee to write my diary there again.

Additional context
Here's an example from a logseq user:
https://franalbani.github.io/glog/#/graph
Screenshot 2024-03-23 at 07 32 49

@pablo03v pablo03v changed the title Graph of tags w/ docs [Feature Request] Graph of tags w/ docs Mar 23, 2024
@josh3way
Copy link

Hi, I'm currently an Obsidian user, but I loved Cryptee and especially the idea behind it. I would love to see this feature implemented. It's a gamechanger for me and I would easily switch to a Cryptee subscription.

Additional context
Here's an example from a obsidian user:
https://help.obsidian.md/Plugins/Graph+view

@johnozbay
Copy link
Member

johnozbay commented May 16, 2024

Hey both 👋🏻

First off, I'll get something out of the way, I think you two are referring to document -> document links, but not tags, as the title suggests. Since the graphs you linked to shows documents being linked to one another, and Obsidian links docs, not tags. (as far as I know)

Now as you know we already have linked documents, but not a graph. Graphing these connections is very tricky, primarily due to zero-access encryption.


In short, at the moment all document -> document links are stored encrypted inside documents as a part of the document itself. If we built the graph today, it would need to literally download and decrypt all documents, extract these links, and visualize that. It would be immensely data-heavy and near-impossible (since some users have hundreds of gigabytes of documents)

The only way this would work is if we built something like a special 'migration tool' (can't think of a better name) that goes through your documents one by one, extracts these doc-to-doc links, and stores them somewhere else. Similar to how we store folder-names, file-names, and tags separately and encrypted at the moment. So for example, when you tag #cat in your document, it is automatically encrypted and stored in a separate list outside of your document to make it easier to search and find these documents without having to download and decrypt all your documents to check for tags in them.

Similarly, a graph like this would need doc-to-doc links to be stored separately outside of documents. But that comes with problems of its own too.


One major difficulty would be, how would these graphs behave when a folder is ghosted, and many documents are removed from it. We would then need our server to keep a list / reference of what's removed, and what needs to be added back when a folder is summoned back. But again, how would we do this if they're encrypted?

This one's relatively easy right now, since the links are stored in the documents themselves. If a doc is ghosted, it's gone, and there's no way to visualize the ghosted docs.

And since doc names / tags aren't graphed etc, it's also easy with ghosting. When a doc is ghosted, it's gone, and there's nothing to search. Graphs on the other hand require all information to be present always. (and it would need to account for ghosting somehow) — not impossible, just a looot of engineering hours.


At the moment, if I were to take the title of this issue literally, and take the easier road (because tags are inherently easier to work with to be honest), we can go about doing this in three different ways:

  1. We can make a hierarchical graph of tags, and docs under them in a list. So you can see all your tags, and see which docs are in it. It would take time to build, and it would take time to visualize each time, (since cryptee would need to decrypt all tag names — think how long search takes sometimes, since it needs to decrypt all filenames and tags and index them) but not as much as doc-to-doc links.

  2. We can make a relational graph of tags. Say for example #cat and #dog would appear connected, because they're both in the same Animals doc. And #orange and #banana would appear together because they're both in the Fruits doc. But again, this wouldn't show you how two docs are connected to one another (like obsidian does)

  3. We can make a relational graph of docs that share tags. Say for example the docs Apple and Microsoft both have the following tag : #antitrust. And let's say Facebook and Instagram both have the tag : #zuck. This graph would show Apple -> Microsoft using an arrow labeled #antitrust, and also show Facebook -> Instagram using an arrow labeled #zuck. Importantly, this would be very different from how things are linked on Obsidian, since on Obsidian you use direct doc-to-doc links like [[Microsoft]] or [[Apple]].


Which brings me to the final point. Markdown.
Obsidian uses their own non-standard flavor of Markdown.
The official markdown spec, doesn't have document-to-document links. Because Markdown is not a good document storage format. It's a quick-styling format. (i.e. intended for chatboxes, comment boxes etc to save time styling quick bits of text) Not a rich document styling format. Due to this, Markdown spec doesn't even have things like font-sizes or tables etc either.

Which makes it an especially bad option for document storage in the long run, both due to interoperability reasons, and due to lack of styling options which you may need to extend on in the future. For example Github, which has tables, uses its own flavor of Markdown which it came up with to add support for things like tables.

Since Cryptee has tons of features which Markdown doesn't support, we don't use Markdown under the hood, because it makes no sense for us to try to come up with yet another self-invented standard like Obsidian did.

Finally, especially because we have a zero-access storage system, and our servers can't see your documents = can't upgrade your documents in the future to a newer spec / format etc, it's even more relevant and important for us that your documents are stored in a spec-compliant / interoperable way, so that some day in the future, you don't have to worry about not being able to open your documents in another app etc.

So even if we were to implement this, it wouldn't be similar to Obsidian, because of all the reasons listed above, and you likely wouldn't be able to move in from Obsidian to Cryptee simply by clicking on 'import' — as much as I would love for us to build something like this, I don't want to over-promise.


I hope these make sense! I'll keep this thread running for a while to collect more feedback from everyone while we're brainstorming and trying to find the best way. I'm confident that we'll settle on a fun and easy to use solution in the near future.

Until then, if that's okay with you too @pabloscloud I'll rename the thread as "doc-to-doc links" instead ✌🏻
— if not, we can revert back easily 👍🏻

All the best,

J

@johnozbay johnozbay changed the title [Feature Request] Graph of tags w/ docs [Feature Request] Graph of documents (like Logseq / Obsidian) May 16, 2024
@johnozbay johnozbay added the feature request New feature or request label May 16, 2024
@johnozbay
Copy link
Member

Also related to / duplicate of #36.
Although we've come a long way since 2021 and we are still not saying no to this, I think what we will settle for will be a bit different than this graph as far as functionality, usability and features go.

@pablo03v
Copy link
Author

pablo03v commented May 16, 2024

Since the graphs you linked to shows documents being linked to one another

yes, in logseq files (aka documents) are linked by the tags which are used inside of them. When multiple files have the same tag, they are closer and -more importantly- connected inside of the map. Therefore I am not talking about linking documents by hand.

We can make a relational graph of docs that share tags. Say for example the docs Apple and Microsoft both have the following tag : #antitrust.

I think that's what I've been asking for.

Btw, thanks for all the details of why this is complicated. I literally knew this and was hoping that you don't put so much time into writing this comment but now I know that you would have really liked to implement this and thought of all the ways to make this possible. I'm also confident, that, as cryptee grows and development continues, this will eventually find it's way into cryptee one day or another in some way. May it just be a list of tags with their corresponding docs.

Thanks John <3

@johnozbay
Copy link
Member

Thanks for the kind words 🙏🏻 And no worries at all!
I love what I do, so it's a pleasure to chat up amazing people like yourself about Cryptee 👍🏻

yes, in logseq files (aka documents) are linked by the tags which are used inside of them. When multiple files have the same tag, they are closer and -more importantly- connected inside of the map. Therefore I am not talking about linking documents by hand.

I see! The reason why I asked is because there seems to be tons of different ways logseq makes use of things like [[page link]] and #tags and even #[[yadayada]] :

So I wasn't sure which one would be the best way. Not to mention that even if we make one work, some day rightfully so our users would expect the other ways too. So in an ideal world, if we're doing this work, I would love to bring all features with 100% compatibility.

Thanks for helping with these!
Best, J

@josh3way
Copy link

Hi thanks for the clarifications! 🙏

In short, at the moment all document -> document links are stored encrypted inside documents as a part of the document itself. If we built the graph today, it would need to literally download and decrypt all documents, extract these links, and visualize that. It would be immensely data-heavy and near-impossible (since some users have hundreds of gigabytes of documents)

The only way this would work is if we built something like a special 'migration tool' (can't think of a better name) that goes through your documents one by one, extracts these doc-to-doc links, and stores them somewhere else. Similar to how we store folder-names, file-names, and tags separately and encrypted at the moment. So for example, when you tag #cat in your document, it is automatically encrypted and stored in a separate list outside of your document to make it easier to search and find these documents without having to download and decrypt all your documents to check for tags in them.

Yes I agree, using tags is a lot better. Using tags would be less work since part of the code is already done. But here's a question, when you delete a tag from your document, how is it actually removed from the encrypted list?

One major difficulty would be, how would these graphs behave when a folder is ghosted, and many documents are removed from it. We would then need our server to keep a list / reference of what's removed, and what needs to be added back when a folder is summoned back. But again, how would we do this if they're encrypted?

And since doc names / tags aren't graphed etc, it's also easy with ghosting. When a doc is ghosted, it's gone, and there's nothing to search. Graphs on the other hand require all information to be present always. (and it would need to account for ghosting somehow) — not impossible, just a looot of engineering hours.

As the documents are encrypted, the graph would have to be mounted on the client side, correct? And as mentioned above the tags are saved in an encrypted list separate from the document, when a folder is ghosted are these tags removed from the list? If they were removed, I believe there wouldn't be any major problems when putting together the graph, but the graph itself would still be a lot of work. 😅

  1. We can make a relational graph of docs that share tags. Say for example the docs Apple and Microsoft both have the following tag : #antitrust. And let's say Facebook and Instagram both have the tag : #zuck. This graph would show Apple -> Microsoft using an arrow labeled #antitrust, and also show Facebook -> Instagram using an arrow labeled #zuck. Importantly, this would be very different from how things are linked on Obsidian, since on Obsidian you use direct doc-to-doc links like [[Microsoft]] or [[Apple]].

I think this is the best way to do this. Great idea! 🥰

Finally, especially because we have a zero-access storage system, and our servers can't see your documents = can't upgrade your documents in the future to a newer spec / format etc, it's even more relevant and important for us that your documents are stored in a spec-compliant / interoperable way, so that some day in the future, you don't have to worry about not being able to open your documents in another app etc.

Using tags, I think this would be much easier in terms of compatibility. In the case of import options, it would be necessary to convert tags from other applications to the Cryptee format (put in the encrypted list), as the graph would be assembled on the client side using the tags, there wouldn't be so many compatibility problems (As mentioned above, you would not be able to import the Obsidian chart, but you would still be able to create a chart using Obsidian tags). And in the case of the export options, it would be a normal export with tags, in the case of Obsidian it would be impossible to "export the graphic" because of the other format (Doc-to-Doc link), but for programs that use tags to assemble the graph would be easier.

Sorry if I missed something, I'm not a very experienced programmer 😆. I hope we can arrive at the best solution, and as this is not a high priority feature, it is interesting to raise these questions so that they can serve as references in the future.

@johnozbay
Copy link
Member

Thanks for the thoughtful replies @josh3way 🙏🏻

But here's a question, when you delete a tag from your document, how is it actually removed from the encrypted list?

Each document has its own list of encrypted tags in the database (locally on your device and on the server), so if a document is deleted, that row is deleted = tags are gone.

While one could say why not do the same for cross-links to docs, it's a bit more tricky, because :
a) you can rename docs and delete docs etc.
b) On each doc deletion / rename, the pointer would need to be renamed/deleted from all other docs that are pointing out. i.e. if Doc 1 -> Doc ABC, Doc 2 -> Doc ABC, and you delete or rename Doc ABC, both Doc 1 and Doc 2 needs to be updated, and that's where the difficulty is.
c) But tags are by their nature self-pointing, in that, if you update a document's tag / rename a tag / delete a tag, that's okay and only that document itself is affected.

So tags are much more logical and easier to work with from the perspective of computation (especially when it comes to on-device encryption) since you don't need to download/decrypt/edit/re-encrypt/re-upload all other docs etc.


As the documents are encrypted, the graph would have to be mounted on the client side, correct?

Correct! It will be generated client-side, which is why it needs to be suuuuper light, and ideally require very little decryption. As you might need to have to download & decrypt thousands of tags etc.

And as mentioned above the tags are saved in an encrypted list separate from the document, when a folder is ghosted are these tags removed from the list? If they were removed, I believe there wouldn't be any major problems when putting together the graph, but the graph itself would still be a lot of work. 😅

Correct, they are removed, as each doc has its own list of tags, and when ghosted / deleted that list is deleted or ghosted too.


I think this is the best way to do this. Great idea! 🥰

Happy to hear! This is the direction that feels closest to heart for me as well.


Sorry if I missed something, I'm not a very experienced programmer 😆

Noooo worries at all. It's not about the experience, it's all about your kind offer, initiative and willingness to help, and your thoughtful responses! This has been extremely helpful! We have a pretty good direction now thanks to all the thoughtful and good feedback on this thread!

Will keep this thread posted with more news as it comes.

Best,

J

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants