Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processing images including downloading, downsizing and serving them #19

Open
jurra opened this issue Oct 14, 2020 · 14 comments
Open

Processing images including downloading, downsizing and serving them #19

jurra opened this issue Oct 14, 2020 · 14 comments
Assignees
Labels

Comments

@jurra
Copy link
Member

jurra commented Oct 14, 2020

Feature description and scenario

Feature

  • When the user drags and drops an image from the computer.
    then, downsize the image, and save if asyncrhonously to the local folder of images.
  • When the user copies an image from a link and is stored as a url,
    then, downsize it, place it in the tiptap html, download the image, and store it in the images folder

How to implement it

  • Node.js service that downloads and processes the image
  • All images are stored in one image folder automatically, and downsized already.

Possible solutions for displaying the image

  • Option 1. Node.js service that serves the image as a blob, titap image listener/hook should then consume the service
  • Option 2. Store all images in the html encoded, generate an id for this image, and a name and create a clone of this image in the image folder.

I prefer option 1 but we should talk about it.

Image specifications defined by @narration-sd

Special considerations for Images

  • Freestanding images will be in the /imgs folder, and need to follow the same rules to be enforced by the Electron app on import as those following which we use for embedded images -- for almost all of the same reasons. We might possibly relax the maximum dimension a little -- let's see if that's needed at all, remembering always first and last that we are presenting a summary, abstracted 'documentation' within Hardocs; that any level of detail belongs only on the own website of a project's provider.

  • Document mages will be embedded entirely in our HTML lingua franca docs. A future cloud service addition may change this, but it's what we can do, and do best, for the present delivery. Without exception, all images need to be filtered by us, adjusted to have limited dimensions, proper type, and most importantly, limited encoded size remember the hex encoding increases over binary size by a factor of 1.3 {rather than 2x as first indicated; timing estimates below have been corrected for this]. We need especially to keep this size down, for speed vs. web browsers, and even though there may be a future plan.

  • All images must be jpg, or converted to it for the embed, as this vastly reduces size compared to bitmaps like png. Also, a decent but not excessive jpg quality has great influence. At size these will be viewed at, 80%, a normal jpg setting, is likely. All images should be restricted to 'reasonable viewing size'. Fits within 500px X 500px is pretty much what we arrived at on Combat Covid; again, we'll see and adjust in this range. Remember size goes up by area...so the square of linear dimension...

  • Here's a 500x500px jpg at very likely this 80% quality, as a reference. To have it at full size to appreciate what viewing that size is, it will show on the next page of this document.

  • It's a photo of an already photographed and 1980s-printed record cover, so not the highest quality original, but note that you can still see the freckles just visible of a season on her cheek, besides other clarity,

  • This is the sort of detail higher compression silently removes, as jpg is a perceptual encoding method. It’s what lets design diagrams show clearly, so we need this degree. (For music, this album is 'rich' - I prefer her debut Sibelius Violin Concerto, which is haunting, and still considered a marvel, for this highly expressive violinist....)

  • The size of this example is 70kB — which will be 91 KB in the embedded hex conversion. Let's say there are five docs -- a typical large repo number in CombatCovid, and that there are average 3 images per doc, a little high.

  • Then we would have 15 * 91 KB, or 1.4 MB, roughly. Add something maximal for the documents themselves, estimates 2MB for that aspect of the project. Probably /imgs images, which are separate, are 10 or so, maximum. Thus we add 0.9MB to the total, making a Hardocs project about 3MB in size. Since our high-average per-document size is 3 images plus text, approximately 400KB, we'd be well inside the per-each-doc maximum to be hard-imposed on the next version of CloudDB (4.0) due to storage mechanism, of 8MB.

  • More important is the download — and upload — time for future web app searched viewing like CombatCovid, or import/export to cloud of our Electron app.

  • At typical ISP performance of 1Mbit/s upload, 10Mbit/s download, our project at 3MB=30Mbit (including packet overhead) would download in 3 seconds, for an equivalent of Viewer, just adequate enough for first load before caching.

-The Finder view would be much faster, as it's just metadata and one pic per project shown.

  • For Electron export of a project to the cloud, 1Mbit/s means it would take 30 seconds wait, half a minute for our admittedly fat high-average project example.

  • As this should be a ‘finish of work and publish’ situation, it’s adequate, but shows well why we need to give close and careful attention to project sizes, and consequences of how we think in designing, to arrange and present the use of the Hardocs application.

  • Thus we'll want to manage thoughtfully how often publishing needs to happen in UX design (which is when an edit should be 'published'), while for import, and given the project persists in the browser's PouchDB, there shouldn't be a burden.

  • A future cloud image service would keep our absolute security for an owner's projects when they don't want them public, and would improve on these times to degree, as the image sizes would be 70% as large over the wire, and more importantly, would download only per document in view. But that's a substantial job, proposed if needed only for a later Hardocs contract, if thought out early in design.

  • What we need to consider here is how necessary the image handling to control type, dimensions, and quality are done...mandatorily by us in the Electron app.

  • This sizing discussion and its picture lead strongly back to our primary intention for the design documents aspect of Hardocs, that this is a summary documentation Hardocs carries, of an abstracted quality level.

  • You should be able to read and see the project’s nature, what's chosen for that, but for any finer detail, whether of documentation or other content files, the Hardocs summary documents will necessarily not provide those.

  • Thus each project’s Hardocs pages need to contain their project's links to their own website, where they can offer anything and (e.g. CAD files, no doubt separately) at any size and organization that they consider appropriate.

  • When a project has a lot of external website links, they can certainly make a References or Downloads page as one of their summary documents, which will allow a lot of freedom.

  • We might consider whether a project's primary documentation website link might be allowed as a field in the metadata.

@jurra jurra added the feature label Oct 14, 2020
@jurra
Copy link
Member Author

jurra commented Oct 14, 2020

Clive @narration-sd idea about handling image, reduce them before saving to file, in the context of html lingua franca.
Issues:

  • Dealing with links
  • Dealing with big images
  • Embedding images in the html

@Hardocs Hardocs deleted a comment from create-issue-branch bot Oct 21, 2020
@jurra
Copy link
Member Author

jurra commented Oct 22, 2020

Here is the link with the last session @DNature and I had about dealing with images to downsize them and download them.
image

@narration-sd
Copy link
Member

narration-sd commented Oct 22, 2020

Good to see this, and as in the writings with Divine, I've felt a little hazy up to now about the full result of normalized (let's use that word, normalized for our needs) images.

Looking at your diagram, it may be that it's just fine and complete to consider them only inside the fully processed html lngua franca doc instance -- and only ever save that to the Hardocs-Summary project file folder.

That way we will not interfere with whatever they have.

And maybe because of this as well as other considerations, there's a strong argument that we never offer to convert back to any of their other formats. Thus conversion from Word, MD, etc., is only ever one-way: we import.

This would satisfy in many ways, and much avoid pitfalls. Can we think to set that in proper stone, from now?

It seems this would radically decrease my problematic intuitions about the area -- properly as well our actual work...

...p.s...showing how this noted conversation method can actually help us....too

@jurra
Copy link
Member Author

jurra commented Oct 28, 2020

@DNature
Copy link
Member

DNature commented Oct 30, 2020

Great reference Jose. However, How do we continuously display an image if we have to go through these process.
The ways i can think of this to work like when a user drops and image, paste an image or use the regular markdown image are:

  • Have a preview button which means we have to opt in for a pure un-formatted markdown editor similar to githubs markdown editor so that when the person clicks on the preview button, it triggers an action that does whatever we want to do with the image.

  • Alternatively, we can listen for an on-drop, on-paste, or image-link event of the tiptap editor before triggering an action. if this event doesn't exist in tiptap then we'll have to do it our self by either contributing to tiptap or forking their repo.

@jurra
Copy link
Member Author

jurra commented Oct 30, 2020

I would go for the second approach @DNature, as you mentioned before there myght be a jumping behaviour if we return a promise handling the image upload. We could do something like github does also as you say where you just write on the html uploading image, until it returns the new minimized image.

@narration-sd
Copy link
Member

I may be missing something here, but I had some further thought after @DNature and I worked over this area yesterday. Let's see if it helps...

  • by the time the universal Hardocs Object is created or updated ready for any store to Habitat (database) or filesystem, all images must be fully internal, resized, converted to jpeg, and base64 hex-encoded. That's the lingua franca, for html pages, or for plain /imgs images themselves.

(Hardocs Object is the one that mirrors the shape of the CombatCovid/Hardocs filesystem structure, in the diagram I made recently)

  1. I see Markdown still being mentioned. But we are making an HTML editor. Markdown would be brought in by pandoc conversion, which with the flags I;'ve showed will already have converted any images to base64 internal. We might still have to decode, convert to jpeg, recode, if the tag indicates they are png -- very important!! Or any other oversized format. Easy to tell by that visible tag.

  2. Now, consider our HTML editor (CKEditor). There are a number of ways someone might edit in an image. They could write HTML tags, they could (perhaps) insert a file from their laptop filesystem, or they could drag-drop an image from another source like a web browser. I consider that whatever they do, a) this should not matter b) it's a very over-complex and thus bad idea to consider interfering in CKEditor's operation by any kind of hooks. Just leave the image in the form as they placed it alone, much smarter, no?

  3. How we get to do that (leave it alone) is that we only (and definitely always) do all our discovery of images and conversions only at the point of pulling the new or modified, edited HTML document into our Hardocs Data Object. Thus it is a regular processing we always do over the html document, before it is written to the Data Object.

  4. Only after that processing over the html document is done, every time (on a 'save', for example, from the editor), do we actually add or replace the lingua franca, fully processed images html document into its place in the data object tree.

  5. With all steps completed, the Hardocs Data Object is now clean, and can be placed/replaced in the Vuex, for display, for pushing to cloud or browser database via Habitat call, or emitted to the filesystem for our representation there.

I think the above makes everything quite straight forward, and allows the Image Processing to be...a process, with nicely defined steps of its own to a) retrieve from net via fetch b) convert to jpeg if needed, even if it's already base64, probably next c) resize so it's within the boundaries I put the calculation for in the paper under architecture, d) final conversion to base64 , and finallyl e) replace the element in the html doc so that it's using the base64,

Seems this will be nice to work up, and will nicely get us there, no? Thanks, guys.

@jurra
Copy link
Member Author

jurra commented Oct 30, 2020

I agree with most of it @narration-sd ,
Except with point 2:

  1. I see Markdown still being mentioned. But we are making an HTML editor. Markdown would be brought in by pandoc conversion, which with the flags I;'ve showed will already have converted any images to base64 internal. We might still have to decode, convert to jpeg, recode, if the tag indicates they are png -- very important!! Or any other oversized format. Easy to tell by that visible tag.

Having a straightforward conversion into markdown, provides portability and mobility to the folder with docs markdowns.
They could be done with pandocs also, but it requires setting up pandocs etc. CkEditor also outputs markdown by the way.

The main issue related to markdown is not so much pandocs or the conversion itself (CKeditor also outputs markdown). The issue is more of replacing the paths in th converted mardkown into te relative images paths in the hardocs folder. For me as a user it would be also great to have the downsized files in the folder. Because then I can upload that little folder into thingiverse, github, or other hosting services and people can still read the documents without having to depend on hardocs.

@jurra jurra self-assigned this Oct 30, 2020
@narration-sd
Copy link
Member

narration-sd commented Oct 30, 2020

Well, that's kind of the whole point -- by centering on the Hardocs Object, and guaranteeing any image processing is done by the time it's released to cloud, browser, or filesystem, you have a central point of truth.

It seems your hanging onto CKEditor writing Markdown itself is what may be occluding this view??

If you somehow have to have that (why?, but) you could pull the Hardocs Object result back into ckeditor...then save out MD from there.

It feels like very complicated even for the user to have to think about Markdown at that point, though, and remember, it is going to be the definite minority who want to have anything to do with Markdown at all,, in the real user community we expect, no?

@jurra
Copy link
Member Author

jurra commented Oct 30, 2020

I agree with the central point of truth, as long as the markdowns are exported builds, (an output that you can generate from the main source of truth), I think is all good ;)

Maker communities and hardware developers, specially those that do electronics and coding are quite familiar with markdown and github. Also scientists that use jupyter notebooks and github. So it varies on the audience and also the generations.

@narration-sd
Copy link
Member

And, we got a great agreement -- and a diagram -- on how this gets arranged and done.

I'll share what I think as a link, and you can discuss it with Jose, @DNature -- will be good.

Not entirely visible in it is a quite important thing we 'hashed out' -- that at the time the Image Processing occurs, also, each image of an html lingua franca document, after processing (and even if we didn't have to modify it) ,is also dumped out to the filesystem under /imgs -- and the same, added into the imgs component of the Hardocs Data Object to match, so that those images are all available for other uses.

We would solve the problem if it occurs of llmiting which images to actually see in UX by adding later some kind of flag to the component for each img in the object - JSON makes this straightforwards, not to be worked with now in our release preparation.

@narration-sd
Copy link
Member

Here's the diagram, at least a version that's useful -- Jose will put it into Google folder in some form am sure: https://app.diagrams.net/#G1l9TruQ5JMGVGhOJFXXASFIACaV1OoTYa

@jurra jurra removed their assignment Nov 1, 2020
@jurra
Copy link
Member Author

jurra commented Nov 5, 2020

Here is the document in the google drive addressing image challenges and ideas.
(Read the section special considerations for images)

@jurra
Copy link
Member Author

jurra commented Nov 20, 2020

@DNature has managed to process images in html and make them smaller.
Maximum quality 600pixels. Currently also the img src attribute is replaced by an image path.(which is not working)

Minimum requirement for the images

  • Save a html with lighter (downsized base64 images). Here the goal is to make the files lighter and control the size of documents.

Another ancillary requirement

  • It would be nice to have as an output a folder with the images. But this implies creating some kind of id on the image file name itself so that it can be mapped to the image tags in the html document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants