New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Sha instead of md5 #6
Comments
I'm definitely okay with preferring sha256, especially if it's already being used by eprints. Am I understanding correctly that a stock installation of eprints will generate both md5 and sha256 hashes for all uploaded files? |
I would add that we should aim to support multiple hash algorithms from the start. In my experience, this should not greatly complicate development. @photomedia - thoughts? |
I took a look at a MySQL dump from a development instance of EPrints here at Concordia, and it looks like all uploaded files and EPrints-generated derivatives have an MD5 digest stored in the documents table in the MySQL database. This particular instance of EPrints was not generating any SHA-* hashes, only MD5s. I tried to look in the EPrints source code where the generate_md5 and/or generate_sha functions are called, but only found the function definitions as pointed out by @goetzk above, I'm not terribly familiar with EPrints and don't know Perl, so it's very possible I'm missing something here. Is it possible to determine which function is being called by default when new files are uploaded? |
Our EPrints instance generates MD5 only too. This seems to be the default for generating technical metadata about files in EPrints because -- although sha256 is supported by EPrints -- MD5 is specified in a number of configuration files, inc. perl_lib/EPrints/System.pm There are other .pm and .pl files within EPrints that seem to reference the MD5 digest but I can't interpret why! But certainly the aforementioned would need to be modified and tested. It would be interesting to know (perhaps via the Eprints Tech listserve...?) whether anyone has modified their configuration to support sha256... |
Since EPrints uses MD5 by default, the majority of repository instances will have MD5 hashes in their databases. Therefore, I suggest that we keep our spec to MD5 at this point. Supporting sha256 is a nice feature that would be useful for new repository instances once EPrints starts adding these to the database, but for now, the majority content in repositories likely only has MD5 hashes, so let's support these first. |
@geo-mac I edited your comment to include the clarification from the comment/correction. |
For anyone watching this discussion in the future: The only reply on list was mine, observing the same hashes (md5) and behaviour (inconsistency in their usage). |
Based on the above discussion(s) I think we should remain with md5 as the specified format. It may be appropriate to add a note "Future revisions of this specification will add optional Sha support" or similar. |
Hi,
As a related point to #5 ; I would like to propose the spec supports Sha256 instead of md5 as its default checksum.
As a future feature it should probably support md5, sha1 and sha256 (archivematica's three options) but I don't see it being a pressing issue from the start as sha256 and md5 are those supported by eprints OOTB.
From eprints commit be42eb7f4dd9ebc184ff8f7fb0282d2f8e778f21
The text was updated successfully, but these errors were encountered: