If you would like to experiment with your own version of this digital library, then download the following OVA file and import into your favourite Virtual Machine platform or cloud service.
Undertaking a full rebuild of the provided digital library collection demo-localhost requires the credientials of a Google service account to be added in for the OCRing of the images stored in the import folder. If experimenting with collection building, then the recommended approach is to to work up to testing a full rebuild.
Start by taking a built collection, and activating in:
cd greenstone3-svn source ./SETUP.bash cd web/sites/intermuse/collect/demo-localhost wget https://intermuse.sowemustthink.space/greenstone3/library/sites/intermuse/export/building.tar.gz tar xvzf building.tar.gz ./ACTIVATE.shAnd then view the collection: http://localhost:8383/greenstone3/library/collection/demo-localhost/
Next, experiment with runing buildcol.pl, which converts the canonical GreenstoneXML representation into the necessary indexing and database structures, storing the result in a freshly generated building directory:
# Assuming you have run SETUP.bash and are still in the same directory as above, then ... wget https://intermuse.sowemustthink.space/greenstone3/library/sites/intermuse/export/archives.tar.gz tar xvzf archives.tar.gz ./BUILDCOL.sh ./ACTIVATE.shThe buildcol process typically takes a few minutes to run. Once activated, t hen view the collection: http://localhost:8383/greenstone3/library/collection/demo-localhost/
To fully rebuild the collection, download the JSON version of a Google service account, and store it as:
greenstone3-svn/web/sites/intermuse/collect/demo-localhost/etc/google-sa-credentials-key.jsonAnd then:
# Assuming you have run SETUP.bash and are still in the same directory as above, then ... ./IMPORT.sh && ./BUILDCOL.sh && ./ACTIVATE.shThe import process is a more computationally intense step than buildcol. For the two documents shipped with the demonstration collection expect IMPORT.sh to require more like 10 minutes to run. The importing process does has a cache—for example storing the results returned from OCRing using the Google Vision API&mdashy;and so subsequent runs of importing will be faster. In terms of using the Google Vision API, at the time of writing (29 Sept 2023), this API includes a free tier of 5,000 images per month before any costs are charged. The demo-localhost example collection contains 86 pages that will be OCR'd when importing is run.