Virtual Machine Export


If you would like to experiment with your own version of this digital library, then download the following OVA file and import into your favourite Virtual Machine platform or cloud service.

Undertaking a full rebuild of the provided digital library collection demo-localhost requires the credientials of a Google service account to be added in for the OCRing of the images stored in the import folder. If experimenting with collection building, then the recommended approach is to to work up to testing a full rebuild.

Start by taking a built collection, and activating in:

    cd greenstone3-svn
    source ./SETUP.bash
    cd web/sites/intermuse/collect/demo-localhost
    wget https://intermuse.sowemustthink.space/greenstone3/library/sites/intermuse/export/building.tar.gz
    tar xvzf building.tar.gz

    ./ACTIVATE.sh
And then view the collection: http://localhost:8383/greenstone3/library/collection/demo-localhost/

Next, experiment with runing buildcol.pl, which converts the canonical GreenstoneXML representation into the necessary indexing and database structures, storing the result in a freshly generated building directory:

    # Assuming you have run SETUP.bash and are still in the same directory as above, then ...
    wget https://intermuse.sowemustthink.space/greenstone3/library/sites/intermuse/export/archives.tar.gz
    tar xvzf archives.tar.gz

    ./BUILDCOL.sh
    ./ACTIVATE.sh
The buildcol process typically takes a few minutes to run. Once activated, t hen view the collection: http://localhost:8383/greenstone3/library/collection/demo-localhost/

To fully rebuild the collection, download the JSON version of a Google service account, and store it as:

    greenstone3-svn/web/sites/intermuse/collect/demo-localhost/etc/google-sa-credentials-key.json
And then:
    # Assuming you have run SETUP.bash and are still in the same directory as above, then ...
    ./IMPORT.sh && ./BUILDCOL.sh && ./ACTIVATE.sh
The import process is a more computationally intense step than buildcol. For the two documents shipped with the demonstration collection expect IMPORT.sh to require more like 10 minutes to run. The importing process does has a cache—for example storing the results returned from OCRing using the Google Vision API&mdashy;and so subsequent runs of importing will be faster. In terms of using the Google Vision API, at the time of writing (29 Sept 2023), this API includes a free tier of 5,000 images per month before any costs are charged. The demo-localhost example collection contains 86 pages that will be OCR'd when importing is run.