[INDOLOGY] Projects, websites, universities
Claudius Teodorescu
claudius.teodorescu at gmail.com
Fri Jul 18 09:25:36 UTC 2025
Erratum:
"GitLab offers slightly more features than GitLab" is to be read as "GitLab
offers slightly more features than GitHub".
On Fri, 18 Jul 2025 at 12:23, Claudius Teodorescu <
claudius.teodorescu at gmail.com> wrote:
> Dear all,
>
> To add just a small technological note to what Dominik rightfully
> mentioned above, I would like to point that, even if it is not so famous,
> GitLab offers slightly more features than GitLab. Disclaimer: I am not
> associated in any way with GitLab, but I am just sharing the conclusions I
> got after publishing 16 digital resources (dictionaries, dictionary corpus,
> GRETIL search interface, and a search interface for the Indology mailing
> list).
>
> Besides the more convenient technical features GitLab has, it features a
> community edition which can be installed anywhere, by anyone. And this is
> pure gold for the research projects, especially after the funding is over.
>
> At the Academy of Mainz, Germany, where I currently work, ALL the digital
> resources we produce are stored in Git-type repository, in a GitLab server
> that is made available at the land's level, for academic purposes. When I
> worked with the Heidelberg University we had a GitLab instance, too, for
> the digital resources we developed. This is also currently the case with
> the universities or university libraries, respectively, of Bielefeld,
> Dresden, Paderborn, and I am sure for others in Germany. I think this is a
> model that can be successfully used.
>
> So, my technical advice would be:
> 1. Use Git-type repositories for research data of any kind: raw research
> data (scans, audio files, etc.), processed research data (transcriptions in
> TEI format, etc.), and published research data (transcriptions in HTML
> format, static indexes for searching in the browser, audio files in
> web-friendly formats, images in web-friendly formats, etc.).
> 2. Separate the raw and processed data to the published data, by storing
> the published data only in Pages storage and regenerating it from the
> processed data at any change of the processed data. Do not store data in
> the publishing format, which is usually HTML.
> 3. Keep the research data in a repository that is separated to the
> repository with the static website that presents the digital resource In
> this way, one can have more views of the data, namely: more publishing
> formats (HTML, PDF, ePub), linking of the respective digital resource with
> other resources, in a corpora, etc. Also, one will not have to process
> again the data for every tiny change of the presentation website.
>
> Key concepts to take away: *Git Storage of data*, *static publishing of
> data*.
>
> Examples of static digital resources that I published (the search is done
> with an in-browser search engine I made and static indexes):
> Search interface for the GRETIL corpus
> <https://claudius-teodorescu.gitlab.io/gretil-corpus-site/>
> Search interface for the Indology mailing list
> <https://wujastyk.github.io/INDOLOGY-forum-website/>
>
> Best regards,
> Claudius Teodorescu
>
> On Fri, 18 Jul 2025 at 02:32, Dominik Wujastyk via INDOLOGY <
> indology at list.indology.info> wrote:
>
>> The recent scare about the Nepal-German descriptive cataloguing data has
>> yet again shown that universities are unreliable hosts for the preservation
>> of digital resources. Long-term members of this forum might remember when
>> U. Washington shockingly unplugged Blackbox
>> <https://blackbox.hacc.washington.edu/>, a major repository of
>> Indological texts, fonts, and software in the 1990s, maintained by Tom
>> Ridgeway. The Indology website and archive of etexts used to live on a
>> machine at UCL, but it was closed after I left that university, without
>> consulting or warning me. The University of Cambridge refused to continue
>> hosting John Smith's Bombay website when he retired. The Indology website
>> has been able to take over <https://bombay.indology.info/> the hosting
>> of that incredibly valuable asset (think: Pabuji, fonts, software,,
>> Mahābhārata, Rāmāyana) . There are many more examples of websites and
>> archives being just shut down when faculty members move, or a couple of
>> years after project funding ends.
>>
>> I don't have a great answer to any of this. But I do think that the
>> persistence of digital assets is a vital question for us all and something
>> we should all think about carefully if we put valued resources on the
>> internet.
>>
>> For example, the results of my last funded project are accessible through
>> http://sushrutaproject.org. I have tried to push as much important
>> stuff as I can tot Github and Zenodo (and http://archive-it.org). But
>> as soon as I stop paying personally for the registration and hosting at
>> sushrutaproject.org that gateway website will close down within a few
>> weeks. Do I really want to keep paying, out of my pocket, for the rest of
>> my life?
>>
>> For the new project that I announced yesterday, I've built the website at
>> Github. This costs nothing, so the worry about annual payments is gone.
>> And as far as I can tell from reading the Github documentation, they do not
>> delete project repositories, even if they become inactive. If repositories
>> are "archived" they simply become read-only (docs
>> <https://docs.github.com/en/repositories/archiving-a-github-repository/archiving-repositories>).
>> Github seems very enlightened about long-term preservation. They save
>> offline archives at the bottom of a mineshaft in Svalbard - I kid you not.
>> There are many good features for project work at Github, and it's all
>> free. Building a simple one-page website is also extremely quick and
>> easy. Building something more complicated, with menus etc., is more
>> troublesome. It's significantly harder than using Wordpress (the Microsoft
>> Word of website creation). But the up-side is that you get a website that
>> will persist for years and doesn't require payment.
>>
>> Within the university world, departments and computing centres are not
>> going to look after project data and websites in the long term. The one
>> institution that actually does think long-term is the university library.
>> But as far as I can tell, most university libraries are still working out
>> what their place is in the digital landscape.
>>
>>
>> Best,
>> Dominik
>>
>>
>> --
>> Dominik Wujastyk, Professor Emeritus, Classical Indian History
>> University of Alberta
>>
>> "The University of Alberta is committed to the pursuit of truth,
>> the advancement of learning, and the dissemination of knowledge
>> through teaching, research and other scholarly and creative activities
>> and service."
>> -- Collective Agreement
>> <https://www.ualberta.ca/human-resources-health-safety-environment/media-library/my-employment/agreements/2020-2024-collective-agreement---working-version.pdf>
>> 3.01
>>
>>
>> _______________________________________________
>> INDOLOGY mailing list
>> INDOLOGY at list.indology.info
>> https://list.indology.info/mailman/listinfo/indology
>>
>
>
> --
> Cu stimă,
> Claudius Teodorescu
>
--
Cu stimă,
Claudius Teodorescu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://list.indology.info/pipermail/indology/attachments/20250718/7ec2a7f1/attachment.htm>
More information about the INDOLOGY
mailing list