Your help is needed! Help me collect data on light novels

„In the database, database / I’m struggling in the database Wow Wow / It doesn’t even matter if there is no hope / As the madness of the system grows“ is what Man with a Mission sang in their song „Database“. I like the song but am also struggling with building a database for english and german light novels.

Me literally struggeling in the database or rather not finding the time to finish the project I wanted to do for a long time. Also this is one of the rare english posts on this blog. Enjoy!

What is this? What am I doing?

My basic idea was: I was trying out bookwyrm, a fediverse software which is kinda like goodreads. My problem only really was that light novels or manga are rarely found in the databases that are supported as datasources by bookwyrm. Books can be added to an instance either by grabbing the data from other bookwyrm instances, Open Library (a project of the Internet Archive) or inventaire.io. I could of course add all the books to bookwyrm itself, like I did for the List of German Light Novels, but my thought was it would be better to have all that info in an publicly accessable and usable form, so it could be used for more other projects as well.

My first thought was contributing the internet archive, but their archaic mailing list infrastructre quickly detered me from getting into further contact with them. The people at inventaire were quite approachable and even are okay with me submitting a bunch of CSV for the initial addition of many, many light novels. The cool thing about inventaire also is that all the data makes its way to wikidata as well. So I decided to donate to inventaire.

I built a web scraper to grab as much data as I can from the official websites of publisher such as yenpress, J-Novel Club and Seven Seas. Well I am not proficient with that but I got results quite quickly.

How can you help me?

The data-quality delivered by my bad scraper-software is rather „meh“ at best. So I will need to do lots of manual work checking the data and correcting mistakes. I don’t really want to built a better scraper right now since the initial data drop will be big but afterwards its „just“ adding new releases.

Since I like to get stuck on this project so much (it’s like in the making for like 2 years and I had to renew the data all over like twice since it got outdated) I am now looking for your help on this. I want this to actually get done and contributed to inventaire in a reasonable timeframe.

Your contribution to this can be super simple.

You can check out the data I already collected over on my Codeberg-Repo for this project. You may look through the data yourself and correct any mistake you find and any book that is missing in the list. Just clone the repo, make any changes you like and contribute them back via a pull request.

Of course you can also make changes to the files directly without codeberg and send them to me any other way. But codeberg would be my preferred way of going about this project.

Right now I am only looking to add german and english editions of the light novels to inventaire since I don’t speak any japanese. I would love to rebuild something like lndb but that really a pipe-dream.

What else is there to know?

Make sure to not use any data from commercial databases like isbnDB, their license does not allow the use of their data outside of their service. And I am not really looking for legal trouble here. The data needs to be obtained by publicly availiable sources, best from the publishers themselves. Only that way I can donate and contribute the data to inventaire.

You will need to be okay to with publishing the data you collect under CC0. Otherwise I cannot accept your pull request or data.

Want to talk to me about this or have any ideas?

Feel free to get in touch with me if you have any inquires about this project.

You can reach me via different means. Choose the one you like most:
E-Mail: stefan@lightnovel-dungeon.de
discord: @stefan230
matrix: @stefan:chat.lightnovel-dungeon.de