Developing a Federated Metadata System: Part 1
January 31, 2026•488 words
Let me preface this blog series by saying a few things:
- I am not very good at Rust
- I am certainly not an expert in federation/protobufs/distributed computing/etc
- Most importantly though, I am tired of having to grab metadata for ebooks, audiobooks, comics, etc. from 18 different sources, all of which structure their data uniquely, have hidden or nonexistent APIs, or impose nearly unusable rate limits, especially for large libraries. Which obviously means it's time to build a 19th source.
Some things that I think will set this project apart from existing systems: it will be entirely open source. Some services exist that provide very good metadata for various things (e.g. Comicvine for comics), some that provide very accessible databases (e.g. Hardcover for ebooks), but lack in other areas. Hopefully, clarity of purpose and implementation (i.e. not simply designed as a backing system for another product) will allow us to build a more robust system that is more usable, more open, and more complete.
The starting point here will be picking a universal standard for metadata structure. Personally, I think the best way to accomplish this is setting out from the beginning by using Schema.org models. An open standard for how the data is represented (without deviation, or only deviating when absolutely required by the project, and documenting that deviation well) is pretty much the goal of this project anyways, and even though the schema.org schemas are probably overkill, it will make the system more adaptable in the future hopefully.
When I first started this post, I didn't have anything written (in code). Over the past couple days, I've spent a bit of time figuring out rust best practices for project structure, and experimenting with different libraries, trying to get familiar with what options there are.
Currently, the crates I'm using are Diesel for ORM (I tried rusqlite, but I'm just not a huge fan of writing raw sql, especially as a migration log), axum for a webserver, and pgp-lib. Some dependencies will probably get added later too, and some of these might change but pretty confident I'm going to stick with SQLite for the db (unless we run into performance issues) in the interest of making deployment as simple as monolithic as possible.
My planned design is a gRPC/protobuf compatible API for inter-node communication/replication, and a REST/JSON api for external access. I might end up scrapping the grpc if it doesn't seem worth the maintenance, but I don't think it will add too much overhead, based on what I've looked at from Axum docs and others. I've never worked with protobufs but I've got some models set up and I think I like how declarative everything is.
In conclusion, this might be a hugely non-useful project and I might be over-engineering, or under-engineering, or just wildly off base on how to do stuff, but I'm having fun!
You can check out the project at https://github.com/cmathews393/kleya