I am trying to create something somewhat similar to Syncthing, primarily I am struggling to understand how can I store file directories in a database efficiently.
If it is stored in such a manor how is a change in the root directory reflected onto the database? For example if I rename the photos directory with a lot of sub directories, then all of the sub directories path would need to change. I’d reckon this is not how things work, but then how do things work?
With regards to the general question of storing directory path names efficiently in a database, a lot depends on the type of database being used and how compact it needs to be.
For example, a traditional RDBMS with rectangular tables consisting of rows and columns where the goal is for the database to be fully normalized, you’d store each parent directory only once with a unique immutable ID (with the directory name just a changeable value) and each child directory referencing its parent by its ID rather than by name. Each child directory’s name is also simply a changeable value.
Some modern file systems use a B-tree for their directory structure:
Thanks for the quick reply, does this scales well for nested sub directories? Wouldn’t this would very expensive if the root have millions of nested sub directories(e.g. if I synced my node_modules).
That’s not a concern. Easy answer: Because no matter how inefficient, doing such large changes will be much more costly still at the protocol/application layer. And secondly because this isn’t that inefficient of a db operation for a leveldb (and I think most kv DBs work similarly at the lowest level): Delete (well mark deleted) a contiguous section of keys, and add a new one.
Expanding more on that, we don’t know it’s a directory rename until we’ve scanned and hashed all the contents of the new directory and compared it with the old. At that point the cost of changing records in the database is moot.