I really think the solution here is to change the way the filesystem is modeled and exchanged. I understand this is a pain in the neck and lots of work, but I think something along what I’m describing is the best long-term solution that is ‘future proof’.
Realistically it would take a few months to overhaul, so I’m not sure whether this is a realistic expectation . I’m going to describe it anyway , since I’m not clear on whether this is feasible given resources and constraints. But I do think it would solve a bunch of ‘structural’ issues in the long term.
Forgive me if some of this is already implemented – I’m Go-illiterate at the moment, so I can’t tell what’s currently in production.
So a filesystem is a tree, right? – it can be expressed as tree Abstract Data Type where each node is either a file or folder, and each node can have a variety of attributes /metadata if necessary.
I’m really actually just proposing a Merkle Tree or Tiger Tree , something along those lines , that’s been customized for our own needs.
Hash Trees are used in the ZFS file system,[5] BitTorrent protocol, Apache Wave protocol,[6] Git distributed revision control system, the Tahoe-LAFS backup system, the Bitcoin peer-to-peer network, and a number of NoSQL systems like Apache Cassandra and Riak.[7]
Tiger tree hashes are used in Gnutella, Gnutella2, and Direct Connect P2P file sharing protocols and in file sharing applications such as Phex, BearShare, LimeWire, Shareaza, DC++[8] and Valknut.[citation needed]
Anyway – a folder might contain it’s name as metadata, on the most basic level.
Now here’s what I propose:
(1) Model any sync’d filesystem as a custom tree with files and folders.
Example of Filesystem as Tree (Files and Folders)
(2) Ensure that every node in the tree has a hash value which is dependent upon it’s own attributes and the attributes of all it’s children to an arbitrary depth.. In other words, a Merkle-Tree like data structure.
(2.0) Permit efficient serialization and de-serialization of the tree for network transport.
**Serialization of Filesystem Tree to Serialized Object Suitable for Network Transport **
(2a). Elaboration (file as a leaf node):
-Let Fi be a File. -Let Fi.data be the file’s data. -Let Fi.len be the file’s length in bytes. -Let Fi.name be the filename. -Let Fi.modified be a file modification timestamp.
-Let HASH(Fi) be a hash of a File for security and versioning purposes. -Let HASH be comprised of (minimally) a CRC64 checksum or other fast hashing algorithm, but preferably a crypographically secure hash of the form SHA1, SHA2, SHA3, RIPEMD, or otherwise.
-Let HASH preferably be a fast calculation with collision resistance. -Let HASH be SHA1 or SHA512 as the best compromise between speed and security.
-Let HASH(Fi) be a message digest of the file and (optionally) metadata properties.
-For example, HASH(Fi) = SHA1(Fi.data + Fi.len + Fi.name + Fi.modified) where ‘+’ is a concatenation operation of any form, or optionally and less preferably an XOR operation.
We can re-run the hash ONLY when the modified timestamp Fi.modified changes AND/OR when Fi.name changes , so we don’t chew up the CPU monitoring the filesystem.
Okay , so the above ensures that each file gets a hash value that reflects it’s state at a given time. This let’s us detect changes.
Next , we do the exact same operation as above, but where an Internal node of type Folder contains children of type Files or Folders. in this case, we can define…
(2b). Elaboration (folder as internal node):
-Let Fo be a Folder with 1 or more child nodes. -Let Fo.data be undefined. -Let Fo.len be undefined. -Let Fo.name be the folder name. -Let Fo.num_children be the number of child nodes of the folder (whether file or folder type). -Let Fo.modified be a folder modification timestamp.
-Let HASH(Fo) be a hash of a Folder for security and versioning purposes.
-Let HASH be comprised of (minimally) a CRC64 checksum or other fast hashing algorithm, but preferably a crypographically secure hash of the form SHA1, SHA2, SHA3, RIPEMD, or otherwise.
-Let HASH preferably be a fast calculation with collision resistance.
-Let HASH be SHA1 or SHA512 as the best compromise between speed and security. -Let HASH(Fo) be a message digest of the folder, all it’s children’s hashes, and (optionally) metadata properties.
-For example, HASH(Fo) = SHA1(Fo.name + Fo.num_children + Fo.modified + HASH(Fo.child[0]) + … + HASH(Fo.child[n]) where ‘+’ is a concatenation operation of any form, or optionally and less preferably an XOR operation. Let n be the number of direct child nodes of the folder, where the folder hash is dependent on the hashes of all it’s children to any tree depth.
Anyway so that solves the issue on a local level… we represent the filesystem as a tree, and everybody gets a hash that is dependent upon file data and/or all of the nodes beneath them (whether they are files or folders).
This way, a change to a file at arbitrary depth propagates up the tree to the root node, which in turn triggers an explicit protocol exchange message of type ‘Filesystem Sync’ or something like that, with the packet data containing either (1) the hash value of the Filesystem root node for comparison purposes, and/or (2) the serialized filesystem tree in it’s entirety for more detailed analysis by a peer of ‘what’s changed in my peer’s FS relative to what I have locally in my FS’.
Here is where the controversial / non-obvious part comes in… And I don’t claim this is the best solution either , but it seems the most straightforward one to me…
INSTEAD of implicitly defining the tree in the BEP protocol, where the tree is shared as a sort of ‘nested array’ of arbitrary depth (the way it works now is the tree is sort of compressed to a FAT or vFAT style array)… But there is alot of redundancy here, and we aren’t explicitly defining hashes for EVERYTHING …
We EXPLICITLY share the serialized tree representation of the filesystem when it’s necessary to do so. This exchange now becomes it’s own message or series of messages which are specific ONLY to sharing the filesystem model among the p2p network for purposes of synchronization.
You’ll see in a moment is that I think we should have some form of hashes that are shared all the way down to the block / chunk level. But at the very least, we need it down to the node level , for all files and folders we are sync’ing.
So instead, we make an entirely new BEP message or messages, which EXPLICITLY shares the serialized tree under the following conditions:
(1) At the start of a session (2) Whenever a change is detected in the tree (meaning a hash value changes and propagates up to the root node) via a filesystem watcher, FUSE event trap, (or other mechanism, implementation not important here). (3) Optionally, at periodic intervals…
So that means we have an EXPLICIT message / packet in BEP which contains the serialized tree (along with all of the hash values). To save space / traffic, we can start by sharing the hash value of the root node, which implicitly tells any and all peers whether anything has changed in the nodes below.
If changes have occurred, the nodes then exchange serialized versions of the file system tree, and each independently evaluate and process what has changed… Each peer can determine if they have the most current version of the FS once they receive ‘tree models’ from all other peers (this is done by evaluating in-memory model of a peer’s respective filesystem vs the serialized ‘update’ packet which has arrived over the wire from one or more other peers).
The serialized filesystem packet is de-serialized and the in-memory structures are compared – but not compared by name, but ONLY by hash. Here there can be some optimizations made via traversal methods , where we can safely ignore certain child sub-trees if the hashes match.
This property makes traversal fast, even for an extremely large filesystem with many small files. The only slow part (arguably) is the initial evaluation and exchange of FS tree models.
In any case – Update message occurs – The tree is traversed, and any changes on a leaf-node (file type) level are queued to be requested as updated versions of the file by a peer. The file is divided into blocks, and requested as normal in bittorrent style.
The only issue I see here is defining who is the ‘authoritative’ source if the file changes simultaneously on two nodes, resulting in confusion as to which is the ‘official’ or ‘current’ version , since we just are comparing hashes.
I think this defines things well enough to get the idea of what I propose … There is one additional update , but this part is optional.
(3) Share block-level checksus or hashes of each file via explicit protocol messages, either as part of tree serialization or as specific messages for files.
This lets clients only request areas of the file which have actually changed… If chunks of the file remain the same, then there will be no exchange of data packets (unless another node explicitly requests for a chunk).
I guess the overall idea means moving towards modelling files based on their hash values, which forms their ‘identifier’ for purposes of data exchange and synchronization.
This process opens the door to convergent encryption , as a side effect.
****IMPLEMENTATION IN JAVA OF A FILESYSTEM TREE ****
public class MXMTree {
MXMNode root; MXMNode commonRoot;
public MXMTree( MXMNode root ) { this.root = root; commonRoot = null; }
public void addElement( String elementValue ) { String[] list = elementValue.split("/");
// latest element of the list is the filename.extrension root.addElement(root.incrementalPath, list);
}
public void printTree() { //I move the tree common root to the current common root because I don't mind about initial folder //that has only 1 child (and no leaf) getCommonRoot(); commonRoot.printNode(0); }
public MXMNode getCommonRoot() { if ( commonRoot != null) return commonRoot; else { MXMNode current = root; while ( current.leafs.size() <= 0 ) { current = current.childs.get(0); } commonRoot = current; return commonRoot; }
}
}
import java.util.ArrayList; import java.util.Arrays; import java.util.List;
public class MXMNode {
List<MXMNode> childs; List<MXMNode> leafs; String data; String incrementalPath;
public MXMNode( String nodeValue, String incrementalPath ) { childs = new ArrayList<MXMNode>(); leafs = new ArrayList<MXMNode>(); data = nodeValue; this. incrementalPath = incrementalPath; }
public boolean isLeaf() { return childs.isEmpty() && leafs.isEmpty(); }
public void addElement(String currentPath, String[] list) {
//Avoid first element that can be an empty string if you split a string that has a starting slash as /sd/card/ while( list[0] == null || list[0].equals("") ) list = Arrays.copyOfRange(list, 1, list.length);
MXMNode currentChild = new MXMNode(list[0], currentPath+"/"+list[0]); if ( list.length == 1 ) { leafs.add( currentChild ); return; } else { int index = childs.indexOf( currentChild ); if ( index == -1 ) { childs.add( currentChild ); currentChild.addElement(currentChild.incrementalPath, Arrays.copyOfRange(list, 1, list.length)); } else { MXMNode nextChild = childs.get(index); nextChild.addElement(currentChild.incrementalPath, Arrays.copyOfRange(list, 1, list.length)); } } }
@Override public boolean equals(Object obj) { MXMNode cmpObj = (MXMNode)obj; return incrementalPath.equals( cmpObj.incrementalPath ) && data.equals( cmpObj.data ); }
public void printNode( int increment ) { for (int i = 0; i < increment; i++) { System.out.print(" "); } System.out.println(incrementalPath + (isLeaf() ? " -> " + data : "") ); for( MXMNode n: childs) n.printNode(increment+2); for( MXMNode n: leafs) n.printNode(increment+2); }
@Override public String toString() { return data; }
}
public class Test {
/** * @param args */ public static void main(String[] args) {
String slist[] = new String[] { "/mnt/sdcard/folder1/a/b/file1.file", "/mnt/sdcard/folder1/a/b/file2.file", "/mnt/sdcard/folder1/a/b/file3.file", "/mnt/sdcard/folder1/a/b/file4.file", "/mnt/sdcard/folder1/a/b/file5.file", "/mnt/sdcard/folder1/e/c/file6.file", "/mnt/sdcard/folder2/d/file7.file", "/mnt/sdcard/folder2/d/file8.file", "/mnt/sdcard/file9.file" };
MXMTree tree = new MXMTree(new MXMNode("root", "root")); for (String data : slist) { tree.addElement(data); }
tree.printTree(); }
}
REFERENCES
[A] Plots of Hash Algorithm Speed vs Type for files
[B] Discussion and Benchmarks of Speed vs Security of Hashing algorithms
[C] Java Tree to represent File / Folder Filesystem Source Code