This is currently a work in progress. Some source has been released, but it still needs some work. See the end of the page for details.
As always on the Arduino platform, the library would need to use as little memory and program space as possible. The SD card memory is a solid state device, so accessing a file in random order does not add a significant amount of time to the process, although the link between the SD card and the Arduino is relatively slow, and there is no caching on the Arduino side to speed things up.
What we need is a quick algorithm for finding one item in a large list.
Choose a number between 1 and 1000. What if I bet you $1000 that I can discover your number by asking you no more than ten questions? Obviously my first question would be “what is your number?” . But now we restrict my questions to simply guessing a number, and your reply will be “higher”, “lower”, or “yes”. Would I still be guaranteed to win in ten guesses or less? Surprisingly, I would win every time (assuming you always told the truth).
It's difficult to find a needle in a haystack, but if the haystack is half the size, the task is twice as easy. So my first guess will be “is it 500?” and depending on your answer, my next guess will be 250 or 750. Each time I guess, I eliminate half as many possibilities as the previous guess, as I “home in” on your number. So in ten guesses I can check 2x2x2x2x2x2x2x2x2x2 possibilities, which is two to the power of ten, or 1024.
So, the goal is to create an index structure that will allow us to use a Binary Search so we can eliminate large chunks of data at each step of the search, and quickly arrive at the key we're looking for.
The interesting bit of coding this is to build the index in an efficient way, so that it will remain quick to search, even though the records to be read or written are presented to the library in any order. We can assume each record will usually be read more times than it is written, so it's more efficient to spend a little time storing things in the “right place” and tidying up as we go, so they can be quickly retrieved later.
Wikipedia has a nice description of AVL Trees, and enough pseudo code to get me started along the right path.
The first version of the library is now available to download. You can find the documentation (such as it is) in the header file. There's no proper documentation yet, and no examples of how to use it. This is a pre-alpha release, and is not intended to be used for anything other than experimentation. Do not store important data using it, it will very probably break and you'll lose it.