Unified Repository
The AIMstor Repository is a multi-functional object secondary storage system that can simultaneously perform multiple types of storage operations. Legacy storage systems are disparate and inefficiently create duplicate copies of data. Some systems try to de-duplicate data but at best this is only effective within one subsystem. The diagram below demonstrates the problem with legacy point solutions.
AIMstor Repository's unique ability to be a Backup, Archiving and CDP store enables it to employ several data reduction techniques delivering massive storage cost savings.
Traditional storage systems require separate storage for Backup, Archiving and CDP. This means that data resides in several locations. With AIMstor, if a policy employs Backup, Archiving and CDP to the same repository, then the data will only be stored only once. Importantly also, it will only be sent once, thereby reducing the amount of data that gets moved around the system.
Additionally, the repository detects duplicated data that may appear across different datasets and will de-duplicate this, further reducing the storage required.
The AIMstor Repository receives real time updates; meaning it receives updates as the data from the source is changed. This is byte level changes. For example if a row from a database is modified, only the row is sent, not the whole database. This capability isn't limited to just CDP but available also to Backup and Archiving.
The Repository enables the following:
Live Backup - The repository is updated with live updates negating the need for the system crippling filesystem scans and bulk data transfers. On a periodic basis snapshots are created on the repository creating a backup point in time. Unlike performing traditional full backups, snapshot backups do not duplicate data or cause any overhead to the host system. This offers the opportunity to increase the number of periodic backups. The Repository feeds the meta-data store updates on changes ensuring that backed up content is indexed for easy and quick retrieval.
Archiving / Versioning - The repository has the ability to store versions of files. With the appropriate retention rules, this provides granular archiving. Unlike backups which are taken at set intervals, versions are taken as and when a file changes. For instance, when applied to office documents, every time a document is saved, the version is retained and indexed. This allows a history of a file to be generated showing how the file changed during the generations. To fully facilitate audits and e-discovery, the extra information is retained showing which user and which program modified the file not just the file owner.
CDP - Provides a method of capturing the state of a system on a very granular basis. CDP shares much of the functionality of Live Backup except data is retained for a shorter period of time. Typically CDP and Live Backup are used in conjunction.
Each Repository has its own Metadata store. A site may have more than one Repository. These will work in a federated fashion allowing searches or restores to pull data from several repositories. On large sites where many repositories are required, it is best to assign repositories according to the type of data they are storing. Keeping like data going to the same repository will yield the greatest data reduction savings.
Post Retention Block Scramble - After retention of a data set or file expires, blocks are released and instantly become unreferenced. This is an uncorrectable file fragmentation at the block level, providing a shred-like effect without the massive performance implication of applying shred writing algorithms broadly.
Unified Metadata Helps . . . a Lot
The Metadata store (often referred as the MDS), is a metadata index for things held in the repository. Every Repository has its own MDS and is an integral component of a Repository. Unlike traditional storage products, this is not a database but an indexing system similar to that used in search engines. Unlike a database, it doesn't not have the limitation of requiring fixed field definitions. This means that fully indexed fields can be added enriching the metadata any time without having to restructure the database. It also means that the records don't need to reserve space for fields it doesn't use.
The MDS is closely coupled to the Repository but it can also be used to index data that is not in its local Repository. It may for instance keep the index records for data that maybe stored in a cloud.

The MDS gets fed events from the repository. The repository informs the MDS of new versions, new snapshots etc. The MDS will retrieve the metadata that repository has on a given object and will also append its own metadata corresponding to the policy set. For objects that have retention, a retention record is attached to the object. This in effect keeps the object alive. You may have more than one retention record for a given object. Typically this happens if you have two or more policies which intersect. For instance you may have a policy for all users to keep files for a year and a policy for accounts that requests documents to be held for 7 years. The account documents will only be stored once but it will have two retention records. The documents will not be removed from the system until both retention records are expired.
Versioning has more extensive functionality when it relates to how a file is indexed. Versioning has the notion that a file has generations and the MDS keeps a track of how a file changes and creates a relationship between its previous incantations. A Version search will group the search results so that files that are related are grouped together. Hence if you create a file "New Microsoft Word Document.docx", makes some changes, rename it to "Fred.docx" and then make some changes and then rename it to "Fred Smith annual review 2009.docx" you will be able to see the how all these files are related.