opinions on archiving loads of data

Recording Techniques, People Skills, Gear, Recording Spaces, Computers, and DIY

Moderators: drumsound, tomb

Post Reply
User avatar
2121TrumbullAve
gimme a little kick & snare
Posts: 77
Joined: Sat Aug 12, 2006 8:44 pm
Location: Denver

opinions on archiving loads of data

Post by 2121TrumbullAve » Tue Jul 31, 2007 4:42 pm

the company i work for is currently archiving each project in its own cardboard box, which contains raw session tracks, EDL's, mixdowns, images, etc. , so a typical project will be a 5 disk set with anywhere from 5GB to 50GB of material to be archived.

we currently burn it all to DVD, and want to refine our process and make archiving easier.

my 1st thought is to set up new machines on the network dedicated to archiving and just copy the files from our workstations to these new archive machines, in which we'd put as many HD's with the largest capacity available.

then there is the necessity of having to recall what's on these drives once they are full and stored away in a closet - how will we label them, to know what's on them.

would external HD's be a smoother way to copy the files than the idea above?

what do any other pro's in publishing houses do?

any thoughts are appreciated.

-jop
*insert pricey DAW specs here

The Scum
moves faders with mind
Posts: 2745
Joined: Thu Jul 03, 2003 11:26 pm
Location: Denver, CO
Contact:

Post by The Scum » Tue Jul 31, 2007 11:14 pm

I've got a background in heavy-duty storage. Fortune-100 companies have faced the same problems, and deployed big-gun technology to deal with it.

The common setup is called a "Storage Area Network" or SAN. A common deployment for what you're talking about would use drive arrays (RAID or JBOD) for "mid-tier" storage, which is then backed up to high end tape drives (IE: StorageTek/Sun devices, which hold 40 GB+ ).

Searching the web for SAN or "data lifecycle management" should give you some background reading. Cisco and Brocade are the biggest players in the SAN game, and also have some good info on their websites.

There's also a company that specializes in SANs for media production:
http://www.studionetworksolutions.com/

User avatar
2121TrumbullAve
gimme a little kick & snare
Posts: 77
Joined: Sat Aug 12, 2006 8:44 pm
Location: Denver

Post by 2121TrumbullAve » Wed Aug 01, 2007 6:10 am

Very cool - thanks a lot.

if these solutions prove to be out of the price-range of the small-ish company i work for, i welcome other suggestions, elaborating on my initial ideas, as well.

regards.
*insert pricey DAW specs here

joel hamilton
zen recordist
Posts: 8876
Joined: Mon May 19, 2003 12:10 pm
Location: NYC/Brooklyn
Contact:

Post by joel hamilton » Wed Aug 01, 2007 7:58 am

Studio wide server drives, with good old DVD burning.

Automatic overnight AIT backups work well. Redundancy seems to be the key, but that means a LOT of storage space.
DVD seems to be the most cost effective, efficient method for off site archive as well. At least from my perspective, which means I am not dealing with 1000 gigs a month... more like 200 gigs a month, average, with a copy going with the client as backup 1, and a set of DVD's, and a copy of the entire session on a house server that gets rotated off as needed.

User avatar
2121TrumbullAve
gimme a little kick & snare
Posts: 77
Joined: Sat Aug 12, 2006 8:44 pm
Location: Denver

Post by 2121TrumbullAve » Wed Aug 01, 2007 10:52 am

we're trying to get away from dvd burning because it's such a time consuming drag, and we have so much material to archive (2-3 engineers touch each project, and it's difficult to get all the files together.)

are optical disks condidered a safe, long term backup medium?

this leads me to the question:

i would like to know why tape is so often the medium of choice as a final destination archive choice. not knowing much about this, the meer word tape implies words in my head such as unreliable, prone to degradation, etc.

thanks again for any thoughts.
*insert pricey DAW specs here

User avatar
A-Barr
tinnitus
Posts: 1010
Joined: Tue Jan 24, 2006 12:27 pm

Post by A-Barr » Wed Aug 01, 2007 11:09 am

are optical disks condidered a safe, long term backup medium?
That is a very good question, I have seen claims by disk manufacturers that data stored on their discs will last for 300 years with no degradation but that's obviously impossible to prove. I have also read that direct light can break down burned media pretty quickly, I have also had discs get ruined from being stored in vinyl, then there is the scratching and trying to access session stuff stored on DVD is a real nuisance, you pretty much have to re-import to your hard drive, which takes a long time, as does the burning process, and they are not re-writable or reliable, in my opinion. I would opt for hard drives. If these are sitting in a box, there's no need for an enclosure, you can just get a 60 gig internal drive for a pretty good price, probably cheaper than DVD if you figure in the man-hours involved with it.

If you really want a reliable backup medium that will not ever degrade with age and is "future-proof," cut it all to vinyl. :)

The Scum
moves faders with mind
Posts: 2745
Joined: Thu Jul 03, 2003 11:26 pm
Location: Denver, CO
Contact:

Post by The Scum » Thu Aug 02, 2007 1:15 pm

I had dinner with a former coworker last night, another storage-savvy person, and we discussed some strategies that might be good for you.

There are a couple of guiding principles in tiered storage like this:
-once a piece of media is no longer active (a disc burned, a tape or hard drive shelved), there's a chance that it won't be readable when you need it.
-therefore, redundancy is key.

Good storage people are pretty paranoid.

The core of this strategy would be to set up a beefy file server. Equip it with a RAID controller operating in RAID mode 5, and feed it as much fast hard drive as you can afford. RAID5 gives you some protection against failed drives.

The server should also have a backup drive of some sort, which largely depends on how comfortable you are with the specific technology. Again, by the first principle above, there's a chance that the backup media will be unusable...and there's no guarantee that any one media is better than another.

Tape drives are common in business applications. Modern ones are reasonably fast and have large capacity (check dell.com for some examples...there was a $1000 one that could put 200 GB on a tape). But tapes are relatively expensive, and often can only be read by the machine that created them.

A reasonable compromise might be a BluRay burner. They'll put 25 or 50 GB on a $10 disc. The format appears to have better commodity status in the marketplace. There's a good chance that in the future, you'll be able to find a BluRay drive, and be able to read the discs (that is, assuming the media is still good)...whereas you might have to hunt to find a tape drive.

The next part is the plumbing for all of this: add a gigabit Ethernet card to each machine you need to back up, as well as the server. Get a small gigabit switch to tie them all together (I just peeked at newegg, and was really surprised by how cheap GigE is now). Set up the server as an iSCSI target, and each workstation as an iSCSI initiator. This network allows the data traffic to go back and forth, and iSCSI is a fairly modern and fast protocol for this. To guarantee storage traffic performance, use this GigE network only for storage traffic...it'll be in addition to any regular office LAN you might have.

Backups with this setup would be a 2-tiered process. When a workstation needs back up, the files are copied to the RAID on the server. Then, the entire contents of the server would be backed up to the backup drive periodically. Preferably, the backups would be scripted and automated, so nobody forgets.

One of the keys is that files remain on the server as long as is practical, so they'd live on the RAID drives, as well as multiple copies of the backups, achieving the necessary redundancy. As long as the files are still on the RAID, then you won't have to go to the backups to restore them, either.

User avatar
2121TrumbullAve
gimme a little kick & snare
Posts: 77
Joined: Sat Aug 12, 2006 8:44 pm
Location: Denver

Post by 2121TrumbullAve » Thu Aug 02, 2007 9:15 pm

thanks very much for the in depth suggestions.

it turns out we might have the budget needed for a serious solution afterall (raid arrays) though we are uncertain about tape as a long term final destination media.

i have forwarded your ideas to the rest of the team.

thanks again.
*insert pricey DAW specs here

GooberNumber9
tinnitus
Posts: 1094
Joined: Fri Oct 20, 2006 7:52 am
Location: Washington, DC

Post by GooberNumber9 » Fri Aug 03, 2007 9:47 am

When you ask about tape, I'm not sure if you mean digital or analog. I've heard and read several times that analog tape for audio is a great archival medium in many ways. Obviously print-through is a concern, as well as overall storage environment (humidity and temperature), and of course that's not a bit for bit digital copy.

Personally I don't trust digital tape as an archival format, only as a day-to-day backup format. In my experience, optical media last much longer than magnetic tape, so much so that I've only in a few cases had problems with optical media. I've thrown out (or sent back for warranty replacement) TONS of digital backup tapes, including 4mm DAT, DLT, and especially Exabyte formats (there's a reason why they are cheaper). ALL media must be stored correctly. There are guidelines for optical and tape media storage, but I think the biggest issue with optical is that it is very inefficient and impractical for any large set of data. I have .5 TB of DVDs that I burned, and labelling and keeping track of them was and has been a nightmare.

Right now my archival model is to copy (either directly or with Retrospect) a large chunk of data to a quad interface hard drive (USB 2.0, Firewire 400 & 800, eSATA) and then take the drive offline. I expect an offline drive to last for a long time, probably longer than the longest lived of the four interfaces will be around. The overall long term plan is to then re-archive it to an updated medium when it becomes available, is affordable and the old medium is close to being obselete.

That last part of an archival plan is probably the most important. The US government spent years and tons of money on researching archiving of data, and they decided that no one format will work for an indefinite amount of time. The National Archives' current archival policy is to re-evaluate media every few years and regularly update and copy information to more updated media.

No easy answers here, as far as I can tell.

Todd Wilcox

User avatar
b3groover
deaf.
Posts: 1977
Joined: Mon Dec 22, 2003 4:07 pm
Location: michigan
Contact:

Post by b3groover » Fri Aug 03, 2007 10:05 am

With the failure rate of harddrives, I would trust my data to DLT or DVD backups way before trusting it to sitting on a harddrive.

But regardless, this is a serious issue that has really not been addressed yet by the audio community. If there are standards for audio, and more specifically digital audio data, then why are there not standards for backing up that audio data? Seems like something the AES should be concerned about.
www.organissimo.org
organissimo - Dedicated (new CD)
"This shitty room is making your next hit record, bitch!"

GooberNumber9
tinnitus
Posts: 1094
Joined: Fri Oct 20, 2006 7:52 am
Location: Washington, DC

Post by GooberNumber9 » Fri Aug 03, 2007 10:13 am

I have some comments on what is largely good advice and corporate industry standard:
The Scum wrote:Tape drives are common in business applications. Modern ones are reasonably fast and have large capacity (check dell.com for some examples...there was a $1000 one that could put 200 GB on a tape). But tapes are relatively expensive, and often can only be read by the machine that created them.
That last isn't entirely true. Usually you have to have the same model tape drive, the same version of the backup software, and sometimes the same operating system. If your tape drive is several years old, getting the same model tape drive to restore from (if the original drive isn't available) is usually the hardest part. If your drive isn't old, then getting a new drive can be very expensive. I've really never run into this issue. Loss of data due to fire will make this your top concern, though.
The Scum wrote: The next part is the plumbing for all of this: add a gigabit Ethernet card to each machine you need to back up, as well as the server. Get a small gigabit switch to tie them all together (I just peeked at newegg, and was really surprised by how cheap GigE is now). Set up the server as an iSCSI target, and each workstation as an iSCSI initiator. This network allows the data traffic to go back and forth, and iSCSI is a fairly modern and fast protocol for this. To guarantee storage traffic performance, use this GigE network only for storage traffic...it'll be in addition to any regular office LAN you might have.
I think this part of the plan is more complicated than it sounds. Yes, 1000 B-T has come down a lot in price, however the cheapest "switches" are not going to give you the performance you might need if you have a large number of systems to back up. A Linksys device that has gigabit ports for $100 is not at all the same thing as a $1200 HP ProCurve switch. The switching fabric, logic, and backplane all have huge impacts on network performance when more than 2 or 3 computers are involved, and especially when moving large files like audio around.

Second, copying data to or from audio computers over ethernet is not a task to enter into lightly. I consulted for a talking book company not too long ago that moves large audio files around as part of their work flow. We found much to our dismay that we couldn't copy from or to computers while they were recording, because the recorded audio would have glitches (using Sound Forge). This may vary depending on the software used, but we found this to be consistent across several different types of computers, interfaces, and hard drives. If you are working with a 24-hour shop, this is a big problem. In terms of this client, they tasked one of their late night staffers with running around copying data over night when the studios were dark.

Finally, I don't think iSCSI would be necessary for this type of application. It's really geared towards abstracting SAN protocols to run over normal data networks and WAN connections. Any backup software will be able to reach out and grab data from workstations to write to disk or to tape without anything more than the normal ethernet, TCP/IP stack, and SMB. Also, I question the description of iSCSI (or anything for that matter) as a "fast protocol". You can run iSCSI over a T1 line, which will be about 800 times slower than running iSCSI over GigE. Only sychronous protocols have any speed specifications, and then I feel like you're inhabiting the grey area between a protocol and a signalling standard.
The Scum wrote: Backups with this setup would be a 2-tiered process. When a workstation needs back up, the files are copied to the RAID on the server. Then, the entire contents of the server would be backed up to the backup drive periodically. Preferably, the backups would be scripted and automated, so nobody forgets.
If you invest in a tape drive, server, RAID array, and GigE switch for the rest of this, spend the extra couple thousand and get some backup software and agents. You won't be able to use the tape drive at all without some kind of software, and the whole thing will just be terrible to manage without the right solution to automate it.

As an IT consultant, I found that there are many solutions for data integrity and archiving for normal businesses that have tons of Word documents and databases. For media companies with large audio and video files, I have yet to find a great solution that doesn't involve very scary amounts of money. I think a lot of smaller TV stations and news outfits are archiving to BETACAM and DigiCAM tapes. I know a few radio stations that were archiving to MiniDisc for a while. The big boys just spend huge amounts of money.

No easy answers.

Todd Wilcox

GooberNumber9
tinnitus
Posts: 1094
Joined: Fri Oct 20, 2006 7:52 am
Location: Washington, DC

Post by GooberNumber9 » Fri Aug 03, 2007 10:20 am

b3groover wrote:With the failure rate of harddrives, I would trust my data to DLT or DVD backups way before trusting it to sitting on a harddrive.
I'm not going to disagree, I think everyone has their own comfort zone when it comes to data. In my comfort zone, I trust media sitting on a shelf in this order, based on my experience with media failures of media that has not be accessed in a long time:

1) Printed optical (infinite lifetime if stored correctly?)
2) Burned optical (unknown lifetime if stored correctly)
3) Fixed disks - hard metal platters (also unknown lifetime if stored correctly)
4) Magnetic tape (I wouldn't give this more than 10 - 12 years even if stored correctly)
5) Floppy disks (more than a year old? Better copy it to something else quick!)

Pretty much all of these are stored correctly in a fairly dark, air conditioned space, preferrably inside a case of some kind.

I have ten year old hard drives that I still use. I have eight year old CD-Rs that still work. I don't know if I've ever seen a ten year old backup tape.

Todd Wilcox

Post Reply

Who is online

Users browsing this forum: Google [Bot] and 85 guests