What's new

Massive Media Server Project For Non-Profit Digital Archive (1 Viewer)

bigshot

Senior HTF Member
Joined
Jan 30, 2008
Messages
2,924
Real Name
Stephen
I've been asked to talk about the media server project that I'm overseeing for a non-profit digital archive where I serve on the Board of Directors. Our users are primarily film/animation students and artists, and our digital assets fill nearly 100 TB of disk space. Since we are a non-profit educational organization, we can operate under certain fair use exclusions in the Digital Millennium Copyright Act that might not apply to individuals or for profit businesses, and obviously the scale we are working on is far beyond the capabilities of most home theater owners. But perhaps info on how we built our servers might help people design a smaller system for their own home.

We currently have two servers... the primary media library is a custom built database designed to contain 1) biographical information on film makers and artists, 2) high resolution scans of images (photos, artwork, digitized books, etc.), and 3) digitized films. Every asset is optimized for the smallest possible file size, and has a file name with a unique identifying naming convention that allows us to build cross references between different kinds of data... For instance, you can look at the biography of a film maker and see a list of the films he worked on. Click on "film cross links" and view a digitized copy of the films by that film maker in the collection. Then click on "media cross links" and see photos and artwork related to the film maker or specific film. This is a way of organizing media for researchers and students that is right on the cutting edge of library science. It requires a LOT of volunteer man hours in digitizing, cataloging and tagging. It's far beyond the ability of any individual. We've been working on this project for over a decade and have a crew of volunteers who build it digital "brick" by digital "brick". The primary database contains hundreds of thousands of files that all have to be instantly searchable, so we create "work copies" of every asset at a reduced resolution or as compressed files. We back up higher resolution lossless copies on our secondary server that can be easily rolled in to upgrade the resolution or compression settings as technology advances and computers and hard drives get faster.

Our secondary video server would probably be of greatest interest to home theater folks... We originally started with a library of DVDs that numbered well over 10,000 titles. Storage, organizing and access to all of this physical media was a huge challenge... shelves and shelves of stuff. How to organize it? Alphabetical? By subject? It spent most of its time sitting on the shelf because it was too unwieldy to be accessible. Rather than have it all collect dust, I worked with a couple of my volunteers on a plan to get it into circulation. First I pared off all the cases and sleeved the discs, but even then it was very difficult to find specific titles that the students and media curators were interested in accessing. The solution was to rip the DVDs losslessly to a digital file and organize them on a disk array with a media server that automatically catalogs and plays back the files. We use both Plex and XBMC/Kodi and it has streamlined access to the secondary library tremendously. This give us easy access to curate and process the material so we can edit, tag it and transfer it into the main media library as volunteer time allows. The secondary server acts as the "archival copy" and we use the files here for screenings and events where maximum resolution is needed. At this point, most of the video is 480p, but I have started to try to bring in HD content as well. The problem is the file sizes involved. It's difficult to maintain lossless HD video with the number of titles we need to archive, so I've been experimenting with compression. The archivist in me wants the secondary server to be completely lossless, but it just isn't practical right now. That's a problem to be worked out in the future as technology advances.

The basic workflow for video is like this... all of these steps are always going on. We don't do one thing at a time. It's an ongoing process.
  • Digitization: Capturing video from film/tape, ripping disks to various file formats depending on the source
  • Tagging: Labelling the files so Plex/Kodi can parse the data and it can be added to the secondary archive
  • Curation: Review of the raw library to call for specific info to add to the primary database
  • Processing: Editing video to extract the material called for by the curator
  • Formatting: Converting and compressing files for inclusion in the primary database
  • Tagging: File naming convention, cross linking with image and biographical data in the primary database
That is what we are doing in a nutshell. I'll leave it at that for now, because listing all of the file formats and software we use would make for pretty dense and technical reading. But if anyone is interested in how we deal with any specific part of the process, I'd be happy to elaborate.
 
Last edited:

Russell G

Fake Shemp
Senior HTF Member
Joined
Sep 20, 2002
Messages
12,193
Location
Deadmonton
Real Name
Russell
Very cool., thanks for the update! I've found plex hit & miss with identifying TV shows correctly, so I can imagine the tagging could be a bit of a pain.
 

bigshot

Senior HTF Member
Joined
Jan 30, 2008
Messages
2,924
Real Name
Stephen
A lot of times, DVDs of TV shows will be in production order while the scraper uses the order the episodes were aired. This is particularly a problem with British DVDs which are almost always in production order. It helps to use the same site for your reference in tagging that Plex uses as a scraper too.
 

DaveF

Moderator
Premium
Senior HTF Member
Joined
Mar 4, 2001
Messages
24,567
Location
Catfisch Cinema
Real Name
Dave
Can you discuss the archival of source materials? What formats were the materials in? Was any of it encrypted? What tools were used for archiving the encrypted media?

Is the end result lossless or lossy re-compression of the media?
 

bigshot

Senior HTF Member
Joined
Jan 30, 2008
Messages
2,924
Real Name
Stephen
We have digitized from just about every format... film, VHS, Beta, LD, etc. Everything we rip is lossless, except for the files we include in our primary database, which are compressed to be streamable. We are Mac based, but I'm sure there are equivalents to the software we use for PC. I will send you a list of the software we use in PM.
 

DaveF

Moderator
Premium
Senior HTF Member
Joined
Mar 4, 2001
Messages
24,567
Location
Catfisch Cinema
Real Name
Dave
Are you archiving only main features, or do you include special features (commentaries, etc.)? Assuming so, what's your process? My impression is that most systems are geared towards only the movie proper, and do not support (at least, not in a user-friendly manner) all the secondary materials on a modern DVD or Blu-ray.
 

bigshot

Senior HTF Member
Joined
Jan 30, 2008
Messages
2,924
Real Name
Stephen
We archive primarily films, but if there are interviews with the film makers, we archive that too. There is a naming convention in Plex for "Behind the Scenes" "Trailers" and other sorts of supplements. That covers most things, for the rest, we hand tag. Plex is easier to incorporate things that don't exist in the online databases than Kodi is, but perhaps it has a workaround that I just haven't found.

Try googling Plex "behind the scenes naming You should find the page that outlines the different kinds of supplement categories.
 

Peter Apruzzese

Senior HTF Member
Joined
Dec 20, 1999
Messages
4,264
Real Name
Peter Apruzzese
Just started playing with Plex on my PC and using an Oppo for playback of MKV files ripped from some DVDs. (I tried my Roku for playback as well, but the result was noticeably softer). Also put the trial app on my iPhone and iPad - very cool to see a DVD file stream to the iPad in another room.

I could see using this with my entire DVD collection, as I'd love to store them away.
 

bigshot

Senior HTF Member
Joined
Jan 30, 2008
Messages
2,924
Real Name
Stephen
It's definitely a powerful and useful program and a great way to organize seasons and episodes for TV shows
 

Users who are viewing this thread

Forum Sponsors

Forum statistics

Threads
346,460
Messages
4,779,490
Members
141,811
Latest member
kasumakubtneha
Top