Home Link
Visitor

VAULT

Take11 Design Document 1
Vault Design Summary

J. M. Haile

08 December 2009

Take11.com provides an online web service for social cataloging of videos and DVDs. Construction of the site began in January 2008 and the site has been in public beta-testing since 13 September 2008. Site development is being carried out in three phases:

  1. In Phase One, emphasis has been on features that enable individual users to interact with data for individual movies.
  2. In Phase Two, emphasis will be on individuals interacting with collective information obtained by aggregating data over the entire Take11 catalog.
  3. In Phase Three, emphasis will be on users interacting with each other.

Our general approach to constructing a large site like Take11 is iterative design: we try to get the large framework established and functioning, then on subsequent iterations, we add more detailed features and fine-tune performance. Over the next few weeks, we will be moving from Phase One into Phase Two; however, this does not mean that Phase One is complete. Rather, it means we seem to have a good framework that appears stable and that will support additional features down the road.

1. The Vault

Tentatively, we are using "Take11 Vault" to refer to the pages that will be developed in Phase Two. The data in the Vault will generally originate from three sources: (a) aggregates of data from the Take11 user catalog, (b) reference works on the history of film, such as those published by the American Film Institute, and (c) contributions from members of the Take11 community.

Data supporting the Vault pages will be maintained in a database that is separate from the current user catalog. However, the architecture of the software will consist of three layers that parallel the three layers now being used, see Figure 1.

Figure 1: Schematic of Take11 Software Architecture
  1. The public layer contains html pages and JavaScripts that push content onto web browsers. Lower layers completely isolate the public layer from the databases.
  2. The middle layer contains the PHP library of server-side scripts. These scripts never contain any html nor do they directly access the databases. A script in the library generally supports only one or two public pages. Individual scripts from the PHP library are only loaded when requests for particular public pages are received.
  3. The bottom layer contains PHP core scripts. Core scripts perform two primary functions: (a) they directly pass data into and out of the databases, and (b) they execute other low-level processes that are common to many different public pages. All PHP scripts in the Core are loaded with every public page that is processed and sent to a browser.

2. Vault Content

Our first attempt at the Vault will follow the skeletal structure shown in Figure 2. It contains (1) a navigational dashboard, (2) a Vault record page, and (3) pages related to film history. This structure will become more elaborate as development proceeds.

Figure 2: Preliminary Schematic of Basic Organization of Vault Pages
  1. The Vault Dashboard will provide (a) links to other principal pages in the Vault, and (b) a searchbox by which users can select a particular film to load into the Vault record page.
  2. The Vault record page will contain aggregate data for a particular "standard" title, independent of the medium (DVD, HD DVD, video tape, etc.). A sample mockup of this page is shown in Figure 3. In the case of a TV series, the Vault record page will be the top page in a hierarchy. The hierarchy will include a main page for the program (e.g., House, M.D.) and additional pages for each season. In some cases, the seasonal pages may further divide into a page for each episode.
  3. The historical pages will include (a) a timeline over the 20th-century development of film, (b) filmographies for actors and directors (maybe composers later), (c) lists of awards (e.g., all winners of Academy Awards in all categories), (d) lists of recommended films, (e) books about film and books related to films. Other historical information will be added in later development iterations.
Figure 3: Preliminary Mock-Up of Vault Record Page (top half shown)

3. Category Elements in Standard Form

When information is collected from many sources, it can retain meaning only if there are well-defined categories into which the data can be aggregated. Well-defined here implies a strict adherence to carefully articulated and imposed criteria for distinguishing similarities from differences. It is well-known in information architecture that similarities are much more difficult to recognize than differences. For example, in the Take11 Vault, the medium and its marketing format can be completely ignored: The Hunt for Red October is the same movie no matter whether it was screened from DVD or video tape and no matter whether it occupies a box to itself or whether it is bundled with three other Jack Ryan adventures: these apparent differences are extraneous to the content of the film.

For the Vault to function properly, we need a few categories composed of elements that can be used to consistently and unambiguously label each movie in the database. When an element in such a category meets these requirements, we say it adheres to a "standard" form. Therefore, we seek a set of rules by which an element in the user catalog can be reduced to a standard form that can be used in the Vault. Four such categories immediately come to mind: Genres, Titles, Versions, and Tags.

Genres. On Take11 the allowed elements in the genre category already take a standard form: the allowed values for a Take11 genre are explicit, and their forms are uniform and invariant. For example, the comedy genre is always written as "Comedy": no alternatives are available. Nevertheless, such consistency still allows for human insight and interpretation: e.g., different users may legitimately disagree on whether a particular film should be placed in the Comedy genre.

Vault Titles. The most problematic category whose elements must be reduced to a standard form are movie titles. On the one hand, rules for reduction to standard form must recognize similarities among apparent differences (e.g., special edition, limited edition, director’s cut, etc. are all versions of the same film and so they should all reduce to one standard title). On the other hand, the rules must also distinguish differences from apparent similarities (e.g., Sabrina filmed in 1953 with Bogart and Holden is not the same movie as Sabrina filmed in 1995 with Ford and Kinnear). Unlike the situation with books, in which different editions of the same title may, in fact, be different books, I think this problem for movies can be resolved in the software, without asking individual users to make judgments about whether two movies are the same or different. In fact, I have spent a few days addressing this problem, and the progress is very encouraging. I don’t think this issue has to be completely resolved initially; I think we can start building the Vault with something that is reasonably close, then iterate to refine our rules.

Movie Versions. Although the Vault record page will provide information about a standard title, we also want to include on that page a linked list to all versions that users have cataloged. Examples include special editions, extended editions, director’s cuts, seasons, sets, volumes, etc. These possibilities also require standard forms, but the identification of those forms appears straightforward.

Vault Tags. We will also need rules for reducing any tag to a standard form. Again, we need a consistent form that is the same, in spite of apparent differences. For example, the tags "Academy Award: best picture", "Oscar (picture)", "AA (pic)" are really all the same, and in the Vault a single standard form should be used. I don’t think this problem for tags will be nearly as challenging as that for titles.

4. User Participation

The problem of reducing certain elements to standard forms illustrates an underlying issue that pervades social websites: when do we let users contribute to the organization and interpretation of information and when do we use software to tackle aspects of these problems? I don’t think there is any one best answer to this; various balances should be workable. But I believe some balances between human and machine can be more efficient than others. When the machine is the better tool, we use it; and when the human mind is the better tool, we use it instead. For example, say we are indeed able to let the software impose a set of rules that satifactorily reduces any movie title to a standard form. We still need user input and testing to find and refine those rules. Finding and testing rules seem best done by humans; imposing the resulting rules in a consistent way seems best done by computers.

More generally, the Vault will absolutely need participation from users. Information will need to be added, data will need to be checked and interpreted, user interfaces will have to be tested, etc. We need incentives to motivate participation and mechanisms that recognize valuable contributions from individual users.

5. Long-Term Objectives: Transmission vs Transformation

Let us distinguish data from information and information from knowledge. For our purposes it is probably sufficient to identify information as organized data and knowledge as information to which meaning has been attached. Meaning refers to relations: when we ask what something means, we are asking how that thing is related to other things.

In disseminating information, a process (like a website) may take one of two general forms: it may be primarily devoted to organizing data into information and transmitting that information to users. The development of Take11 in Phase One falls in this class:

But it is also possible to do more: it is possible to articulate relations among chunks of information, thereby creating knowledge that is made available to users:

My perception is that most social websites concentrate on the easier tasks of organizing information and transmitting that information to users. But the harder tasks associated with information transformation seem to be more interesting and potentially more rewarding. The Vault is where Take11 users will be able to participate in such transformations.

Copyright © 2012 by J.M. Haile. All rights reserved.