Presentation slides

The A32 Quick Start Guide Version 1.0 Alpha has been presented in Gotheborg in may 2009.

29-05-2009: Link fixed!

Get the slides

Register here to receive updates and news about the guide (no, we won't spam you!).

Technical Print E-mail
[Section edited by Michele Barbera]

1. Digitising your sources

Depending on various factors, you may want to perform the digitisation in-house or outsource it to an external contractor. The aim of this document is to be a quick starting guide, therefore we will not cover all the details of the digitisation process but rather focus on some key issues you will have to take into account. If you have choosen to do the digitisation in-house you presumably already have at hand the technical know-how, if you instead decided to use an external contractor, the following section may help you to make you sure the digitisation company meets a set of minimum criteria.

1.1 Defining the purpose of digitisation

The first question to answer is if the digitisation you are about to perform is for long term preservation of the digital objects or you just want to make the digital reproduction usable on the web. In the latter case the requirements of the digital images are less stringent, the costs involved are also lower.

1.1.1 Digitising for long term preservation and for publication on the web

If you can afford the higher costs connected to digitisation, increased storage space and long term maintainance, this is of course the preferable option. Digitising once and for all in high quality will save you money in the long term because the same images will be usable for both preservation and publication on the web. Digitising for long term preservation is a non-trivial subject that goes outside the purpose of this guide.

You can find more information in the Minerva EU best parctices and in the AHDS Guides to good practices

1.1.2 Digitising only for the web

If you are on a restricted budget, digitising at a lower quality can be cheaper and save you storage space and maintainance costs. There is a wide range of digitising services, therefore we would like to give you some advice for choosing a suitable solution.

If you are on a very restricted budget you can choose the option to make the digitisation yourself (or by employing a member of your permanent staff) by using a common digital camera or flatbed scanner. You may think that doing it yourself will give low quality results. This is wrong. If the digitisation is done carefully you can get high quality pictures that for non-specialist use are barely distinguishable from those you get from professional services (See this example from Nietzschesource, the digitisation has been made with a common 8mp digital camera which you can buy on the consumer market for less than 300eur ).

The first things you have to consider are:

  • Is your source material subject to deterioration? Do you need an insurance to let other people work on it? Does the collection holder (archive, library, university) request an insurance?
  • Can you move the collection or you need the digitisation company to send their employees to the location where the collection is conserved?
  • Depending on the type of your sources (manuscripts, paintings, photos, etc.), which is the minimum level of detail that the users need?

1.2 Minimum requirements for publication on the web

1.2.1 Image Format and Compression

Make sure you get the images in uncompressed TIFF format or lossless compressed TIFF from the digitisation service (If you are using your own digital camera, check its instruction manual and set the best available image quality. If there is a compression setting, choose “no compression”, “RAW IMAGE”, “loss less” or similar).

The TIFF format can be compressed and uncompressed. There are many compression algorithms and some of them do not lead to a loss of information, these compression algorithms are called lossless. One of the most widely known lossless compression algorithms is LZW, which is handled by most image manipulation applications. If your digitisation service or digital camera can provide you LWZ compressed format it is usually a good choice. You will not loose details, you will save storage space and you will be able to open, manipulate and convert the images with most imaging applications.

1.2.2 Spatial Resolution

Forget about DPIs. (See also http://www.scantips.com/no72dpi.html). The general rule is “the higher the resolution, the better” (caution: you shoud keep this within reasonable limits). The technology we are going to use to visualise the images on the web doesn't need images to be scaled down. At the same time, there is no required inferior limit for resolution. If you absolutely have to save disk storage, choose the minimum resolution that makes your reproductions readable for your target users. Make sure to set the resolution at the chosen level for the entire group of source documents in order to avoid the need for an item by item review.

When choosing the resolution you will use for photographing or scanning your sources please take into account that using a higher resolution will not always make your images more detailed. It may just unnecessarily make them grow bigger without significant improvements in the details they capture. Therefore, make sure you do some experiments before beginning to scan your collection.

1.2.3 Color Depth

There are three kinds of scanning (digital sampling):

  1. bitonal scanning which can represent only black and white
  2. greyscale scanning usually at 8 bits per pixel wich can represent 256 shades of grey
  3. colour scanning at a variable bit per pixel rate. 24 bits per pixel is called true colour level, and it can represent 16.7 million colours

The higher the bit depth, the more visual details of the physical object you will be able to distinguish. Decisions on the bit depth therefore have to take into account which parts of the phisical object convey an information value for your users. As an example with bitonal image you won't be able to distinguish between an ink and a pencil. Once again “the more bit depth, the better” if you don't have any restrictions on storage space.

See http://www.scantips.com/ for more info on color depth and general scanning strategies. You may also want to look at other resources, some of which are listed in the Best Practices section of this website.

The following is a sample request that you could use when requesting images to a commercial provider.

Please provide digital files as follows:
Resolution: 400 DPIs
Size: 30 cm short dimension
Bit-Depth: 8-bit
Color Space: Adobe RGB (1998)
File Format: Lossless TIFF
ICC Profile: Embedded
It is usually a very good idea to include a ruler, both in inches and centimeters, in the picture. You can place it next to the objects you are taking pictures of, along the horizontal or the vertical axes. This may be useful in the future for image analysis purposes.
Another very good idea is to use a background that has an high contrast with the objects you are photographing. For example dark blue is often a much better choice than white for old manuscripts and maps.

2. Preparing your digital objects for publication

2.1 Registering a domain name

The first thing you have to do is to buy a domain name for your web site. Generic and country specific top level domain names (.com, .org, .eu, .de, .it and so on) are managed by domain name registries, which can be public or private organisations, depending on the domain. To buy a domain name you can contact any hosting provider. The cost of buying a domain depends on the type of domain (a .tv domain is much more expensive than a .it for example). If possible, we suggest that you buy a .org domain name. The .org domains are mostly associated with non-profit organisations as opposed to the .com domains, which are mostly used by companies. As a general indication, registering a .org domain shouldn't cost more than 10-20Eur per year.

Each domain name “points to” a numerical adress called IP address (a set of numbers that looks like “213.92.22.136”). When you register your domain, the service provider will associate the domain to one of its IP adresses. Make sure that the service provider allows you to change the IP address the domain points to at no additional cost (this feature is often called “DNS administration”, “DNS management” or similar by service providers).

Register the domain name as soon as you can. It is cheap to “park” domain names and make them point to a new IP address (which means make them point to a new physical server that hosts your web site) is often free of charge.

2.2 Defining your naming scheme

The domain name you registered will be the basis of the “names” of your digital objects. In fact each of your digital objects will have a name like www.yourdomainame.org/objectname

2.2.1 Publishing your classification

With “publishing a classification” we basically mean defining the names that your digital objetcs will have on the web. This is an important steps for many reasons and among other things it will have an impact on how your objects will be cited in the future on the web and on paper publications. By publishing your classification you will defiine the names that your objects will have in a way that can be understood by a machine. The result of your work will be then fed to a software that will use it to create the web site.

As of today Talia is focused only on publishing content and delivering it to end users. It makes no assumpion on the the way you create the underlying data and metadata. You will find no administration or backend area in which you can upload your content or edit its metadata. This may change in the future, and in fact a simple backend interface is already in our roadmap but our strategy is to form strategic alliancies with projects who produced Open Source applications to manage the creation and editing of data and metadata in order to make these tools able to export their content to Talia for final consumption by end users and machines.

At the moment there are different strategies that you can use to formalise your classification. Talia, the web publishing software, is unaffected by the strategy you will employ - provided that at the end of the process you can pass the data to Talia in a format it understands. Therefore, the procedure or software you use for generating this format is not restricted. However, we will suggest you a couple of easy ways to generate a Talia compliant classification.

2.2.1.1 Method1: Spreadsheet or OOffice DB

Create an MS Excel (or similar application such as OpenOffice's Spreadsheet or Apple's Numbers) sheet with at least the following columns: sigla, image_file_name, (europeana minimum). You then have to convert the data you entered into the spreadsheet into an XML format. This is usually a trivial task that your computing staff will be able to perform without any problem. Alternatively you can use the template available at LINK. We are in the process of creating a program that will let you post your file on a web site and automatically convert your classification to XML. Of course, in order to be able to use this service you need not to modify the structure of the template (e.g. don't change column names and order). The conversion service will be advertised on the web site of this guide as soon as it will be available.

2.2.1.2 Method2: Fabrica

The EU Discovery Project content partners developed their own graphical tool to input classfications. The tool automatically produces valid Talia XML as its output. The Discovery partners should be able to give it to your project for free. Although the tool itself is free to use and modify, it runs on top of a commercial software and it will require you to pay its license to use it. You will find contacts of the discovery project representatives in the project website.

2.2.1.3 Method3: Legacy tools or database

As long as you are able to produce XML that conforms to the Talia schema, you can of course use any tool you like to enter the data. In the case that you have particular requirements or that you already have a database that contains the classification, this might be the preferred method.

Talia imports the classification as XML. The current XML format is quite generic and it should be relatively easy to map any data structure into the Talia import format. For more infomation on importing data, you can look at the Talia Wiki page on data import.

For example XML files, you can have a look at the example files that are used for testing the import.

3. Setting up your Talia (discovery customization) infrastructure

3.1 Finding a server to host the platform

In order to run Talia you will need a server that offers you the possibility to deploy web applications in a java servlet container, such as Apache Tomcat, Glassfish or Jboss. Additionally Talia requires access to a mysql database. You will also need shell access to the server and, if you want to deliver high resolution tiled images, also the possibility to install and run fcgi applications. For more details see the Talia wiki.

Altough the hardware itself is quite cheap, the management of a web server requires quite a lot of work and skills therefore it is usually better to had it over to a professional. You have the following options:

  1. Ask the computing service of your institution
  2. Ask the partner library of your project if you have one
  3. Buy a commercial hosting service
When choosing a server you should take into account other things beside its price. Do not forget to carefully consider the level of assistance that you will receive from the hosting service. Although your institution or partner library might give you the hosting for free, it is not grated that it will also ensure an adequate level of assistance. If this is the case a commercial hosting service might be preferable. On the other hand, relying on a public institution and even more on a partner library might be preferable by the point of view of long-term preservation of your data (although this is a debatable opinion).

3.2 Obtaining, customising and installing Talia on the server

Talia is a Open Source framework to build semantic digital libraries on the Web. A framework is a set of software libraries that are meant to help programmers to build a finite product. Therefore, Talia in itself is not a finite product and it is meant to be heavily configured and custmized by a programmer. However, some projects have publicly released the results of thir customizations of Talia and they made them available for others to use. In this guide we refer to one of these customizations, the Discovery Customization, rather than to Talia as a framework. Please keep in mind that Talia is costantly changing and evolving. The same is true also for its documentation. This is the reson why some parts of this guide refer to external websites that provide updated information about Talia.

For the purpose of this quick start guide, we will suppose that you will use the customisation of Talia developed by the Discovery Project. Please also see the Talia web site to check if other Talia customisations that better suit your needs are available before installing the Discovery customization.

A word about ontologies

Talia is ontology agnostic. This means that it can load any ontology that maps the structure of your domain.The ontology is then used to dinamically build the user interface, so that the structure of the website will reflect the one of your ontology. However, the cutomization of Talia wich we refer to in this guide (the discovery customization), is build to work with a pre-defined ontology. This ontology is very light, therefore it will little affect the structure of your content and the user interface. To learn more about ontologies in Talia and in the Discovery customization of Talia, see section 3.2.2.2

3.2.1 Downloading and Installing Talia

You can install Talia starting from pre-made packages from the source code. Talia can be installed on Linux, MacOS and Windows (although it hasn't been troughly tested on Windows) machines. Starting from the source is always preferable. However, if you don't have the necessary sysadmin and programming skills you can use a pre-made package. At the time of writing only the Talia Discovery customization is available as a pre-made package and the package is only available for MacOS platforms.

The most updated package download link is available at http://www.muruca.org/talia.html

The Talia source code is available at http://github.com/net7/talia/tree/master

Installation instructions for the packaged version (Mac Os) of the Discovery Customization are available at http://net7sviluppo.com/trac/talia/InstallTaliaForDiscoveryPartners

Installation instructions for the source code (all platforms) are available at http://net7sviluppo.com/trac/talia/wiki/TaliaInstallation

Installing Talia is not as of today a trivial task. Should you need for profession help you can contact one of the professionals listed at http://www.muruca.org/

Please contact us if you are a professional and wish to be added to the list.

3.2.1.1 XML Search engine

There is an XML full-text search engine, which is part of the Discovery Customization of Talia. This engine can search through the content of the XML-TEI documents contained in the library. It's a separate module which can be installed together with Talia.

Follow the instructions on the Talia site to install and configure this search engine.

Alternative search engines will soon be available for Talia that will be based on other technology.

3.2.1.2 Content Management System

You may have some static pages that go with your site, like the editorial criteria, a page for the sponsors or one that lists the participating institutions.

If you like, you can create those as static pages or use any CMS that you prefer. A simple CMS system has been developed for the Discovery project for this task, which you can download from our GitHub page. We also have instructions on the Talia site on how to install this CMS.

The CMS provided by the Discovery Project has a predefined page structure that helps you to create the documentation for a scholarly community on the web. You can use it as a skeleton to structure your site. Of course you can use any other CMS. Many CMS, both commercial and Open Source, are listed at http://en.wikipedia.org/wiki/List_of_Content_Management_Systems

3.2.2 Customizing Talia

As noted before Talia is a framework for building digital libraries and it is meant to be customized to meet the requirements of individual digital libraries. A full customization tutorial is outside the scope of this quick start guide. If you are reading this guide, you may want to start from an existing customization and make your own modifications.

There are 3 main areas in which you may want to customize Talia: changing its graphical appearance, modifying its ontologies and adding more functionalities for the end user. The latter customization requires advanced programming skills and it has been intentionally omitted from this guide.

3.2.2.1 Graphical customization

The appearance of the Discovery version of Talia can be adapted to a limited extent, by changing the logo images and modifying some files. This version of Talia ships with several sets of images that are used for the Discovery sites. When you add your own set of images to the customization_files folder, you will be able to select it during the Talia setup and when creating new Collections (Editions) in Talia.

You may also edit the CSS and the files in the app/views/custom directory to change the appearance of the page. Of course it's also possible to edit the HTML templates inside the app/views directory, if you are an expert user.

To change the look of the Discovery version you can add a new customization in the customization_files directory. You can use the existing files as examples.
  • edition_logo - a small logo image that will be used in different places
  • edition_styles - a template that will automatically be customized for each edition in the system
  • header_images - sets of images that are used in the headers of the editions. Each edition may have a different header image.
  • start_page - Templates to customize the look of the front page
  • start_page_images - to customize the large image that will be used on the start page

In January 2010, with the release of Talia 1.0, you will have even more possibilities to customize Talia's user interface.

3.2.2.2 Customization of ontologies and metadata schemes

Talia can use RDFS and OWL ontologies to describe the structure of your data. Talia will always use the Dublin Core and the base RDF and RDFS ontologies.

The Discovery version use it's own, specialized ontology. Other ontologies can still be easily loaded, but will have little effect on how the sites work. The Discovery Version will also expect that the data you provide has the same structure as the data used for the Discovery Project.

Talia's internal OAI service will automatically map the Dublin Core attributes that are stored in Talia to OAI. The Discovery version also provides a specialized mapping for the Discovery data and the Europeana OAI format.

Talia 1.0 will allow you to flexibly define the structure of your data using a custom ontology. You are free to create your own ontology, Talia will only expect that you each item has a set of Dublin Core metadta and that the overall structure respects a very basic "core" ontology (which will be published together with Talia 1.0). In Talia 1.0 you will also be able to map your own ontology to the OAI service.

There are a number of good tools to create and edit ontology files, such as the Protege ontology editor. The ontologies can be loaded into Talia with a simple command, and you can find out more about how Talia uses them on the Talia site.

3.2.2.3 Customizing menus, links and translations

Customizing and translating menus and messages that appear in the user interface is quite easy. Talia provides a contextual translation mode in which administrators can edit the localization while navigating the site. This way of localizing and customizing the interface has the advantage of letting you immediately see the effects of your modifications.

3.2.3 Contributing to Talia

Most of the Talia development is paid by projects using Talia as the basis of their infrastructure. These projects contracted the Talia development team to develop specific features and then agreed to make these available for free to the community. Of course this is the most direct way to contribute to the evolution and growth of Talia. However, there are many other ways to contribute. Among these there are:

  • writing guides and tutorials on how you used and modified Talia in your project;
  • sending us your user interface templates;
  • inviting us to conferences and workshops to present Talia;
  • sending us translations of the user interface;
  • sending us patches and modifications you made to the source code;
  • helping us to obtain funding (e.g. by involving us in applications for EU or national projects)

3.3 Publishing your content

TO BE COMPLETED> Daniel, Danilo, Michele. [most technical things should go in the wiki and be referenced from here.]

Once Talia has been installed and your content is ready you can import it into the application by following the instructions published here.

Last Updated on Tuesday, 15 December 2009 23:39