Metadata
Analyzing hidden semantics in social bookmarking of open educational resources
Web 2.0 services such as social bookmarking allow users to manage and share the links they find interesting, adding their own tags for describing them. This is especially interesting in the field of open educational resources.
“Analyzing hidden semantics in social bookmarking of open educational resources” discusses the possibilities of using the crowd-sourcing phenomenon of social bookmarking for extracting semantics from the tags added by delicious users which describe links related to open educational resources (OER).
Author Julià Minguillón suggests the use of a simple statistical analysis tool to discover which tags create clusters that can be semantically interpreted. The obtained results are compared with a collection of resources related to OER in order to better understand the real needs of people searching for these.
Call for Papers - MTSR 2012: Metadata and Semantics Research Conference
This conference will be held at the University of Cádiz, Spain in November 28-30, 2012.
The 6th International Conference on Metadata and Semantics Research (MTSR'12) aims to bring together researchers and practitioners that share a common interest in metadata, its representation, its semantics and its diverse applications to Information Systems.
Topics include but are not limited to contributions dealing with the following issues:
* Metadata and knowledge representation in open access repositories, research information systems and research infrastructures
* Information standards and specifications for open access repositories, research information systems and research infrastructures
* Achieving semantic interoperability in open access repositories, research information systems and research infrastructures
* Application of semantic web technologies in open access repositories, research information systems and research infrastructures
* Support of advanced features for metadata: compound objects, complex relationships, handling heterogeneous content, multi-linguality, versioning, etc.
* Information exchange, aggregation and integration in open access repositories, research information systems and research infrastructures
* Information mining and extraction in open access repositories, research information systems and research infrastructures
* Infrastructures for data sets (e.g. scientific data, public sector informations)
* Metadata interoperability for research infrastructures across disciplines
* Metadata quality in open access repositories, research information systems and research infrastructures
* Mechanisms, tools and infrastructures for shared services, including but not limited to:
* Persistent identification
* Authority control
* Online viewing/streaming of digital content
* Metadata transformations
* Digital preservation workflows and mechanisms and impact on metadata
* Value-added services based on open access repositories, research information systems and research infrastructures
eContentplus: What are Enriched Digital Documents?
A digital document is a type of structured information package whose physical storage format comprises a huge list of ones and zeros that have been saved in a certain way. Furthermore, the idea of the list of ones and zeros is not entirely correct: it is more about the visualisation that we have of the different bits so that a human being can understand them. We call a bit an object that can be in one of two possible states: on or off, charged with positive or negative electricity, black and white or in colour, charged with three or seven volts, etc.
The process of digitalisation therefore involves ensuring that something can be sufficiently described with these small “bits” in such a way that it can be reproduced when we want a more or less faithful copy of the original.
For example:
- A letter
We can codify each of the letters, symbols or spaces that it comprises with a number, and represent this number as a succession of bits (now we need the image of the ones and zeros). If we know the system that has been followed to codify the original document, we can recompose it quite successfully.
- An image
We can apply a matrix of points – the more the better – to the image, so that, by describing the characteristics of each point and its position, we can once again have a digital codification of the image. So we can say that the point with coordinates 1.1 is blue, the point with coordinates 1.2 is yellow, and so on.
If the point matrix is precise, we should not have any problem when it comes to recomposing the image again, but here we have a problem: we know that reality is not made up of “points” and if those “points” can be seen in the resulting image, the impression of faithfulness to the original may be bad, so we have to make sure that the points we describe are as small as possible. If we manage to make them extremely small, it is possible that the human eye will not see them and will then have a good impression of faithfulness. In order to do this, we use a very big point matrix, which leads to the problem that, in order to describe an image, the resulting list of bits can be very big indeed.
- A song
Let us imagine that we are splitting up the total song duration into many different parts. Approximately every second of the song is divided up into 44,000 parts, and we describe the sound in each of these parts: the frequency of the sound playing, the volume and, if it is stereo, what is playing on each channel. We then store this information and, when we want, we make one or two speakers reproduce exactly a sound of this frequency and volume at the same speed at which it was recorded. The final impression, if the process has been carried out correctly, can be almost that of listening to the original. We all know that music stored on a compact disc (CD) is much better than the old vinyl records. Electronics offer a very good solution to the great speed at which things have to be done. The only problem is that, once again, the size of the resulting list of bits can be enormous. If we want to reduce the size of this list, we can reduce only the number of times we divide each second and, instead of 44,000 times, halve this, or halve half of this… This is known as “reducing sampling frequency”, and the positive result of this is that the final size of the list of bits is smaller; however, the disadvantage is that the quality of the reproduction is not as high.
If we are clear about what a digital document is, we can establish some conclusions about it:
- A digital document is an information package stored in a list of bits.
- The size of the list of bits (from now on, we will use the term “file size” for this concept) can be very large.
With a digital document, however, we have only resolved part of the problem: documents are used to store information but they do not tell you about themselves, that is, they do not tell you about their history of use, how they were conceived, their various parts, etc. All of this information dealing with the information itself is known as “metainformation”. Furthermore, as information, it can also be digitalised and attached to the document. In other words, if we have a digital document that is accompanied by a certain amount of “metainformation”, we have a digital document that can come, to put it this way, with its “user manual”, or with its background. With this, we have “enriched” the document.
Using enriched documentation has some advantages: the “metainformation” is also digitalised and, if it has been constructed in accordance with a pre-established system or protocol, we can ensure that the digital document is processed automatically by a machine in the appropriate manner. For example, we have a digital document that discusses how a law will influence the methods of activity of certain associations that protect children’s rights in India. In addition to the bits that make up the document, we can add another document with “metainformation” about the document in question, for example, with a series of key words such as “jurisprudence”, “India”, “Child rights”, “activism”, “associations, “creation date”, “application date”, “other related documents”, etc. This information on the document, also known as “metadata”, can be used so that, once our computer receives it and it is processed, it is automatically classified in the pre-established categories. Therefore, enriching a document can have many positive consequences:
| If the document is enriched with information on... | we can use it to... |
| Who used this document previously and what he or she thought about it | Support our selection of somebody we trust in order to go directly to the important documents |
| The parts it comprises and what each one talks about | Go directly to what interests us |
| A list of possible applications of the document or what to do with it | Classify the document and send it to somebody we know will find it useful |
We could make a huge list of advantages related to using enriched documentation, always following this method: depending on the information we add, the use may vary. It is obvious that we do not currently know for sure what somebody will do with our document in the future, so it is a good practice to attach ALL of the metainformation that we can to the document, “just in case” it might be of use to somebody in some way in the future.
Metadata and Protocols
As will become obvious, metainformation can not be written in a random way. If we want somebody to be able to use it, we have to write it bearing that person in mind and how he or she will understand it upon reading it. For this reason, it is best to agree in advance as to how to write the metainformation and how to process these metadata.
In order to do this, the most important Information Technology associations in the world and some international standards bodies have agreed to choose the words that will describe the metainformation to be used. After coming to an agreement, they published the syntactical regulations and the dictionaries of words that can be used. This is known as a “standard”. For example, the standard most used to describe a document for cataloguing is known as "Dublin Core" and consists essentially of a series of 15 characteristics that must be written in accordance with a specific method.
If the use we wish to obtain is educational, the standard most used is that of the IMS consortium. With this standard, we can add metainformation to our documents on their pedagogic use.
To write these metadata, a language whose syntax is very simple and easy to understand, particularly by machines, has been used. It is known as XML and it is good because it is very clear and hierarchical in terms of describing the different characteristics of an object.
In Conclusion, enriching a document is adding “metainformation” (“metadata”) to a document. These metadata have to be written in accordance with a “standard”.
The enriched information of a document allows for the document to be used better and for a machine to process it automatically.Glossary
Finally, I am adding a short glossary in case any concept requires clarification.
Bit: object that can exist in one of two possible states. They tend to be grouped in packages of eight, known as “bytes”. To measure them, we tend to use prefixes: 1000 bytes = 1 kilobyte; 1000 Kbytes = 1 megabyte; 1000 Mbytes = 1 Gigabyte (in fact, for more than 1000, we should say two to the power of ten, which makes exactly 1024).
Metadata: data that describes other data. Small package of information that describes another package of information.
Metainformation: generic name for a group of metadata.
Enrich: to add metadata to a document.
Dublin Core: standard to add metadata on library classification to a digital document.
IMS: standard to add educational information to a digital document.
XML: IT language with a very precise syntax to write metadata.


