Workshops

We are pleased to announce that three workshops and one tutorial will take place at the HP campus, Fort Collins, Colorado, in Building 3, on Tuesday, 16 September, 2014. This year, each workshop/tutorial will take up the full day. Please enter Building 3 Lobby (best entrance to the facilities is from Harmony Road) and proceed to your event.

The workshops are:

DChanges 2014 - Document Changes: Modeling, Detection, Storage and Visualization
Organizers: Gioele Barabucci, Uwe M. Borghoff, Angelo Di Iorio, Sonja Maier, Ethan Munson

The goal of DChanges is to share ideas, common issues and principles about models and algorithms for change tracking and detection, versioning and collaborative editing. We want to look at these topics from different perspectives and want to identify the most common issues and the peculiarities of each domain and each approach.

This edition in particular will be focused on interpretation, visualisation, processing and exploitation of changes. One of last edition's outcomes was that we identified the need for novel interfaces to better understand and exploit detected changes. Several issues were pointed out as still unsolved: interfaces do not scale when dealing with many changes, changes at different levels of abstraction are often not sufficiently taken into account, detection and visualisation are often inter-mixed, logs are often detailed but underexploited, and versioning techniques are not very well suited for non-technical people. Contributions on related topics (diff and merging algorithms, change tracking, applications to other domains) and from related areas (e.g., software engineering, collaboration, or ontology management) are also welcome.

Further information is available on the workshop website.



SemADoc: Semantic Analysis of Documents
Organizers: Carlotta Domeniconi, Evangelos Milios

A large number of document management problems would benefit from having the semantics of documents explicitly represented. However, manually assigning semantic descriptions to documents is labour intensive and error prone. At the same time, the manual generation of domain specific taxonomies is not only labour intensive, but it also needs to be repeated often as the domains themselves and their key concepts shift with time.

In this workshop we will focus on document content analysis and semantic enrichment to generate a layer of semantic description of documents that is useful for document management tasks, such as semantic information retrieval, conceptual organization and clustering of document collections for sense making, semantic expert profiling, and document recommender systems.

The workshop is timely and relevant to the Document Engineering community, as its focus is on semantically enriching documents and document collections, to make them more accessible to their readers. The task is nontrivial due to the volume of text data and the rate at which text data is accumulated by companies, government, and individuals.

Further information is available on the workshop website.



DH-CASE II: Collaborative Annotations in Shared Environments: metadata, tools and techniques in the Digital Humanities
Organizers: Patrick Schmitz, Laurie Pearce, Quinn Dombrowski

Digital Humanities is rapidly becoming a central part of humanities research, drawing upon tools and approaches from Computer Science, Information Organization, and Document Engineering to address the challenges of analyzing and annotating the growing number and range of corpora that support humanist scholarship.

From cuneiform tablets, ancient scrolls, and papyri, to contemporary letters, books, and manuscripts, corpora of interest to humanities scholars span the world’s cultures and historic range. More and more documents are being transliterated, digitized, and made available for study with digital tools. Scholarship ranges from translation to interpretation, from syntactic analysis to multi-corpus synthesis of patterns and ideas. Underlying much of humanities scholarship is the activity of annotation. Annotation of the “aboutness” of documents and entities ranges from linguistic markup, to structural and semantic relations, to subjective commentary;  annotation of “activity” around documents and entities includes scholarly workflows, analytic processes, and patterns of influence among a community of scholars. Sharable annotations and collaborative environments support scholarly discourse, facilitating traditional practices and enabling new ones. 
 
The focus of this workshop is on the tools and environments that support annotation, broadly defined, including modeling, authoring, analysis, publication and sharing. We will explore shared challenges and differing approaches, seeking to identify emerging best practices, as well as those approaches that may have potential for wider application or influence.

Further information is available on the workshop website.



The tutorial is:

DocEng 2014: PDF Tutorial
Organizers: Matthew Hardy and Steven Bagley

Many billions of documents are stored in the Portable Document Format (PDF). These documents contain a wealth of information, however, that information is often perceived as inaccessible. However, often this is down to the tools used to create and process them rather than PDF itself. Initial versions of PDF were primarily aimed at perfect, device-independent print and display reproduction. However, future versions have included many additions aimed at adding “structure” to PDF documents as well as improved support for print and display. These non-print capabilities include the obvious items such as bookmarks, article threads, hyperlinks, commenting, logical structure, metadata, file attachments, digital signatures and more. While many are aware of the print-based capabilities of PDF, fewer are aware of these non-print capabilities. In fact, there is significant misinformation related to PDF even in the scientific communities. It is these higher-level features of PDF that make it such a versatile container format for modern documents by allowing it to combine structural markup with reliable, high-quality presentation.

The focus of this tutorial is to give attendees practical knowledge of how to create and handle PDFs that take advantage of the non-print features of PDF to provide rich access to the information within, using a variety of commercial and open-source tools. We will get under-the-hood of PDF and analyze the poor practices that cause PDFs to be inaccessible; see how to access the text and graphics within a PDF; and the features of PDF that can be used to make the information much more accessible. We will also discuss some of the new ISO standards that provide profiles for producing Accessible PDFs. No prior experience of PDF is expected.

Session timetable:

 0900-0915 Welcome and Introduction
 0915-1030 PDF Internals
This session looks at how a PDF is structured; how the content on the page is described from the bottom up; and how those pages -- and the resources they use -- are fitted together to form a PDF file
 1030-1100 Coffee
 1100-1230 Higher-level Structures
How more abstract higher-level structures have been built on top of the basic PDF structures to support accessibility, and navigation around the PDF
 1230-1330 Lunch
 1330-1400 Rogues Gallery
A look at a "rogues gallery" of bad PDFs in light of what we have learnt in the morning and explaining why they cause problems
 1400-1500 Creating and Manipulating PDF Programmatically
A look at the various options for creating and manipulating PDFs programmatically.  Both open-source and commercial options will be considered, and we'll give tips to avoid creating PDFs that end up in the "rogues gallery"
 1500-1530
Coffee
 1530-1615 The Future
A look at where PDF is heading and what is new in PDF 2.0
 1615 Close