Putting documents into their work context in document analysis
A. Salminen , , V. Lyytikäinen and P. Tiitinen
Department of Computer Science and Information Systems, University of Jyväskylä, PO Box 35 (MaE), FIN-40351 Jyväskylä, Finland
Received 10 September 1999; accepted 9 November 1999. Available online 18 April 2000.
Abstract
In trying to achieve document standardization the goal is to find more effective, consistent, and standardized ways to utilize information technology. The specification and implementation of document standards may take several years requiring a profound analysis and understanding of document management practices. Document standardization does not concern documents only: it concerns workers, their work, business partners, and future systems as well. In this paper we discuss two ways of describing the work context of documents: process modelling and life cycle modelling. In process modelling, documents are regarded as resources produced and used in inter- or intra-organizational business processes. Different types of documents are typically produced and used in a business process. In life cycle modelling work related to processing of a document of a specific type is described. The modelling methods have been tested in an SGML standardization project called RASKE during the analysis of four case domains: the enquiry process in the Finnish Parliament and Government, national Finnish legislative work, budgetary work, and the Finnish participation in EU legislative work. This paper discusses the modelling requirements in document analysis and describes the techniques used in the RASKE project.
Author Keywords: Document analysis; Document standardization; Process modelling; SGML; XML
1. Introduction
The data volume in the electronic document repositories of organizations is growing fast, but the diversity of the document formats and systems, as well as continuing changes in the information technology, cause problems in the access and use of the information needed in work tasks. The problems concern both companies and public sector organizations. These problems have prompted organizations to start major document standardization projects where the intention is to agree upon rules which define the way information is represented in documents. The rules are needed in order to achieve more effective, consistent, and stable ways to utilize information technology in business processes. Problems with technological changes, and in the maintenance of long-term access to digital documents have motivated the search for application independent formats for documents. SGML (Standard Generalized Markup Language) is an international standard for defining and representing documents in an application-independent form (Goldfarb, 1990). A subset of SGML called XML (Extensible Markup Language) has been developed especially for specifying document standards to be used in Web information systems ( Bray, Paoli & Sperberg-McQueen, 1998).
In SGML/XML standardization projects, a profound document analysis is needed. The analysis is usually seen as an analysis of document structures (Travis; Watson and Maler, Magnusson Sjöberg, 1997, Weitz, 1998). Successful implementation of document standards in enterprises however requires understanding of the role of documents in work processes. Especially in cases where the standardization concerns several document types and the document production is part of inter-organizational business processes, the analysts as well as the actors in processes should be able to see the process context of documents. In this paper we discuss the work process modelling as part of document analysis. We will introduce the modelling techniques used in a major standardization project called RASKE where the standardization has concerned the documents created in the Finnish Parliament and ministries ( Salminen; Salminen and Salminen).
The rest of the paper is organized as follows. Section 2 introduces a model for electronic document management environments and defines the notions related to the model. Document standardization of enterprises is discussed in Section 3. As an example of a standardization project the RASKE project is introduced. Work process modelling approaches in other application areas and needs in the document analysis of a document standardization project are discussed in Section 4. The techniques used in the RASKE project are described in Section 5. Experiences and implications from the RASKE project are discussed in Section 6.
2. Electronic document management environments
Organizations use documents as a means for information management: a means to cluster, organize, store, transfer, and use information to fulfill their organizational purposes. The term electronic document management (EDM) refers to the use of modern information technology for the purpose (Sprague, 1995). In document standardization it is important to identify, not only documents and their structures, but also other entities of the EDM environment where the documents are created, manipulated, and used.
Fig. 1 shows a model for an EDM environment using the central notions of information control nets (ICNs): activities and resources (Ellis, 1979). Information is produced and used in activities. The resources are information repositories where information produced can be stored, or from where information can be taken. The dashed lines in the figure denote the information flow from and to resources. The set of activities is denoted by a circle and the resources by rectangles. The resources are divided into three types: documents, systems, and actors. Documents consist of the recorded data intended for human perception. A document can be identified and handled as a unit in the activities, and it is intended to be understood as information pertaining to topic. Since the documents in an EDM environment are mostly digital, it means that information technology is needed and utilized to operate on documents. Hence systems, i.e. hardware, software, and applications, are essential resources in an EDM environment. On the other hand, since the information in documents should be available also after system changes, it is also important to separate the documents from systems as resources. Finally, the actors are people and organizations performing activities and using documents as well as systems in the activities. In some fully automated activities a software system may perform an activity (for example, create an email message and send it to a repository). In this paper we will however consider activities where the actors creating and using documents are people and organizations. In relationship to documents and systems, actors are called users. Actors are grouped by roles. A role specifies the tasks, responsibilities, and rights of an actor in an activity, as a user of a system, or as a user of a document repository.
(3K)
Fig. 1. Components of an electronic document management environment.
Information pieces needed and produced during an activity are stored in many different ways: in the heads and experience of people, in the organizational culture, as hardware and software solutions, and as data in documents and applications. If the notion of information is understood according to the sense-making theory of Dervin (1992) as ‘the sense created in a situation, at a specific moment in time and space by a reader’ (where Dervin means a human reader), then information is subjective and the information needed by a person in order to perform an activity may be a complicated combination of pieces coming from different sources.
An EDM environment may be in a single organization. In the current networked world however, business processes often concern several organizations and resources are shared more or less by those organizations. Thus the EDM environments in which a specific organization or person is involved may be quite complex.
3. Document standardization
One of the approaches for improving business processes is document standardization using application-independent standard formats. In the standardization the idea is to plan digital information structures and formats taking into account future changes in systems instead of planning them for a specific software system. The rules associated with a document, document authoring, and its storage format are intended to help consistent understanding of the content by the authors and different readers also in situations where the software and hardware changes. Sprague (1995) suggests the development of an electronic document management strategy in an organization. Standardization can be taken as such a strategy.
3.1. RASKE as a standardization project
One example of a standardization project is RASKE. The term RASKE comes from the Finnish words ‘Rakenteisten AsiakirjaStandardien KEhittäminen’ meaning the development of standards for structured documents. The project was commenced in spring 1994 by the Finnish Parliament and a software company in cooperation with researchers at the University of Jyväskylä. The Ministry of Foreign Affairs, Ministry of Finance, Prime Minister’s Office, and a publishing house also participated in the project.
Starting the RASKE project was motivated by document management problems in the Finnish Parliament and government. Teams studying the legislative work carried out in Parliament identified, for example, the following problems concerning document management (Salminen et al., 1997):
1. Incompatibilities of the systems used caused the need for repeated typing of the same piece of text, which in turn was a potential source of inconsistencies in documents.
2. Inconsistencies in document naming and document identifiers caused problems and extra work.
3. Lack of information management coordination between the ministries, and between the government and Parliament.
4. In spite of the fact that almost all of the documents were digital, documents were mostly distributed on paper.
5. The retrieval techniques of different systems were heterogeneous.
6. The retrieval techniques of the electronic archiving system and the tracking system of Parliament were not satisfactory.
7. Uncertainty concerning the future usability of the information in the archived digital documents.
The document analysis in the RASKE project concerned four domains: the enquiry process, national legislative work, Finnish participation in EU legislative work, and the creation of the state budget. During the case analyses, various methods of analysis were tested and developed. Preliminary DTDs were designed for 21 document types including, for example, Government Bill, Government Decision, Government Communication, Private Bill, Special Committee Report, Budget Proposal, and Communication of Parliament.
翻譯部分:
在工作環(huán)境中分析文檔的流轉(zhuǎn)
關(guān)鍵詞:document,government
概要:
文檔標(biāo)準(zhǔn)化的目的是為了提高工作效率和一致性,通常的方法是利用信息技術(shù)。文檔標(biāo)準(zhǔn)的規(guī)范和執(zhí)行需要幾年的對(duì)實(shí)際文檔管理的長(zhǎng)遠(yuǎn)分析和理解。文檔標(biāo)準(zhǔn)化不僅僅只是涉及文檔本身,還涉及到職員,職員的工作,商業(yè)合作伙伴以及將來的制度。在這里我們將論述兩種描述文檔工作聯(lián)系的方法:進(jìn)程模型和生命周期模型。在進(jìn)程模型中,文檔被認(rèn)作是生產(chǎn)的資源,并被用于交互組織或者內(nèi)部組織的商業(yè)進(jìn)程。在生命周期模型中,工作涉及到處理一個(gè)被描述的具體類型的文檔。在對(duì)四個(gè)案件領(lǐng)域分析的期間,這種模型的方法已經(jīng)通過了一個(gè)稱為RASKE的SGML標(biāo)準(zhǔn)化工程測(cè)試:向芬蘭議會(huì)和政府的詢問過程,芬蘭國(guó)家的立法工作,財(cái)政預(yù)算,還有芬蘭在EU立法機(jī)關(guān)所參與的工作。本文將論述在文檔分析中的模型要求和描述RASKE項(xiàng)目中所用到的技術(shù)。
作者關(guān)鍵字:文檔分析,文檔標(biāo)準(zhǔn)化,進(jìn)程模型,SGML,XML
1. 導(dǎo)言
組織機(jī)構(gòu)中所存儲(chǔ)的電子文檔的數(shù)據(jù)量增長(zhǎng)迅速,但是由于文檔格式和系統(tǒng)的多樣性,以及信息技術(shù)的持續(xù)改革,導(dǎo)致了工作任務(wù)中出現(xiàn)了存取和利用這些必要信息的問題。這些問題涉及到了公司和公共部門機(jī)構(gòu)兩個(gè)部分。這些問題促使機(jī)構(gòu)開始在著手一些大型的文檔標(biāo)準(zhǔn)化工程,工程的目的就是為了使文檔描述的行業(yè)信息的規(guī)則達(dá)成一致。而這些規(guī)則就是為了提高效率和一致性,以及在商業(yè)進(jìn)程中利用信息技術(shù)的穩(wěn)定途徑。科技技術(shù)進(jìn)步以及對(duì)數(shù)字文檔的長(zhǎng)期使用的維護(hù)的問題 ,推動(dòng)了文檔的搜索應(yīng)用程序的獨(dú)立。SGML(標(biāo)準(zhǔn)通用標(biāo)注語言)是一個(gè)在獨(dú)立請(qǐng)求格式(Goldfarb,1990)中定義和描述文檔的國(guó)際標(biāo)準(zhǔn)。SGML中叫做XML 的子集已經(jīng)得到發(fā)展,尤其是作為指定文檔標(biāo)準(zhǔn)應(yīng)用在網(wǎng)絡(luò)信息系統(tǒng)。(Bray,Paoli& Sperberg-McQueen,1998)。
在SGML/XML標(biāo)準(zhǔn)化工程中,一個(gè)深思熟慮的文檔分析是十分必要的。這個(gè)分析往往被看作是文檔框架的分解(Travis; Watson and Maler, Magnusson Sjöberg, 1997, Weitz, 1998)。然而企業(yè)文檔標(biāo)準(zhǔn)的成功執(zhí)行,需要工作進(jìn)程的文檔角色理解。尤其是當(dāng)標(biāo)準(zhǔn)化涉及到一些文檔類型和文檔的成果作為交互組織商業(yè)進(jìn)程的一部分時(shí),進(jìn)程中的分析者和參與者應(yīng)當(dāng)可以看見文檔進(jìn)程的聯(lián)系。本文我們將工作進(jìn)程模型作為文檔分析的一部分來進(jìn)行討論。我們將介紹應(yīng)用在稱為RASKE的大型標(biāo)準(zhǔn)化工程的模型技術(shù),在那里,標(biāo)準(zhǔn)化已經(jīng)被應(yīng)用到到芬蘭國(guó)會(huì)和政府部門的文檔的創(chuàng)建。( Salminen; Salminen and Salminen)。
2. 電子文檔管理環(huán)境
機(jī)構(gòu)把文檔作為信息管理的一種手段:收集,管理,存儲(chǔ),傳遞的手段,并用這些信息完成他們的組織目的。電子文檔管理的術(shù)語(EDM)引用了利用現(xiàn)代信息技術(shù)目的的這一層含義。在文檔標(biāo)準(zhǔn)化中,識(shí)別是很重要的環(huán)節(jié),不僅僅是文檔以及文檔的框架,還有其他電子文檔管理環(huán)境的實(shí)體,在這些環(huán)境中文檔被創(chuàng)建,操作以及使用。圖一顯示了一個(gè)使用了信息控制網(wǎng)絡(luò)中心概念(ICNs)的EDM環(huán)境的模型:活動(dòng)性和資源(Ellis,1979)。在活動(dòng)性中信息被創(chuàng)造和使用。資源就是信息的倉(cāng)庫,它能把創(chuàng)造的信息存儲(chǔ)起來或者提取出去。圖中的線表明了信息流流出或者流入資源。活動(dòng)性的設(shè)置用圓圈表示,資源設(shè)置用矩形表示。資源分為三種類型:文檔,規(guī)則和參與者。文檔由可以被人們所理解的記錄數(shù)據(jù)項(xiàng)組成。在活動(dòng)性中,文檔作為一個(gè)可被識(shí)別和掌握的個(gè)體,被理解成與主題相符合的信息。由于在EDM環(huán)境中,文檔主要是數(shù)字,這就意味著,信息技術(shù)是必須的,并且要利用它來管理文檔。因此,規(guī)則,也就是硬件,軟件,和應(yīng)用,是EDM環(huán)境中基本的資源。另一方面,在規(guī)則改變前后,文檔中的信息都應(yīng)該是可用的,把文檔從規(guī)則和資源中分離也是很重要的。最后,作為參與者的人和機(jī)構(gòu)控制文檔流轉(zhuǎn)的進(jìn)程,在整個(gè)文檔的流轉(zhuǎn)中運(yùn)用文檔和流轉(zhuǎn)規(guī)則。在一些完全自動(dòng)流轉(zhuǎn)的過程中,軟件系統(tǒng)會(huì)自動(dòng)控制流轉(zhuǎn)過程(例如,創(chuàng)建一個(gè)電子郵件消息并發(fā)送到儲(chǔ)存室)。本文我們將考慮人或者機(jī)構(gòu)作為參與者創(chuàng)建和使用文檔的流轉(zhuǎn)過程。在文檔和規(guī)則的關(guān)系中,參與者被稱為用戶。參與者由角色構(gòu)成。角色制定流轉(zhuǎn)過程中的任務(wù),職責(zé)和參與者的權(quán)利,或者在文檔流轉(zhuǎn)中作為其中一個(gè)部分---用戶。
(3K)
圖一.電子文檔管理環(huán)境的組成
流轉(zhuǎn)過程中信息塊的需要和創(chuàng)建用不同的方法進(jìn)行儲(chǔ)存:人們的頭腦和經(jīng)驗(yàn),組織文化,硬件和軟件的解決方案,文檔的數(shù)據(jù)和應(yīng)用。如果依照Dervin(1992)提出的“由讀者(Dervin指的是人類讀者)在特定的瞬間,時(shí)間和空間,依據(jù)情況產(chǎn)生的感覺”制造感官的理論能理解信息的概念,這樣的信息就是主觀的,也被人們執(zhí)行一個(gè)復(fù)雜的,由不同資源所結(jié)合的流轉(zhuǎn)過程所需要。
EDM環(huán)境可以應(yīng)用在單獨(dú)的機(jī)構(gòu)。然而在當(dāng)前的網(wǎng)絡(luò)世界,商業(yè)進(jìn)程往往涉及到幾個(gè)機(jī)構(gòu)和或多或少被其他機(jī)構(gòu)所共享的資源。因而涉及特殊機(jī)構(gòu)或者個(gè)人的EDM環(huán)境也許會(huì)更復(fù)雜。
3. 文檔標(biāo)準(zhǔn)化
一個(gè)改進(jìn)商業(yè)進(jìn)程的方法就是在獨(dú)立請(qǐng)求標(biāo)準(zhǔn)書型中實(shí)施文檔標(biāo)準(zhǔn)化。標(biāo)準(zhǔn)化的思想是在設(shè)計(jì)數(shù)字信息框架和格式的時(shí)候考慮系統(tǒng)將來的改革,而不是把它們?cè)O(shè)計(jì)成一個(gè)特殊的軟件系統(tǒng)。流轉(zhuǎn)的規(guī)則包括文檔,文檔的創(chuàng)建者以及文檔的儲(chǔ)存格式,目的是當(dāng)系統(tǒng)改進(jìn)時(shí),有助于創(chuàng)建者和其他不同的讀文檔的人能理解文檔內(nèi)容的一致性。Sprague(1995)對(duì)于機(jī)構(gòu)內(nèi)部的電子文檔管理策略的發(fā)展提出了一些建議。而標(biāo)準(zhǔn)化就是策略的主要內(nèi)容。
3.1 RASKE作為一個(gè)標(biāo)準(zhǔn)化工程
RASKE就是標(biāo)準(zhǔn)化工程的一個(gè)例子。術(shù)語RASKE來源于芬蘭的一個(gè)單詞‘Rakenteisten AsiakirjaStandardien KEhittäminen’,意思是文檔框架標(biāo)準(zhǔn)的發(fā)展。這項(xiàng)工程開始著手于1994年的春天,由芬蘭國(guó)會(huì)和一個(gè)軟件公司連同Jyväskylä綜合大學(xué)的一些研究員合作負(fù)責(zé)。外交部,芬蘭的政府部門以及政府總理和出版社都參與了這項(xiàng)工程。
芬蘭國(guó)會(huì)和政府部門的難題是RASKE工程的起因。立法機(jī)關(guān)的研究人員遇到了與國(guó)會(huì)同樣的難題,例如,以下的問題就是有關(guān)文檔管理的(Salminen et al.,1997):
1. 使用不協(xié)調(diào)的系統(tǒng)就會(huì)導(dǎo)致當(dāng)需要對(duì)相同片斷的文本進(jìn)行重復(fù)鍵入時(shí),就會(huì)與文檔的電源開關(guān)相沖突。
2. 文檔的命名和文檔的標(biāo)志符之間的矛盾導(dǎo)致了問題和增加了額外的工作。
3. 在政府部門之間,內(nèi)閣和國(guó)會(huì)之間缺乏信息管理的協(xié)和。
4. 由于文檔在書面形式上的分布式而忽略了大部分的文檔是數(shù)字式這一事實(shí)。
5. 不同的系統(tǒng)有不同的檢索技術(shù)。
6. 電子檔案系統(tǒng)的檢索技術(shù)和國(guó)會(huì)的跟蹤系統(tǒng)存在著安全隱患。
7. 檔案文件中數(shù)字文檔的信息將來可用性存在著不確定性。
RASKE工程的文檔分析包含四個(gè)領(lǐng)域:進(jìn)程詢問,國(guó)家立法工作,芬蘭參與的EU立法工作,國(guó)家財(cái)政預(yù)算的創(chuàng)造。在案例分析中,測(cè)試使用了不同的分析方法。預(yù)先設(shè)計(jì)的文件類型定義包括21個(gè)文檔類型,例如,政府議案,政府決議,政府的通訊,司法法案,特別委員會(huì)報(bào)告,概算以及國(guó)會(huì)的通訊。
前面翻譯的狗屁不通的,后面才漸漸找到一點(diǎn)感覺。
小女子新來的,請(qǐng)大家多多指教。