Putting documents into their work context in document analysis
A. Salminen , , V. Lyytikäinen and P. Tiitinen
Department of Computer Science and Information Systems, University of Jyväskylä, PO Box 35 (MaE), FIN-40351 Jyväskylä, Finland
Received 10 September 1999; accepted 9 November 1999. Available online 18 April 2000.
Abstract
In trying to achieve document standardization the goal is to find more effective, consistent, and standardized ways to utilize information technology. The specification and implementation of document standards may take several years requiring a profound analysis and understanding of document management practices. Document standardization does not concern documents only: it concerns workers, their work, business partners, and future systems as well. In this paper we discuss two ways of describing the work context of documents: process modelling and life cycle modelling. In process modelling, documents are regarded as resources produced and used in inter- or intra-organizational business processes. Different types of documents are typically produced and used in a business process. In life cycle modelling work related to processing of a document of a specific type is described. The modelling methods have been tested in an SGML standardization project called RASKE during the analysis of four case domains: the enquiry process in the Finnish Parliament and Government, national Finnish legislative work, budgetary work, and the Finnish participation in EU legislative work. This paper discusses the modelling requirements in document analysis and describes the techniques used in the RASKE project.
Author Keywords: Document analysis; Document standardization; Process modelling; SGML; XML
1. Introduction
The data volume in the electronic document repositories of organizations is growing fast, but the diversity of the document formats and systems, as well as continuing changes in the information technology, cause problems in the access and use of the information needed in work tasks. The problems concern both companies and public sector organizations. These problems have prompted organizations to start major document standardization projects where the intention is to agree upon rules which define the way information is represented in documents. The rules are needed in order to achieve more effective, consistent, and stable ways to utilize information technology in business processes. Problems with technological changes, and in the maintenance of long-term access to digital documents have motivated the search for application independent formats for documents. SGML (Standard Generalized Markup Language) is an international standard for defining and representing documents in an application-independent form (Goldfarb, 1990). A subset of SGML called XML (Extensible Markup Language) has been developed especially for specifying document standards to be used in Web information systems ( Bray, Paoli & Sperberg-McQueen, 1998).
In SGML/XML standardization projects, a profound document analysis is needed. The analysis is usually seen as an analysis of document structures (Travis; Watson and Maler, Magnusson Sjöberg, 1997, Weitz, 1998). Successful implementation of document standards in enterprises however requires understanding of the role of documents in work processes. Especially in cases where the standardization concerns several document types and the document production is part of inter-organizational business processes, the analysts as well as the actors in processes should be able to see the process context of documents. In this paper we discuss the work process modelling as part of document analysis. We will introduce the modelling techniques used in a major standardization project called RASKE where the standardization has concerned the documents created in the Finnish Parliament and ministries ( Salminen; Salminen and Salminen).
The rest of the paper is organized as follows. Section 2 introduces a model for electronic document management environments and defines the notions related to the model. Document standardization of enterprises is discussed in Section 3. As an example of a standardization project the RASKE project is introduced. Work process modelling approaches in other application areas and needs in the document analysis of a document standardization project are discussed in Section 4. The techniques used in the RASKE project are described in Section 5. Experiences and implications from the RASKE project are discussed in Section 6.
2. Electronic document management environments
Organizations use documents as a means for information management: a means to cluster, organize, store, transfer, and use information to fulfill their organizational purposes. The term electronic document management (EDM) refers to the use of modern information technology for the purpose (Sprague, 1995). In document standardization it is important to identify, not only documents and their structures, but also other entities of the EDM environment where the documents are created, manipulated, and used.
Fig. 1 shows a model for an EDM environment using the central notions of information control nets (ICNs): activities and resources (Ellis, 1979). Information is produced and used in activities. The resources are information repositories where information produced can be stored, or from where information can be taken. The dashed lines in the figure denote the information flow from and to resources. The set of activities is denoted by a circle and the resources by rectangles. The resources are divided into three types: documents, systems, and actors. Documents consist of the recorded data intended for human perception. A document can be identified and handled as a unit in the activities, and it is intended to be understood as information pertaining to topic. Since the documents in an EDM environment are mostly digital, it means that information technology is needed and utilized to operate on documents. Hence systems, i.e. hardware, software, and applications, are essential resources in an EDM environment. On the other hand, since the information in documents should be available also after system changes, it is also important to separate the documents from systems as resources. Finally, the actors are people and organizations performing activities and using documents as well as systems in the activities. In some fully automated activities a software system may perform an activity (for example, create an email message and send it to a repository). In this paper we will however consider activities where the actors creating and using documents are people and organizations. In relationship to documents and systems, actors are called users. Actors are grouped by roles. A role specifies the tasks, responsibilities, and rights of an actor in an activity, as a user of a system, or as a user of a document repository.
(3K)
Fig. 1. Components of an electronic document management environment.
Information pieces needed and produced during an activity are stored in many different ways: in the heads and experience of people, in the organizational culture, as hardware and software solutions, and as data in documents and applications. If the notion of information is understood according to the sense-making theory of Dervin (1992) as ‘the sense created in a situation, at a specific moment in time and space by a reader’ (where Dervin means a human reader), then information is subjective and the information needed by a person in order to perform an activity may be a complicated combination of pieces coming from different sources.
An EDM environment may be in a single organization. In the current networked world however, business processes often concern several organizations and resources are shared more or less by those organizations. Thus the EDM environments in which a specific organization or person is involved may be quite complex.
3. Document standardization
One of the approaches for improving business processes is document standardization using application-independent standard formats. In the standardization the idea is to plan digital information structures and formats taking into account future changes in systems instead of planning them for a specific software system. The rules associated with a document, document authoring, and its storage format are intended to help consistent understanding of the content by the authors and different readers also in situations where the software and hardware changes. Sprague (1995) suggests the development of an electronic document management strategy in an organization. Standardization can be taken as such a strategy.
3.1. RASKE as a standardization project
One example of a standardization project is RASKE. The term RASKE comes from the Finnish words ‘Rakenteisten AsiakirjaStandardien KEhittäminen’ meaning the development of standards for structured documents. The project was commenced in spring 1994 by the Finnish Parliament and a software company in cooperation with researchers at the University of Jyväskylä. The Ministry of Foreign Affairs, Ministry of Finance, Prime Minister’s Office, and a publishing house also participated in the project.
Starting the RASKE project was motivated by document management problems in the Finnish Parliament and government. Teams studying the legislative work carried out in Parliament identified, for example, the following problems concerning document management (Salminen et al., 1997):
1. Incompatibilities of the systems used caused the need for repeated typing of the same piece of text, which in turn was a potential source of inconsistencies in documents.
2. Inconsistencies in document naming and document identifiers caused problems and extra work.
3. Lack of information management coordination between the ministries, and between the government and Parliament.
4. In spite of the fact that almost all of the documents were digital, documents were mostly distributed on paper.
5. The retrieval techniques of different systems were heterogeneous.
6. The retrieval techniques of the electronic archiving system and the tracking system of Parliament were not satisfactory.
7. Uncertainty concerning the future usability of the information in the archived digital documents.
The document analysis in the RASKE project concerned four domains: the enquiry process, national legislative work, Finnish participation in EU legislative work, and the creation of the state budget. During the case analyses, various methods of analysis were tested and developed. Preliminary DTDs were designed for 21 document types including, for example, Government Bill, Government Decision, Government Communication, Private Bill, Special Committee Report, Budget Proposal, and Communication of Parliament.
翻譯部分:
在工作環境中分析文檔的流轉
關鍵詞:document,government
概要:
文檔標準化的目的是為了提高工作效率和一致性,通常的方法是利用信息技術。文檔標準的規范和執行需要幾年的對實際文檔管理的長遠分析和理解。文檔標準化不僅僅只是涉及文檔本身,還涉及到職員,職員的工作,商業合作伙伴以及將來的制度。在這里我們將論述兩種描述文檔工作聯系的方法:進程模型和生命周期模型。在進程模型中,文檔被認作是生產的資源,并被用于交互組織或者內部組織的商業進程。在生命周期模型中,工作涉及到處理一個被描述的具體類型的文檔。在對四個案件領域分析的期間,這種模型的方法已經通過了一個稱為RASKE的SGML標準化工程測試:向芬蘭議會和政府的詢問過程,芬蘭國家的立法工作,財政預算,還有芬蘭在EU立法機關所參與的工作。本文將論述在文檔分析中的模型要求和描述RASKE項目中所用到的技術。
作者關鍵字:文檔分析,文檔標準化,進程模型,SGML,XML
1. 導言
組織機構中所存儲的電子文檔的數據量增長迅速,但是由于文檔格式和系統的多樣性,以及信息技術的持續改革,導致了工作任務中出現了存取和利用這些必要信息的問題。這些問題涉及到了公司和公共部門機構兩個部分。這些問題促使機構開始在著手一些大型的文檔標準化工程,工程的目的就是為了使文檔描述的行業信息的規則達成一致。而這些規則就是為了提高效率和一致性,以及在商業進程中利用信息技術的穩定途徑。科技技術進步以及對數字文檔的長期使用的維護的問題 ,推動了文檔的搜索應用程序的獨立。SGML(標準通用標注語言)是一個在獨立請求格式(Goldfarb,1990)中定義和描述文檔的國際標準。SGML中叫做XML 的子集已經得到發展,尤其是作為指定文檔標準應用在網絡信息系統。(Bray,Paoli& Sperberg-McQueen,1998)。
在SGML/XML標準化工程中,一個深思熟慮的文檔分析是十分必要的。這個分析往往被看作是文檔框架的分解(Travis; Watson and Maler, Magnusson Sjöberg, 1997, Weitz, 1998)。然而企業文檔標準的成功執行,需要工作進程的文檔角色理解。尤其是當標準化涉及到一些文檔類型和文檔的成果作為交互組織商業進程的一部分時,進程中的分析者和參與者應當可以看見文檔進程的聯系。本文我們將工作進程模型作為文檔分析的一部分來進行討論。我們將介紹應用在稱為RASKE的大型標準化工程的模型技術,在那里,標準化已經被應用到到芬蘭國會和政府部門的文檔的創建。( Salminen; Salminen and Salminen)。
2. 電子文檔管理環境
機構把文檔作為信息管理的一種手段:收集,管理,存儲,傳遞的手段,并用這些信息完成他們的組織目的。電子文檔管理的術語(EDM)引用了利用現代信息技術目的的這一層含義。在文檔標準化中,識別是很重要的環節,不僅僅是文檔以及文檔的框架,還有其他電子文檔管理環境的實體,在這些環境中文檔被創建,操作以及使用。圖一顯示了一個使用了信息控制網絡中心概念(ICNs)的EDM環境的模型:活動性和資源(Ellis,1979)。在活動性中信息被創造和使用。資源就是信息的倉庫,它能把創造的信息存儲起來或者提取出去。圖中的線表明了信息流流出或者流入資源。活動性的設置用圓圈表示,資源設置用矩形表示。資源分為三種類型:文檔,規則和參與者。文檔由可以被人們所理解的記錄數據項組成。在活動性中,文檔作為一個可被識別和掌握的個體,被理解成與主題相符合的信息。由于在EDM環境中,文檔主要是數字,這就意味著,信息技術是必須的,并且要利用它來管理文檔。因此,規則,也就是硬件,軟件,和應用,是EDM環境中基本的資源。另一方面,在規則改變前后,文檔中的信息都應該是可用的,把文檔從規則和資源中分離也是很重要的。最后,作為參與者的人和機構控制文檔流轉的進程,在整個文檔的流轉中運用文檔和流轉規則。在一些完全自動流轉的過程中,軟件系統會自動控制流轉過程(例如,創建一個電子郵件消息并發送到儲存室)。本文我們將考慮人或者機構作為參與者創建和使用文檔的流轉過程。在文檔和規則的關系中,參與者被稱為用戶。參與者由角色構成。角色制定流轉過程中的任務,職責和參與者的權利,或者在文檔流轉中作為其中一個部分---用戶。
(3K)
圖一.電子文檔管理環境的組成
流轉過程中信息塊的需要和創建用不同的方法進行儲存:人們的頭腦和經驗,組織文化,硬件和軟件的解決方案,文檔的數據和應用。如果依照Dervin(1992)提出的“由讀者(Dervin指的是人類讀者)在特定的瞬間,時間和空間,依據情況產生的感覺”制造感官的理論能理解信息的概念,這樣的信息就是主觀的,也被人們執行一個復雜的,由不同資源所結合的流轉過程所需要。
EDM環境可以應用在單獨的機構。然而在當前的網絡世界,商業進程往往涉及到幾個機構和或多或少被其他機構所共享的資源。因而涉及特殊機構或者個人的EDM環境也許會更復雜。
3. 文檔標準化
一個改進商業進程的方法就是在獨立請求標準書型中實施文檔標準化。標準化的思想是在設計數字信息框架和格式的時候考慮系統將來的改革,而不是把它們設計成一個特殊的軟件系統。流轉的規則包括文檔,文檔的創建者以及文檔的儲存格式,目的是當系統改進時,有助于創建者和其他不同的讀文檔的人能理解文檔內容的一致性。Sprague(1995)對于機構內部的電子文檔管理策略的發展提出了一些建議。而標準化就是策略的主要內容。
3.1 RASKE作為一個標準化工程
RASKE就是標準化工程的一個例子。術語RASKE來源于芬蘭的一個單詞‘Rakenteisten AsiakirjaStandardien KEhittäminen’,意思是文檔框架標準的發展。這項工程開始著手于1994年的春天,由芬蘭國會和一個軟件公司連同Jyväskylä綜合大學的一些研究員合作負責。外交部,芬蘭的政府部門以及政府總理和出版社都參與了這項工程。
芬蘭國會和政府部門的難題是RASKE工程的起因。立法機關的研究人員遇到了與國會同樣的難題,例如,以下的問題就是有關文檔管理的(Salminen et al.,1997):
1. 使用不協調的系統就會導致當需要對相同片斷的文本進行重復鍵入時,就會與文檔的電源開關相沖突。
2. 文檔的命名和文檔的標志符之間的矛盾導致了問題和增加了額外的工作。
3. 在政府部門之間,內閣和國會之間缺乏信息管理的協和。
4. 由于文檔在書面形式上的分布式而忽略了大部分的文檔是數字式這一事實。
5. 不同的系統有不同的檢索技術。
6. 電子檔案系統的檢索技術和國會的跟蹤系統存在著安全隱患。
7. 檔案文件中數字文檔的信息將來可用性存在著不確定性。
RASKE工程的文檔分析包含四個領域:進程詢問,國家立法工作,芬蘭參與的EU立法工作,國家財政預算的創造。在案例分析中,測試使用了不同的分析方法。預先設計的文件類型定義包括21個文檔類型,例如,政府議案,政府決議,政府的通訊,司法法案,特別委員會報告,概算以及國會的通訊。
前面翻譯的狗屁不通的,后面才漸漸找到一點感覺。
小女子新來的,請大家多多指教。