Document Processing
Document Processing
Documents serve to archive and communicate information. Document processing is the activity of operating on information captured in some form of persistent medium. Traditionally, that medium is paper, and documents are bundles of paper with information captured in print or in writing.
Document processing may serve to coordinate and conduct business transactions. When a customer submits an order to purchase a certain product, the order becomes a document for processing. The manufacturing company coordinates the activities of acquiring the raw materials, making the product, and finally delivering it to the customer with an invoice to collect payment—all by passing documents from one department to another, from one party to another.
Humans, endowed with the capacity to read, write, and think, are the principal actors in document processing. The invention of the modern digital computer, supported by various key technologies, has revolutionized document processing. Because information can be coded in other media that is read and written by the computer—from punched cards in the early 1960s to magnetic tapes , disks, and optical CDs (compact discs) today—it is not always necessary for documents to be on paper for processing.
Automatic Data Processing
If one can implement decision-making into the logic of a computer program, and have the relevant information in the documents coded in some medium for the computer to read and write, the computer running the program can process the documents automatically. Unless the decisions in processing the documents require the intelligence of a human expert, the computer is much faster and more reliable.
The repository for the information is a database. Since the information in the database is readily accessible by the computer, one can generate the paper documents with the desired information any time it is necessary. Automatic data processing and the database technologies for information maintenance and archival have existed since the 1960s. For decisions that require the judgment of a human expert, document processing must bring in the knowledge workers—human users with the expertise in the relevant field of knowledge.
Typographics and Reprographics
The computer is also a versatile tool for the preparation and reproduction of documents. During the early 1980s, as a result of advances in printing technology, text formatting and typesetting tools were available on the computer. People can use these tools to create document content while at the same time specify the presentation layout, including typesetting details. People can keep all the information in some persistent medium such as a disk file. This is called a source document, since the computer tool can use it as input to generate the printed document as output.
Commonly the source document contains coded information in a mark-up language—tags that specify typesetting and presentation layout information. Mark-up languages may also incorporate the use of images and graphical drawings supported by the printing technologies. Low-cost laser printers became available in the mid-1980s. These tools greatly enhance one's ability to produce documents readily on demand. It is necessary to keep only the source documents in a computer-readable medium.
Interactive Graphics and Multimedia
A document does not need to be printed on paper in order for people to view it. Since the bit-mapped monitor screen was invented in the 1970s, people can also view a document on the monitor screen. This allows people to interact with the document directly on the screen. The printed document is called a hard copy, and a displayed document on the monitor screen is known as a soft copy. Both copies are generated from the source document.
Using interactive graphics and window interfaces, users can treat the monitor screen as a desktop and retrieve any document for viewing, or interact with one document to bring up another document. Multiple users can easily share documents and view related documents on the computer at the same time. This also means that someone can use the computer to mediate and coordinate the timing and sequencing of people working on documents. A workflow system can implement the business rules of operation to coordinate multiple parties working together in document processing. It is conceivable that an office may have employees working on documents without ever needing to print out the documents on paper. That is the idea of document processing in a paperless office.
Another worthwhile note is the changing concept of a document. The source document kept in a disk file may incorporate document content with graphical drawing, images, and the typesetting and layout information, as well as audio and video scripts. On a computer equipped with the proper hardware, the soft copy of the document can show a video script or play an audio segment. Such a multimedia document is a new concept of the document: It is no longer a physical bundle of papers.
Telecommunications and E-Commerce
Since people can view a document on a monitor screen to work on it, and they can print out the document on paper only when a hard copy is needed, they can easily share documents by sending them across computer networks. Electronic mail (e-mail) is a document sent from one person to another over a network. The Internet was originally proposed in the early 1980s for the purpose of communication between researchers, connecting the computers in research institutions across the nation. But as the Internet has rapidly grown with documents shared by more and more people, the network has become a channel for publishing. The parties involved, however, need to jointly observe certain standards for the communication protocol and the format for source documents.
Servers are the computers that send documents out on request, and browsers are the tools that are used to make the requests and view the documents received. Servers and browsers must observe the same standards for communication protocol and document format. Hyper Text Transfer Protocol (HTTP) for communication and Hyper Text Mark-up Language (HTML) were established as the standards for source documents in the 1990s. Computers supporting these standards on the Internet formed the World Wide Web.
The Internet continues to grow, virtually covering the whole world today. Document processing on the web can readily involve anybody in the world. Documents can be published and made available for public access from a web server.
The web has become a marketplace for business. E-commerce is a major application of document processing on the World Wide Web. A company may publish a document on a web server to advertise itself and attract customers. Viewers of the document may then interact with it to go to other documents to seek more information. A viewer may also submit an order to make a purchase, sending the order as a document to the company to initiate trading.
Document Structures and Formats
When there are more and more large, complex documents on the Internet, people want to be able to process most of these documents automatically. They want to mark up the structure of document content, so that computer programs can process the content guided by the markup tags. The generation of a soft copy for viewing is simply one of the functions of processing the document.
HTML is a document format designed primarily for viewing using a web browser. Using HTML, people mark up the content of a document with tags for presentation and layout information. A new document format, called Extensible Markup Language (XML), was drafted in November 1996 and has gone through many revisions. XML is a meta-markup language in the sense that it allows one to design the right tags to mark up the content of a document to indicate the structure of its content. Different areas of application domain apply different sets of vocabulary for markup tags. Although molecular biology researchers may use one set of tags, lawyers may use a different set. The style of presentation can be specified according to content structure, and a computer program will be able to display the document for viewing. XML is now emerging as the standard format for documents on the World Wide Web.
Intelligent Agents
There is now a vast amount of information on the Internet, and the information changes quickly. It can be difficult to find useful information, to track changes, and monitor certain situations. For example, a user might be interested in collecting information on stock prices and want to pay attention only to those that change quickly, or to a very high or very low price. Even when he can gather the information, it is difficult to watch too many stocks at the same time.
An interesting active research area today is that of intelligent agents. An intelligent agent is like a software robot. It is an active program that processes information from documents on the web. An agent may actively watch changes in stock prices, on behalf of its owner who launched it; or it may determine the right combination of plane tickets and hotel reservations for a travel itinerary specified by its owner. It becomes even more interesting when these intelligent agents interact with one another. An agent may be trying to sell some product while another agent may be looking for the right product to buy. The two agents may make the trade, each serving its particular owner. XML is one of the key technologies that makes this possible, because these agents need to process the contents of documents intelligently.
With the Internet, the amount of information, and therefore the number of documents that people need to deal with, is much larger than ever before. It is often said that the world is in the Information Age. Document processing will continue to be a major activity of people working with information. The possibilities for harnessing the power of information are endless.
see also Input Devices; Markup Languages.
Peter Y. Wu
Bibliography
Anderson-Freed, Susan. Weaving a Website. Upper Saddle River, NJ: Prentice Hall, 2002.
Harold, Elliotte Rusty. XML: Extensible Markup Language. Foster City, CA: IDG Books Worldwide, 1998.
Document Processing
DOCUMENT PROCESSING
A document is any written, printed, or electronically prepared business communication that conveys information. In the information age, documents are essential products that are becoming larger and more complex. Document processing involves the equipment, software, and procedures for creating, formatting, editing, researching, retrieving, storing, and mailing documents.
HISTORY OF DOCUMENT PREPARATION
The advent of a writing system coincided with the transition from a hunter-gatherer society to agrarian encampments where it became necessary to count one's property—whether it was parcels of land, animals, or measures of grain—or to transfer that property to another individual or another settlement. Letters were being handwritten as early as 2686 b.c.e. Prior to the inventions of the typewriter and the computer, all documents were handwritten, whether they were letters, bills of lading, property deeds, or reports.
The invention of the typewriter changed the way people communicated—moving from handwritten documents to typed ones. The typewriter was invented in 1714 by Henry Mill. Christopher Latham Sholes, a Milwaukee inventor, is the person most often associated with the invention of the typewriter in the United States. In 1868 Sholes produced the first practical typewriter to be patented.
At that time, however, correspondence was deeply rooted in etiquette and penmanship. Individuals were of the mindset that letter writing was the most private, complete, and encompassing form of communication between people. Individuals who dared to type letters risked rejection. Typewritten letters were viewed as insulting, implying that the recipient could not read. Even as late as 1922, the etiquette authority Emily Post was still describing letter writing as an art—even as she saw that art shrinking until "the letter threatens to become a telegram, a telephone message, a post-card" (Post).
Nonetheless, sales of the typewriter became lucrative, and with its acceptance, individuals found the process of preparing documents a far simpler one. The typewriter gave operators a faster means of writing than a person could do by hand.
In 1961 IBM introduced the first electric typewriter, the Selectric. Instead of the standard movable carriage and individual type strikers, this typewriter had a revolving type ball. The use of the revolving type ball allowed the Selectric to print faster than traditional typewriters. Following on the heels of the electric typewriter, IBM introduced the Magnetic Tape Selectric Typewriter (MT/ST) in 1964. The MT/ST was one of the earliest attempts to convert the regular Selectric typewriter into a word processor.
TYPES OF DOCUMENT PROCESSING
Different definitions have been ascribed to document processing. Several business education courses with document processing in their titles describe courses as being designed to teach students how to create a variety of computer-based documents—anything from business, technical, medical, and/or legal documents, tables, forms, reports, presentations documents, to documents for electronic publishing.
Nonetheless, computer science or library and information science show marked differences in their definition of document processing. In these areas, document processing might "explore the issues involved in building natural-language-processing applications that operate on large bodies of real text such as the ones found in the World Wide Web" (Dras and Cassidy, 2005, para. 2). Others find document processing to relate to electronic publishing—and to include such topics as typography, computer languages, file formats, specifications for document style and semantics, and electronic document standards.
Document processing has also been described as processing text documents, including methods of indexing for retrieving text based on content. Thus, document processing appears akin to nonverbal language in that it is learned terminology, one not easily or readily defined—one whose meaning varies with the culture of the organization and/or individual.
While an administrative assistant considers document processing as using a computer to keyboard a letter, memo, electronic mail (e-mail), or report, other individuals see document processing as a means of coordinating and conducting business transactions. An order submitted to purchase a certain product, for example, becomes a document for processing.
From the word-processing perspective, in its simplest form the term document processing means the production of paperwork. Originally the term encompassed all business equipment concerned with the handling of text. The term word processor came to represent stand-alone units. In 1981, with the advent of the IBM personal computer (PC), the playing field for word processors changed. Software-based word processors gradually replaced dedicated word processors. In this fashion, the term went from representing hardware to referring to software.
THE FUTURE OF DOCUMENT PROCESSING
In 1980 R. I. Anderson reported that "an even broader concept of word processing is emerging which ties automatic typing equipment into a communications network for input and output" (p. 55). At this time, optical character recognition, output to phototypesetting equipment, output onto microfilm, or output routed to automatic filing systems were separate units that were being tied together into a total information system of which word processing was a part.
Advances in technology have made it easier for individuals to create and manage documents. Tablet PCs, scanners, voice-recognition software, and the Internet are all changing the face of document processing.
Doctors' offices use wireless tablet PCs for inputting patient data during examinations. Prior to this technological development, patient reports would have been dictated by the physician and transcribed by an assistant. The hard-copy form of the patient's report would have then been stored in the patient's file. The use of the tablet PC also eliminates the need for storage space for hard-copy records and makes retrieval of materials faster and simpler. Also archiving stored records from a computer is a simpler process because older files may be stored on compact disks, jump drives, or external hard drives so that the data is available if needed but is not consuming space on an active hard drive.
Prior to the advent of scanners, documents were stored in file folders, file cabinets, file centers, and departments. Hard copies of documents can now be scanned and stored in an electronic file. This technological advance decreases the space formerly needed for document storage. Also, when a customer or other individual needs a document, a copy can be sent immediately by scanning the requested document and attaching it to an e-mail message.
Voice-recognition software is an important development, particularly to physically challenged individuals. Through the use of a microphone, individuals can dictate letters, memos, e-mail, and reports and have those documents convert to type on the computer screen. The use of voice-recognition software in industry reduces the number of repetitive stress injuries (such as carpal tunnel syndrome) and decreases the amount of time required to input data.
Companies are using e-mail as their official communication channel, thereby eliminating the need for hard copies of interoffice memorandums. In addition to being a faster means of communication, e-mail messages provide a hard-copy record, when needed, by simply printing the message. E-mail messages may also be stored electronically, reducing the required storage space for hard-copy documents.
see also Word Processing; Writing Skills in Business
bibliography
Anderson, R. I. (1980). Word processing: The changing office environment (Margaret H. Johnson, Ed.). Reston, VA: National Business Education Association.
Dras, M., and Cassidy, S. (2005). Document processing and the semantic Web. Retrieved October 13, 2005, from Macquarie University, Department of Computing Web site: http://www.comp.mq.edu.au/units/comp348
Ober, Scot, Johnson, Jack E., and Zimmerly, Arlene (2006). Gregg college keyboarding and document processing (10th ed.). New York: McGraw-Hill Irwin.
Post, Emily (1922). Etiquette in society, in business, in politics, and at home. New York: Funk & Wagnalls.
Shelly, Gary B., Cashman, Thomas J., and Vermaat, Misty E. (2003). Discovering computers 2004: A gateway to information. Boston: Course Technology.
Szul, L. F., and Bouder, M. (2003, February). Speech recognition: Its place in business education. Business Education Forum, pp. 54–56.
K. Virginia Hemby-Grubb
document processing
1. The machine processing (reading, sorting, etc.) of documents that are generally readable both by people and machines, e.g. bank checks, vouchers from credit card transactions, and accounts from public utilities. In addition to the printed information for human interpretation, there may be encoding that is machine-readable and may be in an OCR or MICR font.
2. Dedicated document processing services are available where bulk mailing of bank, utility, or other statements are handled by specialists. Data is transmitted to the site where forms are printed as required, placed in envelopes together with advertising or informative material, and posted in bulk.