Organization, Classification, and Retrieval of Component Resources

views updated

Chapter 7
Organization, Classification, and Retrieval of Component Resources

7.1 LOCAL COMPONENT RESOURCE WAREHOUSE (LCRW)
7.2 EXISTING METHODS OF COMPONENT CLASSIFICATION AND ORGANIZATION
7.3 DESIGN AND REALIZATION OF LCRW
7.4 COMPONENT ACQUISITION
REFERENCES

7.1 LOCAL COMPONENT RESOURCE WAREHOUSE (LCRW)

The Universal Component Description Language (UCDL) provides a general accessing method for different types of components, which makes it possible to use different types of components from various manufacturers. However, to implement effective program mining on the Internet, it is a must to organize, classify, and manage these components and supply the services of component retrieving, analyzing, and selecting for program mining.

The component resources on the Internet may be classified into three major categories.

The first category refers to the components that developers publish freely on their respective websites. Without a unified, regulated, organizing structure, these components are generally bound with function descriptions in natural language for users to download and use.

The second category refers to the components offered by professional component developers. Generally with their own unified specifications and organizing structures, these components are published on the Internet in the form of component warehouses for users to download and use according to their actual needs. IBM Alphaworks and ComponentSource are such professional component warehouses. These component warehouses provide some

component description information and classify and organize the component resources according to the simple classification rules. For example, ComponentSource classifies and organizes the components mainly according to component technology specification (such as JavaBean, ActiveX), software platform, and component function. The purpose of the classification and organization is to make the component management and retrieval more convenient; for example, we can extract the required components according to the specific component functions and the specific running platforms of components.

The third category refers to the online components that run in various application servers on the Internet. These components do not support remote download, and they merely provide services through the defined standard interfaces, which are mainly represented with Web Services supported under the frameworks of Microsoft.Net and SUN J2EE.

Obviously, for the three categories of components, it is not feasible to make a direct retrieval or access, transmit the extracted components to someplace for real-time composing and compiling, and then execute them to provide services required by users. There are two main problems to be solved.

The first one is related to the possibility and efficiency of retrieving and accessing the needed components, as the three types of components are not compatible with each other. The three types of components are all objects that program mining would want to retrieve and mine. However, as regards the first type of components, it is very difficult to find them directly because they do not follow a unified describing specification and the websites and its links where, they are located keep changing. As regards the second type of components, different producers follow different specifications, which results in the heterogeneousness among the components of different producers and the irrelevance of component description semantics. It is difficult to implement retrieving, analyzing, and composing operations directly on the components in these warehouses. The third type of components are published mainly according to the Web Services Technology Specification, which places emphasis on the description of the single component itself, and goes against the reorganizing and reusing of the components.

The second problem is related to the real-time requirement of Active Services. Most of the services required by a user have a time limit. It will take much time if the retrieving, composing, linking, and executing of the three types of components are directly on the Internet. Besides, the execution of some components themselves involves the real-time problem. This kind of composition, linking, and execution of multi-components on the WAN is time intensive and, therefore, it is difficult to satisfy user service demands.

Therefore, a practical program mining method would be to construct a unified Local Component Resource Warehouse (LCRW) according to the UCDL specification at first, and then unify organization and management of

the components in the warehouse according to the specified structure and organization. In normal times, it uses an intelligent agent system to keep retrieving and extracting the three types of components resources, mentioned above, on the Internet. It organizes and stores the first and second type of components into component warehouse according to the UCDL specification. As for the third type of online components, it stores their function description and location index information into the component resources warehouse. Thus, once the Active Service requests for program mining, the system can rapidly extract the related components from the Local Component Resource Warehouse. If there are no such components, the system will record this failure and send the message to the Intelligent Agent System and Management System of Program Mining so as to make further retrieval of related components on the Internet or ask the program developer to develop new components.

The following sections of this chapter will introduce the commonly used classification and organizing methods of component resource warehouse, and then analyze their applications in LCRW and the design and implementation of LCRW.

7.2 EXISTING METHODS OF COMPONENT CLASSIFICATION AND ORGANIZATION

7.2.1 COMMONLY USED METHODS OF COMPONENT CLASSIFICATION

Component classification methods are the foundation of browsing, managing, and retrieving components in the component warehouse system. At present, the component classification methods adopted in the professional component resource warehouses on the Internet include keyword classification, enumeration classification, attribute-value classification, facet classification, and so on. These methods can also be adopted by the LCRW for classifying, organizing, and managing of components.

Keyword classification. Keyword classification is the most commonly used classification method. When using the keyword method to classify, the first thing we need to do is to define the keywords of components. By analyzing the functions and behaviors of components, we can extract a group of related keywords and assign them to each component. After that, we store and organize the components according to the keywords and retrieve the components accordingly. The advantage of the keyword classification method is that it is simple and easy to implement. However, there is a bottle-neck problem—how does one define and extract the keywords of components. Having defined the keywords, we still need to match the keyword table with related functions so as to quickly extract the components according to the required functions inputted by the user.
Enumeration classification. Enumeration classification, or Level classification, is a widely used information classification. Under this scheme, the target field is segmented into a number of strictly unrelated subsets, which combine to make the whole. In such tiers of recursion, a tree structure is formulated. For example, the MFC class library, which is frequently used in VC language programming, is exactly such an example. This method can classify the components efficiently, and it has been adopted by most of the early component resource warehouses.
The advantage of this method lies in its clear structure, and in the fact that it is easy to understand and use. In the process of component retrieving, users may move along the tree structure with high efficiency. But there are still some shortcomings as well, including the following.
1. Component warehouse structures are difficult to change once they are defined, because the level variety of tree structure must be settled at the very beginning, and new varieties of components are added only to lower tiers. If we expect to merge or decompose previous classifications, we have to change the overall component warehouse structure.
2. Such a classification requires a mass of domain knowledge, which imposes fairly high requirements on component warehouse system builders.
3. Descriptions are often ambiguous, leading to overlapping fuzzy concepts among various subsets and failure to settle some concepts.
Facet classification. The term facet first appeared in the 1950s in the Ranganathan library classification system and Guttman social investigation. But, at that time, the facet concept was very indistinct. There were also many unreasonable factors among facets including overlapped semantics.
The facet classification in component warehouse was put forward by Prieto-Diaz and Freeman in 1987. The main ideas come from library science. In this method, components are accurately classified with the viewpoint of facet that reflects the essence of components. A faceted scheme consists of a group of facets that describe the essential features of components, and each facet classifies components of component warehouses from different viewpoints. Each facet consists of a set of terms, known as term space.The faceted scheme can reflect component features in an objective and comprehensive way. It is widely applied by various component reuse organizations. The most famous ones include the Reuse Based on Object-Oriented Techniques (REBOOT) facet classification method, and that adopted by the Beijing Beida Jadebird Software Engineering Co. Ltd in the country.
In the facet classification strategy, component warehouse managers connect facets with corresponding terms to set up very complex relations between components and faceted schemes. Compared with common level classification strategies, the facet scheme is easier to modify and more flexible because the modification on one facet may not influence other facets. Meanwhile, each facet matches a structured term space, avoiding disorders of common keyword classification strategies and facilitating keyword management.
The facet classification strategy is fairly ideal for the component warehouse system. But the method has defects as well, which is mainly demonstrated by the following two aspects:
1. Complexity in the definition of faceted schemes: components have so many attributes that it is very hard to select facets among them. Facet selection must satisfy many harsh conditions. For instance, facets are the component attributes of most interest to the user. Facets must be simple and orthotropic. They impose very high demands for component warehouse managers.
2. Difficulty in the establishment and maintenance of term space: with continuous development of application domains and technologies, it has become very hard to establish widely accepted standards to follow. As a result, it is not easy to set up and maintain the term space.

Attribute-value classification. This method classifies the components according to the attributes and the corresponding values. For example, the components can be classified according to their functions, application settings or identifications. The attribute can contain any value specified by managers. Although it is very similar to the method of facet classification, there are some differences:

The facets are selected strictly, and their number is limited, while there is no limitation on the attributes.
Facets are orthogonal to each other. One faceted scheme expresses a complete description of components. The attribute-value classification method may randomly select attributes for retrieving components.

The term space of the facet is a limited uncertain space; while the range of the attribute's value is an unlimited space.

Table 7.1 Commonly used methods of component classification.
Method	Description	Advantage	Shortcomings
Keyword	Cataloging according to the keywords of the components	Simple and easy to be implemented	Great uncertainty in deciding value of keyword, difficult to be unified
Enumeration	List out all attributes of the components.	Clear and highly structuralized classifying, easy to understand and to use	Over-restricted, hard to establish a suitable enumerate structure
Facet	To describe attributes of the components from certain aspects (facets) with structuralized terms	Rather objectively and wholly reflecting the features of the component itself	Definition of faceted scheme is complex and it is very hard to buildup and maintain the term space
Attribute	The component is described by a group of values consisting of component inner/outside attributes and class attributes.	Similar to the facet scheme, but the choosing of attributes is more flexible	No limitations on term spaces, more difficult to maintain than facet

There are various relations among the terms of the facet, but the relations among attribute values are linear.

Table 7.1 shows a comparison between the commonly used classification methods in the component resource warehouse system.

7.2.2 ORGANIZING STRUCTURE OF COMPONENT RESOURCE WAREHOUSE

The organizing architecture of the component resource warehouse is mainly of two types, tree structure and network structure.

1. Tree Structure.

According to the general definition of tree structure, the nodes of tree structure must satisfy the following two conditions:

There is one and only one specific node, which is called the root node.
There is one and only one parent node with each node except the root node. These nodes can be divided into many subsets that do not intersect each other, and among them, each subset itself forms another tree structure.

Thus, tree structure is suitable for organizing a component warehouse system with clear tree classification and rigidly grouped subsets without intersecting, for example, the component warehouse with enumerate classification.

To retrieve and store the components in terms of component names, we define the component names and component description messages as component descriptions or component categories to control and manage the components, which includes the description information, such as the component's name, function, interface, using case, and an identifier that marks the location of the component code. Multiple component descriptions or categories of the same type make up component category files.

Example 7.1 The component description in Microsoft.Net SDK (Software Development Kit)

In Microsoft.Net SDK, the description information of components is as shown in Table 7.2 It describes the components by Component Name (Name field), Function Description (Description field), Interface Function (Methods field), Using case (Examples field), and others.

In the component warehouse consisting of tree structures, each leaf node corresponds to a component description and an identifier that indicates the location of the component codes. The component category to which each nonleaf node corresponds is called the metacategory. It contains the description information of the metacategory's name and definition and the index information of all component descriptions under this metacategory. When retrieving components, the system can find any one of the component descriptions under this metacategory according to this index information at

Table 7.2 Component description information (category) in Microsoft.Net SDK.
Field name	Description
Name	Component name
Description	Component function description
Methods	Component's interface function
Examples	Component using cases
Location	Location identifier of component
……	……

first, and, then, find the related components according to the component location identity in the component descriptions.

Example 7.2 Metacategory in Microsoft.Net SDK <Software Development Kit>

The component warehouse of Microsoft.Net SDK classifies and groups the component resources according to the different solution domains that the components belong to. Each component is inserted into different branches according to the solution domain it belongs to. The description of the SDK component category file, that is, the metacategory, as Table 7.3 shows, consists of the fields CategoryName, CategoryDescription, CategoryID, and so on.

According to the category file and component description, a component warehouse in tree structure can be formed.

Table 7.3 Description of component metacategory in Microsoft.Net SDK.
Field name	Field description
CategoryID	Component identifier in metacategory
CategoryName	Component name in metacategory
CategoryDescription	Component description in metacategory
IndexList	Index information of the component descriptions under metacategory
……	……

Example 7.3 Component warehouse organized in Microsoft.Net SDK Microsoft.Net SDK organizes the component warehouse in the form of tree structure according to the Component Description and Metacategory. This is shown in Figure 7.1.

Microsoft.Net SDK first extracts the information of component name, function, interface, and using cases to form the component description and leaf nodes of tree structure, such as “FileClass component description” node and “FileInfoClass component description” node shown in Figure 7.1. Based on these, Microsoft.Net SDK classifies the components according to the different solution domains the components belong to, so that a group of component metacategories (i.e. the nonleaf node) is formed, such as Metacategory System.Security , Metacategory System.Data , Metacategory System.IO , and so on, they are organized under the Metacategory (root node).Net Framework SDK . Among these, the metacategory System.Security is used to organize and index a group of component descriptions related to system security; the metacategory System.Data is used to organize and index a group of component descriptions related to

data source operation; the metacategory System.IO is used to organize and index a group of component descriptions related to system IO operation, for instance, the components FileClass and FileInfoClass belong to System.IO ; then their descriptions are organized under the metacategory System.IO as leaf nodes. Hence, a tree structure is formed.

2. Network Structure.

The component classification method in tree structure is certain and unique, which strictly demands that each component belongs to some class. However, some component classification methods adopt multiple points of view to classify the components; as a result, one component may virtually locate in the intersection of many classification systems. For example, if at the same time we use the component type and component function to classify and organize the components, then, the FTP Protocol component mentioned in the Section 6.4 of this book, will fall into the JavaBean component according to the component type, while it belongs to “Network Application”component according to component function. Aiming at such classification methods, it will be more convenient if we adopt the network structure to organize the component resources.

Network structure is an improved method compared with tree structure. Just as in tree structure, each leaf node corresponds to one component

description in network structure. It includes the description of the component's name, function, running platform, and an identity that indicates the location of the component code. Each nonleaf node corresponds to a meta-category that consists of descriptions of a group of same type components, including the description of the metacategory name and definition, and the index information of all components under this metacategory.

In network structure, the difference from tree structure is that the components are classified from different viewpoints, so as to be convenient for the client to search from different viewpoints. Each classification viewpoint will correspond to a tree structure to sort and organize the component resources in component warehouse. This causes the same component to be organized into many metacategories of tree structures at the same time; therefore, a leaf node will belong to many different nonleaf nodes simultaneously, and hence network organization structure is formed.

Here, we cite the ComponentSource as an example to demonstrate the organization method of components in network structure warehouse.

Example 7.4 Component description and organizing in ComponentSource

In ComponentSource, the Component Description that describes the component's basic information is as shown in Table 7.4. It gives an overall description of the type, function, running platform, developer, and location of the component through the fields of ComponentName, ComponentType, Component-Function, ComponentPlatform, PublisherInfo, Download, and so on.

The ComponentSource warehouse uses the component description shown in Table 7.4 to describe the stored components, and constructs the network structure and organizes the warehouse according to the three different component viewpoints: component type, component function, and component platform.

Table 7.4 Component description in ComponentSource.
Field name	Field description
ComponentName	Name of the component
ComponentType	Type of the component
ComponentFunction	Function of the component
ComponentPlatform	Running platform of the component
PublisherInfo	The developer of the component
Download	The identity indicating the location of the component
……	……

As shown in Figure 7.2 aiming at each component classification, the ComponentSource, first uses the organization method similar to tree structure to organize the component sources. For example, in the classification according to functions, the ComponentSource uses the monolayer method, directly dividing the components' functions into many function classes, such as Network, 3D Model and so on. Each function class forms a metacategory—for example, Network metacategory, 3D Model metacategory, and so on, which is used to organize the components with the same function. All metacategories are further organized into a tree structure so as to realize the organization and managing of the components according to the components' function classification.

Since ComponentSource organizes the components simultaneously from the viewpoints of component type, component function, and component platform, one component actually belongs to three corresponding metacategories in three tree structures. Take the FTP Protocol component as an example, to organize from the view of component type; it belongs to JavaBean metacate-gory; from the viewpoint of component function, it belongs to Network meta-category; from the viewpoint of platform, it belongs to VisualAge metacategory.

Thus, the corresponding description of the FTP Protocol component will be linked to the three metacategories JavaBean, Network, and VisualAge simultaneously, and hence network structure is formed.

7.3 DESIGN AND REALIZATION OF LCRW

Having introduced the frequently used methods in component classification and organization, this part will discuss the methods in component classification and organization used in LCRW, and the computing and operations in LCRW.

7.3.1 COMPONENT CLASSIFICATION METHOD IN LCRW

The LCRW can adopt the classification methods used in professional component warehouses on the Internet that have been introduced in Section 7.2.2 as well as the classification method that arises from the integrating of those methods. The following, focusing on facet classification and network structure, introduces the classification method of LCRW.

1. Selection and Definition of Facet.

According to the introduction to faceted schemes in Section 7.1.2, the facet classification reflects the sorting of components in component warehouse. It uses some attributes of components, according to the different values of various components, to classify them, so as to be convenient for organizing and retrieving. Thus, one facet is, in fact, a subset of component attributes; the terms of a facet are equivalent to attribute value.

The choosing of the facet immediately influences the organizing and retrieving efficiency of the component warehouse. To choose the facet refers to choosing the attributes of the component according to certain selecting rules and methods.

Based on the analysis of the faceted scheme of the large component warehouse system of IBM, Poulin (1993) has concluded that the general principles of facet selecting and defining are as follows:

Reduced. Facet is the most user-interested component attribute when the user searches in the component warehouse, as well as the most related component attribute in component reusing. The facet number of a component warehouse should not be too many, and generally, it is not more than seven.
Orthogonal. For any two facets, their term spaces must be orthogonal and, thus, the changes of the terms of some facet will not influence the term spaces of other facets.
Fullness. Each facet of the component must be one of the classifications in component warehouse, while the component in the warehouse is classified according to this, it must be specifically associated with some term of some facet.
Consistency. It is necessary to keep the consistency of the meanings of each facet and term with the user's understanding of them. As to one concept corresponding to various words, the most exact and professional word of those words will be selected as a term and the others as synonyms.

How to choose the component attributes as facets? According to the principles mentioned above, we choose the attributes that the user frequently uses in searching and fit for the principles mentioned above as facets. We have selected the following five components attributes as the facets of LCRW.

Facet 1, Component Type (CT). This facet is set according to various existing component formats such as JavaBean, EJB, ActiveX, and so on.
Facet 2, Running Environment (RE). The RE of component refers to the software and hardware platform that must be offered when using the component, including understanding/composing/modifying the component, such as the needed special hardware environment, operating system, database platform, network environment, compiling system, and so on. Any component in the warehouse must depend on certain running environment for reuse, even some universal components with source codes have to depend on specific compiling systems.
Facet 3, Application Domain (AD). The AD of the component is the name of the domain (and subdomain) that the component has been or will be applied to, such as MIS, CAD, and so on. Here, the domain means the system that shares some functionality or the set of application programs. Any component in the component warehouse has a suitable domain, and the suitable domain for the generally universal components is General.

Facet 4, Component Function (CF). The component functionality is the function set that the component can provide for the existing or the possible software system, for example, the design of user interface, network function, file operation, graphic processing, and so on. Any component in the warehouse must provide one or more functions. The facet, according to its specific function and the effect in software development, is divided into 11 classes, and each class is subdivided into many subclasses.

Table 7.5 Facet definition of component classification in LCRW.
Facet name	Facet description
Component type (CT)	Set according to existing various component formats
Running environment (RE)	Must be provided platform of software and hardware for running
Application domain (AD)	Specific domain to which the component can be practically applied
Component function (CF)	Function set provided by the component in the processes of program mining and software developing
Level of Reusing (LR)	Application level in each phase of program mining and software reusing

Facet 5, Level of Reusing (LR). The Level of Reusing is the application layer of the component in each phase of program mining and software reusing. For example, some component itself is a complete application and its reusing level is simplly at the “application” layer, while, for the component with source code its reusing layer is just on the level of “coding.”

The five attributes above are not merely isolated completely from each other, but also entirely reflect the correlative feature between component and reusing. They can also suit the later development of the program mining system and LCRW. Therefore, we ascertain the five attributes above as the facets of LCRW, as shown in Table 7.5.

2. Term Space of Facet.

Having confirmed the facet definition in LCRW, we still need to establish a corresponding term table for each facet so as to unify the evaluation of the component on each facet. Since most components stored in LCRW are UCDL components, we can retrieve the value of the corresponding attribute of facet in the UCDL descriptor and after abstracting and choosing, the term table of the facet is confirmed. Because we have defined five kinds of attributes as facets, there are five corresponding term tables.

Term table of component type facet. This facet term table is set according to the existing component formats. At present, the values in the table are COM, DCOM, CORBA, EJB, JAVABEAN, and so on. Here, COM, DCOM, and others are terms.
Term table of running platform facet. This facet term table is specified by the component provider when providing the component, which mainly refers to the needed supporting tools and operating system in its running. At present the values in the table are Solaris, Linux with Java Virtual Machine, HPUX, MacOS, Wintel, Windows, NT, Win2000, and so on.
Term table of application domain facet. The terms of application domain is appointed by the administrator of the component warehouse. The local component resource warehouse, using many existing application domain classification methods for reference, for instance, the NTISGOV: NAICS:1997 (North American Industry Classification System), the UNSPSC-ORG: the UNSPSC-ORG: UNSPSC:3-1 (United Nations Standard Products and Services Code), and other methods, retrieves the domain classification terms related to computer application as the term table of this facet. The users and developers of the component also can modify or update the domain terms according to their demands. At present, the values in the table are e-Government, e-Commerce, e-Learning, office automation, and so on.
Term table of component function facet. The terms of function facet mainly come from the keywords that describe the component functions. To keep the consistency among the terms, the function terms should embody the component purposes on as abstract a level as possible. In LCRW, the term space of the component function facet directly adopts the standard of function definition that is defined in the UCDL—the component universal descriptor. More information on the term's references are given in Section 6.3.2 of this book.
Term table level of reusing facet. The terms in the reusing level facet are rather simple; therefore, they are enumerated by the component warehouse administrator. At present, the terms in reusing level are Application-Level, Component-Level, Code-Level, and so on.

Example 7.5 Term Table of Component Type Facet

The term table of component type is set according to the existing component formats, such as JavaBean component, COM component, and so on. For the convenience of searching, we further subdivide them into 24 concrete component types, which cover all existing component formats, as shown in Table 7.6.

All facets of a component resource warehouse, and all term tables related to these facets and facet names together consist of the term space of this facet. Each facet forms a term subspace. The structure of the term space is the base of component organizing and searching. We adopt the network structure to organize the facet space.

Table 7.6 Term table of component type (CT) facet.
Term class	Term table
.Net component	.NET WinForm
	.NET WebForm
	.NET Class
	.NET Web Service
ActiveX/COM component	ActiveX OCX
	ActiveX DLL
	ActiveX EXE
	ActiveX Designer
	ActiveX.NET Ready
Java component	JavaBean
	Java Class
	Java Applet
	Java Servlet
	Enterprise JavaBean
	BEAWebLogic Workshop JWS Control
CORBA component	CORBA
Others	C /MFC
	VBX
	VCL
	CLX
	Visual Basic Class Library
	Windows Static Link Library
	Windows Foundation Class (WFC)
	CAAdvantage Gen Component
	Other Component Type

Example 7.6 Term subspace in application domain facet

In LCRW, the application domain facet is subdivided into e-Government, e-Commerce, e-Learning, and so on; in the e-Government domain, there are sub-domains office automation, e-Tax, and so on. To lower the searching errors caused by user's misunderstanding of the term space, we define a same class relation among the terms in the application domain facet; the terms OA, Digital Office System, Office Automation System and such form the

same class relation with the term “Office Automation,” while the terms Remote Education, Digital Education, e-Education and such form the same class relation with the term “e-Learning.”

Figure 7.3 shows the term subspace of the application domain facet that is described in Example 7.6. In Figure 7.3, the application domain facet is the root node, and under it, the nonleaf nodes, e-Government, e-Commerce, e-Learning, and others, are linked. They form a tree-like structure based on the tree structure among the terms. For the same class term of each term, we adopt a chain table to connect them with the corresponding nodes in the tree; hence the term subspace is made.

We can evaluate the component according to the term in the application domain facet, link the component under the node of the corresponding term, and consequently make the term subspace be the basic structure in LCRW to organize components.

7.3.2 COMPONENT ORGANIZATION METHOD IN LCRW

After using the multi-facet method to classify the components, we use the network structure, which has been described in Section 7.2.2, to organize components in the LCRW.

When we organize components according to facet classification, the term forms the basic unit of component organization. In LCRW, we organize those component descriptions together, for which the evaluations of the terms are the same, to form a component category entry. The component category entry and facet term correspond to each other; for example, the term Office Automation corresponds to a component category entry, which includes the index information of all components whose term value is equal to “Office Automation.”

Many component category entries are further organized according to the tree structure in facet term space and form the component classification tree. We combine the component classification category trees of the five facets together; they form a multifacet category tree, as shown in Figure 7.4. In it, the multifacet category tree is divided into five subtrees: Facet.CT, Facet.RP, Facet.AD, Facet.CF, and Facet.LR. Within each subtree, it is further organized according to the tree structure in facet term space.

Different from term space, the node of facet subtree is no longer the term, but the corresponding component category entry of this term. In Figure 7.4,

the JavaBean node corresponds to the JavaBean component category entry. All components whose component type is JavaBean will be linked under the JavaBean component category entry.

The multi-facet category tree is the base of LCRW to organize the components according to facet. When we want to add new components into LCRW, we first retrieve the facet terms from the UCDL descriptor. Then, according to the term value of each facet, we separately link the component under the corresponding component category entry of each facet. In Figure 7.4, we add the two components ATMGUI and DBProcess into the warehouse, which have been introduced in Section 4.1.1, we evaluate the facets of the component, and then link the component descriptions of these two components under the corresponding component category entries in the multi-facet category tree. For example, in the component type facet, the term value of ATMGUI is JavaBean, and then we link the component description of ATMGUI under the JavaBean component category entry of the Facet CT subtree; in the component function facet, the term value of DBProcess is DB Transaction, and then we link the component description of DBProcess under the DB Transaction component category entry of Facet CF subtree. Thereby, we make a network organization structure between the multifacet category tree and component description.

The multifacet category tree is also the base of component searching. Either we can browse the components through traversing of the multifacet category tree, or we can directly use “facet → term” as the condition for component searching.

7.3.3 COMPUTING AND OPERATIONS IN LCRW

The purpose of establishing the LCRW, is to retrieve the reusable components that can satisfy the user requirements more quickly and more exactly. To arrive at this purpose, LCRW offers the following operations:

Component warehousing
Component searching
Component combination and coordination

1. Component Warehousing.

Component warehousing is processed in the following steps, as shown in Figure 7.5.

Filling the registration form of component warehousing. In the course of component warehousing into LCRW, first we take out all descriptioninformation on the basic attributes of the component, which will help us understand the component, such as name, author, brief introduction, interface, and so on, from the description file. Then, the information is made into a registration form of component warehousing so as to be convenient for the system to make full testing on the component information.The Registration Form of Component Warehousing in LCRW has the following attributes.
1. Name. Each component must have a certain name, and this name must perfectly identify the nature of the component, such as stack, resource manager and the like.
2. Author. This is the name of the maker or unit that made or provided the component, with contact address and related information.
3. Manufacture date. It is the date when the component was completed.
4. Warehousing date. It is the date when the component was warehoused.
5. Version. This is the corresponding version number of the component in a series of component evolutions.
6. Settings. This refers to the necessary software or hardware platform when using (including understanding, composing, and modifying) the component, for example, the specific hardware environment, operating system, database platform, and network environments.
7. Application domain. This is the name of the application domain (and subdomain) that the component has been or will be applied to, for example, MIS, CAI and the like.
8. Purpose. This refers to the roles that the component plays or will play in the domain where it is used.
9. Function. It is the software function set that the component can provide for the existing or the possible software system.
10. Representation manner. It refers to language form or media that is used to describe the content of the component, such as the programming language used in the source code component, and so on.
11. Modality. This means the compositions of the component and the interrelation between each other, for example, the tree, tree-like, frame, and module.
12. Level. This means the abstract level of the component relative to the phases of software development, such as analyzing, designing, coding, and so on.
13. Context. It is the context on program level that the system must provide when the component is composed.
14. Size. It is the size of the component.
15. Developing tool. This refers to the software tool that the author used to produce this component.
Testing on the component. At present, the testings on the component by the LCRW mainly include Installation Test, Uninstall Test, Anti Virus Check, Evaluation Test, Documentation Review, and so on.
Classifying the facets of the component. The course of classifying the facets of component, in fact, is to evaluate it according to the facet of the component and organize it into the corresponding node in the multifacet category tree. First, we retrieve the facet value of the component according to the UCDL description file of the component and standardize it into terms. Then, we organize and link the component, according to the facet value, under the component category entries that the facets correspond to. Hence, the organization of the component in multifacet category tree is completed.
Setting up the keyword index of the component. In the course of component warehousing into LCRW, by aiming at the UCDL description file of each component, we retrieve the keyword table from the component's function description field. The keyword table is added into the LCRW as an attribute field so as to be helpful for the program mining system to perform keyword-based component searching in analyzing the user requirements.
Submitting entity file of the component for warehousing. The component entity is stored in the given directory in the server file system. The location information written in the file is stored into the field “address” of the component form.

Example 7.7 Procedure of component warehousing

The following takes the FTP Protocol mentioned in Chapter 6 as an example, showing us the procedure and steps of component warehousing (Figure 7.6).

Filling the basic attributes (additional information) of the component and keywords. To acquire the component, the transforming tool of UCDL is used to generate the UCDL descriptor of the component.
The UCDL description of FTP Protocol is as follows:According to the UCDL description the Registration Form of Component Warehousing to be filled is shown in Table 7.7.
Having received the Registration Form of Component Warehousing of the component FTP Protocol, according to the content of the form, the LCRW generates the component description of FTP Protocol.

Classifying the facets of the component. According to the UCDL description file, each facet value of the component FTP Protocol is retrieved, as Table 7.8 shows.
According to the value of each facet of the component FTP Protocol, we add the FTP Protocol into the component warehouse. According to the facet value of the component, the LCRW organizes and links the component description of the FTP Protocol under the component category entry

Table 7.7 Registration form of component warehousing of component FTP Protocol.
Attribute	Value
Name	FTP Protocol
Author	IBM Alphaworks
Manufacture date	1999.11
Warehousing date	2004.6
Version	1.0
Component type	JavaBean
Size	31K
Location	file:\\locolhost\resource\ftpprotocol.jar
Settings	OS with JDK1.3 support
Application domain	Software Development
Component function	Internet/Intranet → FTP
Reusing lever	Component-Level
…	…

Table 7.8 Facet value of component FTP Protocol.
Facet	Value
Component type	JavaBean
Running environment	OS with JDK1.3 support
Application domain	Network
Component function	Internet/Intranet → FTP
Reusing level	Component-Level

that each facet corresponds to. Thus, the organizing of components in the multifacet category tree is completed, forming the organization architecture as shown in Figure 7.7. In the LCRW the components FTP Client and FTPUI are the existing components and they belong to the same class as the component FTP Protocol. For the sake of comparison, the architecture of FTPUI and FTP Client are provided together in Figure 7.7.

Setting up the keyword index of the component. Based on the UCDL description files of the component FTP Protocol, from its function description field, we can retrieve the keyword table “FTP Client services/FTP Protocol/File Transfer.” The table can be added into LCRW as “Keyword” attribute field of the component, so as to be convenient for the program mining system to make a keyword-based search of components when analyzing the user's requirements.
Submitting the entity file of the component. The component entity is stored in the given directory in the server files system. At the same time, the field “address” in the component description table is changed into corresponding location information. The whole process of component warehousing ends.

2. Component Searching.

According to the category and organization in the LCRW, the LCRW system provides the following search methods:

Attribute-value based search
Keyword search
Faceted search, etc.

Based on the above component search methods, LCRW provides two ways of searching—interactive and automatic.

Interactive search. It directly locates the component corresponding to the search condition input by the user. If it fails, LCRW will further clarify the keywords for search by interacting with the user, and then complete the search.
Automatic search. To support the direct access to the component resource in the LCRW during program mining, LCRW provides the search tools that can be called by the program mining system. After the user requirements are analyzed by the program mining system, these tools will automatically search the candidate components according to the keyword table extracted.

From Chapter 5, we know that the user can input his/her requirements in different ways. When the input is natural language (e.g. English), the user interface of the program mining system will analyze and extract the keyword table including the application domain and function description from the user input. LCRW will use this keyword table to find the components needed by interactive search or automatic search.

The control process of the LCRW component search is illustrated in Figure 7.8.

Constructing the keyword expressions for the search by the user input analysis result. From Chapter 5 we know that the system will analyze and generate the keyword table and subfunction division order table after receiving the user input. The keyword table is for the searching of the components, and the subfunction sequence table is for the composing of the searched for components.
In the keyword table, both a single keyword and multiple keywords may correspond to one component. So we need to form the keyword expressions before searching according to the relations among the keywords in the table. Each keyword expression, composed of one or more keywords, corresponds to a component search condition. When the keyword expression is composed of multiple keywords, the search relation among them needs to be defined, such as “and,” “or,” etc.
Searching for the components by the keyword expressions. The commonly used keyword search algorithms are as follows:
- Sequential search. The sequential search is to read the components one by one according to their access address in the component warehouse, and compare them with the keywords in the keyword expressions one after another until the component needed is found.
  The sequential search provides low efficiency, not fitting the warehouse with large number of components.
- Dichotomy search. To improve the efficiency of search, one may sort the components in a certain order in advance. For such an ordered search, dichotomy search can be used for more efficient component search.
  The dichotomy search demands the presorting of the components. But there are often a huge number of components in the warehouse that are changing constantly. Any change would alter the sort structure of the components and in this sense dichotomy search costs too much.
- Hash search. The hash search defines a hash function or a hash table. In searching for the keywords, the hash function or hash table will transform a given keyword into the address of its corresponding component in the component warehouse. The target component is thus found.
  Hash search possesses a high efficiency. But it requires indexing of the components in the warehouse according to the keyword in advance in order to build the hash table.
- For the component search in LCRW, we adopt the hash search for keyword search. When each component is added into the warehouse, we extract its domain keywords and index the components in LCRW by the domain keywords. Thus the hash table is generated to establish the hash relation between the keyword and the corresponding component address in the LCRW. Consequently, in keyword matching according to the hash table we can quickly locate the component.
Returning the search result. Finally, the LCRW returns the search result. If the component required by the user is not found, the program mining system or the user needs to further modify the keywords for researching. Besides confirming the search result, the program mining system or the user may also input other search conditions (e.g., component format, running platform, etc.) to further verify if the component is what one needs.

Example 7.8 A component search in program mining

Given: A user inputs a service demand in natural language “I need a media player playing MP3 music file, avi and mpeg movie file.”

According to the requirement analysis mentioned in Chapter 5, the keywords are extracted as follows:

Keywords list: {MP3, avi, mpeg, decoder, player}

Step 1: The first step of component search is to form the keyword expression for the component search according to the keyword correspondence in the keyword list. Here, according to the requirement analysis and function division mentioned in Chapter 5, the service request is divided into four atomic functions. Likewise, we construct the following keyword expressions:
P1 = “decoder” + “MP3”
P2 = “decoder” + “avi”
P3 = “decoder” + “mpeg”
P4 = “player”

Each keyword expression corresponds to a component search condition. The “+” in the expressions indicates that the relation between the keywords is “and.”

Step 2: searching the components by the keyword expressions. For each keyword in the component search expressions, a group of candidate components is collected after the words looked up in a hash table. Then, the intersection of the groups of candidate components returned by multiple keyword searches is the final result of the component search.

Hash search of the components is shown in Figure 7.9. In this figure, for the first keyword expression P1, the keyword “decoder” is first searched in a

Hash table and a group of corresponding candidate components is returned. Likewise, the second keyword “MP3” is also searched in the Hash table and a group of candidate components corresponding to it is returned. Then, according to the “and” relation between the keywords defined in the keyword expression, the intersection of the two sequences of candidate components is the search result of the keyword expression P1, i.e., the component MP3Decoder as in Figure 7.9.

Step 3: returning the search result. Similarly, we use the hash table and search all the components corresponding to each keyword expression. The final component search result is as follows:

Component MultiMediaViewer
Component Name: MultiMediaViewer
Function: multimedia player component, capable of automatic selection of the decoder components for the files to be played
Component MP3Decoder
Component Name: MP3Decoder
Function: accept MP3 file passed by the component MultiMediaViewer, and accomplish the corresponding decoding task
Component AviDecoder
Component Name: AviDecoder
Function: accept avi file passed by the component MultiMediaViewer, and accomplish the corresponding decoding task
Component MpgDecoder
Component Name: MpgDecoder
Function: accept Mpeg file passed by the component MultiMediaViewer, and accomplish the corresponding decoding task

When the corresponding components are searched in the LCRW by the keyword table, the result is returned to the user or the program mining system, who will assemble the searched for components by the subfunction division order table.

3. Component Combination and Coordination.

The components under the same category of one domain in LCRW possess inter-operations, such as combination and coordination. The components with interoperations can be combined into an application system. Thus, recording the interoperations among the components, such as component

combination, component coordination, etc. in the LCRW will better support the component search and detection in program mining, and users and the program mining system will know the functions provided by the components better and can, therefore, apply them flexibly.

When expecting a component to replace another, we can use the component combination/coordination to find other components with the same function.
When detecting a required component by specific component search techniques, we can get related components and the application methods by acquiring the combination/coordination relation of the detected component. It provides the basis for the analysis of the user requirements, function decomposition, component assembly, and so on.
Each successful component composing will generate a corresponding operation of component combination/coordination, which is also stored in the LCRW. With the enlargement of the component warehouse, such records can serve as knowledge of task division for the analysis of the user requirements in program mining, or as examples of task division.

7.4 COMPONENT ACQUISITION

Component acquisition refers to searching for and finding the component resources from the Internet. Meanwhile corresponding to the various components found, UCDL transformation interfaces will be started to transform them into UCDL components, for unified organization and management by the LCRW.

According to the ways components exist on the Internet, we can divide component acquisition into three different methods, i.e., from the professional component warehouse, from the Web for the components published free by the developers, and from online components.

7.4.1 ACQUISITION FROM THE PROFESSIONAL COMPONENT WAREHOUSE

The component acquisition from the professional component warehouse can be divided into the following steps.

First, the user gets the URL of the Web site of the professional component warehouse through a search engine. The acquisition of this URL can only be completed through interaction between the user and the system.

Then, the system searches the professional component warehouse for the required components according to its organization structure and category.

When such components are found, the program mining system needs to transform them into UCDL components and extracts the domain keywords for the application.

When the component transformation and the keyword extraction are completed, the operation of component warehousing in Section 7.3 can be applied to add the component into the LCRW.

7.4.2 ACQUISITION FROM AN ONLINE PLATFORM

Different from the professional component warehouse, the online component allows people to use its functions instead of allowing the component entity to be downloaded. To employ it is to perform it and acquire its service through the network.

For the online component, the program mining system cannot get the program entity, thus risking to fail the user's time requirement or consistency with other components. To solve the problem, we need to know information like the function description, location index, and other parameters of the component in advance, transform the information into the unified description by UCDL, and add them into the LCRW after the domain keywords are extracted. When the component is the one on the online platform without download permission, the field “location” in the UCDL description denotes the URL of the component entity.

7.4.3 ACQUISITION FROM THE WEB FOR COMPONENTS PUBLISHED FREE

Besides the components in the professional component warehouse and in the online platform, there are numerous components on the Internet published free by the developers, like JavaBean, EJB, ActiveX/COM, etc. These components have their own features in publishing. According to such features, we can use intelligent agent system to search and find the components automatically or interactively on the Internet. Then, the components found are transformed into unified components described by UCDL and added into the LCRW.

For example, people always need to know the method of Web publishing of the JavaBean component in advance if they are to get the JavaBean component from the Web. There are two ways for publishing JavaBean. One is the developer directly publishing the JaveBean component on the Web,

which is realized through inserting the JavaBean component into the Java Applet embedded in the Web page.

The publishing of the Applet in the Web page is shown in the following:

In the above example of Applet publishing, there are three key parameters, i.e., codebase, code, and archive. The “codebase” parameter and the URL of the Applet embedded enable the user of the component to get the location of the target or source program of the Applet. The parameter “code” gives the category name of the main program of the Applet. The parameter “archive” provides all other classes used by the Applet or the names of the archived file for the JavaBean component and the resource. Because the archived file of the Applet is also published on the Web, we can find the JavaBean component in the archived file through the archive information provided by the Applet. Then, we can further transform it into the UCDL format, extract the domain keywords, and add the component in the LCRW by the operation of component warehousing mentioned in Section 7.3.

The other way of publishing JavaBean is to provide the link in the Web page. Now, the component requestor should search for, acquire and download the component through interaction. Then, the downloaded component will be transformed into UCDL format, and added into the LCRW after the domain keywords are extracted.

REFERENCES

Anido, L., and Llamas, M. (2001). A contribution to the e-learning standardization. Proc. of the 2nd IEEE Conference Standardization and Innovation in Information Technology, 3–6 Oct. 2001, (pp. 295–309). IEEE (IEEE Press).

Benatallah, B., Dumas, M., Sheng, Q. Z., et al. (2002). Declarative composition and peer-to-peer provisioning of dynamic Web services. Proc. of the International Conference on Data Engineering (pp. 297–308). IEEE (IEEE Press).

Benatallah, B., Sheng, Q. Z., and Dumas, M.(2003). The self-serv environment for web services composition. IEEE Internet Computing, 7(1), 40–48.

Berztiss, A. T. (2002). Capabilities for E-commerce. Proc. of the 13th International Workshop on Database and Expert Systems Applications, 2–6 Sept (pp. 875–879). IEEE (IEEE Press).

Chandrasekaran, S., Silver, G., et al. (2002). Web service technologies and their synergy with simulation. Winter Simulation Conference Proceedings. 1 (pp. 606–615). IEEE (IEEE Press).

Chang, Y. I., and Lee, C. I. (1997). Alternating hashing for expansible files. IEEETransactions on Knowledge and Data Engineering, 9(1), 179–185.

Chiu, D. K. W., Karlapalem, K., and Li, Q. (2001). E-ADOME: Enacting composite E-services in an advanced workflow environment. Proc. of the IEEE Computer Society's International Computer Software and Applications Conference (pp. 311–316). IEEE (IEEE Press).

Cho, K. M., Kang, K. W., and Kang, Y. H., et al. (2003). Grid services building mechanism using web services model based on OGSA. Proc. of the International Conference on Communications in Computing (pp. 161–165). IEEE (IEEE Press).

Classification Schemes, Taxonomies, Identifier Systems, and Relationships, Version 2.04 December 11, 2002. http://uddi.org/taxonomies/UDDI_Taxonomy_tModels.htm

Davis, L., Gamble, R. F., and Kimsen, S. (2004). A patterned approach for linking knowledge-based systems to external resources. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 34(1), 222–233.

Dongsik, C., Jihye, P., Kim, G. J., Sangwoo, H., Sungho, H., and Seungyong, L. (2003). The dichotomy of presence elements: the where and what. Proc. of the IEEE Reality Conference, March 22–26, (pp. 273–274). IEEE (IEEE Press).

Fan, I. Y. H., and Chao, S. C. (2003). Service creation with web services and SOA. Transactions Hong Kong Institution of Engineers, 10(4), 31–34.

Fang, C. H., Zhang, Y. X., and Xu, K. G. (2003). An XML-based data communication solution for program mining. Intelligent data engineering and automated learning. 4th International Conference (pp. 569–575). Hong Kong, Berlin, Heidelberg: Springer-Verlag.

Fortier, P. J., and Smart, A. (2001). Web based e-government data distribution. Proc. of the 34th Annual Hawaii International Conference on System Sciences, Jan 3–6, (p. 10). IEEE (IEEE Press).

Hull, R., Benedikt, M., Christophides, V., and Su, J. W. (2003). E-Services: a look behind the curtain. Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 22 (pp. 1–14). IEEE (IEEE Press).

Kollios, G., Tsotras, V. J. (2002). Hashing methods for temporal data. IEEE Transactions on Knowledge and Data Engineering, 14(4), 902–919.

Ladan, M. (2003). An overview of e-commerce technologies and challenges. ACS/IEEE International Conference on Computer Systems and Applications, July 14–18 (p. 115). IEEE (IEEE Press).

McIlraith, S. A., Son, T. C., and Zeng, H. (2001). Semantic web services. IEEE Intelligent Systems and Their Applications, 16(2), 46–53.

Medjahed, B., Rezgui, A., et al. (2003). Infrastructure for e-government Web services. IEEE Internet Computing, 7(1), 58–65.

Nunez, S. J., O'sullivan, D., Brouchoud, H., et al. (2000). Experiences in the use of FIPA agent technologies for the development of a personal travel application. Proc. of the International Conference on Autonomous Agents (pp. 357–364).

O'sullivan, D., and Lewis, D. (2003). Semantically driven service interoperability for pervasive computing. Proc. of the Third ACM International Workshop on Data Engineering for Wireless and Mobile Access: MobiDE (pp. 17–24). ACM (ACM Press).

Peltz, C. (2003). Web services orchestration and choreography. IEEE Computer,36(10), 46–52.

Polsani, P. R. (2002). E-learning and the status of knowledge in the information age. Proc. of the International Conference on Computers in Education, Dec. 3–6, 2 (pp. 2:952–956). IEEE (IEEE Press).

Poulin, J. S., et al. Organization and Component Classification in the IBM ReuseLibrary. IBM Technical Report, TR003730, 1993-04-15

Tosic, V., Pagurek, B., Esfandiari, B., et al. (2002). Management of compositions of E- and M-business web services with multiple classes of service. IEEE Symposium Record on Network Operations and Management Symposium (pp. 935–939). IEEE (IEEE Press).

Vidal, J. M., Buhler, P., and Stahl, C. (2004). Multiagent systems with workflows. IEEE Internet Computing, 8(1), 76–82.

Watt, D., and Willey, K. (2003). The project management—systems engineering dichotomy. Engineering Management Conference, 2003. IEMC '03. Managing Technologically Driven Organizations: The Human Side of Innovation and Change (pp. 306–310).

Wolfson, H. J., and Rigoutsos, I. (1997). Geometric hashing: an overview. Computational Science and Engineering, IEEE [see also Computing in Science & Engineering], 4(4), 10–17.

Xia, D. L., Zhang, Y. X., and Fang, C. H. (2003), Design and implementation of an agent-based program mining system. Chinese Journal of Electronics, 31(5), 793–796.

Xu, H. N. (2003). Web services oriented architecture for electronic commerce. IEEE International Engineering Management Conference (pp. 479–483). IEEE (IEEE Press).

Xu, K. G., Zhang, Y. X., and Fang, C. H. (2003). Introducing XML to program mining. Proc. of the International Conference on Telecommunications (pp. 163–168). Beijing: Publishing House of Electronics Industy.

Zhang, R. Y., Arpinar, I. B., and Aleman, M. B. (2003). Automatic composition of semantic web services. Proc. of the International Conference on Web Services (pp. 38–41). IEEE (IEEE Press).

Zhang, Y. X., Fang, C. H., and Wang, Y. (2004). A feedback-driven online scheduler for processes with imprecise computing. Journal of Software, 15(4), 616–623.

Active Services: Concepts, Architecture and Implementation