Syllabuses Crawling and Knowledge Extraction of Courses about Global Standardization Education

Received December 2014; Accepted March 2015

Hiroshi Nakanishi1, Tetsuo Oka2 and Yoshiaki Kanaya3



Abstract

With progress of social globalization, global standards are becoming more important. This means that more human resources should be developed who are engaged in standardization. To realize lively situation of the global standardization education in universities, it is very important to know the programs, courses and their contents for the education about global standardization in universities. In this paper, firstly, current situation of education about global standardization is studied. As a result, 3 education programs consisting of plural courses and 45 courses were found. Secondary, a new crawling technology to collect syllabuses of courses published on the websites of universities is proposed. Also, a new method of knowledge extraction from the crawled syllabuses is proposed. Using these technologies proposed, syllabuses of 132 Japanese universities including all the 88 national universities were crawled successfully. As a result of the knowledge extraction of the global standardization courses from the syllabuses crawled, it is made clear that 45 courses about global standardization education are offered by 24 Japanese universities. This paper also shows a result of knowledge classification of the 45 syllabuses.



Keywords


1 Introduction

Changing ecological system due to global warming, environment contamination by developed industrialization, increase of ecological footprints due to increasing population and decrease of bio capacity make the circumstance of the earth hard.

3 billion of world population in 1960 have increased by 2.3 times to 7 billion in 2011. Human behavior coupled with the activities that need more convenience and more comfort, which has linked with decrease of natural resources and biological products. It has been a big threat to the global sustainability.

To make global society sustainable, it is important to tackle the problem to solve in cooperation with both advanced countries and advancing countries.

One of the important systems to encourage the cooperation of countries is an approach to global standardization.

ICT makes it possible to overcome distance and time between people worldwide, and people can connect to the world while sitting in the comfort of his/her home. Also, ICT contributes significantly to a realization of sustainable global society through the reduction of energy of cars and trains.

Products and services in conformity with the global standards will become available for users everywhere in the world, and global standards can realize the bulk production with stable low price to the world markets and will advance efficient consumption of resources and energies in manufacturing.

Additionally, the formulation of global standards for environmental protection and the control of toxic substances will greatly contribute to the sustainable global society.

As described above, the roles of global standards are extremely important to achieve the sustainable global society. To continue formulating global standards, it is necessary to cultivate ‘human resources for global standardization’ who will act for the formulation of global standards.

So, strengthening and popularization of the global standardization education are highly important.

In this paper, firstly, situations of global standardization education in universities are surveyed and overviewed.

Secondly, new technologies for syllabuses crawling and knowledge extraction from the crawled syllabuses are proposed. Namely, a crawler collects syllabuses from two different types of syllabus web-pages of universities and extracts a lump of knowledge contained in each of the syllabuses by morphological analysis and unrelated words filtering technologies.

In third, by using a system incorporate with the proposed technologies, syllabuses of 132 Japanese universities including all the 88 national universities were successfully crawled. As a result, it is clarified that 45 courses about global standardization are being offered by 24 Japanese universities.

This paper is written basing on the paper presented at the ITU Kaleidoscope Academic conference 2014 [1].

2 Survey about Standardization Education Situation

2.1 Global Standardization Education in Universities

Through checking websites of 132 Japanese universities including all 88 national universities, it has found that 45 courses concerning global standardization are offered. Also, it has found that Kanazawa Institute of Technology and Osaka University have been offering global standardization education programs [26] which consist of plural courses concerning global standardization for graduate students. It is summarized as follows.

  1. Graduate major program and certificate program for the development of global standardization strategy professionals at Kanazawa Institute of Technology [4].

Required number of credits for the graduate major program completion is 36, consisting of 10 credits for the 7 specified courses about global standardization, 18 credits for the 9 fundamental courses including intellectual properties and 8 credits for the seminar. Graduate students who have completed the above 36 credits get the master’s degree and the certificate of the global standardization strategy professional course. Also, the certificate program named “International Standardization Strategic professional program” is offered to credited auditors, whose completion condition is 10 credits for the 7 specified courses about global standardization same as above.

  1. Graduate minor program and certificate program about global standardization at Osaka University [5].

Required number of credits to complete both of the programs is 8, which should be obtained by studying more than 4 courses among 14 courses specified by the program. The graduate students who have completed 8 credits get the certificate of the graduate minor program of the global standardization in case of the completion of their master’s degree program.

Credited auditors of the certificate program get the certificate when they completed 8 credits by studying more than 4 courses among 14 courses specified by the program.

The purpose of the above global standardization education program is to let the graduate students to obtain the knowledge and abilities that are necessary for global standardization activities [6].

The purpose of the above global standardization education program is to let the graduate students to obtain the knowledge and abilities that are necessary for global standardization activities [7].

About foreign universities, a master’s degree program is found as in the following.

  1. Master’s degree program named “Master in standardization, social regulation and sustainable development” at University Geneva [8] was collected.

Required number of credits for the completion of the program is 90 in ECTS, consisting of 15 credits for the 5 courses about global standardization, 42 credits for the 14 courses relating to environmental policy, global health, governance, public policy and international political economy of standards, 15 credits for courses dependent on the student’s choice and 18 credits for the internship and the thesis.

The examination results described above are summarized in Figure 1.

images

Figure 1 Education programs and courses about standardization in universities.

2.2 Situation of Education in Industries, Government and Academic Societies in Japan

In addition to universities, education about global standardization is offered by industries, governments and academic societies in Japan.

As global standardization education at industries, companies that approach global standardization set up training and commendation for their approaches to standardization.

As global standardization education at government and foundation, Ministry of Economy, Trade and Industry and Ministry of Internal Affairs and Communications offer delivery lectures by their employees. Japanese Standards Association and Association of Radio Industries and Businesses develop educational materials and offer delivery lectures about global standardization.

As standardization investigative commission in collaboration with academic societies, universities and government, “Standardization Education Research Committees” are formed under the Institute of Electronics, Information and Communication Engineers and conduct their activities from 2012. Since March 2013, ‘network of universities for standardization education’ has been studying the education on global standardization.

3 Syllabuses Crawling and Knowledge Analysis of Courses about Global Standardization in Japanese Universities

Almost all the Japanese universities open syllabuses of courses in their homepages. Although research and development have been done about syllabus processing systems [912], syllabuses crawling and knowledge extraction have not been reported yet.

To automatically collect syllabuses published on the websites of universities and analyze their contents, use of a commercially available crawler software is very efficient. Functions of a crawler are as follows.

  1. To collect information on websites according to the direction given by a user.
  2. To extract index words contained in the crawled information by morphological analysis.

By using the functions shown above and adding new technologies, syllabuses collection and the knowledge extraction from the crawled syllabuses become possible.

3.1 Survey on How to Access the Syllabus Pages

To successfully crawl syllabuses opened in the websites of universities, it is necessary for the crawler to know how to access the syllabus websites. So, some survey was done about the structure of the syllabus websites by accessing manually the syllabuses of many universities. The result is that there are two types of syllabus web pages was as shown in Figure 2. From Figure 2, following points should be noticed.

images

Figure 2 Two types of syllabus webpage structure.

  1. Syllabus websites consist of syllabuses-table pages and syllabus pages.
  2. Structures of the syllabus websites differ by universities and are classified into two types.

Type 1: Syllabuses are stored in a database and each of them is retrieved by searching the database one by one.

Type 2: Syllabuses are described on the syllabus web-pages and each of them is retrieved by copying the syllabus web-page one by one.

So, crawling requires a procedure guide table (PGT) which gives a crawler the data to access each of the syllabus websites of universities.

In case of crawling type 1 syllabus websites, adequate data input to a database are required twice to get a syllabuses-table and to retrieve syllabuses. In case of crawling type 2 syllabus websites, adequate data input to a web page is required once to get a syllabuses-table.

3.2 Crawling System Design

3.2.1 Crawling method

Using the results described in Section 3.1, crawling methods are designed as in the following.

  1. Crawling method for the syllabuses of Type 1

The first step is to enter data to a database for getting the syllabus-list.

The second step is to designate one of the syllabuses in the syllabus list and to store it in a storage.

Then, syllabuses are crawled by repeating the second step.

  1. Crawling method for the syllabuses of Type 2.

As the syllabus list is described in the syllabus web page, syllabuses can be crawled by entering data to the syllabus list and copying each of the syllabuses one by one.

3.2.2 Procedure guide table (PGT) for crawling

The PGT is created by manually accessing the syllabus websites before crawling. and contains data necessary for crawling each of the syllabus websites, such as the first access web address, the type of syllabus webpage, the input data to access to the syllabus pages. The crawler refers to the data in the PGT and crawls syllabus websites.

University A: First access URL, Type 1, Data 1, Data 2

University B: First access URL, Type 1, Data 1, Data 2

             |    

University X: First access URL, Type 2, Data

3.2.3 Syllabuses crawling system design

Using the methods described above, syllabuses crawling system is designed as shown in Figure 3.

images

Figure 3 Syllabuses crawling system construction.

In Figure 3, the syllabuses crawling system consists of 4 parts. They are,

  1. Application software for crawling

    It orders the crawler to access the syllabus websites following the procedure guide table data.

  2. Procedure guide table for crawling

    It contains the data for crawling websites of universities

  3. Crawler

    It accesses the web-pages of universities under the order of the application software, collects the syllabuses and stores them to the storage. Also, it extracts index-words from each of the crawled syllabuses.

  4. Storage

    It is used to store collected syllabuses and their index words generated by the crawler.

Hence, the crawler collects syllabuses automatically and also generates index words contained in the syllabuses by a morphological analysis of the description in the crawled syllabuses. The index words are used for syllabuses searching.

3.3 Design of Knowledge Extraction from Syllabuses

In this section, knowledge extraction method from crawled syllabuses is described.

In a syllabus, much information is described, such as a course name, number of credits, lecturer’s name, learning outcomes, weekly lecture plan and various kind of knowledge that a course offers.

Figure 4 shows an algorithm to extract knowledge words from a syllabus.

images

Figure 4 Knowledge extraction algorithm from each of the syllabuses.

A crawler automatically extracts index words from each of the crawled syllabuses and store them in the storage. The index words consist of various kinds of words, such as academic knowledge words, lecturer’s name, class room and so on.

To extract words about knowledge, it is necessary to remove unrelated words to knowledge from the index words.

For this, a filtering software is introduced which removes unrelated words from the index words

Unrelated words are specified by a user who tries to extract knowledge from syllabuses.

The filtering software extracts knowledge from the index words by removing the unrelated words as shown in Figure 3. In Figure 3, “lecturer’s name” and “class room” are specified to be the unrelated words as an example.

3.4 System Design for Syllabuses Crawling and Knowledge Extraction

Using the design explained above, the system for syllabuses crawling and knowledge extraction is designed as shown in Figure 5.

images

Figure 5 System block-diagram for syllabuses crawling and knowledge extraction.

The system consists of 6 parts. They are,

  1. Application software for controlling the system
  2. Crawler for collecting syllabuses and generating index words
  3. Storage for storing the crawled syllabuses, their index words and extracted knowledge words
  4. Search engine for searching the index words
  5. Filter software for removing unrelated words to knowledge from index words
  6. Table of the words unrelated to knowledge

In Figure 5, the crawler collects syllabuses from the syllabus websites of universities and generates index words. The filter software extracts knowledge words from each of the crawled syllabuses and stores them with each of the syllabus.

4 Results of Syllabuses Crawling and Knowledge Analysis

The authors have developed a system named “Interdisciplinary education support system” [13, 14], that implements the design explained in Chapter 3.

Using this “Interdisciplinary education support system”, syllabuses crawling was successfully done from 132 Japanese universities including 88 national and 44 major public and private universities.

The reason why crawling targets are only Japanese universities is due to poor information about syllabus websites in foreign universities.

4.1 Selection of Syllabuses about Global Standardization

The crawler collects all syllabuses of universities. So, it is necessary to select syllabuses about global standardization from the crawled syllabuses.

4.1.1 Syllabuses selection related to global standardization

Syllabuses that offer knowledge related to global standardization can be selected by searching the crawled and stored syllabuses using the search engine in the system.

Key words for searching are set by a lump of knowledge required for global standardization. They are,

  1. knowledge concerning “meaning of global standard and related organizations and associations”
  2. knowledge concerning “organizations which formulate global standard and procedure”
  3. knowledge concerning “intellectual properties relating to global standardization”
  4. knowledge concerning “strategies of business and management”
  5. knowledge concerning “national strategies towards global standardization”
  6. knowledge concerning “ability of negotiations”

4.1.2 Result

Syllabuses that contain the knowledge words concerning global standardization were searched among the 132 crawled syllabuses using the keywords of a.~f. shown above.

Table 1 Number of Global standardization courses of universities in Japan

Number of Courses
Number of Universities at Graduate School at Undergraduate School
24 40 5

As a result, 45 syllabuses of 24 universities were found. 40 syllabuses are offered at graduate schools and 5 at undergraduate schools as shown in Table 1. These 45 syllabuses are still available now.

4.2 Extraction and Analysis of Knowledge about Global Standardization

Knowledge words were extracted from each of the 45 syllabuses by using the method explained in Section 3.4.

Extracted knowledge words are classified into 13 categories of knowledge as shown in Appendix Table 1.

In Appendix Table 1 1, 13 categories of knowledge are written in column wise and 45 syllabuses in line writing direction.

13 categories of knowledge are as follows.

  1. Meaning and institutes of global standardization
  2. Procedure for formulation of international standards
  3. Policy of standardization
  4. Human resources for standardization
  5. Intellectual properties and patent system
  6. Patent pool
  7. Management of intellectual properties / strategies
  8. Negotiation
  9. Communication ability
  10. Innovation
  11. Research and development /strategies
  12. Business model
  13. Business competitiveness in international market / strategies / management

Major results are explained below. It should be noticed that the word of ‘course’ is used instead of the word of ‘syllabus’ to help readers of this paper imaging the results easier.

From appendix Table 1, it can be understood that each course contains different categories of knowledge. Also, some of the categories of knowledge are common among the courses.

Table 2 shows the number of courses that offer each of the categories of knowledge.

Table 2 Knowledge classification of courses of universities

Knowledge (Large Classification) Category of knowledge (Middle Classification) Number of Courses
Standardization Meaning and institutes of global standardization 44
Procedure for the establishment of international standards 14
Policy of standardization 9
Human resources for standardization 4
Intellectual properties Intellectual property and patent system 25
Patent pool 7
Management of intellectual properties and strategies 8
Negotiation Negotiation 9
Communication ability 6
Research and Innovation 18
Development Research and development strategies 12
Business Business model 31
Business competitiveness in international market, strategies and management 17

From Table 2 and appendix Table 1, the following points can be understood.

  1. The category of knowledge of “Meaning and institutes of global standardization” is commonly offered by 44 courses.
  2. The categories of knowledge of “Business model” and ‘Intellectual property and patent system’ are offered by more than half of the courses.
  3. The categories of knowledge of “Procedure for the establishment of international standards”, “Innovation”, “Research and development strategies” and “Business competitiveness in international market, strategies and management” are offered by 30~40 percents of the courses.
  4. The category of knowledge of “Negotiation” is offered by 6 courses.

5 Considerations

In this chapter, firstly, new technologies proposed are evaluated. Secondary, the knowledge of courses about global standardization offered by Japanese universities is considered. In third, knowledge of courses about global standardization offered by Japanese universities is considered. In fourth, the difference of study effects between programs and courses is considered.

  1. About the technology for syllabuses crawling and knowledge extraction.

In general, web-site crawling is very popular and collects web pages that can be crawled without any special operation such as entering pass-words to websites.

But, when syllabus crawling, it is required to enter proper data to some of the syllabuses websites. The proper data can be obtained by manual access to each of the syllabuses web-sites and are set in the procedure guide table for crawling.

This technology proposed in this paper is new and very essential for syllabuses crawling.

Also, proposed knowledge extraction method from the index words of crawled syllabuses is new and very useful for designing education programs and courses.

  1. About the validity of offering courses for global standardization at graduate schools.

In Japanese research universities, almost all the students of science and engineering study at graduate schools. Graduate students have enough times to study some courses for global standardization. Also, graduate students have cultivated abilities to think and understand things such as global standardization with broader perspectives.

So, offering courses for global standardization at graduate schools is reasonable and effective.

  1. About the knowledge of courses about global standardization offered by Japanese universities.

To get the knowledge about global standardization, varieties of knowledge are required to study, such as standardization procedures, intellectual properties, research and development, negotiation and business and so on.

As described in Section 4.3, each of the 45 courses about standardization offer some but not all of the 13 categories of knowledge due to the different educational objectives of each of the courses.

It is necessary for the students to select the course from the view point of their study goals.

  1. About the difference of study effects between programs and courses.

To understand fully about global standardization, wide range of knowledge are required such as the meaning of standards, intellectual properties, business management, strategies, policies and regulations, negotiation and so on.

Programs are designed to offer wide range of knowledge by the combination of courses. So, studying a program makes it possible for the students to obtain various kind of knowledge systematically.

Single course study is useful for the students to get reviewing knowledge about global standardization.

6 Conclusion

In this paper, firstly, current situation of education about global standardization in universities are surveyed and made clear. Namely,

  1. 45 courses about global standardization are offered at 24 Japanese universities.
  2. Kanazawa Institute of Technology and Osaka University offer global standardization education programs which consist of plural courses for graduate students and credited auditors.
  3. University of Geneva offers a master’s degree program of “Master in Standardization, Social regulation and Sustainable Development”.

Secondary, new methods for syllabuses crawling and knowledge extraction of courses were proposed. Namely,

  1. Survey on the syllabus webpage construction of Japanese universities made it clear that there are two types of syllabus web pages such as database storage type and webpage description type.
  2. New crawling technology was proposed that enables to collect syllabuses from two types of syllabus web pages.
  3. Also, a new filtering method was proposed that extracts knowledge words by removing unrelated words to knowledge from the index words generated by the crawler.

In the third, syllabuses crawling and knowledge extraction from 132 Japanese universities were executed by using a system that implements the above proposed methods. As a result, following fruitful results were confirmed.

  1. Syllabuses were successfully crawled from 132 Japanese universities. Also, 45 syllabuses about global standardization were selected from 132 crawled syllabuses by searching.
  2. Knowledge was successfully extracted from 45 syllabuses about standardization and classified into 13 knowledge categories as shown in Table 2 and appendix table 1.

These technologies and results described in this paper will contribute to make the education about global standardization in universities more active and to realize joint education between universities.

One of the further tasks is to crawl syllabuses from universities worldwide.

For this, it is necessary to get syllabus website addresses of universities. They will be gotten from the questionnaire about global standardization education in universities that ITU-T director’s Ad.hoc group is now planning [15].

References

[1] Hiroshi Nakanishi, Tetsuo Oka, Yoshiaki Kanaya, “Syllabuses Crawling and Knowledge Extraction of Courses for Global Standardization Education”, Proceedings of the 2014 ITU Kaleidoscope Academic Conference, pp.191–196, June 2014.

[2] Hiroshi Nakanishi, “Lump of Knowledge Based Design of Global Standardization Education Program For Graduate students of Universities”, The Journal of IIEEJ of Japan, Vol. 42 No 3, pp. 396–400, 2013

[3] Hiroshi Nakanishi, “Graduate Minor education Program on Global Standardization”, Proceedings of IEICE General Conference, BP5-2, 2013.

[4] http://www.kanazawa-it.ac.jp/tokyo/ip/ip2.htm

[5] http://www.osaka-u.ac.jp/jp/facilities/gakusai/en/index.html

[6] Hiroshi Nakanishi et al., “Global Standardization Education Program Collaborated by Osaka Univ. and MJIIT UTM”, Journal of ICT Standardization, pp. 59–82, Vol 1, 2013.

[7] “Education about Global Standardization in Japan:IEICE Questionnaire Survey”, Standards Education AHG – Document 009, Documents 2nd Meeting 20130425-Japan, TSB Director’s Ad hoc Group, 2013. http://www.itu.int/en/ITUT/academia/Pages/stdsedu/documents.a spx?RootFolder=%2Fen%2FITU-T%2Facademia%2FDocuments%2Fstdsedu&

[8] http://www.standardization.unige.ch/rationale-for-the-proposed-master-program/course-list.html

[9] Yasuhiko Tsuji, “Information Extraction from Course Syllabi for Automatic Metadata Generation”, IEICE Technical Report ET2009-74, 2009.

[10] Fuyuki, Yoshikane, “Syllabus Retrieval Considering Relationship between the Search Term and its Synonyms”, Japan Society for Fuzzy Theory and intelligent information Vol. 18, No 2, pp. 299–309, 2006.

[11] Daisuke, Hie, “Development of Kyushu University WEB syllabus cross-searching system”, IPSJ SIG Technical Report Vol. 2010-DPS-145, 2010.

[12] Takashi, Kawabata, “Development of General-purpose Syllabus System with Syllabus Object mapping to XML”, IPSJ SIG Technical Report Vol. 2009-DPS-141, 2009.

[13] Hiroshi Nakanishi, “Development of the Interdisciplinary Education Support System” , JSET The 26th Annual Conference 2a-508-10, 2009.

[14] Hiroshi Nakanishi et al., “Development of Collecting System of Information on Research and Education of Universities”, Journal of the Institute of Image electronics Engineers of Japan, pp. 194–202, vol. 43, no. 2, 2014.

[15] Education about Standardization AHG-Documents 021: Report of the third AHG meeting, June 2014. http://www.itu.int/en/ITU-T/academia/Pages/stdsedu/default.aspx

Appendix

Appendix Table 1. Knowledge extracted from 45 syllabuses about standardization

images

images

images

Biographies

Image

H. Nakanishi Lecturer, former Professor, Osaka University. He was born in 1947. He graduated from graduate school of engineering of Osaka University in 1973. He received BS and MS degrees from Osaka University and PhD degree from Waseda University. He joined ECL of NTT( Electrical Communication Laboratory of Nippon Telephone and Telegraph public corporation) as a researcher in 1973. His major is electronics and information science. He had been researching and developing Magnetic and Optical storage devices, storage systems and network filing systems. In 2006, he moved to Osaka University as a professor for the interdisciplinary research and education, where he has been researching designs of interdisciplinary education program through analysis of social needs and is teaching a program of Global Standardization. He is a member of The Japan Society of Information and Communication Research, a member of The Institute of Electronics, Information and Communication Engineers, also a member of The Institute of Image Electronics Engineers of Japan.

Image

T. Oka Engineer, former Specially-appointed Professor of Osaka University. He was born in 1949. He graduated from graduate school of engineering of Osaka University in 1973. He received BS and MS degrees from Osaka University. He joined Mitsubishi Corp. as an engineer in 1973. His major is electronics. He had been researching and developing information systems. In 2006, he moved to Osaka University as a specially-appointed professor for the interdisciplinary research and education, where he had been researching designs of interdisciplinary education systems. He retired from Osaka University in 2012. From 2012, he is working for Tresbind Corp. as a system engineer. He is a member of The Japan Society of Information and Communication Research.

Image

Y. Kanaya Engineer, former Specially-appointed Research Associate of Osaka University. He graduated from graduate school of engineering of Kinki University in 1994. He received BS and MS degrees from Kinki University. He joined Seiyu Corp. as an engineer in 1994. In 2012, he moved to Osaka University as a specially-appointed research associate for the interdisciplinary research and education, where he had been researching designs of interdisciplinary education systems. He retired from Osaka University in 2014. From 2014, he is working for Brain Gate Co. LTD.

Abstract

Keywords

1 Introduction

2 Survey about Standardization Education Situation

2.1 Global Standardization Education in Universities

images

2.2 Situation of Education in Industries, Government and Academic Societies in Japan

3 Syllabuses Crawling and Knowledge Analysis of Courses about Global Standardization in Japanese Universities

3.1 Survey on How to Access the Syllabus Pages

images

3.2 Crawling System Design

3.2.1 Crawling method

3.2.2 Procedure guide table (PGT) for crawling

3.2.3 Syllabuses crawling system design

images

3.3 Design of Knowledge Extraction from Syllabuses

images

3.4 System Design for Syllabuses Crawling and Knowledge Extraction

images

4 Results of Syllabuses Crawling and Knowledge Analysis

4.1 Selection of Syllabuses about Global Standardization

4.1.1 Syllabuses selection related to global standardization

4.1.2 Result

4.2 Extraction and Analysis of Knowledge about Global Standardization

5 Considerations

6 Conclusion

References

Appendix

images

images

images

Biographies