People, Processes, and Products: Case Studies in Open-Source Software Using Complex Networks
AuthorMa, Jian James
Management Information Systems
MetadataShow full item record
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
AbstractOpen-source software becomes increasingly popular nowadays. Many startup companies and small business owners choose to adopt open source software packages to meet their daily office computing needs or to build their IT infrastructure. Unlike proprietary software systems, open source software systems usually have a loosely-organized developer collaboration structure. Developers work on their "assignments" on a voluntary basis. Many developers do not physically meet their "co-workers." This unique developer collaboration pattern leads to unique software development process, and hence unique structure of software products. It is those unique characteristics of open source software that motivate this dissertation study. Our research follows the framework of the four key elements of software engineering: Project, People, Process and Product (Jacobson, Booch et al. 1999). This dissertation studies three of the four P's: People, Process and Product. Due to the large sizes and high complexities of many open source software packages, the traditional analysis methods and measures in software engineering can not be readily leveraged to analyze those software packages. In this dissertation, we adopt complex network theory to perform our analysis on open source software packages, software development process, and the collaboration among software developers. We intend to discover some common characteristics that are shared by different open source software packages, and provide a possible explanation of the development process of those software products. Specifically we represent real world entities, such as open source software source code or developer collaborations, with networks composed of inter-connected vertices. We then leverage the topological metrics that have been established in complex network theory to analyze those networks. We also propose our own random network growth model to illustrate open source software development processes. Our research results can be potentially used by software practitioners who are interested to develop high quality software products and reduce the risks in the development process. Chapter 1 is an introduction of the dissertation's structure and research scope. We aim at studying open source software with complex networks. The details of the 4-P framework will be introduced in that chapter. Chapter 2 analyzes five C-language based open source software packages by leveraging function dependency networks. That chapter calculates the topological measures of the dependency networks extracted from software source code. Chapter 3 analyzes the collaborative relationship among open source software developers. We extract developer's co-working data out of two software bug fixing data sets. Again by leveraging complex network theory, we find out a number of topological characteristics of the software developer networks, such as the scale-free property. We also realize the topological differences between from the bug side and from the developer side for the extracted bipartite networks. Chapter 4 is to compare two widely adopted clustering coefficient definitions, the one proposed by Watts and Strogatz, the other by Newman. The analytical similarities and differences between the two clustering coefficient definitions provide useful guidance to the proposal of the random network growth model that is presented in the next chapter. Chapter 5 aims to characterize the open source software development process. We propose a two-phase network growth model to illustrate the software development process. Our model describes how different software source code units interconnect as the size of the software grows. A case study was performed by using the same five open source software packages that have been adopted in Chapter 2. The empirical results demonstrate that our model provides a possible explanation on the process of how open source software products are developed. Chapter 6 concludes the dissertation and highlights the possible future research directions.
Degree ProgramGraduate College
Management Information Systems