Integrated Statistical Modeling of Engineering Data with Shared Information
MetadataShow full item record
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
AbstractData has become exceedingly important to organizations, especially with regard to decision making. The large amount of available data is crucial in engineering applications as it ensures the understanding of the problem and the efficient execution of the solution. Studies have also shown that the employment of large amounts of data in engineering applications makes it easier for the data-driven model to generate insights that can be acted on in the best interest of the optimal solution. With the existence of multiple data sources, we can unveil hidden patterns and trends to determine possible relationships in the most complex engineering applications. This has traditionally been achieved by building single statistical models independently to explain single data sources. Nonetheless, when there exists a correlation among several data sources, a single statistical model strategy has been shown to be time consuming, result in loss of pertinent information, and is tedious. It is against this backdrop that this dissertation aimed at developing statistical models that can accurately predict the responses of three important engineering applications. To achieve this aim, this dissertation developed three integrated statistical modeling (ISM) techniques for these three applications. The choosing of the techniques was informed by the fact that they have shown great performance benefits. First was the modeling of multivariate profiles. In some manufacturing processes, profile data are collected to monitor process variations. In situations when multiple profiles are collected together, correlations might exist across profiles. Modeling these multivariate profiles requires describing both within and between profile correlations. Second was the discovery of material oxides, which often suffers from data scarcity. In some cases, collecting data from the target source can be expensive, while there are auxiliary data sources that are cheaper to collect. In such situations, auxiliary data sources can be exploited to improve the performance of the expensive target data. Third was the investigation of data by grouping information where subjects are clustered into various groups. Through employment of the above strategies and applying them to examples and case studies, it was evident that improvement of the prediction accuracy can be realized by exploiting the within-group and between-group characteristics in these data sources, instead of modeling each data source separately. In general, transferring the knowledge across sources is complicated for most real-world systems, and often traditional modeling approaches are not adequate to capture the relations, when data are not stationary or are changing abruptly in a small interval. In addition, the modeling time can be burdensome with the increased number of sources and observations. Hence, developing efficient and flexible frameworks for multiple correlated data sources is imperative. This dissertation proposed novel ISM techniques to deal with the complicated scenarios associated with correlated data sources. This study further demonstrated that the proposed ISM techniques have the ability of helping to model correlation among different data sources into a single modeling framework. The major advantage of the proposed ISM methods was found to be their flexibility over individual modeling of each data sources. The study concludes by proving that it is possible to effectively handle data nonstationarity with reasonable computation loads.
Degree ProgramGraduate College
Systems & Industrial Engineering