Absolute Frequency Data for Statistical Computing: A Comparison With Sample-Based Approaches and Guidelines for Improving Software Implementation

Loading...
Thumbnail Image

Authors

Robards, Amy Elisabeth

Issue Date

2019

Type

Thesis

Language

Keywords

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

Data sets comprising a large number of discrete data points that include multiple repeated values can be expressed in absolute frequency form, which represents the data more compactly by listing the number of occurrences for each unique value present in the full data set. This form of data can significantly reduce the space required to store the data and can speed up calculations on the data. The purpose of this thesis is to assess the decrease in computing time and increase in storage efficiency gained by performing statistical computations using absolute frequency data. These results quantify the reduction in data storage requirements for absolute frequency data relative to using the sample form of the data, and illustrate the potentially large speed-up gained by performing statistical computations using the absolute frequency form of data. These results suggest that statistical software should take advantage of these benefits and accommodate absolute frequency data input. Using R as a case study, a summary of current capabilities is presented and guidelines for adapting functions to better accommodate absolute frequency data are provided.

Description

Citation

Publisher

License

Journal

Volume

Issue

PubMed ID

DOI

ISSN

EISSN