Biostatistics Core Facility - user guidelines
The Biostatistics Core Facility provides biostatistical collaboration and support to medical and public health research. It aims to ensure that experimental designs, data analyses, and interpretation and dissemination of research findings take advantage of the most efficient and innovative methods in biostatistics.
Providing data to the statistician
The following items are general guidelines, not necessary but advisable, to make the collaboration more efficient.
- You should include the rawest form of the data that you have access to. This ensures that data provenance can be maintained throughout the workflow. Most of the time, efforts to preprocess or "clean" the data for the statistical collaborator are counter-productive.
- Columns are variables (weight, blood pressure, price, category, time to failure, etc), rows are observations of those variables (patients, patients at different times, animals, etc). Column names should be descriptive but not too long, and please avoid spaces and special characters (e.g.: "#", "%", "&", ",").
- Do not use color/highlight/comments to convey important information or to identify observations that satisfy certain conditions (e.g.: observations that need to be excluded). Instead, create new variables (columns) in the dataset to convey that information.
- Each cell should include one value only and this value must be machine readable (i.e., a number, string or letter not a color coding).
- If you have any missing values in your data, code them consistently with a single character (e.g., "." or "NA"), or provide the missing value codes separately.
- If you have multiple tables, they should each include a column in the table that allows them to be joined or merged. Avoid using multiple worksheets. It's preferable to have one dataset per file.
- Do not send personally identifiable information, especially by email.
- The data should preferably be in a text-based, delimited format, such as csv. Excel files may be acceptable. Other database formats may be acceptable, please enquire. We cannot accept data in a text editor format, such as in Word.
Let us know
The Biostatistics Core Facility strives to contribute to the success of its users. Its policy and organizational structure are subjected to continual evaluation towards improving its effectiveness. Feedback and suggestions are welcome and appreciated.