Knowledge base

Two methods of data normalization

Why normalization

When we draw heat maps, linear regression, neural networks, etc., we often need to normalize the data first. That’s because the values of some variables belong to different magnitude levels, such as variables around 10,000 and variables around 100. Then when drawing heat maps or machine learning, because this 10,000 exists, variables at the level of 100 The changes or sample differences will become insignificant and will not show up on the heat map, or will not play a role in machine learning training. Therefore, we have to normalize different variables so that they are at the same level of magnitude.

normalize method

There are two common ones, zscore normalization and 01 normalization.

  1. Zscore normalization, the value of each variable is subtracted from the mean of the variable, and then divided by the variance of the change. In fact, it is to find the zscore of the normal distribution, so that the normalized value is positive and negative.
  2. 01 normalization, the value of each variable is subtracted from the minimum value of the variable, and then divided by the difference between the maximum value and the minimum value of the change, so that the normalized value obtained is between 0 and 1.

    Code

    R language code implementation

    # Note that the input x here is a matrix of numbers, or a data.frame of numbers
    # 01 normalize
    scale01 <- function(x, low = min(x), high = max(x)) {
    x = (x - low)/(high - low)
    x
    }
    # zscore normalize
    # Normalize each column
    scale <- function(x){
     colMeans = rm = colMeans(x, na.rm = T)
     x = sweep(x, 2, rm)
     colSDs = sx = apply(x, 2, sd, na.rm = T)
     x = sweep(x, 2, sx, "/")
     return(x)
    }
    
    # Normalize each row
    scale <- function(x){
    rm = rowMeans(x, na.rm = T)
    x = sweep(x, 1, rm)
    sx = apply(x, 1, sd, na.rm = T)
    x = sweep(x, 1, sx, "/")
    return(x)
    }
Original code

https://github.com/DavidQuigley/WCDT_WGBS/blob/master/scripts/2019_05_15_WGBS_figure_1B.R

Leave a Reply

Your email address will not be published. Required fields are marked *