Codes in paper

Significance test of PPI(protein–protein interaction) interface mutations.

Introduction

  Technological and computational advances in genomics and interactomics have made it possible to identify how disease mutations perturb protein–protein interaction (PPI) networks within human cells.
  A PPI in which there is significantenrichment in interface mutations in one or the other of the two protein-binding partners across individuals will be defined as an oncoPPI. For each gene gi and its PPI interfaces, we assume that the observed number of mutations for a given interface follows a binomial distribution, binomial (T,Pgi ), in which T is the total number of mutations observed in one gene and Pgi is the estimated mutation rate for the region of interest under the null hypothesis that the region was not recurrently mutated. Using length (gi) to represent the length of the protein product of gene gi, for each interface, we computed the P value—the probability of observing >k mutations around this interface out of T total mutations observed in this gene—using the following equation
  in which gi = length of interface/ length(gi).
Finally, they set the minimal P value across all the interfaces in a specific protein as the representative P value of its coding gene gi, denoted P(Pgi). The significance of each PPI is defined as the product of P values of the two proteins (gene products). All P values were adjusted for multiple testing using the Bonferroni correction.
  Authors found that somatic missense mutations are also significantly enriched in PPI interfaces compared to noninterfaces in 10,861 tumor exomes.

Code explanation

Read requred file

# read protein length
uniprot_length <- read.table("UniProt_len.txt",colClasses=c("character","numeric"))

psites = list()
data <- read.table('PPI_sites.txt',sep="\t",colClasses="character")  
for(i in 1:nrow(data)){
    psites[[data[i,1]]] <- unlist(strsplit(data[i,2],","))  # psites information
}

cancer_types <- sapply(list.files('HQinterfaces_mutations/'),function(x) unlist(strsplit(x,split="_"))[1])

for each protein, run binomial test

for(cancer in cancer_types){
    Mutations <- read.table(paste0('/data/Data/TCGA_Somatic_Mutations/Mutect2/TCGA.', cancer, '.mutect.NonSilent.maf'),
        skip=1,header=TRUE,colClasses="character",sep="\t") # total mutation
    PPI_mutations <- read.table(paste0('HQinterfaces_mutations/', cancer, '_HQinterfaces_mutations.txt'),colClasses="character")  # PPI mutation

    all_uniprot <- unique(PPI_mutations[,3])
    uniprot_gene_mapping <- rep(NA,length(all_uniprot))
    names(uniprot_gene_mapping) <- all_uniprot
    uniprot_pvalues <- rep(1,length(all_uniprot))
    names(uniprot_pvalues) <- all_uniprot
    for(uniprot in all_uniprot){
        uniprot_gene_mapping[uniprot] <- PPI_mutations[match(uniprot,PPI_mutations[,3]),2]
        mutation_number <- sum(Mutations[,68]==uniprot)
        PPI_mutation_number <- sum(PPI_mutations[,3]==uniprot)
        gene_len <- uniprot_length[match(uniprot,uniprot_length[,1]),2]
        bind_len <- length(psites[[uniprot]])
        if(PPI_mutation_number > 0 & mutation_number > 0){
            p <- binom.test(x=PPI_mutation_number,n=mutation_number,p=bind_len/gene_len,alternative="greater")$p.value
            uniprot_pvalues[uniprot] <- p
        }
    }
    write.table(cbind(uniprot_gene_mapping,uniprot_pvalues),
                file=paste("./uniprot_pvalues/",cancer,"_uniprot_pvalues.txt",sep=""),
                col.names=FALSE,
                sep="\t",
                quote=FALSE)
}

cancer <- cancer_types[1]
Mutations <- read.table(paste0('/data/Data/TCGA_Somatic_Mutations/Mutect2/TCGA.', cancer, '.mutect.NonSilent.maf'),
        skip=1,header=TRUE,colClasses="character",sep="\t")
PPI_mutations <- read.table(paste0('HQinterfaces_mutations/', cancer, '_HQinterfaces_mutations.txt'),colClasses="character")
#Pan-cancer
for(cancer in cancer_types[2:33]){
    Mutations <- rbind(Mutations,
                       read.table(paste0('/data/Data/TCGA_Somatic_Mutations/Mutect2/TCGA.', cancer, '.mutect.NonSilent.maf'),
                       skip=1,header=TRUE,colClasses="character",sep="\t"))
    PPI_mutations <- rbind(PPI_mutations,
                           read.table(paste0('HQinterfaces_mutations/', cancer, '_HQinterfaces_mutations.txt'),
                            colClasses="character"))
}

all_uniprot <- unique(PPI_mutations[,3])
uniprot_gene_mapping <- rep(NA,length(all_uniprot))
names(uniprot_gene_mapping) <- all_uniprot
uniprot_pvalues <- rep(1,length(all_uniprot))
names(uniprot_pvalues) <- all_uniprot
# for each protein, calculate pvalue of PPI mutation
for(uniprot in all_uniprot){
    uniprot_gene_mapping[uniprot] <- PPI_mutations[match(uniprot,PPI_mutations[,3]),2]
    mutation_number <- sum(Mutations[,68]==uniprot)
    PPI_mutation_number <- sum(PPI_mutations[,3]==uniprot)
    gene_len <- uniprot_length[match(uniprot,uniprot_length[,1]),2]
    bind_len <- length(psites[[uniprot]])
    if(PPI_mutation_number > 0 & mutation_number > 0){
    # binominal test
        p <- binom.test(x=PPI_mutation_number,n=mutation_number,p=bind_len/gene_len,alternative="greater")$p.value
        uniprot_pvalues[uniprot] <- p
    }
}
write.table(cbind(uniprot_gene_mapping,uniprot_pvalues),
            file=paste("./uniprot_pvalues/Pan-cancer_uniprot_pvalues.txt",sep=""),
            col.names=FALSE,
            sep="\t",
            quote=FALSE)

Interpretation of results

Authors inspected 1,750,987 missense somatic mutations from 10,861 tumor exomes across 33 cancer types from The Cancer Genome Atlas (TCGA) in the interface regions of
121,575 PPIs. They found a significantly higher somatic mutation burden at PPI interfaces compared to noninterfaces across all 33 cancer types.

References

[1] F Cheng, Zhao J , Wang Y , et al. Comprehensive characterization of protein–protein interactions perturbed by disease mutations[J]. Nature Genetics, 2021, 53(3):1-12.

Original

https://github.com/ChengF-Lab/oncoPPIs

Donation

[paypal-donation]

Leave a Reply

Your email address will not be published. Required fields are marked *