Reproduce and present result of one paper out of two. Suggest further analysis
Bioinformatics Engineer Interview Questions
631 bioinformatics engineer interview questions shared by candidates
Nothing particularly difficult, given that you know some algorithms.
How would you find the homologous sequences to a given sequence?
Describe your previous experiences etc
Questions about my work, etc.
Can you take me through one of the researches that you have conducted?
A strange disease has spread across the land, many people seem to be affected in a way that is yet to be understood: when they are in daylight, odd looking marks appears on their skin that appear like burning tissue. A drug company trying to understand this disease's mechanism of action sent data over to us. They took normal and lesion skin biopsies from healthy and disease individuals respectively, and performed whole genome RNA-seq profiling in order to identify and understand the disease at the gene expression level. Analysis workflow Load the data into R and make sure the count and annotation data are consistent with each other. Filter the count data for lowly-expressed genes, using the strategy of your choice. For example: only keep genes with a CPM >= 1 in at least 75% samples, in at least one of the groups. Assign the library-size normalized log-CPM data into an object from a suitable data structure/class. Save it as a binary file (.rda or .rds). Generate basic plots of your choice to investigate its main properties and comment (library sizes, expression distribution densities per sample, PCA colored per group, etc.). Based on the previous plots, look for the presence of outlier/mislabeled samples in this dataset. Try to identify and remove them from the downstream analysis. Run a differential expression analysis to find genes whose expression is different in lesion vs. normal samples. This can be done according to your preference either on the count data or the normalized log-CPM data, using an appropriate statistical method. Generate a volcano plot (x-axis is the effect size and y-axis is the p-value) for this analysis. The selected 100 most significant genes should be colored. Re-write step 6. by wrapping it up into a single function that you implement -- and document: arguments: the expression data, the sample annotations and the name of the group variable return value: a data.frame of statistics of differential expression. (bonus) Write a function that identifies the outlier(s) based on the expression data and group variable only. Pointers Installing Bioconductor For a quick introduction to RNA-seq data in limma user guide - Section 15 Differential expression analysis: with limma: limma user guide - Section 16 with DESeq2 ExpressionSet class: Video introduction Class description
How would you analyze the RNA seq data of human?
Where do you see yourself in next five years
normal HR questions like where you want to see in next 5 years, about the experience.
Viewing 111 - 120 interview questions