Analyze your own gene set

  1. Upload your query gene set: Copy & Paste Gene Set, OR Upload Data File.
    Your list of genes should use canonical gene symbols and should be in one column format separated by one of the following characters: newline character (\n), comma (,) or semicolon (;)
    See the example gene lists in Copy/Paste Gene Set section to learn more about the format.

  2. Select the MeSH category that you want to work with:
    The options are either Diseases (C) or Psychiatry/Psychology (F). Default is category C.

  3. Choose how to compute significance of results:
    • Hypergeometric Test: (immediate, but imperfect p-values)
      The hypergeomteric test will give you the results with p-values calculated using the hypergeometric distribution, but using the taxonomy-based pooling as described in the PLoS Computational Biology paper. This option will deliver results quickly, but the p-values may be based on incorrect assumptions.
    • Permutation Test: (get results by email in a few minutes)
      The permutation test will calculate p-values from the distribution empirically learned by scrambling labels of genes (i.e., which genes are in your query set). If you want to run this test, you have to provide the following two additional information.
      1. Enter the number of samples for the permutation test (to calculate p-value): default is 10,000.
      2. Your email address: this is required because the running time of the permutation test can be very long depending on the size of the querty gene set and the number of permutations used in the permutation test. For example, one of our analyses for a query gene set of size = 150 with 10,000 permutations took about 10 minutes. Consider this when estimating your waiting time, but note that it can also be affected by many other factors such as work load on our server. You will receive an email with links to both the visualization of your results and a tab-delimited file of the results. Visualizations are kept on our server for two weeks.

  4. Submit the job by clicking the "start analysis" button. The output file will contains 5 columns and is tab-delimited:
    • Column 1:    MeSH index
    • Column 2:    Disease Name (MeSH Descriptor)
    • Column 3:    p-value
    • Column 4:    Number of query genes associated with the corresponding disease
    • Column 5:    List of query genes associated with the corresponding disease

Data updated on 2016/02/03

