Analyze your own gene set

  1. Upload your query gene set: Copy & Paste Gene Set, OR Upload Data File.
    Your list of genes should use canonical gene symbols and should be in one column format separated by one of the following characters: newline character (\n), comma (,) or semicolon (;)
    See the example gene lists in Copy/Paste Gene Set section to learn more about the format.

  2. Select the MeSH category that you want to work with:
    The options are either Diseases (C) or Psychiatry/Psychology (F). Default is category C.

  3. Choose how to compute significance of results:
    • Hypergeometric Test: (immediate, but imperfect p-values)
      The hypergeomteric test will give you the results with p-values calculated using the hypergeometric distribution, but using the taxonomy-based pooling as described in the PLoS Computational Biology paper. This option will deliver results quickly, but the p-values may be based on incorrect assumptions.
    • Permutation Test: (get results by email in a few minutes)
      The permutation test will calculate p-values from the distribution empirically learned by scrambling labels of genes (i.e., which genes are in your query set). If you want to run this test, you have to provide the following two additional information.
      1. Enter the number of samples for the permutation test (to calculate p-value): default is 10,000.
      2. Your email address: this is required because the running time of the permutation test can be very long depending on the size of the querty gene set and the number of permutations used in the permutation test. For example, one of our analyses for a query gene set of size = 150 with 10,000 permutations took about 10 minutes. Consider this when estimating your waiting time, but note that it can also be affected by many other factors such as work load on our server. You will receive an email with links to both the visualization of your results and a tab-delimited file of the results. Visualizations are kept on our server for two weeks.

  4. Submit the job by clicking the "start analysis" button. The output file will contains 5 columns and is tab-delimited:
    • Column 1:    MeSH index
    • Column 2:    Disease Name (MeSH Descriptor)
    • Column 3:    p-value
    • Column 4:    Number of query genes associated with the corresponding disease
    • Column 5:    List of query genes associated with the corresponding disease

Copy & Paste Gene Set
> Official gene symbols only.
   Other identifiers will not cause an error,
   but will be ignored in analysis.


> Format:
   Genes should be separated by one of the
   following delimiters: { \n, comma, semicolon }.


> Example Gene List
> Clear List

Upload Data File

> Only one file can be uploaded at a time and the file name cannot contain spaces.
> Genes should be identifed by the official gene symbol.
> Gene symbols in the file should be separated by one of the following delimiters:
    { \n, comma, semicolon }.
> There should not be any spaces between gene symbols and delimiters.
Input MeSH Category The MeSH category that you want to work with (default is Diseases):

   Diseases (C)            Psychiatry & Psychology (F)

Options for Analysis Select a type of analysis that you want to run:

Hypergeometric Test

Permutation Test

        1) Number of samples for the permutation test (default = 10,000):  

        2) Email address (to send you the analysis results)*:

        > We ask for your email address if you select the permutation test,
           because the permutation analysis takes a few minutes; we will email you a link to your results when they are available.




Data updated on 2017/04/23

Contact us for any questions or suggestions: gda@cs.tufts.edu