| All Products | CloneTracker | GeneSight | ImaGene | Licensing |
Can I import Excel files into BioDiscovery software?
How can I access files on a network?
How do I register my software?
We are having problems running the software on some user accounts. What is wrong?
When starting the software, why does a little window appear, but then software does not start?
Why does my BioDiscovery product give me a different Code Entry number every time I start the License Manager?
Why does my software display Kernel Error when starting BioDiscovery software?
Can I use my current database with CloneTracker®?
What do the Advanced Features mean in the CloneTracker® Database menu?
What is a Field in CloneTracker 2.0?
What is MySQL? Why is this included with CloneTracker 2?
What kind of information is produced by CloneTracker®?
What plates are supported by CloneTracker®?
Where does CloneTracker 1.x store my plate and slide information?
Why can only the user who installed CloneTracker run the software?
Why do I get a The user cancelled the import error after pressing OK. error whenever I try to import a plate description?
Why do I get an Invalid Format error when importing settings from VIRTEKs ChipWriterPro?
Why do I get get a duplicate plate error whenever I try to import a plate description, even though there is no duplicate in the database?
Can GeneSight cluster the data based on gene expression patterns?
Can GeneSight compute p-values (significance) for the genes across multiple experimental
conditions?
Can GeneSight help to predict an unknown class of compound or drug?
Can GeneSight import quantifications generated from programs other than ImaGene? Can
GeneSight-Lite do the same?
Can I define groups or partition my data loaded into GeneSight based on either the genes ids
or experimental conditions?
Can I query across the dataset for filtering out the genes of interest based on their expression
values, gene names, etc.?
Can I save my desired transformation sequence for future analyses?
Can I use my current image analysis software with GeneSight?
Can we filter out those genes with too high or low values?
Considering that cluster analysis tools are non-deterministic based on their random
initialization, how do I know that the genes falling into my clusters do actually belong to these clusters?
Do K-Means and Hierarchical Clustering produce different image/result each time with the
same data set?
Does GeneSight offer any adjustment for p-values obtained? Why it might be required?
Does GeneSight offer data distribution, such as a histogram? How do I draw a histogram?
Does GeneSight offer supervised data mining algorithms or predictive modeling?
Does GeneSight offer unsupervised data mining algorithms?
How can I download the chromosome information from the NCBI site?
How can I draw a chromosome map for my experimental organism? What does it mean?
How can I generate a report on transformed data?
How do I build a data transformation sequence?
How do I calculate ratio of ratios?
How do I calculate the ratios of experiment over control?
How do I carry out an analysis using only the selected genes?
How do I carry out Cluster Enrichment Analysis?
How do I combine replicate arrays and upload all of them together?
How do I conduct a t-test for my data?
How do I conduct an ANOVA test for my data?
How do I create a dataset in GeneSight?
How do I determine that which clusters are pre-dominant with certain functional classes?
How do I generate a report containing my analyses?
How do I get a sequence of the columns in the image to be identical to the sequence entered in
Dataset Builder?
How do I import non-ImaGene generated quantification data?
How do I know that these genes are actually differentially regulated on a statistical level?
How do I know that what should be the correct numbers of clusters for my data?
How do I plot out the distribution of my data from several different experimental conditions?
How do I save a picture of a clustering result?
How do I trace a gene as to which cluster or group it belongs to?
How do I upload my Affymetrix text data?
How do I upload my GenePix data?
How do I upload my two channel experiment data, such as ImaGene generated?
How much data can be analyzed by GeneSight?
How to determine differentially regulated genes (up- and down-regulated) in the dataset?
I always put my control condition on the x-axis of scatter plot, how do I do it?
I did a dye-swap experiment, how do I combine quantifications from different hybridizations
to one dataset?
I have a list of housekeeping genes, how do I normalize my data against them?
I heard Lowess Normalization is getting popular, what is it? Does GeneSight have Lowess
normalization?
I prefer to calculate ratios and log in Excel and then upload this transformed data. Is it
possible?
I want to conduct expression profiling in my time-course study without using clustering. Does
GeneSight provide this feature?
I want to upload both gene ids and gene names. Is it possible?
If I want to change my transformation sequence after transforming the data, can I go back and
do that without having to load the data again?
In the clustering plots, what are those horizontal and vertical number lines on the lower left
and upper left of the rows and columns?
In the significance analysis test, can I filter out genes with low p-values for generating reports
and further analysis?
Is it possible to combine two groups of interesting genes?
Is it possible to compute ratios and combine replicates during data upload?
Is it possible to filter out clusters of interest for further analysis or generating reports?
Is it possible to filter out common genes between two groups of interesting genes?
Is it possible to filter-out genes following a particular expression pattern using time-series
analysis?
Is it possible to filter-out genes following the expression pattern of a "particular gene"
using time-series analysis?
Is it possible to filter-out genes following the expression pattern of a template defined by me?
Is it possible to generate report on only certain selected genes?
Is it possible to obtain a statistically confident group of genes?
Is it possible to study the expression data distribution of various channels or slides in one single
plot including standard deviation bars and outliers, if any?
Is it possible to view the GeneSight generated reports in Excel?
Is it possible to visualize the entire data on a scatter plot using either two channels or two
experimental conditions?
Is it possible to visualize the entire gene population in a 3-dimensional space color coded by
different variables or conditions?
My data also contains gene annotations. Is it possible to upload them too along with the data?
My experiment is to determine whether the genes in different experimental conditions (such as
control and experiment) have significant differences in their expressions levels. Can GeneSight do that?
Should I use "Divide By Mean" or "Subtract Mean" in Normalization?
The box plot confuses me. What do the box, lines, and points stand for?
What are the functions of the two buttons, "Pair Data / Perform Ratio" and "Perform Ratio & Add as Replicate", in the Dataset Builder window?
What cluster linkage choices for hierarchical clustering are included in GeneSight?
What data transformations are offered by GeneSight?
What does Confidence Analyzer do?
What does the chromosome map mean?
What file format is required by GeneSight to import?
What is cluster confidence and how do I carry it out?
What is Cluster Enrichment Analysis?
What is confidence analysis?
What is special about Subselect?
What is Template Matcher, how is it used in Time Series?
What is the function of GenePie?
What is the purpose of make partitions?
What kinds of clustering methods are provided in GeneSight?
What normalizations are offered by GeneSight?
What statistical tests are offered by GeneSight for obtaining the p-values?
What statistical tests are offered on top of data mining tools embedded in GeneSight?
What types of analyses does GeneSight provide?
Which distance metrics for cluster analyses are offered by GeneSight?
Which transformations should I use for my data?
Why do I need to do data transformation?
Why do the values keep changing upon re-application of the cluster?
Why is Normalization so important?
Can ImaGene support data from other image analysis software (like GenePix)?
Flexibility in Spot Finding
How do I auto-center and fit-to-page once I have loaded my image?
How many channels, or images, can ImaGene process each time?
How many levels of undo/redo does ImaGene support?
I am quantifying multiple images in ImaGene, but only ending up with one data file when I am finished. What am I doing wrong?
I have moved my .sst file or images, and now I cannot open the .sst file. What is wrong?
Image Processing Procedure
Min and Max Spot Diameters
Spot Finding
Spot Finding Circle Placement
What do the numeric flags mean in my ImaGene data?
What does the Wrangle button along the ImaGene toolbar do?
What image formats are supported by ImaGene?
What image formats are supported by ImaGene?
What is a negative spot? How is it different from an empty spot?
What is an Automation Module? What is Batch Processing?
What is the .sst file that is generated when I process data in ImaGene? Can I delete it? Why is it so large?
What is the difference between a template and a grid?
What is the difference between a template and a grid?
What is the difference between between ImaGene® or ImaGene-Lite®?
What is the difference between ImaGene Standard and Premium?
What is the difference between the green and red Xs and pluses?
What is the purpose of the red-purple-blue circle colors that appear after quantification?
What templates are available for ImaGene?
What templates are available for ImaGene?
What to the green and red colors within the preview window mean?
Why are the background intensity values higher than my signal intensity vales?
Why do I get a Cannot open license error when I open ImaGene?
Why do I get the message: You do not have priviledges to run the Batch Editor. Please contact your network administrator?
Why do I get the message: You do not have priviledges to run the Batch Editor. Please contact your network administrator?
Why does ImaGene stop in the middle of quantification?
Why is the segmentation tab not visible after quantification and when reviewing the results?
Why is the segmentation tab not visible after quantification and when reviewing the results?
After installing the License Manager and Configuring the License Wizard, I get “License Checkout Failed
Can I change the date of my Demo license
Can I download the License Manager from the web
Can I run the License Manager on a machine that doesn’t run Windows
Do I need to install a License Manager in order to run any of the products with a Demo license
Do I need to remove the Demo license before installing a normal one
How do I configure my software to communicate with the License Manager Server
How do I configure the License Manager
How do I get additional License Manager information
How do I install a Demo license
How do I install a normal product license
How do I install the FLEXlm License Manager
How do I migrate my License Manager from one server to another
I get “License Checkout Failed when trying to start the application
Troubleshoot error: “Input file: local.lic cannot be found!
What is the difference between a Demo license and a Normal product license
What is the difference between a NodeLocked and a Floating license
What is the difference between the local.lic and license.lic license files
When do I need to install and configure the License Manager
Q. Can I import Excel files into BioDiscovery software?
A. BioDiscovery does not support the use of proprietary file formats. Rather, BioDiscovery has adopted the use of tab delimited text files. Microsoft Excel files can easily be saved as tab delimited text files as outlined below.
To save an Excel Doc as Tab Delimited Text:
1. With the Excel document open, choose Save as... from the File Menu.
2. Select Text (Tab Delimited) from the Save as Type Option at the bottom of the Save as Window.
3. Specify a File name and location.
4. Click Save.
5. Microsoft will warn that certain elements cannot be saved in this format. Simply accept any warnings to complete the process.
For additional information on the required text format, please see the documentation accompanying the software.
Q. How can I access files on a network?
A. All BioDiscovery products are written in Java. Java can only see drive letters, not network locations. Therefore, the simplest solution is to map a network location to a drive letter.
Under Windows, you can browse the network, right-click on the folder that contains the files and select Map Network Drive. Map it to a letter that is not being used by anything else (like "N" for Network or whatever you want). Then, when the BioDiscovery software asks for a file, simply select the N: drive and it will load the files across the network without transferring them locally first.
Q. How do I register my software?
A. BioDiscovery currently offers two ways to register newly purchased software:
1. via Fax - All new purchases, with the exception of additional floating licenses, will contain a Product Registration Form. Fill out this form completely and fax to BioDiscovery Corporate Headquarters.
2. via Online Form - Process to the online registration form and complete all required field. Be certain to check the YES box to agree to the Software License Agreement. Register at:
Please address any additional questions or registration problems to support@biodiscovery.com.
Q. We are having problems running the software on some user accounts. What is wrong?
A. The permissions are set up correctly by our software when you install it, but only if you are using an account that has administrative privileges.
You can always un-install and re-install using an account that has administrative privileges.
Q. When starting the software, why does a little window appear, but then software does not start?
A. There are a few potential causes of this problem. When the little screen appears then stays visible for a minute then disappears yet the software does not start, this is an indication that the license cannot be checked out and as a result, the software will not start. There could be several reasons for this: 1.) The flexlm license server is not running. In this case, go to the location where flexlm is installed and run lmtools.exe. Within this software is the ability to start and stop the license server. Be certain the server is started. 2.) Turn off any firewall software running on the computer as this may block network ports used by the licensing software. If this still does not solve the problem: 1.) Run lmtools.exe. Click on the Server Status tab. Click on the perform status inquiry button. This should tell you if there is anything wrong with the license server. 2.) Provide the GeneSight.log file, located in the GeneSight Directory, to BioDiscovery support so that we can determine if there is another problem.
Q. Why does my BioDiscovery product give me a different Code Entry number every time I start the License Manager?
A. It sounds like your GeneSight.ini, ImaGene.ini, or CloneTracker.ini has been set to "read-only" (saving the ini file as an attachment from Outlook will cause this problem). In this case, your software will generate a new "Entry Code" every time you run the license manager since it cannot save the code to the ini file.
In order to make sure your ini file is not "ready-only", you need to open your product directory ( C:\GeneSight for example), find the ini file, right-click on it, select "Properties" on the pop-up menu and confim that the "Read-Only" attribute is NOT checked. If you cannot see the ini file in your product directory, then you will need to show "System Files" by clicking on Tools, Folder Options, View tab, and make sure "Hide protected operating system files" is NOT checked.
Q. Why does my software display Kernel Error when starting BioDiscovery software?
A. BioDiscovery software is dependent upon Sun Microsystems Java Runtime Environment version 1.3 or higher. As a result of this, BioDiscovery software installs and modifies several files within the Java JRE home directory. There are times when due to configuration errors or changes to the host computer configuration, these files become either misplaced or corrupted. An alternative cause can also be the installation of a different JRE on the same computer. In this circumstance, ImaGene cannot load integral system files resulting in the Kernel Error Message.
Possible solutions include:
1. After backing up the software license, uninstall then reinstall the software. During the reinstallation, BioDiscovery software can then adapt to any new changes to the system configuration. If possible, the Java Runtime Environment (v1.3 or higher) should also be uninstalled and reinstalled prior to reinstalling any BioDiscovery software.
2. Check the system path and/or Java Home environment variables to be certain these point to the correct JRE 1.3 directory. Use caution when changing these values as they may affect other software on the computer as well. If uncertain, please contact your network administrator or contact BioDiscovery support.
Q. Can I use my current database with CloneTracker®?
A. Yes, you can! CloneTracker is based on a standardized SQL framework that is designed to integrate with your current database system.
Q. What do the Advanced Features mean in the CloneTracker® Database menu?
A. 1. Combine four 96-well plates into one 384-well plate:
Drag four 96-well plates into the left-hand column. The order of original 96-well plates in the left-hand column will determine their location on the new 384-well plate:
Q. What is a Field in CloneTracker 2.0?
A.
A field is a uniform grouping of clones on a slide. Most slides contain only 1 field (a single meta-group divided into smaller sub-groups), so we can ignore the field settings. However, in some cases there may be more than a single meta-group.
For example, a slide that contains a normal grouping of clones, AND a smaller group set off to the side by itself (being used for imagine analysis calibration). The normal grouping would be field 1, while the smaller offset group would be field 2.
Q. What is MySQL? Why is this included with CloneTracker 2?
A. MySQL is a fast, multi-threaded, multi-user, and robust SQL (Structured Query Language) database server. BioDiscovery has adopted this database due to its speed, flexibility and overall reliability.
Additional considerations for the adoption of MySQL also included its multiplatform capability as well as its ability to act as a database server for networed environments with floating licenses.
Documentation for the MySQL database is located within the mysql directory, typically c:\mysql.
Q. What kind of information is produced by CloneTracker®?
A. CloneTracker outputs tab delimited files containg information about slide GeneIDs and plate to slide mapping tables. CloneTracker also seamlessly integrates with ImaGene to correlate CloneTracker® information with array images.
Q. What plates are supported by CloneTracker®?
A. CloneTracker version 1.4 supports plate table sizes of 96 and 384. CloneTracker version 2.0 also supports 1536 plates.
Q. Where does CloneTracker 1.x store my plate and slide information?
A. CloneTracker is designed around a central database. All information within CloneTracker is stored within this database. CloneTracker 1.4 uses a SQL Anywhere 5.0 database while CloneTracker 2.0 uses a MySQL database.
The actual data is stored within a binary format and is not normally accessible directly by users. However, users do have the ability to export this information to flat text files for later examination in such program as Microsoft Excel.
Please review the documentation included with CloneTracker for information on exporting data.
Q. Why can only the user who installed CloneTracker run the software?
A. By default CloneTracker installs the ODBC database connection for a single user(i.e. User DSN). However, for all users to have access to CloneTracker you must create what is known as a System DSN. The steps involved are not difficult. Essentially, users will just need to enter new values into the proper location. The procedure is(and this may vary slightly depending on the OS):
1. Pull up Start menu and go to Settings and select Control Panel
2. Select ODBC Data Source
3. Click on the System DSN tab
4. Click the Add button and scroll all the way to the bottom of the window
5. Select Sybase SQL Anywhere 5.0 RunTime and click the Finish button
6. Enter CloneTracker into Data Source Name field
7. Enter dba into User ID Field and sql into Password Field
8. For Database File, you need to point to the directory where you installed CloneTracker and select the CLONERTRACKER.db file.
The user who installed CloneTracker, as well as other users, should now be able to run the program with no difficulties.
Note: This problem and solution are only valid for Windows NT and Windows 2000 operating systems. For additional information on setting up ODBC connections, please contact your system administrator or operating system documentation.
Q. Why do I get a The user cancelled the import error after pressing OK. error whenever I try to import a plate description?
A.
The error message "User Cancelled Import" can be caused by various reasons. In most cases, it is caused by the wrong file format. The CloneTracker Plate file should be in a tab-delimited text format. Extra tabs in between fields, at the end of a line, or at the end of the file will cause problems with the importing.
Also, the text file should contain four columns and they are Plate_ID, Rows, Columns, and Ref/Gene_Name. The Plate_ID can be numbered or lettered, Rows has to be lettered, Columns has to be numbered, and Gene_name cannot exceed 20 characters. Any missing rows and columns, or additional tabs at the end of the file will cause this error message as well.
Sample Format:
plate_id row1 col ref 501 a 1 66317ew 501 a 2 66321ds 501 a 3 blank 501 a 4 66428e 501 a 5 66507 501 a 6 beta_actin 501 a 7 66564 501 a 8 66599 501 a 9 66721 501 a 10 ew3HGD 501 a 11 110844 501 a 12 114045 501 b 1 blank 501 b 2 122756
Q. Why do I get an Invalid Format error when importing settings from VIRTEKs ChipWriterPro?
A. CloneTracker does not like the VIRTEK margin setting of zero. Set all margins to at least 1 micron.
For example:
   chipMarginX1=5000
   chipMarginX2=0
   chipMarginY1=4000
   chipMarginY2=0
should be changed to:
   chipMarginX1=5000
   chipMarginX2=1
   chipMarginY1=4000
   chipMarginY2=1
Q. Why do I get get a duplicate plate error whenever I try to import a plate description, even though there is no duplicate in the database?
A. Missing or duplicated row and column numbers give a duplicate plate error in CloneTracker 1.4.
Also, make sure that rows are only letters and columns are only numbers.
Q. Can GeneSight cluster the data based on gene expression patterns?
A. The self-organizing maps clustering in GeneSight clusters the gene population based on the expression patterns lying within. For more information on the clustering tools of GeneSight, please refer to this application note.
Q. Can GeneSight compute p-values (significance) for the genes across multiple experimental conditions?
A. The significance analysis tool of GeneSight can determine the genes significantly different in expression across multiple experimental conditions using the parametric and non-parametric statistical tests. Below are the instructions to conduct significance analysis on your data:
Requirements:
1. DO NOT combine the replicates during either the data upload or data transformation.
2. There should be at least two groups to compare.
3. There should be at least two replicates in each group.
Steps:
1. Highlight your data columns in the GeneSight main window, go to 'Tools' and select 'Partition
Editor'.
2. From the 'Manage Partition' menu, select the 'Create Partition' option. Give a partition
name, e.g., 'Group' and click on 'OK'.
3. Now you need to add groups (at least two) under this partition to perform a significance test. Assuming that
you want two groups, right click on the 'Group' twice and select 'Add Group' option.
4. Right click on the two new groups and select the option 'Edit Group Name' and give them the
meaningful names like 'Control' and 'Experiment'.
5. Right click on the two new groups and select the option 'Edit Color' and select two different colors
for the two groups.
6. Select Group 1, and highlight the conditions columns at the bottom that you want added to this group (at least
two replicates). Now, right click on these condition columns and select 'Add Selected' option. Do
the same for the second group. These columns will be added to the 'Add Candidates' panel on the
right under their respective groups.
7. Make sure about the columns added to each group by selecting these individual groups. If a column has been
added by mistake, right click on this column and select 'Remove Selected' option. Close this
window.
8. In the GeneSight main window, highlight the data columns added to the partition groups and select
'Significance Analyzer' option from the 'Tools' menu.
9. You should see your chosen columns in different colors in this window. If they are not in different colors, go
to 'Color Scheme' menu on the top and select your partition name to assign your chosen colors.
10. Click on the 'Compute overall significance (p-value)' label to compute the overall significance p-
value between the groups, which will be shown up in the text box next to it.
11. Select the statistical test you want to perform under the 'Please choose statistical test to perform'
drop down menu to compute the individual p-values for each gene among different experimental conditions.
These p-values will be calculated and listed in the 'p-value' column below for each gene.
12. Clicking on the 'p-value' column heading will sort the p-values in ascending order to filter out the
genes under certain p-value threshold. The highly significantly different genes under the chosen
experimental conditions or the ones with the low p-values will, therefore, be shown on the top.
13. You can highlight all the gene rows of interest (e.g., genes with p-values less than 0.05) and select
'Create Partition' option under the 'Sub-Selected Genes' menu. Give this partition a
name (e.g., p < 0.05) and this partition will be added to the partition panel (lower left) in the GS main
window. Clicking on this partition will show its pie in the middle panel and the contained gene ids in the
right panel.
Q. Can GeneSight help to predict an unknown class of compound or drug?
A. The Classification tool of GeneSight, which is a partially supervised classifier, can help this prediction. Say there is a drug discovery company working on characterizing new drugs and their classes. Their chemist, based on his domain chemistry knowledge, knows that drugs A, B, C belong to class 1; drugs D, E, F belong to class 2, and so forth. There is a new drug X that is under investigation and at this point he does not know as to which class it might belong to. The classification tool may help answer this question. Once we have the gene expression data from the treatment with these known and unknown drugs in a model, we first classify the known drugs into their respective classes. This is the supervised part of this algorithm and we use the partition editor in GS for this. We then cluster these partitions and the unknown drug all together (unsupervised). Based on the expression profiles of the unknown drug, whichever class it clusters with is supposed to be its putative class. So this algorithm is somewhat similar to the support vector machine (SVM) algorithm and contains both deterministic and non-deterministic approaches.
Q. Can GeneSight import quantifications generated from programs other than ImaGene? Can GeneSight-Lite do the same?
A. Yes, using the alien text import wizard (see the instructions above).
Q. Can I define groups or partition my data loaded into GeneSight based on either the genes ids or experimental conditions?
A. This can be accomplished using the 'Partition Editor' option under the 'Tools'
menu. Please follow the steps below for this:
1. Go to 'Tools' and select 'Partition Editor'.
2. From the 'Manage Partition' menu, select the 'Create Partition' option. Give a partition
name, e.g., 'Group' and click on 'OK'.
3. Now you need to add groups under this partition. Assuming that you want two groups, right click on the
'Group' twice and select 'Add Group' option.
4. Right click on the two new groups and select the option 'Edit Group Name' and give them the
meaningful names like 'Control' and 'Experiment'.
5. Right click on the two new groups and select the option 'Edit Color' and select two different colors
for the two groups.
6. Select Group 1, and highlight the conditions columns at the bottom that you want added to this group. Now,
right click on these condition columns and select 'Add Selected' option. Do the same for the
second group. These columns will be added to the 'Add Candidates' panel on the right under their
respective groups.
7. Make sure about the columns added to each group by selecting these individual groups. If a column has been
added by mistake, right click on this column and select 'Remove Selected' option.
Q. Can I query across the dataset for filtering out the genes of interest based on their expression values, gene names, etc.?
A. Using the 'Query/Group Builder' option under the 'Tools' menu, the users can write text-based queries and filter out genes of interest. With the click of the mouse, the users select the pre- defined variables and the software writes the query of interest for the users. The users can write multiple queries and also combine them together in a stringent or liberal fashion. Finally you can filter out the results of the query, which would be added as a partition in the partition panel in the lower-left in the GeneSight main window.
Q. Can I save my desired transformation sequence for future analyses?
A. Once you come up with the appropriate transformation sequence for your data, you can save it by selecting the option 'Save Transformation Sequence' under the 'File' menu. This sequence can be re-used for the future analyses by selecting the option 'Load Transformation Sequence' under the 'File' menu.
Q. Can I use my current image analysis software with GeneSight?
A. Yes, you can! GeneSight has been designed to integrate with BioDiscoverys ImaGene family of array image analysis products. However, data from virtually any other software package, and in nearly any format, can be imported into GeneSight. This includes Affymetric GeneChip result files and text files from most array image processing programs.
Q. Can we filter out those genes with too high or low values?
A. The easiest way to filter out "low" or "high" values is using the histogram tool. For example, you have data from 5 time points. Select all 5 time points, and choose "Histogram" tool. This produces a histogram using all the values from 5 time points. Using "Select" tool to select the left (less than xx value) and right tail (greater than xx value), and choose "create gene subset" under "Subselect Genes". A new partition appears in the main GeneSight Window. It consists of the selected genes (left and right tail), and the complement genes. You can choose to perform any subsequent analysis on those complement genes (The high and low genes are filtered out).
Q. Considering that cluster analysis tools are non-deterministic based on their random initialization, how do I know that the genes falling into my clusters do actually belong to these clusters?
A. GeneSight offers the Confidence analysis tool for the statistical differential regulation level determination. This algorithm is based on the bootstrapping algorithm used by the researchers at Jackson laboratory (their paper) and determines the membership of gene to a group (differentially regulated group or a cluster) based on the information retrieved for its replicates. In order to pass the confidence test, all the replicates of a gene should fall into the same group.
Please follow the steps below for conducting cluster confidence analysis:
1. Change the color map in the cluster plot (lower-right) to white.
2. Click on the 'Cluster Confidence' button (lower-right) in the cluster plot.
3. In the next 'Cluster Confidence Analysis' window, either type or slide the bar to select a confidence
level, e.g., 0.95 for 95% confidence.
4. Clicking back in the cluster plot would highlight the rows in gray for the genes that passed the confidence
test.
5. Go to 'Sub-Selected Genes' menu and select the 'Create Partition' option and give the
name 'Confident Genes' (or any other) to this subset. This partition will show up in the partition panel
in the lower-left of the GeneSight main window. Clicking on this partition will show its pie in the middle panel
and the contained gene ids in the right panel.
Q. Do K-Means and Hierarchical Clustering produce different image/result each time with the same data set?
A. Traditional Hierarchical Clustering is a deterministic clustering method using an agglomerative (bottom up) approach. It calculates the distance between individual data points (expression profile), and then group together those that are close. There are five traditionally used linkage metrics: single, complete, average, centroid, and ward. Each linkage type produces the same result every time with a given set of data. The disadvantage of this approach is time-consuming, requires large amount of memory, and is not scalable to large data sets. Therefore, GeneSight 4.0 includes a new "linkage" type called "Division", which is considered a divisive (top down) approach. This linkage metric accelerates the clustering algorithm for large data sets and requires minimal amount of memory. If you choose to use "Division" linkage type, the result and appearance will be slightly different each time due to the nature of algorithm. K-Means clustering is non-deterministic, owing to the random initialization of cluster centers. Whether or not you'll get the same k-means cluster results again and again will depend on your data. If your data are uniformly distributed, then there is really no pattern. In such case, the results will be quite different every time you run k-means clustering because initial cluster centers are chosen at random. If there're distinct patterns in your data, then K-means clustering will give very similar results every time. In addition, the order of clusters in K-means has no meaning, and the order of genes within each cluster has no meaning as well. If you want to save a particular cluster image, select "save image as" under "File".
Q. Does GeneSight offer any adjustment for p-values obtained? Why it might be required?
A. Yes, GeneSight offers Holm's p-value adjustment. When we test thousands of null hypotheses at one time, the family-wise error rate (FWER, the probability of at least one false positive over the collection of tests) increases. When testing thousands of individual differentially expressed genes from the microarray data based on a series of p-values for tests of the individual hypotheses, adjustments should be made to these p- values in order to bind the probability of falsely accepting at least one of these hypotheses.
Q. Does GeneSight offer data distribution, such as a histogram? How do I draw a histogram?
A. The histogram plot draws out the distribution of the entire data, divided into 100 bins by default. The
x-axis indicates the gene expression and the y-axis indicates the number of genes falling into different bins
based on their expression values. Please follow the steps below to draw a histogram and filter out differentially
regulated genes:
1. In the GS main window, select the mean ratio column in the pair folder and click on the
'Histogram' icon on the top.
2. In the histogram, type -1 in the box next to 'Lower bound' and hit enter and type 1 in the box next
to 'Upper bound' and hit enter. This will highlight the up- and down-regulated genes in the
histogram in blue and give their numbers at the bottom. If you have used the log base 2 transformation of the
data, 1 and -1 would indicate the 2 fold up- and down-regulation, respectively.
3. Go to 'Sub-Selected Genes' menu and select the 'Create Partition' option and give the
name 'DiffReg' (or any other) to this subset. This partition will be added to the partition panel in the
lower-left of the GeneSight main window. Clicking on this partition will show its pie in the middle panel and
the contained gene ids in the right panel.
Q. Does GeneSight offer supervised data mining algorithms or predictive modeling?
A. Yes, GeneSight offers a classification tool, which is a partially supervised classifier.
Q. Does GeneSight offer unsupervised data mining algorithms?
A. Yes, GeneSight offers unsupervised clustering algorithms. There are two main kinds of clustering offered: (a) Partitioning (k-means and self-organizing maps), and (b) Hierarchical. For more information on the clustering tools of GeneSight, please refer to this application note. In addition, GeneSight also offers time-series analysis (application note) and principal component analysis (PCA).
Q. How can I download the chromosome information from the NCBI site?
A. You would need to retrieve the genome sequences from the NCBI's web site and save them as
text files for use in GeneSight by following the steps below:
1. Create a new subfolder under your C:\GeneSight\Chromosome folder, and name this new subfolder with your
organism's name.
2. Access the National Center for Biotechnology Information (NCBI) Data Base @:
ftp://ftp.ncbi.nlm.nih.gov/genomes/
3. Choose the species you are searching for from the list and the chromosome number.
4. Download the .gbk extension files. If they are zipped files, you will have to unzip them.
5. Save the unzipped files for each chromosome in your organism's newly created chromosome folder.
Save the files encoding as Western European for Internet Explorer. Save the file type as "all file type"
for Netscape Navigator.
6. If your organism only has a single chromosome (e.g. E. coli has one circular chromosome):
File name = organism's name (Use species' name and strain). Do NOT leave BLANK spaces between
name and strain!!
Ex: EcoliK12
7. If your organism has a more than one chromosome, all individual chromosome files have to be saved in one
Folder.
For example, Saccharomyces cerevisiae has 16 chromosomes:
File name = organism's name followed by the corresponding chromosome number.
Ex: Scerevisiae1.txt
name chromosome number
Ex: Scerevisiae2.txt
name chromosome number
Q. How can I draw a chromosome map for my experimental organism? What does it mean?
A. After you download the chromosome information for your experimental organism from the NCBI site
(instructions above), you can draw the chromosome map after uploading the data for this organism in
GeneSight by following the steps below:
1. Click on any one or more experimental conditions for your organism and click on 'Chromosome'
button above.
2. GeneSight's default organism is Scerevisiae. Therefore, if this is not your organism of choice to view, a
warning message will only be seen if the genes of your expression data do not match the genes of Scerevisiae.
3. Press 'OK' on the warning message and choose the proper organism of interest from the drop down
list in the right side of the Chromosome Map window.
4. This will draw the chromosome map for your organism. Each row displayed on the left side of the window is
a chromosome. Base pairs are shown along the chromosome and their numbers are displayed to the right of
the chromosome. Each experimental condition selected in the GeneSight main window is displayed on the
right side of the chromosome map window as the check boxes. Checking these boxes displays these
experimental conditions as adjacent bars on the chromosomes. You can zoom in to the base pair level. The
gene ids are highlighted in red. The y-axis indicates the gene expression with the lowest expression value
displayed on the left of the middle blue line. The bars above the blue line indicate genes from 5? to 3? end, and
the bars below the blue line indicate genes from 3? to 5? end.
Q. How can I generate a report on transformed data?
A. In the 'Data Preparation' window, select the 'Save Dataset Contents as Text' option under the 'File' menu. This would save the transformed data as a text file.
Q. How do I build a data transformation sequence?
A. For selecting a transformation, click and drag its icon from the top panel and drop it in the middle gray panel. The window for this transformation will be opened up, providing you the option of changing the defaults. The order of the transformations can also be changed by simply dragging and dropping to the new position. You have the complete flexibility of defining your own sequence for your experiment based on the transformations, their options, and their order that you select.
Q. How do I calculate ratio of ratios?
A. The ratio transformation can be applied only one time on any two given columns in a session. For computing the ratio over the ratios, you will have to export the transformed data out, re-import it in GeneSight, and then compute the ratio over the ratios.
Q. How do I calculate the ratios of experiment over control?
A. In the 'Dataset Builder' window, select the data files for each channel one by one and drag/drop under the appropriate 'Experiment' or 'Control' middle panels. Alternatively, you could also select a data file and then click next on the 'Add as Experiment' or 'Add as Control' buttons below. Click on the 'Pair Data / Perform Ratio' button pointing towards right and it would add the data to the right panel after calculating the ratio of experiment over control.
Q. How do I carry out an analysis using only the selected genes?
A. To further analyze selected genes (e.g., differentially regulated genes, highly confident or significant genes) or to generate report on them, they first have to be added as a partition in the partition panel in the lower- left of the GeneSight main window. Next, check the box before this partition and click on the 'Sub- Select' button at the bottom. Generating report now will only include these selected genes. To cancel this selection, click on the 'Reset Schemes' button at the bottom.
Q. How do I carry out Cluster Enrichment Analysis?
A. For using this tool, you should first have the gene annotations of your data loaded into GeneSight as
partitions (shown in the partition panel in the lower-left of the GeneSight main window). With these annotations
loaded and a cluster plot open, cluster enrichment analysis can be conducted by following the steps below:
1. In the partition panel, check the box for the gene annotation classification to select all the sub-partitions.
Click next on the 'Use Gene Set' button below.
2. In the cluster plot, click on the 'Apply' button for 'Cluster Enrichment Analysis' tool.
This would display the annotation enrichment on the cluster nodes (the highest enrichment).
3. Change the 'Display Field' in the lower-left of the cluster plot to indicate 'Cluster
Enrichment'. Now brushing the mouse over the cluster nodes would display their entire enrichments along
with the probability values in the lower-right panel.
Q. How do I combine replicate arrays and upload all of them together?
A. In the 'Dataset Builder' window, select the data files for each channel one by one and drag/drop under the appropriate 'Experiment' or 'Control' middle panels. Alternatively, you could also select a data file and then click next on the 'Add as Experiment' or 'Add as Control' buttons below. If you have multiple replicates under both control and experiment (also applies to flip/swap dye experiments), click on the 'Perform Ratio & Add as Replicate' button pointing towards down. This would bring the data files to the lower-middle panel after calculating the ratios of experiment over control and combining all the replicates together. After this, click on the 'Add Repeated Experimental Conditions to Data Set' button below pointing towards right and this would add the data to the right panel.
Q. How do I conduct a t-test for my data?
A. The significance analysis tool of GeneSight can determine the genes significantly different in expression across multiple experimental conditions using the parametric and non-parametric statistical tests. Below are the instructions to conduct significance analysis on your data:
Requirements:
1. DO NOT combine the replicates during either the data upload or data transformation.
2. There should be at least two groups to compare.
3. There should be at least two replicates in each group.
Steps:
1. Highlight your data columns in the GeneSight main window, go to 'Tools' and select 'Partition
Editor'.
2. From the 'Manage Partition' menu, select the 'Create Partition' option. Give a partition
name, e.g., 'Group' and click on 'OK'.
3. Now you need to add groups (at least two) under this partition to perform a significance test. Assuming that
you want two groups, right click on the 'Group' twice and select 'Add Group' option.
4. Right click on the two new groups and select the option 'Edit Group Name' and give them the
meaningful names like 'Control' and 'Experiment'.
5. Right click on the two new groups and select the option 'Edit Color' and select two different colors
for the two groups.
6. Select Group 1, and highlight the conditions columns at the bottom that you want added to this group (at least
two replicates). Now, right click on these condition columns and select 'Add Selected' option. Do
the same for the second group. These columns will be added to the 'Add Candidates' panel on the
right under their respective groups.
7. Make sure about the columns added to each group by selecting these individual groups. If a column has been
added by mistake, right click on this column and select 'Remove Selected' option. Close this
window.
8. In the GeneSight main window, highlight the data columns added to the partition groups and select
'Significance Analyzer' option from the 'Tools' menu.
9. You should see your chosen columns in different colors in this window. If they are not in different colors, go
to 'Color Scheme' menu on the top and select your partition name to assign your chosen colors.
10. Click on the 'Compute overall significance (p-value)' label to compute the overall significance p-
value between the groups, which will be shown up in the text box next to it.
11. Select the statistical test you want to perform under the 'Please choose statistical test to perform'
drop down menu to compute the individual p-values for each gene among different experimental conditions.
These p-values will be calculated and listed in the 'p-value' column below for each gene.
12. Clicking on the 'p-value' column heading will sort the p-values in ascending order to filter out the
genes under certain p-value threshold. The highly significantly different genes under the chosen
experimental conditions or the ones with the low p-values will, therefore, be shown on the top.
13. You can highlight all the gene rows of interest (e.g., genes with p-values less than 0.05) and select
'Create Partition' option under the 'Sub-Selected Genes' menu. Give this partition a
name (e.g., p < 0.05) and this partition will be added to the partition panel (lower left) in the GS main
window. Clicking on this partition will show its pie in the middle panel and the contained gene ids in the
right panel.
Q. How do I conduct an ANOVA test for my data?
A. The significance analysis tool of GeneSight can determine the genes significantly different in expression across multiple experimental conditions using the parametric and non-parametric statistical tests. Below are the instructions to conduct significance analysis on your data:
Requirements:
1. DO NOT combine the replicates during either the data upload or data transformation.
2. There should be at least two groups to compare.
3. There should be at least two replicates in each group.
Steps:
1. Highlight your data columns in the GeneSight main window, go to 'Tools' and select 'Partition
Editor'.
2. From the 'Manage Partition' menu, select the 'Create Partition' option. Give a partition
name, e.g., 'Group' and click on 'OK'.
3. Now you need to add groups (at least two) under this partition to perform a significance test. Assuming that
you want two groups, right click on the 'Group' twice and select 'Add Group' option.
4. Right click on the two new groups and select the option 'Edit Group Name' and give them the
meaningful names like 'Control' and 'Experiment'.
5. Right click on the two new groups and select the option 'Edit Color' and select two different colors
for the two groups.
6. Select Group 1, and highlight the conditions columns at the bottom that you want added to this group (at least
two replicates). Now, right click on these condition columns and select 'Add Selected' option. Do
the same for the second group. These columns will be added to the 'Add Candidates' panel on the
right under their respective groups.
7. Make sure about the columns added to each group by selecting these individual groups. If a column has been
added by mistake, right click on this column and select 'Remove Selected' option. Close this
window.
8. In the GeneSight main window, highlight the data columns added to the partition groups and select
'Significance Analyzer' option from the 'Tools' menu.
9. You should see your chosen columns in different colors in this window. If they are not in different colors, go
to 'Color Scheme' menu on the top and select your partition name to assign your chosen colors.
10. Click on the 'Compute overall significance (p-value)' label to compute the overall significance p-
value between the groups, which will be shown up in the text box next to it.
11. Select the statistical test you want to perform under the 'Please choose statistical test to perform'
drop down menu to compute the individual p-values for each gene among different experimental conditions.
These p-values will be calculated and listed in the 'p-value' column below for each gene.
12. Clicking on the 'p-value' column heading will sort the p-values in ascending order to filter out the
genes under certain p-value threshold. The highly significantly different genes under the chosen
experimental conditions or the ones with the low p-values will, therefore, be shown on the top.
13. You can highlight all the gene rows of interest (e.g., genes with p-values less than 0.05) and select
'Create Partition' option under the 'Sub-Selected Genes' menu. Give this partition a
name (e.g., p < 0.05) and this partition will be added to the partition panel (lower left) in the GS main
window. Clicking on this partition will show its pie in the middle panel and the contained gene ids in the
right panel.
Q. How do I create a dataset in GeneSight?
A. Click on the 'Create New' icon on the top in the GeneSight main window, and it would bring up the 'DataSet Builder' window. Alternatively, you could also select the 'Dataset Builder' option under the 'Tools' menu.
Q. How do I determine that which clusters are pre-dominant with certain functional classes?
A. Cluster enrichment analysis embedded in the clustering tools in GeneSight helps to answer this question. Cluster enrichment analysis, applied on the clusters separated by k-means, helps to determine the probability that a cluster is predominantly represented by genes from a particular group, given a p value (a number between 0 and 1 representing the probability of a false conclusion). Suppose we are to ask the question "Does any cluster have 'more than its share' of genes of known function?" or in other words "Does any cluster have a larger percentage of genes with a certain function than it should have by chance?". To answer this question, cluster enrichment analysis finds the clusters that are so "enriched" that the probability of the enrichment occurring by chance (false positive) is less than p. This powerful statistical tool comes in handy in the situations where we have certain genes with functional classifications and want to find out which cluster(s) are enriched according to one of those classification. This tool is based on this publication.
For using this tool, you should first have the gene annotations of your data loaded into GeneSight as partitions
(shown in the partition panel in the lower-left of the GeneSight main window). With these annotations loaded
and a cluster plot open, cluster enrichment analysis can be conducted by following the steps below:
1. In the partition panel, check the box for the gene annotation classification to select all the sub-partitions.
Click next on the 'Use Gene Set' button below.
2. In the cluster plot, click on the 'Apply' button for 'Cluster Enrichment Analysis' tool.
This would display the annotation enrichment on the cluster nodes (the highest enrichment).
3. Change the 'Display Field' in the lower-left of the cluster plot to indicate 'Cluster
Enrichment'. Now brushing the mouse over the cluster nodes would display their entire enrichments along
with the probability values in the lower-right panel.
Q. How do I generate a report containing my analyses?
A. For generating report, go to 'Utilities' menu in the GS main window and select the 'Generate Report' option. This would bring up another window containing different data columns. Any test conducted before this stage with its window still opened up will have its check box in the upper-left corner of this window. Checking the box for a test would include its results in the report. Click on 'Save Report' button at the bottom and this file will be saved as a text file that can be opened up in Excel. This file will show the Gene Ids with their corresponding test results at the bottom of the report in two consecutive columns.
Q. How do I get a sequence of the columns in the image to be identical to the sequence entered in Dataset Builder?
A. The issue is how to control the order in which experimental conditions are displayed in a graphic plot. For the time series, the default order is the order in which you select the experimental conditions in the main window before launching the plot. You can change this order by clicking the Shuffle button, then dragging the condition labels from the left panel to the right panel in the desired order. For the K-Means, 1-D SOM, and Hierarchical Cluster diagrams, you must choose Rows Only in the Cluster Choice pull-down menu. The order then will, again, be the order in which you selected the conditions in the main GeneSight window, prior to opening the cluster plot.
Q. How do I import non-ImaGene generated quantification data?
A. These data can be imported using the alien text import wizard using the instructions described below:
1. Click on the 'Create New' icon on the top in the GeneSight main window, and it would bring up the 'DataSet Builder' window. Alternatively, you could also select the 'Dataset Builder' option under the 'Tools' menu.
2. Find your data files (tab delimited text format) from the stored directory on the left panel and drag them to the right panel.
3. Click on 'Done' icon on the top and it will bring up the alien text import wizard window.
4. In the 'Required Information' tab, type the appropriate numbers (1, 2 etc.) in the 'No. of Headers Rows' box and 'Gene Id Column No.' box, corresponding to your data file.
5. Click on 'Guess Column Name' button and 'OK' out the next window that indicates the columns found in the data.
6. After this operation, you will see your column names and their corresponding numbers showing up in the lower-left. Right click on the name of the gene id column in the lower-left and select 'Remove Row' option.
7. If the dataset contains both the signal and background columns, check the box for 'Contains both Signal & Background Columns' in the 'Required Information' tab. This will color code different columns, with the signal and background columns in each condition obtaining the same color for background adjustment.
8. If the dataset contains flagging information, click on the 'Other Info' tab. Type the number of the column containing the flagging info in the box next to 'Flag'. After this, right click on the name of the flagging column in the lower-left and select 'Remove Row' option.
9. If you require the ratio (experiment over control) to be calculated during data upload, click on the 'Pairing Info' tab. Click on the row under 'Experiment' and 'Control' and type the appropriate column numbers for experiment and control conditions.
10. If the dataset contains any genomic information (e.g., annotations), click on the 'Genomic Info' tab and type the appropriate column number containing this info by clicking in the row below 'Genome Data Column'. After this, right click on the name of the genomic info column in the lower-left and select 'Remove Row' option.
11. Click on 'Apply' see the results (re-formatted data) and click on 'Apply & Close'.
12. In the next window, check the box for 'Use defaults for all Experiments' and 'OK' out this window. Now the parameters specified above in the alien text import wizard would be universally applied to all other data sets being uploaded.
13. This will take you to the GeneSight main window and you will be able to see your data columns on the top left and data on the top right, ready to be analyzed.
Q. How do I know that these genes are actually differentially regulated on a statistical level?
A. The confidence analysis tool in GeneSight can help answer this question. This tool is based on the
bootstrapping algorithm used by the researchers in Jackson Laboratory (their publication). Confidence analysis
determines the membership of a gene to a group (differentially regulated group or a cluster) based on the
information retrieved for its replicates. A gene will pass the confidence analysis test only if all its replicates fall
in the same group. You can conduct the confidence analysis test by following the instructions below:
1. Select the mean ratio column in the pair folder and select the 'Confidence Analyzer' option under
the 'Tools' menu.
2. The default selection for the regulation is 'Up- and Down-Regulated' that can be changed to either
up- or down-regulated.
3. Type the regulation level in the down- and up-regulation levels in their appropriate boxes (e.g., 2 for 2-folds
in the normal data and 1 for 2-folds in the log base 2 transformed data).
4. Check the 'Advanced controls' box below.
5. The default value selected for confidence level is 95%, which can be changed to desired.
6. Click on the 'Apply' button below and it will indicate the number of genes at the bottom passing
this statistical confidence criteria.
7. If the scatter plot and histogram are opened up, you will see the corresponding changes in them.
8. Go to 'Sub-Selected Genes' menu and select the 'Create Partition' option and give the
name 'Confidence' (or any other) to the subset. This partition will show up in the lower-left panel of
the GS main window. Clicking on this partition will show its pie in the middle panel and the contained gene ids
in the right panel.
Q. How do I know that what should be the correct numbers of clusters for my data?
A. Partitioning clustering methods are often limited by the lack of prior knowledge of number of natural clusters present in the data. This problem can be handled by first applying hierarchical methods on the dataset to get an approximate idea of clusters hidden in the dataset and then use this information to set the number of clusters required for k-means or self-organizing maps methods. For more information on the clustering tools of GeneSight, please refer to this application note.
Q. How do I plot out the distribution of my data from several different experimental conditions?
A. Using the time-series analysis tool in GeneSight, you can study the expression profile of every single gene across multiple experimental conditions, such as time points or drug doses. You can also filter out the genes following a particular expression pattern. For more information on the time-series analysis tool of GeneSight, please refer to this application note.
Q. How do I save a picture of a clustering result?
A. You may create an image file for a cluster, or any other graphic in GeneSight, by choosing Save Image As ... Under the File menu. You will then have the file format choice of jpeg, gif or tif/tiff.
Q. How do I trace a gene as to which cluster or group it belongs to?
A. The 'Find Gene' tool embedded in the clustering plots in GeneSight (icon on top) does this job. By clicking on this button and selecting a gene from the list provided, highlights the node of the cluster in yellow to which the selected gene belongs. Using this function in the partition panel in the lower-left of the GeneSight main window will check the box before the partition to which the selected gene belongs.
Q. How do I upload my Affymetrix text data?
A. These data can be imported using the alien text import wizard using the instructions described below:
1. Click on the 'Create New' icon on the top in the GeneSight main window, and it would bring up the 'DataSet Builder' window. Alternatively, you could also select the 'Dataset Builder' option under the 'Tools' menu.
2. Find your data files (tab delimited text format) from the stored directory on the left panel and drag them to the right panel.
3. Click on 'Done' icon on the top and it will bring up the alien text import wizard window.
4. In the 'Required Information' tab, type the appropriate numbers (1, 2 etc.) in the 'No. of Headers Rows' box and 'Gene Id Column No.' box, corresponding to your data file.
5. Click on 'Guess Column Name' button and 'OK' out the next window that indicates the columns found in the data.
6. After this operation, you will see your column names and their corresponding numbers showing up in the lower-left. Right click on the name of the gene id column in the lower-left and select 'Remove Row' option.
7. If the dataset contains both the signal and background columns, check the box for 'Contains both Signal & Background Columns' in the 'Required Information' tab. This will color code different columns, with the signal and background columns in each condition obtaining the same color for background adjustment.
8. If the dataset contains flagging information, click on the 'Other Info' tab. Type the number of the column containing the flagging info in the box next to 'Flag'. After this, right click on the name of the flagging column in the lower-left and select 'Remove Row' option.
9. If you require the ratio (experiment over control) to be calculated during data upload, click on the 'Pairing Info' tab. Click on the row under 'Experiment' and 'Control' and type the appropriate column numbers for experiment and control conditions.
10. If the dataset contains any genomic information (e.g., annotations), click on the 'Genomic Info' tab and type the appropriate column number containing this info by clicking in the row below 'Genome Data Column'. After this, right click on the name of the genomic info column in the lower-left and select 'Remove Row' option.
11. Click on 'Apply' see the results (re-formatted data) and click on 'Apply & Close'.
12. In the next window, check the box for 'Use defaults for all Experiments' and 'OK' out this window. Now the parameters specified above in the alien text import wizard would be universally applied to all other data sets being uploaded.
13. This will take you to the GeneSight main window and you will be able to see your data columns on the top left and data on the top right, ready to be analyzed.
Q. How do I upload my GenePix data?
A. GeneSight is compatible with the gpr extension data files generated by GenePix. In the 'Dataset Builder' window, you will get a message indicating that GenePix files have been detected and whether you would like to upload the default columns, giving a choice between 'Yes' and 'No'. In case you want to upload certain selected columns, you should choose 'No' during that message, which would launch the alien text import wizard. You can then import any column(s) of your choice following the instructions for alien text import as described above.
Q. How do I upload my two channel experiment data, such as ImaGene generated?
A. These data can be uploaded using the instructions described below:
1. Click on the 'Create New' icon on the top in the GeneSight main window, and it would bring up the 'DataSet Builder' window. Alternatively, you could also select the 'Dataset Builder' option under the 'Tools' menu.
2. Your data would be in one of the stored directories in the left panel. Browse under the appropriate drive and locate your data files.
3. If the quantified data is ImaGene format, i.e., two text files one for each channel, select the data files for each channel one by one and drag/drop under the appropriate 'Experiment' or 'Control' middle panels. Alternatively, you could also select a data file and then click next on the 'Add as Experiment' or 'Add as Control' buttons below.
4. Click on the 'Pair Data / Perform Ratio' button pointing towards right and it would add the data to the right panel after calculating the ratio of experiment over control.
4.1. If you have multiple replicates under both control and experiment (also applies to flip/swap dye experiments), click on the 'Perform Ratio & Add as Replicate' button pointing towards down. This would bring the data files to the lower-middle panel after calculating the ratios of experiment over control and combining all the replicates together. After this, click on the 'Add Repeated Experimental Conditions to Data Set' button below pointing towards right and this would add the data to the right panel.
4.2. If you have multiple replicates under both control and experiment and want to calculate the ratio of experiment over control but do not want to combine them as replicates, click on the 'Pair Data / Perform Ratio' button pointing towards right. It would add the data to the right panel after calculating the ratio of experiment over control for each set of replicates.
4.3. If you neither want to calculate the ratio of experiment over control nor want to combine the replicates, select your data files in the left panel and add them to the right panel by either dragging and dropping or clicking on the 'Add to Data Set' button below.
5. If any data file has bee added by mistake, select this file and click on the 'Remove Selected' icon on the top.
6. Click on 'Done' icon on the top. This will take you to the GeneSight main window and you will be able to see your data columns on the top left and data on the top right, ready to be analyzed.
Q. How much data can be analyzed by GeneSight?
A. GeneSight has no predefined limit on the number of genes or experiments which can be analyzed. The Hardware specifications of the system computer, relative to processor speed and available memory, are the main practical limitations.
Q. How to determine differentially regulated genes (up- and down-regulated) in the dataset?
A. Please follow the steps below:
1. In the GS main window, highlight the two means in the Cy3 and Cy5 (control and experiment) folders. Click
on the control column first and then on the experiment one, as the one selected first goes on the x-axis. Click
next on the 'Scatter Plot' icon on the top to generate a scatter plot.
2. In the scatter plot, click on the 'Reference' drop down menu and select the 'Log Fold
Reference' option. The genes between the middle three diagonal lines are similar to control genes and the
genes above and below these are differentially regulated. If the control is on X axis and experiment on the Y
axis, the genes above 'similar to control genes' are up-regulated and below are down-regulated.
3. In the GS main window, select the mean ratio column in the pair folder and click on the
'Histogram' icon on the top.
4. In the histogram, type -1 in the box next to 'Lower bound' and hit enter and type 1 in the box next
to 'Upper bound' and hit enter. This will highlight the up- and down-regulated genes in the
histogram in blue and give their numbers at the bottom. Assuming that you have used the log base 2
transformation in the data, 1 and -1 would indicate the 2 fold up- and down-regulation, respectively. At this
point, you will see the corresponding change in the scatter plot.
5. In the histogram, go to 'Sub-Selected Genes' menu and select the 'Create Partition' option
and give the name 'DiffReg' (or any other) to this subset. This partition will be highlighted in the
lower-left panel of the GS main window. Clicking on this partition will show its pie in the middle panel and
the contained gene ids in the right panel.
6. In the scatter plot, go to 'Color Scheme' menu and select this partition and the color of the control
and differentially regulated genes will be changed accordingly.
Q. I always put my control condition on the x-axis of scatter plot, how do I do it?
A. Click on the control column first and then on the experiment one in the upper-left of the GeneSight main window. The column selected first goes on the x-axis. Click next on the 'Scatter Plot' icon on the top to generate a scatter plot.
Q. I did a dye-swap experiment, how do I combine quantifications from different hybridizations to one dataset?
A. In the 'Dataset Builder' window, bring the data files under their appropriate control and experiment panels so that each panel contains both the channels. Next, click on the 'Perform Ratio & Add as Replicate' button pointing towards down. This would bring the data files to the lower-middle panel after calculating the ratios of experiment over control and combining all the replicates together. After this, click on the 'Add Repeated Experimental Conditions to Data Set' button below pointing towards right and this would add the data to the right panel. Finally, click on 'Done' icon on the top. This will take you to the GeneSight main window and you will be able to see your data columns on the top left and data on the top right, ready to be analyzed.
Q. I have a list of housekeeping genes, how do I normalize my data against them?
A. For the normalization using few but not all the genes (house keeping genes, HKG or positive controls),
please follow the steps below:
1. Save the Gene Ids of your HKGs or positive controls in a text file. They should be in a column with no
column heading, should have exactly the same ids as they appear in the loaded dataset.
2. While you are in the data preparation window, double click on the normalization icon to open it up to change
the default settings.
3. In the 'Select the genes to normalize with' drop down menu, select the 'Select Genes using a
File' option.
4. Click on the 'Browse' button under 'Filename' and select & upload your file.
5. 'OK' out this window unless you would want to change the defaults for the options below.
Q. I heard Lowess Normalization is getting popular, what is it? Does GeneSight have Lowess normalization?
A. Yes, GeneSight offers Lowess as one the normalization options. The following parameters may be specified:
1. Smoothing parameter: The level of influence of a spot's neighbors on its normalization adjustment.
Higher values mean that the normalization is more continuous across spots.
2. Linear/quadratic: The assumed shape of the curve relating interchannel bias to spot brightness. Quadratic
provides greater flexibility.
3. Normalization Scope: You can group all spots together ("Global" choice) or separate spots into
groups by their meta-grid ("Print-tip").
Q. I prefer to calculate ratios and log in Excel and then upload this transformed data. Is it possible?
A. Yes, it can be imported using the alien text import wizard. See FAQ 493
Q. I want to conduct expression profiling in my time-course study without using clustering. Does GeneSight provide this feature?
A. Yes, the time-series analysis tool of GeneSight draws out the expression profiles of each gene in different time points, and also offers profile matching for expression trend as well as absolute value to a certain gene or a user set template. Multiple distance metrics including Correlation and a variety of Euclidean algorithms are available for the similarity calculation. For more information on the time-series analysis tool of GeneSight, please refer to this application note.
Q. I want to upload both gene ids and gene names. Is it possible?
A. Yes, the gene names can be imported during the alien data import as 'Genomic Info'. The instructions for that are described in FAQ 490.
Q. If I want to change my transformation sequence after transforming the data, can I go back and do that without having to load the data again?
A. Yes. Simply go to the 'Data Preparation' window while the raw data already loaded and change your transformation sequence. Applying the new transformation sequence would update the transformed data.
Q. In the clustering plots, what are those horizontal and vertical number lines on the lower left and upper left of the rows and columns?
A. Those lines are cluster distances scales for genes (rows) or experiments (columns). Specifically, for each cluster we calculate the average squared distance of the clustered elements from the cluster center. We may call this quantity the dispersion of the cluster. The dendrogram cross-bar for the cluster is positioned along the graduated axis to indicate the dispersion for the cluster. Low dispersion values indicate tight clusters, meaning that the elements within the cluster have similar expression profiles.
Q. In the significance analysis test, can I filter out genes with low p-values for generating reports and further analysis?
A. Clicking on the 'p-value' column heading in the significance analysis window will sort the p- values in ascending order. The highly significantly different genes or the ones with the low p-values will, therefore, be shown on the top. You can highlight all the gene rows of interest (e.g., genes with p-values less than 0.05) and select 'Create Partition' option under the 'Sub-Selected Genes' menu. Give this partition a name (e.g., p < 0.05) and this partition will be added to the partition panel (lower left) in the GeneSight main window. Clicking on this partition will show its pie in the middle panel and the contained gene ids in the right panel.
Q. Is it possible to combine two groups of interesting genes?
A. Using the 'Union' function in GeneSight, you can combine any two groups. These groups first have to be placed as partition in the partition panel in the lower-left in the GeneSight main window. Checking the boxes before these groups and using the 'Union' function will combine these groups and show their common gene ids in the lower-right.
Q. Is it possible to compute ratios and combine replicates during data upload?
A. Yes, explained in FAQ 495 & 496
Q. Is it possible to filter out clusters of interest for further analysis or generating reports?
A. Once you are satisfied with your clusters, you can partition them out by selecting the 'Make Partition' button on the right. After giving a name to this partition, different clusters will be shown as partitions in the partition panel in the lower-left of the GeneSight main window. Clicking on any partition (cluster) will show its gene ids in the panel on the right. If you want to filter out only one cluster, clicking on its node would select that particular cluster only. Next, select the 'Create Partition' option under the 'Sub-Selected Genes' menu. Give a name to this cluster and it will be added as a partition in the partition panel in the lower-left of the GeneSight main window. For more information on the clustering tools of GeneSight, please refer to this application note.
Q. Is it possible to filter out common genes between two groups of interesting genes?
A. Using the 'Intersection' function in GeneSight, you can filter out the commonalities between any two groups. These groups first have to be placed as partition in the partition panel in the lower-left in the GeneSight main window. Checking the boxes before these groups and using the 'Intersection' function will capture the common genes between these groups and show their gene ids in the lower-right.
Q. Is it possible to filter-out genes following a particular expression pattern using time-series analysis?
A. After arriving at a pattern of your choice, select the 'Create Partition' option under the 'Sub-Selected Genes' menu. Give a name to this pattern and it will be added as a partition in the partition panel in the lower-left of the GeneSight main window. For defining a pattern and other additional information on the time-series analysis tool of GeneSight, please refer to this application note.
Q. Is it possible to filter-out genes following the expression pattern of a "particular gene" using time-series analysis?
A. This can be accomplished using the 'Template Matcher' option under the 'Tools' menu. Selecting a gene from the gene id list in the 'Template Matcher' draws its template and the time-series plot can then highlight other genes following this template. After retrieving a pattern of your choice, select the 'Create Partition' option under the 'Sub-Selected Genes' menu. Give a name to this pattern and it will be added as a partition in the partition panel in the lower-left of the GeneSight main window. For using the 'Template Matcher' option and other additional information on the time-series analysis tool of GeneSight, please refer to this application note.
Q. Is it possible to filter-out genes following the expression pattern of a template defined by me?
A. After defining a pattern template and obtaining the genes following that pattern, select the 'Create Partition' option under the 'Sub-Selected Genes' menu. Give a name to this pattern and it will be added as a partition in the partition panel in the lower-left of the GeneSight main window. For defining a pattern template and other additional information on the time-series analysis tool of GeneSight, please refer to this application note.
Q. Is it possible to generate report on only certain selected genes?
A. To generate report on a particular partition only (e.g., differentially regulated genes, highly confident or significant genes), check the box before this partition in the partition panel (lower-left in the GeneSight main window) and click next on the 'Sub-Select' button at the bottom. Generating report now will only include these selected genes. To cancel this selection, click on the 'Reset Schemes' button at the bottom.
Q. Is it possible to obtain a statistically confident group of genes?
A. This can be accomplished using the confidence analyzer tool of GeneSight following the steps below:
1. Select the mean ratio column in the pair folder and select the 'Confidence Analyzer' option under
the 'Tools' menu.
2. The default selection for the regulation is 'Up- and Down-Regulated' that can be changed to either
up- or down-regulated.
3. Type the regulation level in the down- and up-regulation levels in their appropriate boxes (e.g., 2 for 2-folds
in the normal data and 1 for 2-folds in the log base 2 transformed data).
4. Check the 'Advanced controls' box below.
5. The default value selected for confidence level is 95%, which can be changed to desired.
6. Click on the 'Apply' button below and it will indicate the number of genes at the bottom passing
this statistical confidence criteria.
7. If the scatter plot and histogram are opened up, you will see the corresponding changes in them.
8. Go to 'Sub-Selected Genes' menu and select the 'Create Partition' option and give the
name 'Confidence' (or any other) to the subset. This partition will show up in the lower-left panel
of the GS main window. Clicking on this partition will show its pie in the middle panel and the contained
gene ids in the right panel.
Q. Is it possible to study the expression data distribution of various channels or slides in one single plot including standard deviation bars and outliers, if any?
A. The Box Plot tool of GeneSight offers this functionality. Using this plot, the users can visualize the distribution of gene expression in each array or experimental condition, their central value, and also the outliers beyond the standard error bars. The x-axis is discretized into categorical bins and the y-axis indicates the gene expression. The box indicates lower and upper percentiles of distribution. These percentiles can be changed using scroll bars on the left hand side. The majority of the genes in each bin are lumped into the boxes, the outliers are shown as individual points. The box plot has two modes in GeneSight: (a) If you choose one condition, the values for that condition are binned according to the source microarray's meta-grid, (b) If you choose multiple conditions, the conditions serve as the bins.
Q. Is it possible to view the GeneSight generated reports in Excel?
A. GeneSight saves the analysis report as a text file, which can readily be opened up in Excel.
Q. Is it possible to visualize the entire data on a scatter plot using either two channels or two experimental conditions?
A. This can be accomplished by following the steps below:
1. In the GS main window, highlight the two means in the Cy3 and Cy5 (control and experiment) folders. Click
on the control column first and then on the experiment one, as the one selected first goes on the x-axis. Click
next on the 'Scatter Plot' icon on the top to generate a scatter plot.
2. In the scatter plot, click on the 'Reference' drop down menu and select the 'Log Fold
Reference' option. The genes between the middle three diagonal lines are similar to control genes and the
genes above and below these are differentially regulated. If the control is on X axis and experiment on the Y
axis, the genes above 'similar to control genes' are up-regulated and below are down-regulated.
Q. Is it possible to visualize the entire gene population in a 3-dimensional space color coded by different variables or conditions?
A. This is possible using the principal component analysis (PCA) plot of GeneSight. The users can select any three variables (experimental conditions or arrays) and visualize the gene expression in these three variables, revolving in a three dimensional space. The genes can also be color-coded by a fourth variable, e.g., gender, age, significant genes, differentially regulated genes, etc.
Q. My data also contains gene annotations. Is it possible to upload them too along with the data?
A. Yes, the annotations can be imported during the alien data import as 'Genomic Info'. The instructions for that are described in FAQ 495.
Q. My experiment is to determine whether the genes in different experimental conditions (such as control and experiment) have significant differences in their expressions levels. Can GeneSight do that?
A. The significance analysis tool of GeneSight can determine the genes significantly different in expression across multiple experimental conditions using the parametric and non-parametric statistical tests. Below are the instructions to conduct significance analysis on your data:
Requirements:
1. DO NOT combine the replicates during either the data upload or data transformation.
2. There should be at least two groups to compare.
3. There should be at least two replicates in each group.
Steps:
1. Highlight your data columns in the GeneSight main window, go to 'Tools' and select 'Partition
Editor'.
2. From the 'Manage Partition' menu, select the 'Create Partition' option. Give a partition
name, e.g., 'Group' and click on 'OK'.
3. Now you need to add groups (at least two) under this partition to perform a significance test. Assuming that
you want two groups, right click on the 'Group' twice and select 'Add Group' option.
4. Right click on the two new groups and select the option 'Edit Group Name' and give them the
meaningful names like 'Control' and 'Experiment'.
5. Right click on the two new groups and select the option 'Edit Color' and select two different colors
for the two groups.
6. Select Group 1, and highlight the conditions columns at the bottom that you want added to this group (at least
two replicates). Now, right click on these condition columns and select 'Add Selected' option. Do
the same for the second group. These columns will be added to the 'Add Candidates' panel on the
right under their respective groups.
7. Make sure about the columns added to each group by selecting these individual groups. If a column has been
added by mistake, right click on this column and select 'Remove Selected' option. Close this
window.
8. In the GeneSight main window, highlight the data columns added to the partition groups and select
'Significance Analyzer' option from the 'Tools' menu.
9. You should see your chosen columns in different colors in this window. If they are not in different colors, go
to 'Color Scheme' menu on the top and select your partition name to assign your chosen colors.
10. Click on the 'Compute overall significance (p-value)' label to compute the overall significance p-
value between the groups, which will be shown up in the text box next to it.
11. Select the statistical test you want to perform under the 'Please choose statistical test to perform'
drop down menu to compute the individual p-values for each gene among different experimental conditions.
These p-values will be calculated and listed in the 'p-value' column below for each gene.
12. Clicking on the 'p-value' column heading will sort the p-values in ascending order to filter out the
genes under certain p-value threshold. The highly significantly different genes under the chosen
experimental conditions or the ones with the low p-values will, therefore, be shown on the top.
13. You can highlight all the gene rows of interest (e.g., genes with p-values less than 0.05) and select
'Create Partition' option under the 'Sub-Selected Genes' menu. Give this partition a
name (e.g., p < 0.05) and this partition will be added to the partition panel (lower left) in the GS main
window. Clicking on this partition will show its pie in the middle panel and the contained gene ids in the
right panel.
Q. Should I use "Divide By Mean" or "Subtract Mean" in Normalization?
A. If you are using normalization before the log transformation, use "Divide by Mean". If you are using normalization after the log transformation, use "Subtract by Mean".
Q. The box plot confuses me. What do the box, lines, and points stand for?
A. The Box Plot tool of GeneSight offers this functionality. Using this plot, the users can visualize the distribution of gene expression in each array or experimental condition, their central value, and also the outliers beyond the standard error bars. The x-axis is discretized into categorical bins and the y-axis indicates the gene expression. The box indicates lower and upper percentiles of distribution. These percentiles can be changed using scroll bars on the left hand side. The majority of the genes in each bin are lumped into the boxes, the outliers are shown as individual points. The box plot has two modes in GeneSight: (a) If you choose one condition, the values for that condition are binned according to the source microarray's meta-grid, (b) If you choose multiple conditions, the conditions serve as the bins.
Q. What are the functions of the two buttons, "Pair Data / Perform Ratio" and "Perform Ratio & Add as Replicate", in the Dataset Builder window?
A. 'Pair Data / Perform Ratio' button pointing towards right adds the data to the right panel after calculating the ratio of experiment over control. 'Perform Ratio & Add as Replicate' button pointing down brings the data files to the lower-middle panel after calculating the ratios of experiment over control and combining all the replicates together. After this, click on the 'Add Repeated Experimental Conditions to Data Set' button below pointing towards right and this would add the data to the right panel.
Q. What cluster linkage choices for hierarchical clustering are included in GeneSight?
A. For linking the clusters, GeneSight offers six different cluster linkage choices:
Q. What data transformations are offered by GeneSight?
A. GeneSight offers ten different transformations, most having multiple options to choose from. These
transformations are as follows:
1. Background correction
2. Omit multiple flagged spots
3. Combine replicates
4. Fill in missing values
5. Floor
6. Shifted log
7. Ratio
8. Difference
9. Omit low expression levels
10. Normalization
Q. What does Confidence Analyzer do?
A. GeneSight offers the Confidence analysis tool for the statistical differential regulation level determination. This algorithm is based on the bootstrapping algorithm used by the researchers at Jackson laboratory (their paper) and determines the membership of gene to a group (differentially regulated group or a cluster) based on the information retrieved for its replicates. In order to pass the confidence test, all the replicates of a gene should fall into the same group.
In context of spot brightness, the confidence analyzer 'bins' the spots according to spot brightness, setting the regulation for threshold differently for dim spots vs. bright spots. In short, the confidence analyzer establishes confidence levels, beyond which genes are selected as differentially regulated, by pooling the replicate differences from a population together into a empirical distribution of residuals (noise). Our experience has shown that bright and dim spots in the same data set can have different noise statistics, so the confidence analyzer allows the selection of multiple brightness 'bins', each of which is analyzed separately. The result is that when using the confidence analyzer, for a certain confidence level, the differential regulation level may be different for different brightness bins.
Q. What does the chromosome map mean?
A. A chromosomal map displays expression information at the chromosomal position of each gene. Each row displayed on the left side of this window is a chromosome. Base pairs are shown along the chromosome and their numbers are displayed to the right of the chromosome. Each experimental condition selected in the GeneSight main window is displayed on the right side of the window along with the type of organism you have selected.
Q. What file format is required by GeneSight to import?
A. A tab-delimited text file format.
Q. What is cluster confidence and how do I carry it out?
A. GeneSight offers the Confidence analysis tool for the statistical differential regulation level
determination. This algorithm is based on the bootstrapping algorithm used by the researchers at Jackson
laboratory (their paper) and determines the membership of gene to a group (differentially regulated group or a
cluster) based on the information retrieved for its replicates.
Please follow the steps below for conducting cluster confidence analysis:
1. Change the color map in the cluster plot (lower-right) to white.
2. Click on the 'Cluster Confidence' button (lower-right) in the cluster plot.
3. In the next 'Cluster Confidence Analysis' window, either type or slide the bar to select a confidence
level, e.g., 0.95 for 95% confidence.
4. Clicking back in the cluster plot would highlight the rows in gray for the genes that passed the confidence
test.
5. Go to 'Sub-Selected Genes' menu and select the 'Create Partition' option and give the
name 'Confident Genes' (or any other) to this subset. This partition will show up in the partition panel
in the lower-left of the GeneSight main window. Clicking on this partition will show its pie in the middle panel
and the contained gene ids in the right panel.
Q. What is Cluster Enrichment Analysis?
A. This tool categorizes clusters of genes, based on the annotations of the genes in the clusters. In doing so, it
Q. What is confidence analysis?
A. GeneSight offers the Confidence analysis tool for the statistical differential regulation level determination. This algorithm is based on the bootstrapping algorithm used by the researchers at Jackson laboratory (their paper) and determines the membership of gene to a group (differentially regulated group or a cluster) based on the information retrieved for its replicates.
In context of spot brightness, the confidence analyzer 'bins' the spots according to spot brightness, setting the regulation for threshold differently for dim spots vs. bright spots. In short, the confidence analyzer establishes confidence levels, beyond which genes are selected as differentially regulated, by pooling the replicate differences from a population together into a empirical distribution of residuals (noise). Our experience has shown that bright and dim spots in the same data set can have different noise statistics, so the confidence analyzer allows the selection of multiple brightness 'bins', each of which is analyzed separately. The result is that when using the confidence analyzer, for a certain confidence level, the differential regulation level may be different for different brightness bins.
Q. What is special about Subselect?
A. Subselect allows you to select a particular list or lists of genes, perform subsequent analysis only on those genes of your choice.
Q. What is Template Matcher, how is it used in Time Series?
A. The Template Matcher is created to find pattern of expression based on a particular template of choice. For example, you may want to find genes whose expression changed from low to medium to high over time or treatment. You could specify the expression values in each condition (create a template), and ask how many genes behave in a similar manner as your template. The similarity can be determined using several distance metrics, and a threshold value can be decided for each method. Once a template is saved in Template Matcher, it gets added to the gene list. So when you open Time Series, it is the last one in the gene list (i.e., the panel at the right). Once you click on this template, it is highlighted (selected) in the graph. Now you can choose the distance metric in combination with threshold to find how many genes match your template.
Q. What is the function of GenePie?
A. The gene pie indicates the expression of a gene in different experimental conditions. The different color-coded partitions within the pie indicate different experimental conditions. The names of these conditions are indicated at the bottom of the window. If you have selected any group (such as cluster) before drawing out the gene pie plot, the color in the background of every pie would indicate the group or cluster this gene belongs to.
Q. What is the purpose of make partitions?
A. The purpose of make partitions is to classify genes or experiments. Let us say you have generated different partitions (groups) of interesting genes using several different analysis tools. You can retrieve any of those genes and perform additional analysis in the future. What is unique in GeneSight is the ability to make union (combine all genes) or intersection (what is common to all the