Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Coffalyser.Net Analysis workflow
Transcript of Coffalyser.Net Analysis workflow
The different length of every probe in the MLPA kit then allows these products to be separated and measured using standard capillary fragment electrophoresis. The unique length of every probe in the probe mix is used to associate the detected signals back to the original probe sequences. The MLPA product signals The generation of a probe amplification product is dependent on the presence in the sample DNA of a small (50-80 nt) sequence that is detected by that particular probe. For each probe oligo-nucleotide in a MLPA kit there are about 600.000.000 copies present during the overnight incubation, while an average MLPA reaction contains 60 ng of human DNA sample, which correlates to about 20.000 haploid genomes. This abundance of probes as compared to the sample DNA guarantees all target sequences in the sample to be covered. The large excess of probe oligonucleotides that are not ligated do not have to be removed, as they contain only one of the two PCR primer sequences and can therefore not be amplified exponentially.
The total amount of product that is produced for each probe is dependent on the amount of target sequences available and the PCR efficiency of the template (ligated probes) during the PCR reaction. The signal detected on your capillary device is furthermore influenced by the amount of product that is injected into the capillaries and the detection sensitivity of the system, which makes the absolute measurements incomprehensible. The use of a single primer pair during the PCR reaction ensures all probe signals are in the same range, but due to differences in the probe sequence, the individual signals will be somewhat different in magnitude making a direct comparison of probes within a run impossible. (intra normalisation). We can however assume that the probe product measurements are proportional to the amount of the target sequences present. We can also assume that the signal strengths of the probes compared to those obtained from a reference DNA sample known to have two copies of the chromosome, are expected to be 1.5 times the intensities of the respective probes from the reference if an extra copy is present. If only one copy is present the proportion is expected to be 0.5. If the sample has two copies, the relative probe strengths are expected to be equal. Data analysis setup Data normalisation To make MLPA data easier understandable, unknown and reference samples have to be brought on a common scale. This can be done by normalization, the division of multiple sets of data by a common variable in order to cancel out that variable's effect on the data. In MLPA kits, so called reference probes are usually added, which are targeted to chromosomal regions that are assumed to remain normal (diploid) in DNA of applicable samples. The results of data normalization are probe ratios (DQ), which display the balance of the measured signal intensities between sample and reference. In most MLPA studies, comparing the calculated MLPA probe ratios to a set of arbitrary borders is used to recognize gains and losses. Probe ratios of below 0.7 or above 1.3 are for instance regarded as indicative of a heterozygous deletion (copy number change from two to one) or duplication (copy number change from two to three), respectively. During the normalisation we assume that the reference probes (normalisation constant) are diploid or have an equal copy number status in the unknown samples as compared to the reference samples. Because there may be situations where this is not the case, we use each reference probe separately for a normalisation. By subsequently taking the median over all produced ratios some variation in the reference probe collection is permitted without compromising the final result. Normalisation of test probe 1 The largest problem for MLPA data normalisation is structural variation between the test (unknown) samples and the references samples. Variation between the test and reference samples should be minimised by treating them as equally as possible during the entire workflow. To ensure the quality of the entire workflow any variation that is introduced should be measured, this data can then be used for troubleshooting as well to aid in result interpretation by distribution statistics. There are mainly two parts that need to be included in order to make an estimation of the amount of variation between the test and reference samples. We can measure how reproducible multiple samples with the same genetic content perform within the experiment and we can measure how much difference there exists in the reference probe signals over all samples. Probes that give variable results in the reference samples as opposed to each other are obviously probes that are not reproducible and cannot provide reliable results in any of the samples. This variation is usually introduced in the experimental procedure. Variation in the reference probes between test and reference samples may cause a problem for the normalisation and can have two causes: a. the reference probes contains aberrations in the test samples and b: the variation is caused by differences in treatment of the test and reference samples. In the first case, your samples may be extracted from tumour material and the aberrations are not unexpected (please note that when there are too many aberrant reference probes that the whole copy number estimation may change). In the second case you need to consider that the variation found in the reference probes may also exist in the other test probes, making these results less reliable. In general we recognise variation that expresses itself by lower signals of the longer MLPA probes and probe specific variation. In both cases we can measure this by the relative differences in reference probes. A drop in signal to size may however be corrected for by regression analysis, while variation related to a specific probe cannot be corrected. The choice of test and reference material is therefore one of the most important steps in your MLPA workflow. Initial experiment Experimental setup Sample selection Extraction method Experimental procedure Data interpretation Data interpretation should be a combination of comparing calculated ratio' s to arbitrary borders and the comparison of these results to the performance of the probes in sample type distribution statistics. To minimise the differences in components, other than DNA, that are present in the samples, it is recommended that test and reference samples are of the same tissue type. Different tissue types may contain different concentrations of molecules that could influence the taq fidelity. By using equal tissue types for reference and test samples, the differences in amplification efficiency of the probes should be minimal. In case you are analysing tumour samples, the best references samples may come from the resection margins that were recognised by a pathologist as completely healthy, normal cells. You can then either match each sample against it’s own reference sample or create a pool of normal mixed DNA.
In case you want to pool a larger set of reference samples, it is advised to first test run the selection of reference samples you want to use. By first analysing these samples you may omit using any reference samples that show abnormal results or have increased variation.
Because it is a great advantage to have signals intensities of all samples in the same range it is recommended to make working solution for all samples that are of equal concentrations. It is advised to use concentrations on the lower end of the possible concentration range for MLPA. We recommend to use 50 ng for all MLPA reactions. Type Test Samples = Type Reference Samples To minimise the effects of contaminants, always use sample and reference DNA extracted with the same method and derived from the same source. For instance, when testing DNA extracted from FFPE breast tumour tissue, compare this to similarly extracted FFPE samples from healthy breast tissue (the sample does not have to be from the same individual). It is also recommended to use the same fixation method for reference and test samples since this may also either influence the polymerase or creat structural differences in the DNA that may influence the number of probe that can bind.
The MLPA PCR reaction is more sensitive to certain impurities than conventional PCR. PCR inhibitors that can be present in a DNA sample as a result of the DNA purification method used include phenol/trizol remnants, SDS and Fe containing magnetic beads. Laboratories which experienced problems with DNA samples reported later that these could be attributed to left-over phenol and ethanol remnants. DNA samples should furthermore not contain more than 1 mM EDTA as EDTA binds magnesium. A sufficiently high concentration of unbound magnesium (present in Ligase buffer B) is required for the ligation and PCR reaction. High EDTA concentrations that affect the MLPA PCR reaction can be present in DNA samples that have been concentrated by evaporation or SpeedVac. The mode of action for many PCR inhibitors is reduction of polymerase activity. Not all MLPA probes react the same way to this reduced polymerase activity. While most probes will not be affected, some may show a reduction in peak height whereas yet others will increase in relative signal.
Please check http://www.mlpa.com for more information on recommended extraction methods. Extraction Method Test Samples = Extraction Method Reference Samples To ensure reference sample quality and obtain a good impression of the reproducibility of the probes over the entire experiment, reference samples should be strategically dispersed. A minimum of at least 3 reference samples is required for distribution statistics. In case you are doing more than 24 samples, one extra reference sample per row is recommended to maintain quality assessment over the entire experiment. Since no data normalisation can be done without the reference samples they should obtain maximum quality. If your are using strips of 8 tubes, place the first reference on position A4. On the second row place it at position B5. On the third row place it at position C3, then D6, E2 etc. This setup will not only ensure optimal reference sample and quality control during the experiment but also dispersed your samples across your capillary block. This allows you to also maintain the quality of the different capillaries and reduces risk of having all bad samples in case one of the channels fails.
In case your sample population is very large (n > 30) and the coincidence of aberrations for all probes is very low (<10%), reference samples may not be required. This group may also be compared to itself, which may even provide better results. Please see http://wiki.Coffalsyer.Net at special analysis techniques.
Always place your negative control on the last position in your experiment (this sample is usually the longest open and risks contamination the most). Place a tube with 8 ul of H2O in the right bottom position of your PCR machine. A minimum of 5 ul should still be left after the overnight hybridisation reaction. Evaporation may cause a serious problem due to increased salt concentrations. Capillary devices are designed for qualification of fragments, rather than quantification of them. Due to the relative nature of MLPA, larger signals are more reliable and are less sensitive to variation. MLPA however usually uses fluorescent dyes for detection by CCD plates that may risk overloading or bleeding through in other channels when the signals are too large. Signals are therefore optimally in a small range of detection. Signals should be at least 3x the baseline for detection and optimally all probes signals performed on the reference samples fall within the 10-40% of the maximum detection range. Separation & detection In case you do not have Coffalyser.Net installed please find instructions at: http://Coffalyser.Net. Installing and configuring the Coffalyser.Net can be done by anyone and does not require specific computer skills.
1. Get your free copy of Coffalyser.Net by registering at: http://www.mlpa.com. After registration you will receive an email with a download link and an activation code. Download the zip file and extract all content to a new location.
2. Double click Setup or Setup.exe to start the program. The installer will begin with checking if your computer contains the correct components. If required ‘Windows Installer 4.5’ and ‘.NET Framework 3.5’ will be installed After installing the required prerequisites the installer will continue with installing Coffalyser.NET.
3. The Coffalyser.Net is designed to operate in different environments. It can be used on a single computer (e.g. desktop or laptop), or it can be configured to work within a network. Stand-alone configurations cannot share data, while network configurations are able to share data between dozens of computers.
Based on your personal preferences you now need to decide on how you want to configure your Coffalyser.Net.
You can setup your Coffalyser to work in the following environments:
- Configure your Coffalyser.Net to work on a single computer: stand-alone
- Configure your Coffalyser.Net as the master / the first computer in a network: server
- Configure your Coffalyser.Net as a slave / secondary computer in a network: client Before you can start analysing your data you need to prepare the software by following a few steps and add some information around the products and capillary device that your are using.
1. Download the latest sheet updates.
Login using your freshly created using account and then right click on ‘Sheet Library’ in the solution explorer and select ‘Download Updates’. The sheet library contains all MRC-Holland product and prevent you from needing to enter all required probe information for analysis. By regularly updating the library you can make sure that the library always contains the last data.
2. Create a new work sheet.
Because Coffalyser.Net does not work directly with the downloaded sheets, we first need to create a work sheet from the original collection of sheets. This custom sheet is user owned and saved in the database and will not be overwritten by updates. Open the sheet library, right click anywhere in the grid and select ‘Add’ from the context menu.
3. Now select the MRC-Holland product you want to use or create an empty work sheet in case you are using a custom product. The most important column in the probes list is the function of each probe. All MRC-Holland probes will be completely filled in for you and you do not have to change any of the columns. In case of tumour analysis however you may want to edit the reference probes and match them towards a selection of which you know that they remain normal between your test and reference samples. Electrokinetic injection process Preparation To keep signal intensities in your experiment more or less at the same levels, it is recommended to use equal concentrations of DNA for all samples. Most capillary systems use an electrokinetic injection to get the products into the system, best results may be obtained by sample stacking. Sample stacking is done by injecting a plug of low concentration buffer containing a sample for separation into capillaries that are filled with a buffer of the same composition but of a higher concentration. The sample ions will then migrate very rapidly in the sample plug until they reach concentration boundary, once they crossed the boundary the ions will slow down and stack into a narrow band. Theoretically, the peak width should be proportional to the ratio of the buffer concentration in the original sample solution to that of the column.
Mix 1 ul of MLPA product with 10-24 ul of Hi-Di formamide and mix the minimal amount of size marker needed for size calling (0.3 ul). The marker signals only need to be qualified and thus may be low. If the concentration of the size marker is lower, more probe product will be injected positively influencing probe signals
A minimum of 10ul is required for most ABI machines, but dilluting your MLPA products more is recommended since it may enhance selectivity and sensitivity by sample stacking. Installing software Creating a work sheet After separation check the signal intensities and quality of the run. In case signals are too low or too high injection mixture can be rerun directly after the run is finished. If you wait too long you need to recreate the injection mixture, since fluorescent dyes degrade in formamide. The electrokinetic injection process has a significant influence on the signal strenth of your probe peaks and may require emperical optimisation. You can influence the electrokinetic injection by changing both the injection time and injection voltage.
When you modify the injection time, you will encounter a tradeoff between increasing signal strength and increasing resolution. For the range of parameter values and sample concentrations used in most experiments, the signal strength (as measured both by peak height and by peak area) increases linearly with increasing injection time. However, it is not true that an n-fold increase in injection time results in an n-fold increase in peak height. No improvement is seen after 10 seconds for the larger fragment. The signal decreases dramatically after 40 seconds for the smaller fragment. As the injection time increases, the resolution decreases, leading to increasing peak widths and decreasing peak heights. Increasing the injection time decreases the resolution, the deleterious effect on resolution is more pronounced for larger fragments.
No trade-off between increasing signal strength and increasing resolution exists when modifying injection voltage. Resolution with injection voltages of 319 V/cm (the highest possible setting) is often indistinguishable from resolution with injection voltages of 53 V/cm. However, lower voltages, which produce lower currents, are often preferable because injection timing is more accurate. Accurate timing ensures reproducibility in sample loading.
For MLPA, it is recommended to set the injection time to 15 seconds and adjust the injection voltage to a level that will give results where the reference samples fall within the recommended range. Signals that are used for quantification should be at least 3x the height of the baseline, are least 300 units (ABI and Megabace) and are optimally between the 1000-4000 units.
If you find that in later experiments the signals are too high or too low, simply rerun the same injection mixture with a shorter or longer injection time. Injection time and signal intensity are linearly related to each other. The injection time should however always be within the 10-45 seconds range or results may have a negative effect.
If after adjusting the electrokinetic time and voltage, the signal is still too weak or the resolution is poor, you may need to concentrate or desalt the samples. POP7 gel is the preferred gel in case a machine is used for both sequencing and fragment separation. Running on default settings results in shoulders around the peak that may give peak artifact and false positives when using the peak areas. POP4 gel is the preferred gel for fragment separation. Peaks will show little shoulders allowing a good estimation of the peak area. POP-4 POP-7 Polymere type In case you are using POP7 but want to utilize the peak areas for normalisation lower the run voltage may improve the peak patterns
Lower run voltage > longer run time > more diffusion
Decreasing the run voltage with gels originally meant for sequencing may result in less or no shoulder peaks
Incorporation of the +1/-1 nt peak into the probe peak and thus more accurately measure the probe product fluorescence
Run voltage and run time are related
Decrease run voltage > increase run time
Increase run voltage > decrease run time Run voltage After capillary electrophoresis, signals of MLPA products need to be qualified and quantified and subsequently compared to each other. MLPA probe products may be recognised by a peak of a certain length that should closely resemble the actual lenght of the probe. The amount of MLPA product can either be measured by the height of the peak or the area of the peak. Peak areas reflect the amount of fluorescence better but may be influenced by peak artifacts resulting in false positives, peak heights on the other hand are less sensitive to peak artifacts but may be influenced more by size to signal drop and signals are relatively lower, which may increase the amount of variation.
In practice, the metric that expresses the amount of fluorescence for each probe product best and should be used for normalisation may be dependent on your capillary device. Parameters that typically influence the peak heights and areas are the used gel type and run voltage. Improving your workflow Run voltage 8kv pop7 Run voltage 15kv pop7 Decreasing the run voltage from 15kv to 8kv results in integration of the -1nt peaks into the main peak. Even though this is desirable, lower run voltages also result in a decrease of the resolution. Setting up peak / probe detection parameters Organisation / projects / users In order to organise your data and that of other users that use the same database, Coffalyser.Net distinguishes a number of data levels which allows you to better organise and share data between users in your organisation. Each database can have one or more organisations, data within organisation can be shared among users that are part of that same organisation.
To add a new organisation right click on ‘Organisations’ in the solution explorer and select ‘Add Organisation’. You can share data within the same organisation among different users. New users can be added by opening the ‘Users’ from from the solution explorer and selecting ‘Add User’ from the right click menu.
After filling in the account details (needed for login) and the user details; users can be added to one or more organisations. In each organisation they may have a certain role that determines the rights they have to edit/add\delete data. Organisation users can only view data in project to which they are granted access, but they cannot delete data or adjust device settings. Organisation administrators may acces and adjust all data within that organisation while Server Administrator have full right in the complete software.
After creating an empty solution, users can add new or existing items to the empty solution by right clicking on the folder “Projects” in the organizations folder and selecting the option “add project”. Creating projects in organisation is recommended in order to keep your data organised and retrievable. Projects may also be used later in more advanced analysis methods. MLPA product are generally detected and quantified using CE devices. Detected fluorescent units are often displayed on arbitrary scales and the measured intensities may differ from device to device. Coffalyser.Net allows users to define the type of capillary instrument that was used for fragment separation and provides the default fragment analysis settings based on that selection.
Right click on ‘CE Devices’in the solution explorer and select ‘Add CE Devices’ from the context menu to add a CE device to your organisation. After selecting “Add device” a new window will open allowing you to define which capillary electrophoresis device your are using. Next to “CE device”, choose the machine you wish to use from the dropdown list. If you are unsure what machine was used for separation please contact your provider or check in the sample file details.
After the correct machine was selected, you should select the correct filter set that matches the chemistry that was used. The filter sets defines what fluorescent dyes certain channels in the machine recognize. The table contains the most common filter sets used by ABI. If for instance, you are using a FAM label for the probes and you are using LIZ for a size marker, then you need to select filter set G5.
By going through the different tabs of the CE Device properties window, you will be able to change the different device specific analysis settings. There are four types of settings that you may change, which are: baseline settings, peak detection settings, binning settings and filter settings. When you are working in a specific organization, CE device settings will be applied for all users that are working in that organization. In that case you need to be a administrative user to be able to change the CE device settings. Optimising your run Fragment analysis Experiment settings Quality control After creating an initial project, we can create experiment within this project. In each experiment data files can then be imported to the database and linked to this experiment. Users then need to define the experiment type and for each used channel or dye stream of each capillary (sample run) what the contents are. Each detectable dye channel can be set as a sample (MLPA kit) or a size marker. Samples may further be typed as: MLPA test sample, MLPA reference sample, MLPA positive control, or MLPA digested sample.
To create a new experiment within a project, right click on the project you wish to add the new experiment to, and select “Add experiment” from the right click menu. Directly after you create an experiment, you will be able to adjust the experiment settings and give the newly created experiment a name and description. The capillary electrophoresis device should already be filled in to be the default machine for that project. You may however choose to also include different machines within one project. After you click ok, the experiment will be created in the database, allowing you to continue to define the content of each channel.
After adding a name and description to your experiment, you may define the settings required to start the fragment analysis in the next form (see figure below). First we need to determine what type of experiment we are analyzing. There are basically 3 types of experiments, these being:
1) Copy number analysis (“DNA/MLPA [default]”):These are experiments that are performed using standard MLPA probes or custom probes that are designed according to the same rules. The used probes can only produce signals that are proportional to the amount of the DNA target sequences present in each sample. These experiments furthermore require data obtained from reference samples that were performed in the same experiment. This reference sample is usually performed on a sample that has a normal (diploid) DNA copy number for all target sequences.
2) Copy number / methylation status analysis (“DNA/MS-MLPA”):These experiment are combined experiment where both the copy number and methylation status of the probe target sequences are calculated in a single analysis. While the copy number part is equal to that described at point 1, the methylation status analysis, requires a digested sample result together with each standard MLPA sample result. For MLPA probes that contain HHA1 sites the methylation status can then be determined by comparing the signal that is proportional to the amount of the DNA target sequences present in each sample after digestion to the signal of the same target sequence of the same sample without digestion. In case only one of the two copies is methylated, the amount of target sequences available in the digested result will be 50% lower as compared to that of the undigested result.
3) RNA analysis (“RNA”):RNA experiments are quite similar to copy number experiment, except that the probe target sequences are directed to RNA sequences. Sample DNA is therefore often purified from genomic DNA in order to minimize contamination and required reverse transcriptase. In the analysis you may also set reference samples (e.g. zero control, or RNA from control tissues) in order to make a relative comparison. Alternatively users may only evaluate the intra-normalized results, thereby comparing each probe signal against one or two reference probe signals within the same sample.
After settings the correct analysis method, you need to set for each dye channel what the expected contents are. The channels are usually set correct as determined by your filter set. If your channels are not set correct, then click on the option box: “show all channels”, you will be able to select which channel you are using by ticking the option boxes in the first column called “nr” . The name of each dye should appear in the next column. Now you will be able to set the content type or “channel type” for each of your used channels by clicking on the dots or on the little arrow on the left side of the combo box in the channel content column. Channels are either set to “probes” indicating that in this channel peaks that can be related to a MLPA probe mix can be found; or the channel type can be set to “size marker”; indicating that this channel contains a size standard which can be used to compare the detected peaks against and give them a length in nucleotides. If you have set the content of a channel to be a probe mix, then you also need to define the products, lot and version number by using the probe mix selection form, which will appear after selecting the dots.
After settings the channel contents you will find some other settings behind the channel type. In case you have indicated that you are using a “probes” channel type you also need to set an analysis method for the probe mix. The default method that will appear in most cases is “block [default]”. Block analysis means that the available reference probes are used to normalize the samples against the reference samples. Normalization in this case refers to the division of multiple sets of data by a common variable in order to cancel out that variable's effect on the data. Reference probe are usually targeted to chromosomal regions that are assumed to remain normal (diploid) in DNA of applicable samples. In case a MLPA kit does not contain any reference probes, users may define their own reference set (see section 3.4 & 3.5) or use population method instead. In population analysis mode, all probes are used for normalization; this method is therefore only recommended in case the number of aberrations in each sample is expected to be very low (e.g. 1-2 aberrant probes target sequences in each sample). To change the analysis method click on the little arrow row define as a “probes” channel type. The last two columns “DNA type” and “marker” will automatically be set for your and require no more adjusting. Adjust settings Analysis settings Detected probes Detected peaks Probe detection Import files In the experiment form on the fragment analysis tab, right click anywhere on the grid and select 'Add (From File). This will open the 'Import Files' form. Now select the 'Add Files' or 'Add Folder' button and select the files or folders you want to import data from. You selected files should now appear in the window. Next select the import button and close the form after the import is finished.
For ABI-devices, ABIF files from all series can be imported (*.*fsa extensions); for CEQ-devices (Beckman) data from the CEQ-2000, CEQ8000 and CEQ8000 can be imported (*.*SCF or *.*esd extensions); for Megabace-devices data of all series can be imported (*.*rsd extensions). Select the “Add files” or “Add folder” and then select the files you wish to import in the explorer window. At this point the files are not stored in the database yet, click on “Import” and to decode the binary files and save them in the database. If all samples were imported correct, you can close this window to make the sample specific settings. After importing your samples and you have closed the file / folder import form”, the fragment analysis sample setup window will appear. This form allows you to adjust the sample types that you have used in your experiment. You can set 4 different sample types; either by using the key-shortcuts or by changing the combo box by double clicking on the cells in the second column called “sample type”. We distinguish the following types:
Samples or test samples (“key = s”), which will be normalized against the reference and are considered to be the unknown samples of which we want to know the copy number status of the test probes. For these samples we assume that the target sequences of the reference probes are normal or diploid for all autosomes or have an equal copy number as compared to the reference samples. In case no reference samples are defined in the experiment, each sample will be used as a reference. The data for each test probe of each sample will be compared to each other sample, producing as many dosage quotients as there are samples. The final ratio will then estimated by calculating the median over these dosage quotients.
Reference samples (“key = r”) are used to display the balance of the measured signal intensities between sample and reference. The data for each test probe of each sample will be compared to each available reference sample, producing as many dosage quotients as there are reference samples. The final ratio will then estimated by calculating the average over these dosage quotients. In case no reference samples are set, each sample will be used as reference and the median over the ratios be calculated. Next to this reference samples are used to estimate the effect of sample-to-sample variation on probe ratios of test probes by calculating the reproducibility of these probes in the reference sample population. These calculations may be more accurate under circumstances where reference samples are randomly distributed across the performed experiment.
Positive reference samples (“key = p”) are used to make an estimation of the behavior of a probe within a sample population with a known aberration. We can do this by calculating the distribution statistics for each probe over all sample ratio results of the same type. Next each unknown test sample result can be tested against several variables of that distribution, such as: the average, median, standard deviation, CV and 95% confidence range in order to calculate the probability that an unknown sample is equal of different to the distribution results of that sample type.
No DNA or blank controls (“key = n”) are analyzed MLPA experiments that do not contain any DNA. They are used to make sure not contamination has occurred during the performance of the experiment.
Digested samples (“key = d”), are all samples that were digested during the experiments and are used only to estimate the methylation status of each target sequence Set sample types When you are finished adjusting all the sample types, click on the button called “Start fragment analysis” to perform the all-necessary steps to qualify and quantify each of the probe signals. The screen will automatically update and present the quality scores for each sample after the analysis is finished.
After you click on the fragment analysis button the fragment analysis settings screen will open. This screen will allow you to change the basic and advanced fragment analysis settings that are unrelated to the CE-device settings. The form consists of three pages related to different processes of the fragment analysis. First tab contains all settings related to the peak recognition, at the top you can set whether to use a basic baseline correction or an advanced baseline detection method. The exact differences and effects can be viewed in the fragments explorer at the fragment analysis steps tab.
Probe recognition method
In most cases you will never have to change any of the fragment analysis settings. Coffalyser.Net uses a window or panel based approach to link peaks with comparable lengths to the same probe. In order to define these panels or bins we need to compare the peak information we have, with what is expected. This is done automatically during the analysis with an auto bin procedure. The more the lengths are comparable to the found lengths, the more chance that the procedure will find all probes successfully. Coffalyser allows two types of probe lengths to be used for the auto bin procedure: the probe design lengths, which are the real lengths of the fragments and the Coffalyser lengths. Coffalyser lengths are lengths that are filled by MRC-Holland to make the binning procedure more successful since they are based on the detected lengths found during the quality tests. Finally you can also filter you data based on a manual bin set by selecting the manual option. How to create a manual bin set is explained in the section "creating a manual bin set for data filtering". By selecting “Open” from the right click menu on the fragments analysis settings window, while hovering above a sample row, you can open the fragment results explorer window. This can also be done by double clicking on the samples row on one of the QC icons. The fragments analysis explorer will allow you to examine each of the separate analysis steps of the fragment analysis and also allows you to pinpoint more accurately where possible problems related to the fragment separation may have occurred. The fragment results explorer consists of 9 different tabs. Open fragment explorer In the fragment sample explorer go to the third tab that displays the baseline corrected signals of each dye/data streams that was set as a “probes” or “size marker” channel. Displayed signals of channels set as probe mix content will also show which signals were identified as peaks and what their relative length in nucleotides is. In this chart black triangle markers represent the position of the start of a peak, red circle markers represent the peak top and green asterisk markers represent the peak end. Above each peak top the estimated size called length is also displayed. By hovering over the peak top markers the tool tip information will appear showing the exact peak start, top and end data points, the peak height and the peak area. To make optimization of peak detection settings easier, the set minimal / maximum RFU and peak area% of the probes channels are displayed as line series. Displays the baseline corrected signals of each dye/data streams that was set as a “probes” or “size marker” channel. Displayed patterns will also show which peak signals were identified as a probes, labels furthermore show the design length of each probe, gene name and exon number. The coordinates of the peak top of the peak that was recognized as the main peak related to a probe contains a green circular marker in case the probe is a reference probe and a purple circular maker in case it is a test probes. Not all peaks were detected (adjust CE-device settings) Not all probes were recognised
(create manual binset) After the fragment analysis is finished all expected probes should be found and recognised. There are two reasons why probes are missing in your analysis. Either the CE-device settings (peak detection) were too high or too low and some signals were not recognised as peaks or probes, or the binset used to link peaks to probes to not correct. In the last case the peaks were probably detected but were not properly related to the probes. Because of problems arising from poor sample preparations, presence of PCR artifacts, irregular stutter bands, and incomplete fragment separations, a typical MLPA project requires manual examination of almost all sample data. Our software was designed to eliminate this bottleneck by substantially minimizing the need to review data. By creating a series of quality scores to the different processes users can easily pinpoint the basis for the failed analysis. These scores include quality assessment related to: the sample DNA, MLPA reaction, capillary separation and normalization steps (see figure below). Each collective quality score, or score that summarizes a number of aspects or factors starts with 100 points which can be correlated with high quality (or green). Depending on the importance and found severity of abnormality of each factor a number of penalty points are being given for each measured quality factor. The quality of each step can fall roughly into three categories.
High-quality or green. The results of these analysis steps can be accepted without reviewing.
Low-quality or red. These steps represent samples with contamination and other failures, which render the resulted data unsuitable to continue with. This data can quickly be rejected without reviewing; recommendations can be reviewed in Coffalyser.NET and used for troubleshooting.
Intermediate-quality or yellow. The results of these steps fall between high- and low- quality. The related data and additional recommendations can be reviewed in Coffalyser.NET and used to optimize the obtained results.
FRSS: Fragment Run Separation Score displays the quality of the fragment separation and peak sizing quality by evaluating the quality of the peaks in the size marker channel. To get to a final score several different criteria are evaluated that each have a penalty weight, which is subtracted from 100 start points or 100% ok. Each score that is dependent on the measurement of signal intensities has adjusted criteria that are dependent on the machine type. The method of quality assessment may thus different between machines, to find the exact criteria for each machine for the different quality control checks please check the tables in the appendix.
FMRS: fragment MLPA reaction score displays the quality of the performed MLPA reaction. To get to a final score seven different criteria are evaluated from the probe mix channel. Start score of the FMRS is 100 points.
FMRS check 1: signal Intensity of the probe fragments
FMRS check 2: maximum probe signal Intensity of the sample
FMRS check 3: Baseline Intensity of the probe dye
FMRS Check 4: Signal drop of the internal run of the probe fragments
FMRS Check 5: Percentage of unused primer
FMRS Check 6: Probes to peaks noise percentage
FMRS Check 7: Baseline curvature
FMRS Check 8: DNA concentration check
FMRS Check 9: DNA denaturation check
FMRS Check 10: DNA digestion check (only for MS) Quality control Basic analysis settings Iteration settings Slope correction settings Comparatitve analysis settings Sample selection Experiment explorer Sample explorer It is highly recommended if you are beginning to use MLPA, to perform an initial experiment with a selection of reference samples (normal diploid) you have available. In case you have positive controls these should also be included. This will allow you to optimise the workflow before starting with your (precious) test samples. The analysed results of the reference samples can further be used to make a selection of all the best reference samples and create a pool of this selection that may function as your general reference for all coming experiments.
Note, use this experiment to create a general setup for all coming experiment as well as optimising your capillary electrophoresis run settings. The first thing that needs to be done at the comparative analysis tab is the selection of samples that will be included in the normalisation. To make the selection easier you may use the right click menu to make a pre-selection of samples based on their FRMS score. Right click anywhere in the grid, select the option; “Select samples for comparative analysis”. Next select a level of quality you which to apply for the comparative analysis. Dependent on the setting of the study, e.g. research or diagnostic a higher quality level may be desired. It is highly recommended to only used referene samples of good quality since they influence the results of all test samples. You can further adjust the selection of samples by selecing sample based on the presence of the Y-Control fragment. This is recommended when analysing probe mixes with probes targeted to the sex chromosomes. The comparative analysis is mostly dependent on the selection of reference samples that passed the fragment analyis and the selection of reference probes. The comparative analysis may be adjusted by optimising the parameters used for the different steps, but in most cases default settings should suffice. Settings related to slope correction should not be edited at all, while iteration cycles are only recommended in large data groups and in a research setting. Match digested / undigested samples MS-MLPA only
In case your undigested and digested samples have almost the same names (e.g. sample01-Cut & sample01-Uncut) you can use the automatic sample matching function. Right click to open the context menu and select “digested samples (for MS-MLPA)” and then select “match samples automatically” (see figure below). This will enable a matching algorithm based on the Smith and Waterman method adapted for sample names. Each undigested sample in the first column will be matched against a digested sample in the collection, which will afterwards appear in the column “digested”. Because matching may not always be 100% successful users may adapt the matched sample by double clicking on any of the cells in the column “digested” and change it into the corrected sample. Please note that each undigested sample can only be matched against one unique digested sample. MS-MLPA analysis can be applied as an extension on the normal copy number DNA-MLPA analysis. In Coffalyser.NET copy number and methylations status analysis always occurs in a single analysis. Results of copy number and methylations status are then displayed together, making data interpretation easier. Interpreting MS-MLPA data with the copy number status is crucial, since methylation status is presented in the percentage of methylation of the target sequences. Without copy number information these percentages would be very difficult to interpret. During a DNA/MS-MLPA analysis, the normal DNA-MLPA analysis is initially performed, normalizing all samples of the type “sample” and “positive reference” against all available samples of the type “reference sample”. After the calculation of all distribution statistics the MS-MLPA analysis will follow automatically. Here, each sample of the type “sample” is matched against available digested samples by using a Smith&Waterman algorithm on the sample name. To ensure that that this matching is successful is it recommended giving the cut and uncut samples equal names in the capillary sample sheets. Samples that are for instance named “Sample1-Undig” and “Sample1-dig” will ensure correct matching. After each sample is matched, the methylation status normalization will follow normalizing the data of the digested samples directly against their undigested counter parts. During this normalization only a single reference sample exists for each digested sample (the undigested counterpart). Match automatically Match manually After finishing your selection of samples click on the button “Start comparative analysis”, which will open the comparative basic analysis settings form. Start comparative analysis After making all the correct matches click on “start comparative analysis”. Methylation specific normalization occurs always in the same way and the methodology cannot be adapted, the available settings thus only influence the analysis of the DNA-MLPA normalization. The methylation analysis method normalises each test probe of each test sample directly against its undigested counterpart by making use of the set reference probes. This method does not require any slope correction since the sample is the same on both side of the equation and a difference in sloping between the two is not expected. Start comparative analysis After the analysis is finished you will be confronted with a number of quality scores that may indicate the quality of the normalization, slope correction and overall analysis quality of each sample.
PSLP: Pre-normalization signal sloping probes displays the relative amount of signal to size drop of the probe fragments of a sample as opposed to the reference.
FSLP: Final-normalization signal sloping probes displays the relative amount of signal to size drop of the probe fragments of a sample as opposed to the reference after the signal have been corrected for signal to size sloping effects. This measurement checks if performed slope correction method was successful.
RSQ: Reference sample quality displays if relative probe signal inconsistencies existed in the selected reference sample population. The amount of variation is estimated by measuring the standard deviation over the calculated final normalized dosage quotients of each probe over all the reference samples.
RPQ: Reference probe quality displays if relative reference probe inconsistencies existed in the complete sample population. The amount of variation is estimated by measuring the standard deviation over the calculated ratios which are generated when a probe is normalized against each separate reference probe during each sample to reference normalization.
CAS: Coffalyser analysis score displays the quality of the complete analysis of a sample that comprises all quality points calculated during the fragment and comparative analysis into a single score. Reanalysis Results may often improve by reanalysing data with different parameters. The factors that influence the final results the most are the set reference samples and set reference probes. By looking at the results of the reference samples we can thus make a reselection and simple change the sample types to redo the selection. In case some reference probes underperform in your experiment, the work sheet then can be adapted and extra reference probes can be added or reference probes can be removed. Finally you may choose to change the metric, slope correction method and/or perform an iterative analysis. X On this form you can adjust some of the settings that will influence some of the most basic analysis settings. On default all settings are set to auto, resulting in a multistep analysis where the best settings are chosen dependent on: the number of samples and their sample types, the MLPA mix, presence of reference probes and results obtained from earlier steps during the analysis. By using the different tabs users may influence the parameters for the different steps of the comparative analysis. On the first tab we find the basic normalization settings.
Normalization metric: the normalization metric is the system of measurement of each detected probe that will be used during normalization. If this option is set to “Calculate best (signal to noise) [default]” each possible probe metric will be compared to each other, and the metric showing the highest signal as opposed the amount of noise will be used for normalization. Users may furthermore choose if they want to use: peak heights, peak areas, or peaks areas including their siblings are used for normalization. Peak areas plus siblings means that all peaks that passes the minimal peak detection thresholds and fall within the bin set of a probe are summarized and used for normalization.
Normalization factor (intra): during normalization of a test sample against a reference sample each test probe will be normalized using each set reference probe thereby producing as much ratios as there are reference probes. To create a final estimator (dosage quotient) for each test probe the “normalization factor (intra) will be taken over these ratios. On default (auto) the factor is set to median, thereby allowing some of the reference probes to be aberrant. User may however also choose to use the average, minimum of maximum of the collected ratios. Minimum and maximum should be avoided unless you are choosing this for a special kind of analysis.
Normalization (inter): in the presence of multiple reference samples each test sample will be compared to each reference sample, thus generating as many dosage quotient or ratios as there are references samples for each probe. In order to obtain a single result for each sample probe a for these dosage quotients the “normalization factor (inter)” will be taken over these ratios. When this option is set to auto the average will be taken when there are more than 2 reference samples present, if no reference samples are available all samples will be used as a reference and the median will be taken. User may however also choose to use the minimum or maximum, which should only be chosen if you are choosing this for a special kind of analysis.
Arbitrary ratio border (low/high): the arbitrary borders are the set borders where we expect normal results to fall in between. In figure 32, a delta of ratio 0.3 is set as opposed to the reference (which is always 1), resulting in a normal range of ratio 0.7-1.3 for results that appear to be normal or equal to the signals found in the reference samples.
Slope correction: slope correction aims to correct the drop in fragment signal intensity to the length of each fragment that is unrelated to the number of target sequences available in each sample. When this option is set to: “auto (if>15%) [default]”, slope correction will only occur if the difference in sloping between reference and samples is more than 10%. If this difference is less than 12% sloping correction may not be required because the normalization itself will then resolve this issue. You can furthermore choose to always do the slope correction or never.
Normalization cycles: the normalization cycles refer to optional experimental iteration of results. Iterative normalization means that all samples will be completely analyzed where after the results will be automatically interpreted and a new normalization starts with new parameters based on the results of the previous normalization. This method allows a number of methods, which are discussed more extensively in the advanced analysis section. In short, each sample may obtain sample related reference probes and reference samples, which were found to be normal or equal in the previous analysis. This method works best in case you have a large sample collection and no reference probes and you do not have any background information about the samples. The second tab contains the analysis details concerning the slope correction of the data. Slope correction of data may be necessary in case there is too much difference in the signal to size drop between reference samples and test samples. A difference in signal to size sloping may cause the ratios of the shorter probes seem to be gained and the ratios of the longer probes may seem to be losses while this is actually caused by a difference in fidelity of the polymerase between the reference sample PCR reaction and the unknown sample PCR reaction. In case the difference in signal to size drop is minimal, no slope correction is necessary and we also recommend it in such cases since regression analysis is much more sensitive as compared to regular normalizations. This signal to size drop is caused by a decreasing efficiency of amplification of the larger MLPA probes and may be intensified by sample contaminants or evaporation during the hybridization reaction. Signal to size drop may further be influenced by injection bias of the capillary system and diffusion of the MLPA products within the capillaries. You can change several settings in order to optimize the slope correction procedure.
X metric: main metric that will be used for the regression analysis's on the X-axis. For each probe signal we can apply either the lengths or data points related to the probes.
Y metric: by changing the Y-metric you can influence whether the raw signals will be corrected or the pre-normalized ratios. Instead of the correcting the signals, the pre-normalized ratios calculated in the first normalization of all data in population mode may also be corrected and normalized afterwards.
Log correction of signals: determines whether or not the signals used for regression analysis are first converted to a log scale before creating the regression line.
Major outlier filter (high / low): this first outlier filter is used to ignore signals based on the pre-normalized ratios. By setting a very rude filter you can ignore signals that are very aberrant, this will help with better fitting of regression lines.
Ignore major outlier filter (for dynamic detection): determines whether or not the probes that were detected, as major outliers should be left out the regression line dynamic detection method (see outlier detection method).
Outlier detection method: determines the way how outlier signals (probes) should be determined before plotting a regression line through the signals. Note! Outlier detection methods should only be applied on regression lines of the type Least squares or polynomial. The local linear method and least squares local median are methods that already ignore outlier by their methodology and extra outlier detection may make these methods less robust. On default settings the number of iteration round is set to 1, which means a single round of analysis without further adjustment of settings. To use the iteration, the number of rounds need to be at least 2, and in most cases when using just 3 rounds the iterations is optimal. You can furthermore change several settings in order to optimize the iteration procedure.
Experiment reference probe filter: this filter adapt the reference probes influences the way the reference probes are selected in the next round of normalization. The filter uses the statistical results as found over the combined samples of the types reference sample and (test) sample. Depending on the type of filter the effect of each settings should be combined with the 'Experiment probe reference filter (low/high) and the 'Experiment reference probe std. dev. filter (medium / high)'. In case the filter is set to low, medium, high or incremental; the probes that have an average ratio, as calculated over the reference samples or over all test samples, that is outside the 'Experiment probe reference filter' will NOT be used as reference probes. In case the filter is set to medium; the probes that have a standard deviation, as calculated over the reference samples or over all test samples, that is higher than the 'Experiment reference probe std. dev. filter medium' value (0.2 at default) will NOT be used as reference probes. In case the filter is set to high the same rule applies but now the standard deviation is compared the maximal values set under high (0.1 at default).
Extend reference probe collection: in case this option is enabled the selection of reference probes is extended to all probes that pass the criteratia at point 2. If this option is off, the criteria will only be applied at the reference probes that are set in the first round of normalization, this selection is dependent on the used analysis method. If the analysis method was set to block then all selected reference probes in the active sheet were used; if the analysis method was set to population then all probes were used as reference probes.
Only use 'equal' called reference probes: this option limits the use of the earlier selected reference probes to those that were earlier found to be equal to the reference samples collection. What the criteria are for a probe to be equal to the reference sample collection is explained further down in this chapter. In short probes that are equal to the reference sample collection are fall within the 95% confidence range of this population and do not cross the arbitrary set borders (default 0.7-1.3).
Probe minimal reference samples: this settings defines the minimal number of reference samples that should be left in the end of the analysis per probe. In most cases the set reference samples will be equal for each sample, however when using the options only use equal called reference probes and 'reference sample filter [less than median Z-score], the used reference sample and probes may be different for each sample and a minimal number of reference samples that should remain is recommended.
Reference sample filter [less than median Z-score]: this option enables users to minimize the reference sample collection by decreasing the used reference sample signals each round by half. If we for instance start with 10 samples and no reference samples the first analysis will use all samples as reference samples and the final estimator for each ratio will be estimated by taking the median. By applying the Z-scores the reference samples will be limited to the signals that had a Z-scores that is lower than the median Z-scores overall samples divided by two. This basically minimizes the used signals to the 50% that are closest around the original reference set. By increasing the number of cycles the number of used reference samples will be divided in two, each round, until the minimum number of reference samples is reached.
Extend reference sample collection: extending your reference sample collection means that we can use the data of all samples in order to create a new reference sample collection. This option only has use in case you are already using a collection of reference samples but you want to increase this set automatically. It should be noted that this option is OFF on default, because it may skew the results. Statistics Ratio overview Coffalyser.NET provides two ways to evaluate the results: exploration of the results of the complete experiment or exploration of results of a single sample. To open the experiment explorer: right mouse click on the grid showing the quality scores and select from the right click menu “Open experiment results”. The comparative analysis experiment explorer has three tabs allowing getting a quick overview of the results of the complete experiment. Electropherograms Sample report Ratio chart Sample result overview The first tab shows the ratio results of all samples for each probe in a sorted grid. The probes are displayed on the rows while the columns may contain a number of hierarchies. Hierarchical levels allow you to hide or show information by clicking on the plus sign in the left top of the column headers. You may for instance click on the plus sign of the column header or double click anywhere in the header that states “probe target info”, which will then open the columns: probe name, chromosomal position, hg18 position, probe length, recommended order, normal copy number, normal methylation percentage male, normal methylation percentage female. You can use the hg18 position, probe length and recommended order to sort the whole grid by clicking on the column header cells. The top levels of the columns contain a maximum of 4 entrees: probe target info, all samples, reference samples and positive samples. Each sample type group then contains all samples underneath it which levels are already opened on default. Each sample can be furthermore opened to display separate information about each detected probe peak. The levels underneath each sample are closed on default and can contain the following levels: peak signal, intra normalized ratio, pre-normalized ratio, ratio without iteration, final ratio, standard deviation, distribution comparison results against the samples of the type 'sample', distribution comparison results against the samples of the type 'reference sample', distribution comparison results against the samples of the type 'positive reference sample'. This information may also be summoned by hovering above a cell in the grid; a tool tip control will then provide all available data for that result. For more information about these different normalized ratios and distribution comparisons values also see the FAQ in the end of this document or published articles about the methodology behind Cofaflyser.NET (J. Coffa, 2011; J. Coffa 2008). The probes in the grid are on default sorted by the recommended order; if this information is not available the hg18 tracks will be used for data sorting. Each row or probe is related to a certain region which, depending on the settings, will group probes together by giving them a certain color. On default probes that have their target sequence to the same chromosomal arm are grouped together. Cells that contain probe ratio results can be colored in different ways depending on the set conditional format. In general cells that are colored red have decrease signal intensities as opposed to the reference sample collection, cells that are blue have increased intensities. By using the right mouse click menu you may find a scala of options allowing you to adjust the grid and/or export the data in different formats. The statistical overview grid displays data in a similar fashion as the heat map grid, however instead of displaying the samples and their data this grid display the calculated statistical data for each probe over samples with the same sample type. For each probe the following statistical values are calculated over all samples of the same sample type: average, median, minimum, maximum, standard deviation and MAD (median of absolute deviations). By using right mouse click menu, region coloring may be changed or the grid may be exported in a similar way as described for the heat map grid. The right mouse click context menu contains the same options as earlier described for the ratio overview grid.
Note that it are these values that are used to estimate the (95%) confidence ranges for different sample types that are displayed as boxplots in the charts. Chart statistics The last tab shows a statistical overview chart that loads with the statistical results found over all samples of the same sample type. All probe results are displayed as ratios on the Y-axis, the X-axis will on default load on displaying the map view locations of the target sequences of the probes obtained by the hg18 tracks generated by UCSC and collaborators worldwide. The labels above the probes on default load with a text field containing “probe length - gene name of target sequence – exon number within gene of target sequence”, e.g. “126 – DMD – 01”, which thus suggests that this probe had a design length of 126 nucleotides and was targeted to exon 1 of the DMD (dystrophy) gene. The different vertical stripes or color bands indicate which probes fall within a certain region. On default the chart will load placing all probes within one region that are located on the same chromosome arm. Other regions include: chromosome, chromosome band or MRC-Holland defined regions. On default information on user defined regions are filled in by MRC-Holland. All results are furthermore organized according to the MRC-Holland recommended order. In practice this often means that test probes and reference probes are separated and sorted by the hg18 tracks, if no recommended order exists results will automatically organized by the hg18 tracks.
Results of all samples of the same sample type for each probe are displayed by a box plot (also known as a box-and-whisker diagram or plot) graphically depicting the results in groups of numerical data through their five-number summaries: the smallest observation (theoretical minimum), lower quartile (Q1), median (Q2), upper quartile (Q3), and largest observation (theoretical maximum). A box plot may also indicate which observations, if any, might be considered outliers. The quartiles of a set of values are the three points that divide the data set into four equal groups, each representing a fourth of the population being sampled. IQR is the distance of Q1 to Q3 thus containing 50% of all values, which is depicted in the chart by the yellow box. The theoretical minimum is then estimated by Q1 minus 1.5xIQR and the theoretical maximum by Q3 plus 1.5xIQR, which are displayed respectively as the lower and upper whiskers. If results exist that fall outside the range of the theoretical minimum and maximum they will be displayed as by black round markers for the minimum and black triangle markers for the maximum values. The average is depicted in the chart by a blue cross and the median by a red stripe.
Whenever the mouse hovers above any of the displayed symbols extra information will be displayed by a tooltip box. This allows you to quick find the exact result numbers and additional information about the probe and its target sequence. The right click context menu enables you to customize the chart completely.
In the context menu under right click you may find several options that will allowing you to customise the chart, print or export and save in different formats. Most of the results in the experiment explorer are similar in DNA/MS-MLPA mode as described earlier. When you select a distribution type, you will find extra distribution for each sample type separately for the DNA and MS-MLPA analysis. You may for instance display the results of the reference samples MS you can easily evaluate the reproducibility of each probe in that experiment for the methylation status, assuming the selected reference samples were genetically equal and the reference samples were properly dispersed through the experiment.
In the heat map grid, each digested sample MS-result will be loaded directly next to its undigested sample DNA-result. In the figure below you may for instance view the DNA/MS-MLPA results of the ME028 Prader Willi mix. In the left column you may find the reference samples which are normal ratio 1 for the DNA-MLPA results, while the SNRPN have a normal methylation status of 50%, in these samples displayed by a red cell ratio 0.5. Other probes also known as digestion control probes will not have a signal at all. The tab of the comparative analysis experiment explorer containing the statistical overview grid will also automatically be extended with a separate level for each sample type for all methylation results. By selecting the cells, probes that have a HHA1 site will be highlighted in a brown color. DNA/MS-MLPA experiment explorer The comparative sample results explorer may provide a more detailed view to the results of each sample separately to get a closer view. To open the experiment explorer: right mouse click on the grid showing the quality scores and select from the right click menu “Open sample results”. Alternatively the sample results explorer may be opened through the Comparative Experiment Results Explorer. The comparative analysis sample explorer has 4 tabs allowing getting a comprehended view of the results of the selected sample and the statistical significance of that result within the experiment. The first page gives an overview of the used analysis settings and the main quality control factors over the fragment and normalised data. On the second tab you can find all information concerning the target sequence of each probe and also all relevant information of the peak signal that was related to that probe. While most displayed fields are equal to the columns as described for the experiment explorer, there are two extra columns which were not discussed yet, the RSQ (reference sample quality) column and the RPQ (reference probe quality) column. The RSQ accounts for the part of the standard deviation of each probe that is calculated over the ratios when applying multiple reference samples. The RPQ account for the part by the usage of multiple reference probes. The final standard deviation is estimated by combining these two 2 factors.
By using the right mouse menu this grid may be exported to a file in *.*csv, *.*HTML, *.*XML document or *.*XML spreadsheet format. More important by using the right mouse click sample pdf reports may be generated. Coffalyser.NET allows the generation of two types of pdf sample reports.
A single page report where all data of the three sample explorer tabs are put together in landscape modus and a two-page report, which also contains extended information. The dual page contains all relevant quality control information on the first page together with a larger sample chart and electropherogram. The second page contains a report of all probes and their target information. Next to this the peak height, peak area, total peak area in the probe bin, population normalized ratio, slope corrected ratio, final ratio, reference sample quality standard deviation, reference probe standard deviation, final standard deviation, distribution comparison values, peak width, expected peak length and delta to that expected length are also added in the report. In case any of the columns contain values that were found to differ from the rest they will become bold. Note that the expected lengths are the lengths of the peak that were used as the center for data filtering. These values are commonly based on the entire data set and peaks are not expected to differ much from their expected length (<0.5 nt). Single page PDF report PDF reports Extended region report The third tab shows a sample chart displaying the ratios results of the last normalization step. On the Y-axis the probe ratios are displayed of the sample that was selected in the left list box. You may switch samples by either using the cursor keys or by selecting a sample from the list use a mouse click. After each sample of the type “reference sample” you will find the tag “[r]” and sample of the type “positive reference” will have an added tag “[p]”.
Each black, red or purple circular marker points indicate the result of a single probe in the selected sample. On default the X-axis loads with the hg18 track map view locations and the labels display a “probe design length probe gene name – probe gene exon number” notation. The found whiskers at each probe marker ratio indicate the estimated 95% confidence range for that signal. These confidence ranges are estimated by combining the found discrepancies of the estimated dosage quotients by the used reference probes and/or reference samples. The estimated variability of each probe in the used reference collection may thus provide information if that probe was found to be reproducible in the performed experiment and the variability found over the used reference probes may indicate if the quality of the normalization was adequate.
Confidence ranges for all samples of the same sample type for each probe are be displayed by a box plot, on default this box display the estimated 95% confidence range as found over the reference samples. A found single probe result thus has a higher probability to be different from the reference population if the estimated 95% confidence range of that signal does not overlap with the box of the reference sample population. Single sample probe results that fall within the 95% confidence range of the reference sample population will be displayed as by black round markers, if the results fall outside of the 95% confidence range but are still between the set arbitrary borders, they will be displayed by purple circular markers, in case they also fall outside the arbitrary borders they will be displayed as red round markers. Finally we may find results that fall within the 95% confidence range of the reference sample population but fall outside of the set arbitrary border. Such contrary results are marked by a yellow colored circular marker and are also called ambiguous.
The displayed regions listen to the same functionality as described at the comparative analysis experiment explorer statistical overview chart. The tool tip controls display the basic statistics for all the probes that fall within that region based on their final estimated ratios. Chromosomal aberrations often-span larger regions, which allow probes targeted to that region to cluster together by sorting. This data may aid in determination if all signals of the probes that fall in one region are either or decreases as opposed to a certain population. MLPA kits generally contain about 40-50 probes targeted to mainly the exonic regions of a single or multiple genes. Each oligo-probe consists of two hemi-probes, which after denaturation of the sample DNA hybridize to adjacent sites of the target sequence during an overnight incubation. After the overnight hybridization adjacent hybridized hemi-probe oligo-nucleotides are then ligated. Only when both oligos are hybridized to the template DNA can they be ligated, permitting subsequent exponential PCR amplification. Statistics and comparison Confirmation of found results is often not only desirable but also imperative in order to get an indisputable assessment. The displayed electropherogram descent directly from the baseline corrected original data stream created by your capillary electrophoresis device. Even though a line chart may be visible the original data stream consists of separate time points or data points. The sample electropherogram tab on default present the data point on the x-axis and the relative fluorescent units on the y-axis. To make data interpretation easier the design probe length are displayed underneath the x-axis at the data point level of the detected peak top of that probe. A peak that was related to a probe will furthermore have a circular marker at the detected peak top data point to relative fluorescent units.
By hovering above this marker you may view different information about this detection peak including probe target information and probe ratio at the different stages of analysis. In the right mouse click menu you may find options allowing you to export, print, save, zoom and adjust the labels of the chart. The option “Lock current sample” in the right click menu will split the chart area in two. The upper part of the chart area will then show the results of the sample that is displayed at the moment of locking, while the lower part listens to the original functionality. It should be noted that due to differences in separation speed between different channels, peaks might appear to be a slightly different positions. At full automatic zoom methods there will be corrected for these differences. Results interpretation of clinically relevant tests can be one of the most difficult aspects of MLPA analysis and is a matter of professional judgment and expertise. In practice, most users only consider the magnitude of a sample test probe ratio, comparing the ratio against a threshold value. This criterion alone may often not provide the conclusive results required for diagnosing disease. MLPA probes all have their own characteristics and the level of increase or decrease that a probe ratio displays that was targeted to a region that contains a heterozygous gain or loss, may differ for each probe. Interpretation of normalized data may even be more complicated due to shifts in ratios caused by sample-to-sample variation such as: dissimilarities in PCR efficiency and size to signal sloping. To make result interpretation more reliable our software combines effect-size statistics and statistical interference allowing users to evaluate the magnitude of each probe ratio in combination with it’s significance in the population. The significance of each ratio can be estimated by the quality of the performed normalization, which can be assessed two factors: the robustness of the normalization factor and the reproducibility of the sample reactions. To evaluate the robustness of the normalization factor our algorithm calculates the discrepancies computed between the probe ratios of the reference probes within each sample. Our normalization makes use of each reference probe for normalization of each test probe; thereby producing as many dosage quotients (DQ) as there are references probes. The median of these DQ’s will then be used as the definite ratio. The median of absolute deviations between the computed dosage quotients may reflects the introduced mathematical imprecision of the used normalization factor. n statistics, the median absolute deviation (MAD) is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample.
Next, our software calculates the effect of both types of variation on each test sample probe ratio and determines a 95% confidence range. By comparing each sample’s test probe ratio and its 95% confidence range to the available data of each sample type population in the experiment, we can conclude if the found results are significantly different from e.g. the reference sample population or equal to a positive sample population. The algorithm then completes the analysis by evaluating these results in combination with the familiar set of arbitrary borders used to recognize gains and losses. A probe signal in concluded to be aberrant to the reference samples; if a probe signal is significantly different as from that reference sample populations and if the extent of this change meets certain criteria. The results are finally translated into easy to understand bar charts and sample reports allowing users to make a reliable and astute interpretation of the results. During the analysis our software estimates the reproducibility of each sample type in a performed experiment by calculating the standard deviation of each probe ratio in that sample type population. Since reference samples are assumed to be genetically equal, the effect of sample-to-sample variation on probe ratios of test probes can be estimated by the reproducibility of these probes in the reference sample population. These calculations may be more accurate under circumstances where reference samples are randomly distributed across the performed experiment. Multiple reference samples should serve a double function, they should be used to normalise each test sample, and they should be compared to each other. After normalising the reference samples to each other, the average en standard deviation should be calculated per probe that can then be used to estimate a 95% confidence range (average +/- 2*stdev). The more reproducible the probe was in that experiment the smaller the standard deviation and 95% confidence range will be. An effect that is commonly seen with MLPA data is a drop of signal intensity that is proportional with the length of the MLPA product fragments. This signal to size drop is caused by a decreasing efficiency of amplification of the larger MLPA probes and may be intensified by sample contaminants or evaporation during the hybridization reaction. Signal to size drop may further be influenced by injection bias of the capillary system and diffusion of the MLPA products within the capillaries. This effect has to be corrected before normalisation. If this is not corrected before normalisation, shorter probes may look falsely gained while longer probes seem to be lossed. Size to signal drop is commonly corrected by performing regression analysis, which basically is a form of internal normalisation. Small boxplots / distributions found over the reference samples indicate these probes were reproducible in this experiment. Small error bars or test probe confidence range indicate good performance of the reference probe. The large boxplot found over the reference samples indicate that this probe was not reproducible in this experiment. Because the 95% confidence range is very large the found sample result distribution has an overlap with the distribution of the reference samples and cannot be called to be significantly (95%) different even though it passes the arbitrary border. In this example both distributions are completely overlapping, because of this extreme variation found in the reference samples, any found decrease in test samples does not have any real meaning (since a normal sample may provide the same result based on the statistics). By looking at the entire workflow we can recognise the most important areas that can be optimized in order to minimise structural differences between test and reference samples. The three areas that influence the final results the most are the experimental procedure, the capillary separation and the data analysis strategy. Each of these areas may influence the results in their own way and should thus be carefully considers. Especially chemical impurities that influence the PCR fidelity, introduced during fixation and extraction procedures need to be minimised and should be equal over all samples, since they may influence the amplification efficiency differently for each probe. This kind of variation is all but impossible to remove and may pose serious problems for the normalisation and iterpretation of MLPA results. During capillary electrophoresis the main goal should be to ensure that the overal signal intensity is in the proper range for quantification on that device. Most capillary devices are designed for sequencing and are therefore seldom optimised to quantify fragments. Next to this, all samples should be in the same range of signal intensity, this minimises the amount of variation and makes data interpretation easier. Finally we may optimise our data analysis strategy by optimising the reference probe / sample selection and by ensuring only MLPA reactions of adequate quality are used. Samples that do not meet the proper requirement should be recognised and left out of the analysis since they may influence the end result and possibly provide results that may falsely be seen as genetic aberrations. Switching between the experiment and sample explorer allows you to get a better perspective of separate sample results and the meaning of the results in the larger collection of samples. Probes that showed to be very variable in the reference sample collection will have wide distribution and results are therefore unreliable. This will not only be displayed in the experiment results but should also be visible by the boxplots in the sample charts and in the comparison values (>>*, >*, ?, =, <<*, <*). Comparison values are the result of a comparison of each probe ratio and it's estimated standard deviation to sets of samples of the same sample type and the set arbitrary borders. Most importantly, the set of used reference samples is used for comparison and should give an impression of the reproducibility and thus reliability of each probe in that experiment. The comparison results are display both as symbols but also as specific colors in both the grids and charts. Note that the asterix indicates the calculated is outside the set arbitrary borders, while the larger than symbols (>) indicate how many standard deviations the result different from a certain sample population.