require 'soap/wsdlDriver'
wsdl_url = "http://bionmf.dacya.ucm.es/1.0/WebService/BioNMFWS.wsdl"
driver = SOAP::WSDLDriverFactory.new(wsdl_url).create_rpc_driver

The Web Service provides an API to utilize the services provided by the system in a programmatic way. The functionality mirrors that of the web page, but with some minor differences to simplify its use. The API is basically asynchronous. All the actual analysis methods launch jobs on the server that get registered, and an identifier is returned to the client so that it can query the status of the job and gather the results when the job is done.
The results for the job are provided in two ways, the main one only deals with the essential results for each job. The actual results that are produced depend on the analysis performed itself and the parameters used. To gather the results you first query the server for a list of results identifiers, with this identifiers you can gather the actual contents of the results. The description of the methods bellow go into details on what are these results specifically in each case.
When using the web interface you are provided with more results than just the essentials, including images used to asses of validate these results. Because many times these results will probably not be the ones consumed directly by the clients software, but intended for user viewing. We don't provide them by the same means as the essential results, however we provide a method that allows the client to download a file bundle in tar.gz format, so they are still accessible via the Web Service.
The easiest way to connect to the server is to load the bioNMF's WSDL file located at http://bionmf.dacya.ucm.es/1.0/WebService/BioNMFWS.wsdl. This will prepare the driver with all the functions in the API.
For example, this can be done in Ruby Language as follow:
require 'soap/wsdlDriver'
wsdl_url = "http://bionmf.dacya.ucm.es/1.0/WebService/BioNMFWS.wsdl"
driver = SOAP::WSDLDriverFactory.new(wsdl_url).create_rpc_driver
bioNMF accepts as input TAB separated text files, which might or might not contain row labels and/or column headers as well as a short description string at the beginning of the file. Each header, label or the description string might be composed by several space-separated words and/or numbers, but it is limited up to 31 characters.
See an example of input file formats: with labels and without labels (if you use any of these matrices, please use one of the preprocessing methods listed below in order to set all their values as positive).
Important note: Please fill all matrix entries with numbers or labels since "empty values" are not supported (ie. consecutive TAB characters will be processed as a single one).
In addition, bioNMF also accepts data binary files encoded using IEEE little-endian byte ordering, as well as UTF-8 for labels. Data must be written in the following order:
n): a 32-bits signed integer.m): a 32-bits signed integer.n-by-m 64-bits floating-point values stored in row-major order (ie. contigous elements belongs to the same row).n tab-separated strings in ASCII format (up to 31 characters per item).\n) character (in ASCII format).m tab-separated strings in ASCII format (up to 31 characters per item).\n) character (in ASCII format).\n) characters are mandatory if any of headers, labels or the name fields is set.
Performing analysis using the web service usually entails uploading a matrix and setting the preprocessing defaults for it; then calling any of the three methods that perform the analysis you need, and query the status until the job is finished; after that you can retrieve the list of results, which are just the identifiers, and gather those you are interested on; or even retrieve the whole list of output files, which include the results and other useful information for the user.
upload_matrix: matrix => matrix_id
Uploads a matrix into the server, it returns a string that represents an identifier for that matrix. Using this identifier you can launch job to process this particular matrix. It also sets the default values for the matrix preprocessing. The matrix passed as a parameter must be a string following the format explained before.
preprocess: matrix_id, transpose, normalization, positive
Sets the preprocessing parameters for the given matrix.
normalization: A string to specify the normalization that the matrix must undergo:
No" (default): No normalization.SubGMean": Subtracts the global mean. The global mean of the data matrix is calculated and then subtracted from all data items.SColsNRows": Scales columns, then normalize rows. This is the approach proposed by Getz, et al. (PNAS 2000) that first divide each column by its mean and then normalize each row.SDRows" (mean=0, std=1 by rows): Each row of the data matrix is transformed in such a way that its mean will be zero and its standard deviation will be 1.SDCols" (mean=0, std=1 by columns): Each column of the data matrix is transformed in such a way that its mean will be zero and its standard deviation will be 1.SubMRows": Substracts mean by rows. The mean for each row of the data matrix is calculated and then substracted from all data items of that row.SubMCols": Substracts mean by columns. The mean for each column of the data matrix is calculated and then substracted from all data items of that column.SubMRowsCols": Substracts mean by rows and then by columns. The mean for each row of the data matrix is calculated and then substracted from all data items of that row. In a subsequent step, the mean for each column of the data matrix.is calculated and then substracted from all data items of that column.positive: Method to make data matrix positive.
No" (default): No transformation.SubMin": Subtracts the absolute minimum. This a very simple method to make positive data. The minimum negative value is subtracted to every single cell of the data matrix.FoldRows": Fold data by rows. This approach was used by Kim and Tidor (Genome Res. 2003) for the analysis of log-transformed gene expression data. Every row (item) is represented in two new rows of a new matrix. The first one is used to indicate positive expression (up-regulation) and the second one to indicate a negative expression value (down-regulation). This process doubles the number of rows of the data set.FoldCols": Fold data by columns. Similar to the above case but this option makes the data positive by folding columns (variables).ExpScal": Exponential scaling. Data is exponentially scaled to make it positive. This is an inverse operation of a logarithmic transformation.sample_classification: matrix_id, start, end, runs => job_id
Starts a job that runs a number of executions of NMF in order to determine the optimal number of factors. It clusters samples and compares the results to produce cophenetic correlation coefficients, used to estimate the stability of the NMF factors. These cophenetic coefficients can be used to determine the most stable number of factors to use in the analysis.
It returns a string representing the job identifier. This identifier is used to query the status of the job, and ask for the results when finished. Start, end and loop are integers that determine the range of factor numbers to evaluate, and how many runs must be executed for each factor. The result of the job is a vector of cophenetic coefficients, measuring the factor congruency across runs of NMF for the same number of factors. It also returns the best factor number, the one with the maximum coefficient as another result; as well as the cluster number to which each sample (column) would be classified, for each of the different possibilities for the number of factors; and the factor to which the rows are classified.
standardNMF: matrix_id, factors, maxfactors, runs, iterations, convergence => job_id
Start a job that runs a standard execution of NMF. As before, it returns a job identifier. The second parameter is an integer representing the number of factors to use in the factorization. The result is two matrices, W and H. The main parameter is "factors", the number of factors to use in the factorization. If "maxfactors" is smaller or equal to "factors" it will only perform factorization with that number of factors, if its bigger it will first find the optimal number of factors in that range using cophenetic analysis, like in the previous method. The parameter "runs" specifies how many runs to perform for each factor, the results of each of the runs will be combined to form a single output factorization. The value of "iterations" specifies the maximum number of iterations to perform for each factorization, while "convergence" specifies the convergence stop criterion.
Usable defaults for these parameters could be:
iterations = 2000convergence = 40biclustering: matrix_id, factors, maxfactors, runs, iterations, convergence, sparseness => job_id
Starts a job that performs biclustering analysis using NMF. The parameters function identically to the previous case. The new parameter "sparseness" specifies how sparse we want the factor values to be. A usable default is 0.5. The result is a number of row and column index lists, one for each bicluster. So if three biclusters are found, there will be 6 results: column indexes for bicluster 1, row indexes for bicluster 1, column indexes for bicluster 2, row indexes for bicluster 2, column indexes for bicluster 3 and row indexes for bicluster 3.
status: job_id => status
Returns the status of the job:
results: job_id => vector of ids
Given a job identifier it returns a vector of identifiers for the results. The actual results can be retrieved using this identifier with the method "get_result". The actual meaning of the results differs on the type of job ran, but it is always a string:
get_result" method.
get_result: res_id => string holding the result
Returns the actual result referenced by the res_id as a text string. It may contain newline characters.
get_results_bundle: job_id => tar.gz bundle of files (Base64 encoded!)
Depending of the job you launched and the parameters used, the system will produce a number of files apart from the ones returned by the "get_result" method. While theses files are not essential results of the task, they might be of interest for exploratory or assessment tasks. All these files can be accessed using this function. Note that because it is binary, the file is transmitted as a base64 encoded string. The client might need to decode this string before writing it to a file using an appropriate function. This depends on the specifics of the client (ie. programming language, platform). For Ruby's soap4r you will need to explicitly decode it, Perl SOAP::Lite seems to do it automatically for you.
info: job_id => log of the process
This method can be used when an error has been produced, it returns the log generated by the system and can help to spot problems. Takes as argument a string containing the job's ID, and returns a string containing the log.
Here you can find some example scripts.
If you use this software, please cite the following work: