A Galaxy tip from the Pros | January 14, 20213 min read
by Daniel Blankenberg
Change a datatype on multiple datasets
Found yourself needing to change the datatype on many Galaxy datasets? Read on for a tip on how to do it in one swell swoop.

Have you ever uploaded a bunch of files, but forgot to set the format, only to then have to go and manually reconfigure each dataset individually? Wouldn’t it be nice if there was an easier way to set these attributes on multiple datasets at a time?

The workflow trick tip

There is an open issue requesting the ability to do this using the existing multiple dataset operations that will hopefully be addressed soon. In the meantime, here is a fast workaround that you can use today to quickly set the format on multiple datasets at once. We’ll walk through a quick example where I uploaded 10 fastq files, but their format was not set to the more specific fastqsanger format that is required by many tools.

tag

First, we are going to put all of our datasets into a collection, if they are not already. If you were not planning on using these datasets in a collection, just choose to create a simple list.

tag

Now we are going to create a new workflow. Click “Workflow” in the masthead at the top of your browser, then click the “Create” button. Name and save your new workflow.

tag

In the workflow editor, on the left-hand side, click to expand the Collection Operations tool section and click to add the Filter failed tool to your workflow. Click on the newly created Filter failed tool within the workflow editor. On the right-hand side of the workflow interface, you are able to edit the configuration of this tool.

tag

In the Filter failed tool configuration, click Configure Output: 'input dataset(s) (filtered failed datasets)', to enable post job actions and select Change datatype. Set it to fastqsanger. Click to save the workflow at the top right, and click the play button to open the workflow run interface.

tag

Make sure the correct collection is selected as input and then click Run Workflow. After the workflow completes, you will have a new collection that contains new versions of your datasets that have been properly set the datatype to the fastqsanger format. These new datasets reference the original underlying file content and as a result do not add to your disk usage.

tag

Using the converted datasets outside a Collection

If you want to have access to these datasets outside of the collection, they are available in your history as hidden datasets. To expose them, click hidden beneath the history name, activate the multiple dataset actions (click the checkbox), select the datasets that you want to unhide (we can just choose All in this case). Once desired datasets are selected, click For all selected, and choose Unhide datasets. The hidden datasets (22-31 in our example) are now visible within your history, and able to be easily selected within tools as needed.

tag