Skip to content

ehrQL output formats

Supported output formats🔗

The following output formats are supported:

  • .arrow — Apache Arrow format
  • .csv.gz — compressed CSV format
  • .csv — uncompressed CSV format

⚠ The uncompressed CSV format is not recommended, because this produces much larger files than the alternative formats.

Unsupported output formats🔗

These formats were supported in cohort-extractor, but are not by ehrQL

  • .dta and .dta.gz — Stata formats

arrowload for Stata users🔗

Stata itself does not directly support .arrow. However, OpenSAFELY's Stata Docker image contains the arrowload library that can load .arrow files in Stata.

Use arrowload as:

. arrowload /path/to/arrow/file

See the full documentation via running command-line Stata via OpenSAFELY:

opensafely exec stata-mp stata

and then running

. help arrowload

Selecting an output format🔗

You select an output format when you use the --output option to specify an output filename for ehrQL. The filename extension — for example, .arrow — that you provide determines the output format file.

If you specify a filename extension that is not supported, you will get an error telling you so.

Examples with opensafely exec🔗

.arrow🔗

opensafely exec ehrql:v0 generate-dataset "./dataset-definition.py" --dummy-tables "example-data/" --output "./outputs/data_extract.arrow"

.csv.gz🔗

opensafely exec ehrql:v0 generate-dataset "./dataset-definition.py" --dummy-tables "example-data/" --output "./outputs/data_extract.csv.gz"

Example project.yaml🔗

version: "3.0"

expectations:
  population_size: 1000

actions:
  extract_data:
    run: ehrql:v0 generate-dataset "./dataset_definition.py" --output "outputs/data_extract.arrow"
    outputs:
      highly_sensitive:
        population: outputs/data_extract.arrow

⚠ The population filename must be identical to the output filename specified by --output. Otherwise you will see the following error when you use opensafely run to run the project actions:

$ opensafely run run_all
=> ProjectValidationError
   Invalid project:
   1 validation error for Pipeline
   __root__
     --output in run command and outputs must match (type=value_error)