Pulling data from the API and filtering on records in a DEPOT created cohort • tbportals.depot.api

Identifying the right cohort from DEPOT

Prior to using this functionality, a user must be registered to use the DEPOT tool and must have created and saved a cohort with the tool. Users can find the cohorts they saved here.

In the website above with your saved cohorts, you will see a table with the following columns “NAME” corresponding to the name of the cohort you created using DEPOT and “ID” which is the unique cohort ID you will need if wanting to pull the data for just these cases using the tbportals.depot.api package.

Pulling data from an endpoint for just the cases in the DEPOT cohort of interest

# See an example of making a request for the data contained in the Biochemistry end point
REQUEST <- tidy_depot_api(path = "Biochemistry", token = TOKEN, cohortId = "PASTE cohort ID number Here")

# The JSON data from the API is returned in the content section as a data.frame
REQUEST$content

# The end point can be found in the path section
REQUEST$path

# Specific information about the httr request can be found in the response section
REQUEST$response

Filtering on the records relating to the cohort ID of interest

Given the requirements of TB portals, the following example below will be for purely fake hypothetical data that has similar structure as you would receive from the API call. Only the first few columns showing fake records with ids, relative dates, and specimen info columns are shown without corresponding lab test types. The final column for filtering on the cohort records is also shown.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(uuid)

# Structure of a hypothetical data.frame from REQUESTS$content
df <- data.frame("patient_id" = UUIDgenerate(n = 5),
                 "condition_id" = UUIDgenerate(n = 5),
                 "specimen_id" = UUIDgenerate(n = 5),
                 "observationfhir_id" = UUIDgenerate(n = 5),
                 "test_date" = sample(0:100, size = 5),
                 "specimen_collection_site" = rep("blood", 5)) %>%
  mutate(
    specimen_collection_date = test_date,
    in_requested_cohort = c("No", "Yes", "Yes", "No", "No"))


df
#>                             patient_id                         condition_id
#> 1 adeb7dec-903d-4493-90ca-9e52f7a142a0 96e17ff0-33d6-4771-84c2-868434d509ad
#> 2 266a0dc8-e343-4702-929b-c3aaf0f48a09 5155be34-c9bd-4eaa-8359-112a170ddebd
#> 3 f7db1b89-baa3-4127-ac69-0e2d2b98b1d9 7cbd09ba-70e4-4fcf-ba8d-2051d0172880
#> 4 4ab1d22c-60df-4d23-8efa-b8d4c3f00d6d d67703ce-1418-435b-aaec-43239db7dde8
#> 5 48a6fbb3-69b8-4742-bce0-2bb0138dbcca 0e1eff92-2232-46c1-a4cb-e5c8e69d235f
#>                            specimen_id                   observationfhir_id
#> 1 648d8a95-4807-41d2-a988-22cbc3e026e0 9537d04b-ebf5-40ff-899e-9ece78e3684e
#> 2 7700bbce-4c3e-45e4-9e97-41fe42584b7b fb695757-566c-4c4a-9ba2-58ac954133b0
#> 3 d4efa44d-a0a4-4dbe-a614-b9f2a6601082 a5b40cf8-ab4d-4673-a5ea-2d35b44ab465
#> 4 767d694c-0edf-4e20-b82f-662592e262aa a976742f-f4cb-4f08-8ca0-ffecb53f9344
#> 5 87fb3d3a-d2a7-4fb0-8f29-c0202666ca47 255c21f8-a3d4-4425-81fa-3d4d5d74e895
#>   test_date specimen_collection_site specimen_collection_date
#> 1        44                    blood                       44
#> 2        22                    blood                       22
#> 3        75                    blood                       75
#> 4        62                    blood                       62
#> 5        46                    blood                       46
#>   in_requested_cohort
#> 1                  No
#> 2                 Yes
#> 3                 Yes
#> 4                  No
#> 5                  No

To filter on records within the DEPOT cohort of interest, a user can use the in_requested_cohort column matching on records with a “Yes”.

# Filter on a hypothetical data.frame using only the records from a cohort ID of interest from the API call
df_cohort <- df %>%
  filter(in_requested_cohort == "Yes")

df_cohort
#>                             patient_id                         condition_id
#> 1 266a0dc8-e343-4702-929b-c3aaf0f48a09 5155be34-c9bd-4eaa-8359-112a170ddebd
#> 2 f7db1b89-baa3-4127-ac69-0e2d2b98b1d9 7cbd09ba-70e4-4fcf-ba8d-2051d0172880
#>                            specimen_id                   observationfhir_id
#> 1 7700bbce-4c3e-45e4-9e97-41fe42584b7b fb695757-566c-4c4a-9ba2-58ac954133b0
#> 2 d4efa44d-a0a4-4dbe-a614-b9f2a6601082 a5b40cf8-ab4d-4673-a5ea-2d35b44ab465
#>   test_date specimen_collection_site specimen_collection_date
#> 1        22                    blood                       22
#> 2        75                    blood                       75
#>   in_requested_cohort
#> 1                 Yes
#> 2                 Yes