The peekbankr
package allows you to access data in the peekbank-db from R. This removes the need to write complex SQL queries in order to get the information you want from the database. This vignette shows some examples of how to use the data loading functions and what the resulting data look like.
There are several different get_
functions that you can use to extract different types of data from the peekbank-db:
get_datasets()
get_subjects()
get_administrations()
get_trials()
get_stimuli()
get_aoi_region_sets()
get_aoi_timepoints()
get_xy_timepoints()
Technical note 1: You do not have to explicitly establish a connection to the peekbank-db since the peekbankr
functions will manage these connections. But if you would like to establish your own connection, you can do so with connect_to_peekbank()
and pass it as an argument to any of the get_
functions.
Technical note 2: We have tried to optimize the time it takes to get data from the database. But if you try to query and get all of the timepoint tables, it will still take a long time as you are trying to get 100s of MB of data.
## Error in get(paste0(generic, ".", class), envir = get_method_env()) :
## object 'type_sum.accel' not found
The get_datasets
function returns a table related to the sources of the dataset, information of the tracker, information of the method (e.g., monitor size and sample rate).
For example, you can run get_datasets
without any arguments to return all of the datasets in the database.
d_datasets <- get_datasets()
## Using current database version: '2022.1'.
head(d_datasets)
## # A tibble: 6 × 5
## dataset_id lab_dataset_id dataset_name shortcite cite
## <int> <chr> <chr> <chr> <chr>
## 1 0 casillas_tseltal_2015 casillas_tseltal_2015 Casillas et al. … "Cas…
## 2 1 ronfard_2021 ronfard_2021 Ronfard, Wei, & … "Ron…
## 3 2 perry_cowpig perry_cowpig Perry & Saffran … "Per…
## 4 3 SwitchingCues pomper_saffran_2016 Pomper & Saffran… "Pom…
## 5 4 weisleder_stl weisleder_stl Weisleder & Fern… "Wei…
## 6 5 adams_marchman_2018 adams_marchman_2018 Adams et al. (20… "Ada…
The get_subjects
function returns information about persistent subject identifiers for noting when subjects have participated in multiple experiments. This includes demographic information (currently only sex and lab-specific subject id).
d_subjects <- get_subjects()
## Using current database version: '2022.1'.
head(d_subjects)
## # A tibble: 6 × 4
## subject_id sex native_language lab_subject_id
## <int> <chr> <chr> <chr>
## 1 0 male other P3-14moM
## 2 1 male other P14-22moM
## 3 2 male other P14-45moM
## 4 3 female other P16-22moF
## 5 4 female other P16-45moF
## 6 5 female other P19-27moF
The get_administrations
function returns information about the specific experimental administrations to subjects in the database. This includes information about:
Again, if you run the function with no arguments, then you get all the information for all administrations in the database, but you can now also filter on a dataset name or dataset id.
d_administrations <- get_administrations(dataset_name = "pomper_saffran_2016")
## Using current database version: '2022.1'.
head(d_administrations)
## # A tibble: 6 × 12
## administration_id age lab_age lab_age_units monitor_size_x monitor_size_y
## <int> <dbl> <dbl> <chr> <int> <int>
## 1 108 46 46 months NA NA
## 2 109 43 43 months NA NA
## 3 110 47 47 months NA NA
## 4 111 42 42 months NA NA
## 5 112 43 43 months NA NA
## 6 113 42 42 months NA NA
## # ℹ 6 more variables: sample_rate <dbl>, tracker <chr>, coding_method <chr>,
## # dataset_id <int>, subject_id <int>, dataset_name <chr>
The age argument takes a number indicating the age(s) of children (in months) that you want to analyze. you can use this argument in two ways
For example, you can get the participant information for all of the children who were tested between the ages of 24 and 36 months.
d_age_range <- get_administrations(age = c(24, 36))
## Using current database version: '2022.1'.
head(d_age_range)
## # A tibble: 6 × 12
## administration_id age lab_age lab_age_units monitor_size_x monitor_size_y
## <int> <dbl> <dbl> <chr> <int> <int>
## 1 5 27 27 months 1366 768
## 2 7 25 25 months 1366 768
## 3 14 29 29 months 1366 768
## 4 17 32 32 months 1366 768
## 5 18 32 32 months 1366 768
## 6 19 35 35 months 1366 768
## # ℹ 6 more variables: sample_rate <dbl>, tracker <chr>, coding_method <chr>,
## # dataset_id <int>, subject_id <int>, dataset_name <chr>
The get_trials
function returns a table with information of the trials in the experiments in the database. This includes the following information:
d_trials <- get_trials()
## Using current database version: '2022.1'.
head(d_trials)
## # A tibble: 6 × 5
## trial_id trial_order trial_type_id dataset_id dataset_name
## <int> <int> <int> <int> <chr>
## 1 659 1 276 5 adams_marchman_2018
## 2 660 2 277 5 adams_marchman_2018
## 3 661 3 278 5 adams_marchman_2018
## 4 662 4 279 5 adams_marchman_2018
## 5 663 5 280 5 adams_marchman_2018
## 6 664 6 281 5 adams_marchman_2018
This function also takes dataset name and id filters.
The get_stimuli
function returns a table with information of the stimuli in the experiments in the database. This includes the following information:
d_stimuli <- get_stimuli()
## Using current database version: '2022.1'.
head(d_stimuli)
## # A tibble: 6 × 10
## stimulus_id stimulus_novelty original_stimulus_label english_stimulus_label
## <int> <chr> <chr> <chr>
## 1 170 familiar baby baby
## 2 171 familiar car car
## 3 172 familiar ball ball
## 4 173 familiar doggy doggy
## 5 174 familiar shoe shoe
## 6 175 familiar birdie birdie
## # ℹ 6 more variables: stimulus_image_path <chr>, lab_stimulus_id <chr>,
## # dataset_id <int>, image_description <chr>, image_description_source <chr>,
## # dataset_name <chr>
This function also takes dataset name and id filters.
The get_aoi_region_sets()
returning a table with the information of the region of area of interest (AOI) for experiments using eye-trackers. It includes information of the dimensions of the x and y, such as the minimum and maximum dimension of the xy spaces.
d_aoi_region_sets <- get_aoi_region_sets()
## Using current database version: '2022.1'.
head(d_aoi_region_sets)
## # A tibble: 6 × 9
## aoi_region_set_id l_x_max l_x_min l_y_max l_y_min r_x_max r_x_min r_y_max
## <int> <int> <int> <int> <int> <int> <int> <int>
## 1 0 395 359 754 359 1366 971 754
## 2 1 715 3 917 142 1679 967 921
## 3 2 710 0 911 146 1676 982 906
## 4 3 701 3 918 121 1676 978 923
## 5 4 706 0 927 138 1681 982 925
## 6 5 699 2 923 133 1680 982 923
## # ℹ 1 more variable: r_y_min <int>
This function is not expected to be used commonly - this information is retained as part of the process of calculating AOIs from XY points.
The get_aoi_timepoints()
function returns a table with information of the subject’s looking behavior in each trial. For example, you can get information about which area that the subject was looking at in a particular trial (e.g., looking away or target or distractor).
The t_norm
field provides a trial-normalized time variable (milliseconds) whose 0 point is the point of disambiguation on that trial (first timestep of the onset of the first time the target label is said).
d_aoi_timepoints <- get_aoi_timepoints(dataset_name = "pomper_saffran_2016")
## Using current database version: '2022.1'.
head(d_aoi_timepoints)
## # A tibble: 6 × 4
## administration_id trial_id aoi t_norm
## <int> <int> <chr> <int>
## 1 108 269 target -1000
## 2 108 269 target -975
## 3 108 269 target -950
## 4 108 269 target -925
## 5 108 269 target -900
## 6 108 269 target -875
For experiments using eye-trackers (as opposed to hand coding from video), the get_xy_timepoints
function returns a table including the x and y position across time.
d_xy_timepoints <- get_xy_timepoints()