The peekbankr package allows you to access data in the
peekbank-db from R. This removes the need to write complex SQL queries
in order to get the information you want from the database. This
vignette shows some examples of how to use the data loading functions
and what the resulting data look like.
There are several different get_ functions that you can
use to extract different types of data from the peekbank-db:
get_datasets()get_subjects()get_administrations()get_trials()get_stimuli()get_aoi_region_sets()get_aoi_timepoints()get_xy_timepoints()Technical note 1: You do not have to explicitly
establish a connection to the peekbank-db since the
peekbankr functions will manage these connections. But if
you would like to establish your own connection, you can do so with
connect_to_peekbank() and pass it as an argument to any of
the get_ functions.
Technical note 2: We have tried to optimize the time it takes to get data from the database. But if you try to query and get all of the timepoint tables, it will still take a long time as you are trying to get 100s of MB of data.
The get_datasets function returns a table related to the
sources of the dataset, information of the tracker, information of the
method (e.g., monitor size and sample rate).
For example, you can run get_datasets without any
arguments to return all of the datasets in the database.
d_datasets <- get_datasets()## Warning: No connection provided. Defaulting to connect_to_peekbank(db_version =
## 'current'). This can result in mismatched database versions if you are using a
## different version elsewhere. This implicit behavior is deprecated and will be
## removed in a future version. Please create a connection explicitly with
## connect_to_peekbank() and pass it via the connection argument.
## Using current database version: '2022.1'.
head(d_datasets)## # A tibble: 6 × 5
## dataset_id lab_dataset_id dataset_name shortcite cite
## <int> <chr> <chr> <chr> <chr>
## 1 0 casillas_tseltal_2015 casillas_tseltal_2015 Casillas et al. … "Cas…
## 2 1 ronfard_2021 ronfard_2021 Ronfard, Wei, & … "Ron…
## 3 2 perry_cowpig perry_cowpig Perry & Saffran … "Per…
## 4 3 SwitchingCues pomper_saffran_2016 Pomper & Saffran… "Pom…
## 5 4 weisleder_stl weisleder_stl Weisleder & Fern… "Wei…
## 6 5 adams_marchman_2018 adams_marchman_2018 Adams et al. (20… "Ada…
The get_subjects function returns information about
persistent subject identifiers for noting when subjects have
participated in multiple experiments. This includes demographic
information (currently only sex and lab-specific subject id).
d_subjects <- get_subjects()## Warning: No connection provided. Defaulting to connect_to_peekbank(db_version =
## 'current'). This can result in mismatched database versions if you are using a
## different version elsewhere. This implicit behavior is deprecated and will be
## removed in a future version. Please create a connection explicitly with
## connect_to_peekbank() and pass it via the connection argument.
## Using current database version: '2022.1'.
head(d_subjects)## # A tibble: 6 × 4
## subject_id sex native_language lab_subject_id
## <int> <chr> <chr> <chr>
## 1 0 male other P3-14moM
## 2 1 male other P14-22moM
## 3 2 male other P14-45moM
## 4 3 female other P16-22moF
## 5 4 female other P16-45moF
## 6 5 female other P19-27moF
The get_administrations function returns information
about the specific experimental administrations to subjects in the
database. This includes information about:
Again, if you run the function with no arguments, then you get all the information for all administrations in the database, but you can now also filter on a dataset name or dataset id.
d_administrations <- get_administrations(dataset_name = "pomper_saffran_2016")## Warning: No connection provided. Defaulting to connect_to_peekbank(db_version =
## 'current'). This can result in mismatched database versions if you are using a
## different version elsewhere. This implicit behavior is deprecated and will be
## removed in a future version. Please create a connection explicitly with
## connect_to_peekbank() and pass it via the connection argument.
## Using current database version: '2022.1'.
head(d_administrations)## # A tibble: 6 × 12
## administration_id age lab_age lab_age_units monitor_size_x monitor_size_y
## <int> <dbl> <dbl> <chr> <int> <int>
## 1 108 46 46 months NA NA
## 2 109 43 43 months NA NA
## 3 110 47 47 months NA NA
## 4 111 42 42 months NA NA
## 5 112 43 43 months NA NA
## 6 113 42 42 months NA NA
## # ℹ 6 more variables: sample_rate <dbl>, tracker <chr>, coding_method <chr>,
## # dataset_id <int>, subject_id <int>, dataset_name <chr>
The age argument takes a number indicating the age(s) of children (in months) that you want to analyze. you can use this argument in two ways
For example, you can get the participant information for all of the children who were tested between the ages of 24 and 36 months.
d_age_range <- get_administrations(age = c(24, 36))## Warning: No connection provided. Defaulting to connect_to_peekbank(db_version =
## 'current'). This can result in mismatched database versions if you are using a
## different version elsewhere. This implicit behavior is deprecated and will be
## removed in a future version. Please create a connection explicitly with
## connect_to_peekbank() and pass it via the connection argument.
## Using current database version: '2022.1'.
head(d_age_range)## # A tibble: 6 × 12
## administration_id age lab_age lab_age_units monitor_size_x monitor_size_y
## <int> <dbl> <dbl> <chr> <int> <int>
## 1 1516 28.1 2.35 years 1600 1200
## 2 1522 24.9 2.08 years 1600 1200
## 3 1524 28.2 2.35 years 1600 1200
## 4 1529 28.7 2.39 years 1600 1200
## 5 1530 24.8 2.07 years 1600 1200
## 6 1531 28.8 2.40 years 1600 1200
## # ℹ 6 more variables: sample_rate <dbl>, tracker <chr>, coding_method <chr>,
## # dataset_id <int>, subject_id <int>, dataset_name <chr>
The get_trials function returns a table with information
of the trials in the experiments in the database. This includes the
following information:
d_trials <- get_trials()## Warning: No connection provided. Defaulting to connect_to_peekbank(db_version =
## 'current'). This can result in mismatched database versions if you are using a
## different version elsewhere. This implicit behavior is deprecated and will be
## removed in a future version. Please create a connection explicitly with
## connect_to_peekbank() and pass it via the connection argument.
## Using current database version: '2022.1'.
head(d_trials)## # A tibble: 6 × 5
## trial_id trial_order trial_type_id dataset_id dataset_name
## <int> <int> <int> <int> <chr>
## 1 659 1 276 5 adams_marchman_2018
## 2 660 2 277 5 adams_marchman_2018
## 3 661 3 278 5 adams_marchman_2018
## 4 662 4 279 5 adams_marchman_2018
## 5 663 5 280 5 adams_marchman_2018
## 6 664 6 281 5 adams_marchman_2018
This function also takes dataset name and id filters.
The get_stimuli function returns a table with
information of the stimuli in the experiments in the database. This
includes the following information:
d_stimuli <- get_stimuli()## Warning: No connection provided. Defaulting to connect_to_peekbank(db_version =
## 'current'). This can result in mismatched database versions if you are using a
## different version elsewhere. This implicit behavior is deprecated and will be
## removed in a future version. Please create a connection explicitly with
## connect_to_peekbank() and pass it via the connection argument.
## Using current database version: '2022.1'.
head(d_stimuli)## # A tibble: 6 × 10
## stimulus_id stimulus_novelty original_stimulus_label english_stimulus_label
## <int> <chr> <chr> <chr>
## 1 170 familiar baby baby
## 2 171 familiar car car
## 3 172 familiar ball ball
## 4 173 familiar doggy doggy
## 5 174 familiar shoe shoe
## 6 175 familiar birdie birdie
## # ℹ 6 more variables: stimulus_image_path <chr>, lab_stimulus_id <chr>,
## # dataset_id <int>, image_description <chr>, image_description_source <chr>,
## # dataset_name <chr>
This function also takes dataset name and id filters.
The get_aoi_region_sets() returning a table with the
information of the region of area of interest (AOI) for experiments
using eye-trackers. It includes information of the dimensions of the x
and y, such as the minimum and maximum dimension of the xy spaces.
d_aoi_region_sets <- get_aoi_region_sets()## Warning: No connection provided. Defaulting to connect_to_peekbank(db_version =
## 'current'). This can result in mismatched database versions if you are using a
## different version elsewhere. This implicit behavior is deprecated and will be
## removed in a future version. Please create a connection explicitly with
## connect_to_peekbank() and pass it via the connection argument.
## Using current database version: '2022.1'.
head(d_aoi_region_sets)## # A tibble: 6 × 9
## aoi_region_set_id l_x_max l_x_min l_y_max l_y_min r_x_max r_x_min r_y_max
## <int> <int> <int> <int> <int> <int> <int> <int>
## 1 0 395 359 754 359 1366 971 754
## 2 1 715 3 917 142 1679 967 921
## 3 2 710 0 911 146 1676 982 906
## 4 3 701 3 918 121 1676 978 923
## 5 4 706 0 927 138 1681 982 925
## 6 5 699 2 923 133 1680 982 923
## # ℹ 1 more variable: r_y_min <int>
This function is not expected to be used commonly - this information is retained as part of the process of calculating AOIs from XY points.
The get_aoi_timepoints() function returns a table with
information of the subject’s looking behavior in each trial. For
example, you can get information about which area that the subject was
looking at in a particular trial (e.g., looking away or target or
distractor).
The t_norm field provides a trial-normalized time
variable (milliseconds) whose 0 point is the point of disambiguation on
that trial (first timestep of the onset of the first time the target
label is said).
d_aoi_timepoints <- get_aoi_timepoints(dataset_name = "pomper_saffran_2016")## Warning: No connection provided. Defaulting to connect_to_peekbank(db_version =
## 'current'). This can result in mismatched database versions if you are using a
## different version elsewhere. This implicit behavior is deprecated and will be
## removed in a future version. Please create a connection explicitly with
## connect_to_peekbank() and pass it via the connection argument.
## Using current database version: '2022.1'.
head(d_aoi_timepoints)## # A tibble: 6 × 4
## administration_id trial_id aoi t_norm
## <int> <int> <chr> <int>
## 1 108 269 target -1000
## 2 108 269 target -975
## 3 108 269 target -950
## 4 108 269 target -925
## 5 108 269 target -900
## 6 108 269 target -875
For experiments using eye-trackers (as opposed to hand coding from
video), the get_xy_timepoints function returns a table
including the x and y position across time.
d_xy_timepoints <- get_xy_timepoints()