SurfaceColloc {corpora} | R Documentation |
A small data set of surface collocations from the English Wikipedia
Description
This data set demonstrates how co-occurrence and marginal frequencies can be provided for collocation analysis with am.score
.
It contains surface co-occurrence counts for 7 English nouns as nodes and 7 selected collocates. The counts are based on a collocational span of two tokens to the left and right of the node (L2/R2) in the WP500 corpus.
Marginal frequencies for the nodes are overall corpus frequencies of the nouns, so expected co-occurrence frequency needs to be adjusted with the total span size of 4 tokens.
Usage
SurfaceColloc
Format
A list with the following components:
cooc
:-
A data frame with 34 rows and the following columns:
w1
: node word (noun)w2
: collocatef
: co-occurrence frequency within L2/R2 span
f1
:-
Labelled integer vector of length 7 specifying the marginal frequencies of the node nouns.
f2
:-
Labelled integer vector of length 7 specifying the marginal frequencies of the collocates.
N
:-
Sample size, i.e. the total number of tokens in the WP500 corpus.
Author(s)
Stephanie Evert (https://purl.org/stephanie.evert)
See Also
Examples
head(SurfaceColloc$cooc, 10)
SurfaceColloc$f1
SurfaceColloc$f2
SurfaceColloc$N