We want to encourage the use of SOEP data in university teaching. We offer a highly simplified Stata practice dataset, which can be downloaded directly, as well as a teaching dataset, which must be ordered.
This dataset in Stata format is based on the original SOEP data, but provides the data in significantly altered and fully anonymous form. This means that the practice dataset can be used without the need for any contracts or user agreements. The practice dataset consists of original variables, covers five time points, and is available in the “long” format. The dataset is provided in German and English.
We offer two practice datasets that differ in the number of variables and time span. The more recent version includes income variables and the do-file used to create the dataset:
1. Data for the years 2000-2004, 9 variables
2. Data for the years 2015-2019, 15 variables, DOI: 10.5684/soep.practice.v36
To anonymize the variables, an algorithm was used that largely maintains the longitudinal information in the original data. The practice dataset is therefore suited to calculating panel-specific univariate statistics (intra- and inter-individual correlation patterns, transition rates) in classes on descriptive methods. The appropriate programming commands in modern statistical software packages, such as Stata XT, provide realistic results.
In the context of multivariate analysis, the dataset is useful for teaching (panel) regression techniques. Characteristics of panel data and the impact of various analytical procedures (such as fixed effects and random effects modeling) can be demonstrated in a realistic manner when using the appropriate commands. Despite the limitations of the practice dataset, they also allow for illustration of interaction and mediation techniques.
Numerous examples of analyses using the dataset for the years 2000-2004 can be found in the textbook “Regressionsmodelle zur Analyse von Paneldaten” (Marco Giesselmann and Michael Windzio, Springer VS).
Under no circumstances should the dataset be used in real analyses. Due to the procedure used to anonymize the data, they only roughly reflect the actual relationships in the SOEP. Also, data preparation techniques can only be taught and practiced to a very limited extent due to the extremely narrow segment of original data provided. In such cases, the SOEP teaching dataset should be used.
Data protection regulations stipulate that you need a data distribution contract with DIW Berlin to use the SOEP teaching dataset. The contract holder is responsible for ensuring strict adherence to data protection!
German data protection laws stipulate that only a maximum of 50% of all cases in the original dataset may be used for teaching purposes. As of Version 35 data from 1984-2018), we provde our users with a teaching version that has the same data structure as the original data (with the exception of the EU-SILC clone) but just contains half the number of cases.This selection is easily chosen with the help of the random group variable, which can be used to separate the data set into 20 subsamples. The variable RGROUP20, which can be found in the dataset CIRDEF, has exactly 20 values. Only cases with values from 11 to 20 may be used for teaching purposes. Students are under no circumstances permitted to have access to the data in random groups 1-10. Access to the original dataset is of course also prohibited.
The teaching dataset provided to students must be stored in a separate hard drive area to which the user guarantees controlled access. Students may under no circumstances take data home with them or install them anywhere else at the university.
You are welcome to use our SOEPtutorials in your classes.
Im SOEPcompanion gibt es im Kapitel Working with SOEP Data praktische Einführungen und Übungen mit Stata-Skripten.
The following testbooks in English provide examples based on the SOEP data: