Direkt zum Inhalt

SOEP-Core v38eu (Daten 1984-2021, EU-Edition)

Das Sozio-oekonomische Panel (SOEP) ist eine repräsentative Wiederholungsbefragung, die bereits seit 1984 läuft. Im Auftrag des DIW Berlin werden jedes Jahr Personen aus Haushalten in ganz Deutschland von unserem Erhebungsinstitut befragt. Die Daten geben Auskunft zu Fragen über Einkommen, Erwerbstätigkeit, Bildung oder Gesundheit. Weil jedes Jahr die gleichen Personen befragt werden, können langfristige soziale und gesellschaftliche Trends besonders gut verfolgt werden. Zur adäquaten Erfassung des gesellschaftlichen Wandels werden immer wieder Stichproben implementiert, sowie eine Anpassung des Erhebungsprogramms vorgenommen.

Datensatzinformation

Titel: Sozio-oekonomisches Panel (SOEP), Daten der Jahre 1984-2021, (SOEP-Core, v38, EU Edition)

DOI infoZur Erklärung von DOI und dessen Verwendung gibt es hier Informationen . : 10.5684/soep.core.v38eu
Erhebungszeitraum: 1984-2021
Veröffentlichungsdatum: 13.07.2023
PrimärforscherInnen: Jan Goebel, Markus M. Grabka, Carsten Schröder, Sabine Zinn, Charlotte Bartels, Mattis Beckmannshagen, Andreas Franken, Martin Gerike, Florian Griese, Christoph Halbmeier, Selin Kara, Peter Krause, Elisabeth Liebau, Jana Nebelin, Marvin Petrenz, Sarah Satilmis, Rainer Siegers, Hans Walter Steinhauer, Felix Süttmann, Knut Wenzig, Stefan Zimmermann

Datenerhebung: infas Institut für angewandte Sozialwissenschaft

Population: Personen in Privathaushalten in der Bundesrepulik Deutschland

Besondere Stichproben: BürgerIn der DDR (1990), Zuwanderung/Migration (1994/95, 2013, 2015, 2020), Geflüchtete (seit 2016). Eine ausführliche Beschreibung aller Stichproben können Sie im SOEPcompanion unter SOEP-Samples in Detail nachlesen.

Auswahlverfahren: Alle Samples des SOEP werden mittels mehrstufiger Stichprobenziehungen, die regional gebündelt sind, gezogen. Die Befragten (Haushalte) werden per random-walk oder per Registerstichprobe ausgesucht. 

Erhebungsverfahren: Die Methode der Datenerhebung des SOEP basiert auf einem Set von Fragebögen sowohl für die Haushalte als auch für die Individuen. Prinzipiell versucht die interviewende Person face-to-face-Interviews mit allen Haushaltsmitgliedern durchzuführen, die im Befragungsjahr 12 Jahre alt werden oder älter sind. Zusätzlich wird eine Person (Haushaltsvorstand) gebeten, einen Haushaltsfragebogen zu beantworten. Dort werden Fragen zu Wohnsituation und -kosten, verschiedenen Einkommensquellen sowie Fragen zu im Haushalt lebenden Kindern unter 17 Jahren (z.B. Besuch des Kindergartens, der Grundschule etc.) gestellt.

Zitation der Daten: Sozio-oekonomisches Panel (SOEP), Version 38, Daten der Jahre 1984-2021 (SOEP-Core v38, EU-Edition). 2023. DOI: 10.5684/soep.core.v38eu

Wenn Sie bei Ihrer Analyse nicht die Fälle der Migrations-Stichproben ausschliessen, dann zitieren Sie bitte auch:
IAB-SOEP-Migrationsstichproben (M1, M2), Daten der Jahre 2013-2021, DOI: 10.5684/soep.iab-soep-mig.2021

Wenn Sie bei Ihrer Analyse nicht die Fälle der Geflüchteten-Stichproben ausschliessen, dann zitieren Sie auch bitte auch: IAB-BAMF-SOEP-Befragung Geflüchteter (M3-M5), Daten der Jahre 2016-2021, DOI: 10.5684/soep.iab-bamf-soep-mig.2021

In Publikationen, die diese Datei verwenden, soll auf die oben genannte DOI infoZur Erklärung von DOI und dessen Verwendung gibt es hier Informationen . verwiesen und folgende Referenzen zitiert werden:

  • Goebel, Jan, Markus M. Grabka, Stefan Liebig, Martin Kroh, David Richter, Carsten Schröder, and Jürgen Schupp. 2019. The German Socio-Economic Panel (SOEP). Jahrbücher für Nationalökonomie und Statistik 239 (2), 345-360. (https://doi.org/10.1515/jbnst-2018-0022)
  • Schröder, Carsten, Johannes König, Alexandra Fedorets, Jan Goebel, Markus M. Grabka, Holger Lüthen, Maria Metzing, Felicitas Schikora, and Stefan Liebig. 2020. The economic research potentials of the German Socio-Economic Panel study. German Economic Review 21 (3), 335-371. (https://doi.org/10.1515/ger-2020-0033)
  • Giesselmann, Marco, Sandra Bohmann, Jan Goebel, Peter Krause, Elisabeth Liebau, David Richter, Diana Schacht, Carsten Schröder, Jürgen Schupp, and Stefan Liebig. 2019. The Individual in Context(s): Research Potentials of the Socio-Economic Panel Study (SOEP) in Sociology. European Sociological Review 35 (5), 738-755. (https://doi.org/10.1093/esr/jcz029)

Wenn Sie bei Ihrer Analyse nicht die Fälle der Migrations-Stichproben ausschliessen, dann zitieren Sie bitte auch:

  • Herbert Brücker, Martin Kroh, Simone Bartsch, Jan Goebel, Simon Kühne, Elisabeth Liebau, Parvati Trübswetter, Ingrid Tucci & Jürgen Schupp (2014): The new IAB-SOEP Migration Sample: an introduction into the methodology and the contents. SOEP Survey Paper 216 (PDF, 444.25 KB), Series C. Berlin, Nürnberg: DIW Berlin.

Wenn Sie bei Ihrer Analyse nicht die Fälle der Geflüchteten-Stichproben ausschliessen, dann zitieren Sie auch bitte auch: IAB-BAMF-SOEP-Befragung Geflüchteter (M3-M5), Daten der Jahre 2016-2021,

  • Herbert Brücker, Nina Rother, Jürgen Schupp. 2017. IAB-BAMF-SOEP Befragung von Geflüchteten 2016. Studiendesign, Feldergebnisse sowie Analysen zu schulischer wie beruflicher Qualifikation, Sprachkenntnissen sowie kognitiven Potenzialen. IAB Forschungsbericht 13/2017.

Für die SOEP-Core-Daten 1984-2021 (v38) - Wellen A bis BL - stehen folgende Datensätze zur Verfügung:

soep.core.v38eu (EU Edition, 100%)

soep.core.v38i (International Scientific Use Version, 95%)

soep.core.v38t (Teaching Edition, 50%)

soep.core.v38at (Add-on: Area types)

soep.core.v38pr (Add-on: Planning regions)

soep.core.v38r (Remote Edition)

soep.core.v38o (Onsite Edition)

Ausführliche Informationen zu allen Editionen sind auf dem SOEPcompanion zu finden.

In der aktuellen Datenweitergabe komplett enthalten, auf spezielle Anfrage auch als Einzeldatensatz erhältlich:

soep.iab-soep-mig.2021 (Migrationsstichproben)

soep.iab-bamf-soep-mig.2021 (Geflüchtetenstichproben)

General

  • We switched from our own country codings to the international ISO-3 country codes. This effects all variables with country codes such as country of origin, nationality and many more.
  • Duration of interviews, other present people during the interview and wether interviews where self- or interviewer administrated are no longer present in the crossectional datasets of the current wave (bl* datasets). You will find those information in the new instrumentation dataset (see below).
  • Due to a switch of our survey provider the intid is no longer consistent with previous waves. All intids of the current wave have been reset and a new era of intids will begin.

New datasets or variables

Dataset blcorona

  • Original data concerning Corona information to every child and other persons in household. The information are based on the household questionnaire 2021 and are combined and reshaped to get information on personal level.
  • Identifier in this dataset is pid.

Dataset housing2021

  • The dataset contains information to living environment given by the interviewer. Until now the dataset is only available for 2021. Previous data is still available in hbrutt.

Dataset gkal and lkal

  • gkal and lkal are anlog to the already available pkal datasets. gkal includes the ocupational calender of the gap questionaire (luecke) while lkal consists of the biographical calender of the biographic questionnaire.

Dataset instrumentation

  • In the instrumentation data set it is recorded at the personal level which survey instruments are provided and what the respective implementation status is. If information about another person is given in the instrument (e.g. in the case of children's instruments), the number of the third person is also stored in the data record. Thus, each line is uniquely identified by the household, person and instrument ID (hid, pid, instrument and pid2).
    You can filter by which instrument a responded filled in (variable instrument) or by which datasets the responds data flowed into (variable datalist). The variable valid helps you to differentiate between valid and non valid interviews.

Changes in our raw data

Dataset vp

  • Due to a change in the questionnaire structure the number of observations increases. Now the respondents are able to provide information to every deceased person regardless of whether the person lived in the household.

Changes in our new main data format, SOEPlong

Dataset PBRUTTO

  • Variables for agreeing to link SOEP data with IAB social data were renamed and converted to a common user-friendly harmonization reclin_iab_h. sv was renamed `reclin_iab_v1, reclin_mlerg was renamed reclin_iab_v2, and reclin_iab_erg was renamed reclin_iab_v3.
  • The relationship of the head of household towards household members was collected and measured less accurately for PAPI cases than before. stell_h was therefore extended by a PAPI version stell_v3.
  • Increased proportion of people without valid stell_h information
  • harmonized variable year of birth birth_h was added
  • harmonized variable first citizenship pnat_h was recoded to ISO-3 country coding

Dataset HBRUTTO/HBRUTT

  • Harmonization for variable bulaold according to the Infas coding scheme.
  • Residential environment data is no longer kept in HBRUTTO or HBRUTT.
  • Starting in 2021, residential environment information can be found for all samples in the housing2021 dataset. The former residential environment variables wum* were collected by the survey institute Kantar only for new samples (among others in HBRUTT) or after a household move (in HL). The wum* variables in HBRUTTO were then generated by Kantar by taking the previous year's value and overwriting it with new information from HL, if necessary. Because storing them as separate variables in HBRUTTO disguises the fact that the information is essentially just updated from the previous year, the 2020 variables were not continued in HBRUTTO. If users are interested in the information and need a longitudinal view, they must now take the previous year's value and replace it with the HL information on their own. So the residential information must be used from HBRUTTO, HBRUTT, HL and housing2021 to get the complete longitudinal information even after 2018.

Dataset PL

For large datasets like pl we recommend the use of Stata/MP or Stata/SE on a computer with an internal memory of 16GB.
Users can still work with the data in Stata/IC or on less powerful computers, but to work effectively SOEP offers for pl alternative data formats.
If you wish to order an alternative format for pl (e.g. pl in separate year or decade data sets) because your system requirements are not sufficient, please submit your request via the
[order form](https://www.diw.de/de/diw_01.c.357906.de/soep_bestellformular_mod.html) or contact the SOEP hotline by phone or e-mail.

  • Various variables with country codes are recoded according to ISO-3 country codes
  • New variables used to code fields of university degrees p_degree_* p_field* according to Destatis and Infas coding schemes
  • Adjustment in the individual questionnaire for the gender information pla0009* pla0048 pla0049, a validated and harmonized version of gender can be found in PPFAD and PPATHL
  • Adjustment in the individual questionnaire for the employment status plb0022_v11, a validated and harmonized version of employment status pgemplst can be found in `HGEN`.
  • Adjustment and versioning of first nationality plj0014_*
  • Renaming and versioning of calendar strings pab*
  • Various new variables related to home office versioned
  • New variables, renaming and versioning of variables related to working time arrangements
  • New variables on Corona disease, vaccination and its perceptions, changes in work situation, economic situation, and transfers during the corona pandemic.
  • New variables on trust in state institutions and conspiracy theories
  • New variables on political education
  • New variables on Standard of living, social climbing and duties of the government
  • New Variables from the IAB Accommodation Module
  • iyear, pmonin, ptagin, pdatst and pdatmi for 2021 were moved to the INSTRUMENTATION dataset, where they can be found from this version on.

The time use variables could have been -2 and 0 in the data, but both values meant "does not apply". All -2 values were therefore set to 0 as a correction process, since the questionnaire design expects a 0 to be assigned for "does not apply".

Dataset HL

  • h_pnr contains the current person number with a maximum of two digits of the head of household, from 2021 a complete pid is available in the raw data $H. A retroactive conversion of the current person numbers to complete pids in h_pnr is planned for the future.
  • hlk0057, hlk0058, hlk0059, hlk0060 and dauerb for 2021 were moved to the INSTRUMENTATION dataset, where they can be found from this version on.
  • Harmonization for variable hlf0638 according to the Infas coding scheme.
  • The identifiers in`HL are versionized: the person completing the form will receive a complete pid (`h_pnr_v2`) instead of a sequential number (`h_pnr_v1`) from 2021 onwards. In the future, these are to be harmonized in the form of a pid.

Dataset BIOL

  • Various variables with country codes are recoded according to ISO-3 country codes
  • Variables with federal state coding received new versions, because coding was changed
  • New variables used to code fields of university and other degrees l_destatis2020* l_infas2020** according to Destatis and Infas coding schemes
  • Some variables of the relationship history were integrated and versionized
  • New variables regarding the application of personal visa
  • imonth, iday, ihour, iminute, intdevice and intmode were moved to INSTRUMENTATION dataset, where they can be found from this version on.

Dataset JUGENDL

  • Various variables with country codes are recoded according to ISO-3 country codes

imonth, iday, ihour and iminute for 2021 were moved to the INSTRUMENTATION dataset, where they can be found from this version on.

Dataset BIOAGEL

  • The data set has started to be prepared as a long data set. That means there are now harmonized variables (_h) and their original forms as version variables (_v). This process is still in progress. Harmonization and versioning are not yet available for all variables
  • One of the most important harmonization is the one of the keyvariables of this dataset. The former bioage variable is now named as bioage_h. The sources of the harmonization become visible with the two version variables bioage_v1 and _v2.
  • Incorrect raw data integration for the variable pregplan resulted in a strongly increased number of unplanned pregnancies being observed in 2020. This distribution difference has been corrected.
  • Incorrect integration of curscol1 and curscol2 resulted in confusing variables, with unlabeled values. The integration has been corrected, resulting in new variables for the refugee-samples curscolall and curscoloth.

Dataset BIOPUPIL

  • The data set has started to be prepared as a long data set. That means there are now harmonized variables (_h) and their original forms as version variables (_v). This process is still in progress. Harmonization and versioning are not yet available for all variables.

Dataset KIDLONG

  • Added variables on digitization school education ks_dig* and corona pandemic kd_cov*
  • Fixed versioning and harmonization bug in ks_gen_h school education
  • Information from the RELMATRIX2021 dataset was included in the generation of parent pointers k_phead k_pheadp k_pmum k_pmump and can provide more reliable parent-child links

Dataset VPL

  • Added instrument variable
  • Fixed a bug where the content of the pid and vpid variables were swapped in 2020.
  • Due to a change in the questionnaire, it is now possible to report more deceased persons in 2021 than in the previous survey waves, so the population in 2021 is growing larger than usual.
  • intmonth, intday, inthour, intmin, intdevice and intmode for 2021 were moved to INSTRUMENTATION dataset, where they can be found from this version on.
  • Variable gl0175 has been integrated in treiman.

Folders in ZIP-Files

The ZIP-Files contain now a folder soepdata, which contains itself the folders eu-silc-like-panel and raw. This makes it easier to refer to the folders in the documentation. We called the soepdata folder sometimes "toplevel folder" or "./", what has been less informative for our users.

No more duplicates of datasets in folders

There have been over 40 datasets, which have been saved in the former toplevel folder (see above) and in the raw folder. You find them now exclusively in the new `soepdata` folder.

Changes in datasets and individual variables

Dataset PGEN

  • New category for short-time work added in the employment status variable pgemplst
  • pgexpft pgexppt: Spells in short-time work are assigned to the experience variables according to the last spell prior to the short-time work spell. If no prior employment experience spell is available, spells are considered as full-time employment spells.
  • pgpartz pgpartnr: The partner pointer variables were generated for the year 2021 using the information from the data set RELMATRIX2021 in addition to the known source information.

Dataset BLP

  • Date of interview no longer included in datasets from survey instruments. Moved to new dataset for paradata INSTRUMENTATION. The generated variables are still available pgen::pgmonth and pgen::pgpiyear.

Dataset PBIOSPE

  • "In short time work" added as category 10 of spelltyp, replacing previously unused "war/captivity" category.

Dataset ARTKALEN

  • Category for short-time work reintroduced in the spelltyp variable.

Dataset CAMCES

  • Instead of four variables for country and year of highest educational degree (for ISCED11, ISCED97 and the respective alternative versions), only one variabe for country and year of highest educational degree is available now (country_camces and year_camces).
  • The values of the variable abschl_ausl are adapted to the current SOEP standard. Instead of the coding 0 for "No", 1 "Yes", we have 1 "Yes", 2 "No".
  • The country codes of the variable country_camces are adjusted to the new ISO-3 standard of the SOEP.
  • The variable sample1 is now named psample and has also the same values like the psample variable in ppath(l).

Dataset BIOIMMIG

  • biresper: checks have been implemented because of inconsistencies (multiple changes of residence status) in the data. In general: EU citizens (European citizenship) usually get a permanent residence title, since freedom of movement applies within the EU. A temporary title is only possible in exceptional cases. Depending on the year of accession to the EU, persons born in the EU whose data have frequently changed from temporary to permanent and vice versa have been checked and corrected. Reasons for the inconsistencies include confusion among respondents regarding the meaning of the Blue Card and the settlement permit.
  • Some variables are not part of the dataset anymore for multiple reasons. Some are just a replicant of a long variable, some variables lack context definition and some are just asked in one wave and therefore are not longitudinal. Following variables are not part of the dataset anymore:
    - BIREASON: Main Reason Immigration To Germany
    - BISCGC: Also German Pupils In Class
    - BISCGCFN: Mix Of Nationalities In Class
    - BISCGERC: Attended Special Foreigner Prep Class
    - BISTAY: Desire To Stay In Germany
    - BISTAYY: Years Desired To Stay In Germany
    - BIGOBACK: Go Back Home
    - BIRELH: Family in Country of Origin
    - BIRELHC2: Underage Children Not In Germany
    - BIEXPR: Expectations In Germany
    - BIEXPRLV: Expectations: Find Apt
    - BIEXPRAC: Expectations: Accepted by Coworker
    - BIEXPRAN: Expectations: Accepted by Neighbor
    - BIRBETR: Reason Migrate: Better Live
    - BIRMONEY: Reason Migrate: Money
    - BIRFREE: Reason Migrate: Freedom
    - BIRFAM: Reason Migrate: Family
    - BIRPOOR: Reason Migrate: Poor
    - BIRWAR: Reason Migrate: War
    - BIRJUST: Reason Migrate: Just So
    - BIROTHR: Reason Migrate: Other
    - BICAMPW: Refugee Residence: Weeks
    - BICAMPM: Refugee Residence: Months
    - BICAMP: Refugee Residence Y,N
  • Further information on why these variables are not available anymore and where to find them, see the documentation of BIOIMMIG.

Dataset BIOBIRTH

  • Information from the RELMATRIX2021 dataset was included in the generation and can provide more reliable parent-child links.

Dataset PPATHL/PPATH

  • corigin: The variable has been recoded acccording to ISO-3 country codes.
  • parid partner: The partner pointer variables were generated for the year 2021 using the information from the data set RELMATRIX2021 in addition to the known source information.

Dataset BIORESIDREFING

  • Due to the change in the survey institute, not all possibilities were recorded in the _1-_19 variables in the case of imprecise information. If an entry cannot be assigned exactly, the variable GKZ has the value 999999995. Further variables are missing. That means that the variables including possibilities (_1-_19) will no longer be filled.

Outdated Versions of datasets

The following data sets are still at the V37 level and have not been updated. We will update them as far as possible with the next realease of the data:

Dataset Description
PEQUIV CNEF Eqivalent File

The following data sets are still at the V36 level and have not been updated. We will update them as far as possible with the next realease of the data:

Dataset Description
MIGSPELL Migration History
REFUGSPELL Migration History for Refugees
BIOJOB First and last Job
BIOEDU Educational History
BIOPAREN SES of Parents
BIOSIB Siblings Information
BIOTWIN Twins Information


Individual (PAPI) 2021: -de -en
Household (PAPI) 2021: -de -en
Biography (PAPI) 2021: -de -en
Catch-up Individual (PAPI) 2021: -de -en
Youth (16-17-year-olds, PAPI) 2021: -de -en
Early Youth (13-14-year-olds, PAPI) 2021: -de -en
Pre-teen (11-12-year-olds, PAPI) 2021: -de -en
Mother and Child (Newborns, PAPI) 2021: -de -en
Mother and Child (2-3-year-olds, PAPI) 2021: -de -en
Mother and Child (5-6-year-olds, PAPI) 2021: -de -en
Parents and Child (7-8-year-olds, PAPI) 2021: -de -en
Mother and Child (9-10-year-olds, PAPI) 2021: -de -en
Deceased Individual (PAPI) 2021: -de -en

Alle Sample-spezifischen Fragebögen dieses Jahres und alle Fragebögen der vorherigen Befragungsjahre finden Sie auf dieser Seite

1) Handgreifkraftmessung im Sozio-oekonomischen Panel (SOEP) 2006 und 2008

2) The new IAB-SOEP Migration Sample: an introduction into the methodology and the contents

3) The Request for Record Linkage in the IAB-SOEP Migration Sample

4) Flowcharts for the Integrated Individual-Biography Questionnaire of the IAB-SOEP Migration Sample 2013

5) The Measurement of Labor Market Entries with SOEP Data: Introduction to the Variable EINSTIEG_ARTK

6) Job submission instructions for the SOEPremote System at DIW Berlin – Update 2014

7) SOEP 2015 – Informationen zu den SOEP-Geocodes in SOEP v32

8) Editing and Multiple Imputation of Item Non-response in the Wealth Module of the German Socio-Economic Panel

9) Die Vercodung der offenen Angaben zu den Ausbildungsberufen im Sozio-Oekonomischen Panel

10) Das Studiendesign der IAB-BAMF-SOEP Befragung von Geflüchteten

11) Scales Manual IAB-BAMF-SOEP Survey of Refugees in Germany – revised version

12) SOEP 2010 – Preparation of data from the new SOEP consumption module: Editing, imputation, and smoothing

13) SOEP Scales Manual (updated for SOEP-Core v32.1)

14) Kognitionspotenziale Jugendlicher - Ergänzung zum Jugendfragebogen der Längsschnittstudie Sozio-oekonomisches Panel (SOEP)

15) Die Vercodung der offenen Angaben zur beruflichen Tätigkeit nach der International Standard Classification of Occupations 2008 (ISCO08) - Direktvercodung - Vorgehensweise und Entscheidungsregeln bei nicht eindeutigen Angaben

16) Die Vercodung der offenen Angaben zur beruflichen Tätigkeit nach der Klassifikation der Berufe 2010 (KldB 2010): Vorgehensweise und Entscheidungsregeln bei nicht eindeutigen Angaben

17) Multi-Itemskalen im SOEP Jugendfragebogen

18) Zur Erhebung des adaptiven Verhaltens von zwei- und dreijährigen Kindern im Sozio-oekonomischen Panel (SOEP)

19) Documentation of ISCED Generation Based on the CAMCES Tool in the IAB-SOEP Migration Samples M1/M2 and IAB-BAMF-SOEP Survey of Refugees M3/M4 until 2017

20) Dokumentation zum Entwicklungsprozess des Moduls „Einstellungen zu sozialer Ungleichheit“ im SOEP (v38)

21) SOEP-CoV: Project and Data Documentation

22) Missing Income Data in the German SOEP: Incidence, Imputation and its Impact on the Income Distribution

23) SOEP 2006 – TIMEPREF: Dataset on the Economic Behavior Experiment on Time Preferences in the 2006 SOEP Survey

24) Assessing the distributional impact of "imputed rent" and "non-cash employee income" in microdata : Case studies based on EU-SILC (2004) and SOEP (2002)

25) SOEP-Core v36: Codebook for the EU-SILC-like panel for Germany based on the SOEP

Alle Dokumentationen zum Filtern finden Sie auf dieser Seite

keyboard_arrow_up