============================================================= SynPUF疑似データをPostgresに格納する ============================================================= OMOP.OHDSI実装プロジェクト https://github.com/RWD-data-environment-in-Hospital/OMOP.OHDSI.implementation に合わせて、OSはCentOS7とするが、他のLinuxでも基本的には同じ。 [1] 事前準備&PostgreSQL導入 yum update yum -y install epel-release yum -y perl unzip 今回は、Postgres10を使う。 PotsgresプロジェクトのRedHat系インスール説明ページ https://www.postgresql.org/download/linux/redhat/ から、Version10、CentOS7を選ぶと、下記が出てくるので実行する。 yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm yum install -y postgresql10-server /usr/pgsql-10/bin/postgresql-10-setup initdb systemctl enable postgresql-10 systemctl start postgresql-10 ※他のバージョンの場合もこのページから選択できる。 ===================================================================================================================== [2] Posgres設定 ・パスワード設定 su postgres psql -c "alter role postgres with password 'omop-db'" exit ・アクセス制御設定 [OMOP All-in-Oneサーバーのとき] vi /var/lib/pgsql/10/data/pg_hba.conf   host all all 127.0.0.1のidentをmd5に変更 [Postgres専用サーバのとき、他のサーバーからアクセスできるようにする] vi /var/lib/pgsql/10/data/pg_hba.conf   host all all 127.0.0.1のidentをmd5に変更 host all all md5 の行を追加 vi /var/lib/pgsql/10/data/postgresql.conf listen_addresses = '*' ・Postgresを再起動 systemctl restart postgresql-10 ps axで確認 ===================================================================================================================== [3] OMOP-CDM table定義の準備 ・OHDSIのgithubからダウンロードする mkdir /home/postgres cd /home/postgres curl -L https://github.com/OHDSI/CommonDataModel/archive/refs/tags/v5.3.1.tar.gz -o v5.3.1.tar.gz tar xvfz v5.3.1.tar.gz ・ファイル名変更/Posgresに合わせて変更/Bug修正のため加工する cd CommonDataModel-5.3.1/PostgreSQL cp -p 'OMOP CDM postgresql indexes.txt' OMOPindexes.txt perl -pe 's/DATETIME2/TIMESTAMP/g; s/[ \t]+/ /g;' <'OMOP CDM postgresql ddl.txt' >OMOPddl.txt perl -pe 's/fpk_ visit_concept_s/fpk_visit_concept_s/' <'OMOP CDM postgresql constraints.txt' >OMOPconstraints.txt ===================================================================================================================== [4] SynPUFデータの準備 ・OHDSIサイトからダウンロード  https://ohdsi.org/data-standardization/ にある、"CMS SynPUF5% Smple"ボタンをクリックし、  Google Driveからダウンロードする。  ファイル名:synpuf5pct_20180710.zip ・Postgresサーバーにアップロードする  synpuf5pct_20180710.zipを/home/postgresにアップロード ・展開 cd /home/postgres mkdir /home/postgres/synpuf unzip synpuf5pct_20180710.zip -d /home/postgres/synpuf cd /home/postgres/synpuf ・データが2018年作成でCDM Version 5.2.2なので、5.3.1になるよう加工する condition_occurrence : visit_detail_id列にdummyを追加 perl -i.bak -pe '@_=split(/\t/,$_);splice(@_,11,0,"");$_=join("\t",@_)' condition_occurrence.csv concept : 255文字以上のconcept_name列に対応(""で囲まれているため短くはしない) 後のSTEPで実施する→ALTER TABLE concept ALTER COLUMN concept_name TYPE VARCHAR(400); device_exposure: visit_detail_id列にdummyを追加 perl -i.bak -pe '@_=split(/\t/,$_);splice(@_,12,0,"");$_=join("\t",@_)' device_exposure.csv drug_exposure: visit_detail_id列にdummyを追加 perl -i.bak -pe '@_=split(/\t/,$_);splice(@_,18,0,"");$_=join("\t",@_)' drug_exposure.csv measurement: measurement_datetime列とvisit_detail_id列にdummyを追加 perl -i.bak -pe '@_=split(/\t/,$_);splice(@_,5,0,"");splice(@_,15,0,"");$_=join("\t",@_)' measurement.csv observation: visit_detail_id列にdummyを追加 perl -i.bak -pe '@_=split(/\t/,$_);splice(@_,13,0,"");$_=join("\t",@_)' observation.csv payer_plan_period: 10個のdummy列を追加 perl -i.bak -pe 'chomp;$_.=("\t"x10)."\n"' payer_plan_period.csv procedure_occurrence: visit_detail_id列にdummyを追加 perl -i.bak -pe '@_=split(/\t/,$_);splice(@_,10,0,"");$_=join("\t",@_)' procedure_occurrence.csv 上記の加工をまとめて実行するperlスクリプト ------------------------------------------------------------ #!/usr/bin/perl chdir("/home/postgres/synpuf"); while() { next unless /^perl/; print $_; chomp; system($_); } exit; __END__ perl -i.bak -pe '@_=split(/\t/,$_);splice(@_,11,0,"");$_=join("\t",@_)' condition_occurrence.csv perl -i.bak -pe '@_=split(/\t/,$_);splice(@_,12,0,"");$_=join("\t",@_)' device_exposure.csv perl -i.bak -pe '@_=split(/\t/,$_);splice(@_,18,0,"");$_=join("\t",@_)' drug_exposure.csv perl -i.bak -pe '@_=split(/\t/,$_);splice(@_,5,0,"");splice(@_,15,0,"");$_=join("\t",@_)' measurement.csv perl -i.bak -pe '@_=split(/\t/,$_);splice(@_,13,0,"");$_=join("\t",@_)' observation.csv perl -i.bak -pe 'chomp;$_.=("\t"x10)."\n"' payer_plan_period.csv perl -i.bak -pe '@_=split(/\t/,$_);splice(@_,10,0,"");$_=join("\t",@_)' procedure_occurrence.csv ------------------------------------------------------------ ===================================================================================================================== [5] スキーマ、OMOP-CDM tableの作成と読み込み su postgres psql create schema ohdsi authorization postgres; create schema cdmv5 authorization postgres; set search_path to "cdmv5"; \i /home/postgres/CommonDataModel-5.3.1/PostgreSQL/OMOPddl.txt COPY care_site FROM '/home/postgres/synpuf/care_site.csv' WITH DELIMITER E'\t' NULL AS ''; COPY cdm_source FROM '/home/postgres/synpuf/cdm_source.csv' WITH DELIMITER E'\t' NULL AS ''; COPY concept_ancestor FROM '/home/postgres/synpuf/concept_ancestor.csv' WITH DELIMITER E'\t' NULL AS ''; COPY concept_class FROM '/home/postgres/synpuf/concept_class.csv' WITH DELIMITER E'\t' NULL AS ''; ALTER TABLE concept ALTER COLUMN concept_name TYPE VARCHAR(400); COPY concept FROM '/home/postgres/synpuf/concept.csv' WITH DELIMITER E'\t' NULL AS ''; COPY concept_relationship FROM '/home/postgres/synpuf/concept_relationship.csv' WITH DELIMITER E'\t' NULL AS ''; COPY concept_synonym FROM '/home/postgres/synpuf/concept_synonym.csv' WITH DELIMITER E'\t' NULL AS ''; COPY condition_era FROM '/home/postgres/synpuf/condition_era.csv' WITH DELIMITER E'\t' NULL AS ''; COPY condition_occurrence FROM '/home/postgres/synpuf/condition_occurrence.csv' WITH DELIMITER E'\t' NULL AS ''; COPY cost FROM '/home/postgres/synpuf/cost.csv' WITH DELIMITER E'\t' NULL AS ''; COPY death FROM '/home/postgres/synpuf/death.csv' WITH DELIMITER E'\t' NULL AS ''; COPY device_exposure FROM '/home/postgres/synpuf/device_exposure.csv' WITH DELIMITER E'\t' NULL AS ''; COPY domain FROM '/home/postgres/synpuf/domain.csv' WITH DELIMITER E'\t' NULL AS ''; COPY drug_era FROM '/home/postgres/synpuf/drug_era.csv' WITH DELIMITER E'\t' NULL AS ''; COPY drug_exposure FROM '/home/postgres/synpuf/drug_exposure.csv' WITH DELIMITER E'\t' NULL AS ''; COPY drug_strength FROM '/home/postgres/synpuf/drug_strength.csv' WITH DELIMITER E'\t' NULL AS ''; COPY location FROM '/home/postgres/synpuf/location.csv' WITH DELIMITER E'\t' NULL AS ''; COPY measurement FROM '/home/postgres/synpuf/measurement.csv' WITH DELIMITER E'\t' NULL AS ''; COPY observation FROM '/home/postgres/synpuf/observation.csv' WITH DELIMITER E'\t' NULL AS ''; COPY observation_period FROM '/home/postgres/synpuf/observation_period.csv' WITH DELIMITER E'\t' NULL AS ''; COPY payer_plan_period FROM '/home/postgres/synpuf/payer_plan_period.csv' WITH DELIMITER E'\t' NULL AS ''; COPY person FROM '/home/postgres/synpuf/person.csv' WITH DELIMITER E'\t' NULL AS ''; COPY procedure_occurrence FROM '/home/postgres/synpuf/procedure_occurrence.csv' WITH DELIMITER E'\t' NULL AS ''; COPY provider FROM '/home/postgres/synpuf/provider.csv' WITH DELIMITER E'\t' NULL AS ''; COPY relationship FROM '/home/postgres/synpuf/relationship.csv' WITH DELIMITER E'\t' NULL AS ''; COPY visit_occurrence FROM '/home/postgres/synpuf/visit_occurrence.csv' WITH DELIMITER E'\t' NULL AS ''; COPY vocabulary FROM '/home/postgres/synpuf/vocabulary.csv' WITH DELIMITER E'\t'; \i /home/postgres/CommonDataModel-5.3.1/PostgreSQL/OMOPindexes.txt \i /home/postgres/CommonDataModel-5.3.1/PostgreSQL/OMOPconstraints.txt \pset pager off \dt \q exit ※個々時間がかかるため、上記Postgresコマンドは一括で実行するのがおすすめ。 一括実行するには、 (1) Postgresコマンドをリストしたファイルを作る cd /home/postgres cat >readSynPUFdata.sql (上記create schemaから\dtまでをコピペ) Ctrl+D (2) 読み込みつつ実行 sudo -u postgres psql -af readSynPUFdata.sql これでSynPUFデータがPostgreSQLデータベースに格納された。