The use of .csv files is very widespread, because of the simplicity of its tabular format and the support by popular editing tools. We propose a novel workflow for enhancing integration of such files with MongoDB storage, and investigate its applicability over a representative sample from the data. world collection. Compared to mongoimport (which is the MongoDB command-line file backup tool), our solution has much higher latency times, but automatizes the data type check and offers users two main degrees of flexibility, that are particularly useful in application development and deployment: possibility of spotting and rejecting duplicate records and possibility of rejecting single rows, instead of whole files in case of errors. Moreover, the reliance on the Measurify IoT application framework allows users to create application-relevant resources by simply enhancing .csv with semantics, while still providing a transparent end-to-end .csv file storage workflow.
Supporting a .csv-based Workflow in MongoDB for Data Analysts
Fresta M.;Capello A.;Bellotti F.;Lazzaroni L.;Cossu M.;Berta R.
2023-01-01
Abstract
The use of .csv files is very widespread, because of the simplicity of its tabular format and the support by popular editing tools. We propose a novel workflow for enhancing integration of such files with MongoDB storage, and investigate its applicability over a representative sample from the data. world collection. Compared to mongoimport (which is the MongoDB command-line file backup tool), our solution has much higher latency times, but automatizes the data type check and offers users two main degrees of flexibility, that are particularly useful in application development and deployment: possibility of spotting and rejecting duplicate records and possibility of rejecting single rows, instead of whole files in case of errors. Moreover, the reliance on the Measurify IoT application framework allows users to create application-relevant resources by simply enhancing .csv with semantics, while still providing a transparent end-to-end .csv file storage workflow.File | Dimensione | Formato | |
---|---|---|---|
Supporting_a_.csv-based_Workflow_in_MongoDB_for_Data_Analysts.pdf
accesso chiuso
Tipologia:
Documento in versione editoriale
Dimensione
387.59 kB
Formato
Adobe PDF
|
387.59 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.