The use of .csv files is very widespread, because of the simplicity of its tabular format and the support by popular editing tools. We propose a novel workflow for enhancing integration of such files with MongoDB storage, and investigate its applicability over a representative sample from the data. world collection. Compared to mongoimport (which is the MongoDB command-line file backup tool), our solution has much higher latency times, but automatizes the data type check and offers users two main degrees of flexibility, that are particularly useful in application development and deployment: possibility of spotting and rejecting duplicate records and possibility of rejecting single rows, instead of whole files in case of errors. Moreover, the reliance on the Measurify IoT application framework allows users to create application-relevant resources by simply enhancing .csv with semantics, while still providing a transparent end-to-end .csv file storage workflow.

Supporting a .csv-based Workflow in MongoDB for Data Analysts

Fresta M.;Capello A.;Bellotti F.;Lazzaroni L.;Cossu M.;Berta R.
2023-01-01

Abstract

The use of .csv files is very widespread, because of the simplicity of its tabular format and the support by popular editing tools. We propose a novel workflow for enhancing integration of such files with MongoDB storage, and investigate its applicability over a representative sample from the data. world collection. Compared to mongoimport (which is the MongoDB command-line file backup tool), our solution has much higher latency times, but automatizes the data type check and offers users two main degrees of flexibility, that are particularly useful in application development and deployment: possibility of spotting and rejecting duplicate records and possibility of rejecting single rows, instead of whole files in case of errors. Moreover, the reliance on the Measurify IoT application framework allows users to create application-relevant resources by simply enhancing .csv with semantics, while still providing a transparent end-to-end .csv file storage workflow.
File in questo prodotto:
File Dimensione Formato  
Supporting_a_.csv-based_Workflow_in_MongoDB_for_Data_Analysts.pdf

accesso chiuso

Tipologia: Documento in versione editoriale
Dimensione 387.59 kB
Formato Adobe PDF
387.59 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1156238
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact