.. _pg_loading: Data loading ++++++++++++ .. epigraph:: "I disagree strongly with whatever work this quote is attached to." -- Randall Munroe One can argue that loading data is the most important part of a postprocessing tool. In Postgkyl, it is handled by the ``postgkyl.data.Data`` class (there is a ``postgkyl.Data`` shortcut). It load data on initialization and serves as an input for all the other parts of Postgkyl. .. raw:: html
Docstrings

Examples are provided simultaneously for scripting and command line using output files of an electrostatic two-stream instability simulation [:doc:`two-stream.lua`]. .. contents:: Accessing a Gkeyll file --------------------- Gkeyll files are loaded in Postgkyl by creating a new instance of the ``Data`` class with the file name as the parameter. .. code-block:: python :caption: Script import postgkyl as pg data = pg.Data('filename') Next, ``getGrid()`` and ``getValues()`` can be used to return the grid and values as NumPy arrays. For structured meshes, the ``getGrid()`` return a Python ``list`` of 1D NumPy arrays which represent the nodal points of the grid in each dimension. Note that since these are nodal points, these arrays will always have one more cell in each dimension in comparison to the value array. Another important note is that the **value array always have one extra dimension for components**. Components can represent many things from vector elements to discontinuous Galerkin expansion coefficients. As a rule, this extra dimension is always retained even if there is just one component. .. raw:: html
Script example .. code-block:: python :emphasize-lines: 1,2,3,36,47,49,51 import postgkyl as pg data = pg.Data('two-stream_elc_0.bp') print(data.getGrid()) [array([-6.283185307179586 , -6.086835766330224 , -5.890486225480862 , -5.6941366846315 , -5.497787143782138 , -5.301437602932776 , -5.105088062083414 , -4.908738521234052 , -4.71238898038469 , -4.516039439535327 , -4.319689898685965 , -4.123340357836604 , -3.9269908169872414, -3.730641276137879 , -3.5342917352885173, -3.3379421944391554, -3.141592653589793 , -2.945243112740431 , -2.748893571891069 , -2.552544031041707 , -2.356194490192345 , -2.1598449493429825, -1.9634954084936211, -1.7671458676442588, -1.5707963267948966, -1.3744467859455343, -1.178097245096172 , -0.9817477042468106, -0.7853981633974483, -0.589048622548086 , -0.3926990816987246, -0.1963495408493623, 0. , 0.1963495408493623, 0.3926990816987246, 0.589048622548086 , 0.7853981633974483, 0.9817477042468106, 1.178097245096172 , 1.3744467859455343, 1.5707963267948966, 1.767145867644258 , 1.9634954084936211, 2.1598449493429825, 2.356194490192344 , 2.552544031041707 , 2.7488935718910685, 2.9452431127404317, 3.141592653589793 , 3.3379421944391545, 3.5342917352885177, 3.730641276137879 , 3.9269908169872423, 4.123340357836604 , 4.319689898685965 , 4.516039439535328 , 4.71238898038469 , 4.908738521234051 , 5.105088062083414 , 5.301437602932776 , 5.497787143782137 , 5.6941366846315 , 5.890486225480862 , 6.086835766330225 , 6.283185307179586 ]), array([-6. , -5.8125, -5.625 , -5.4375, -5.25 , -5.0625, -4.875 , -4.6875, -4.5 , -4.3125, -4.125 , -3.9375, -3.75 , -3.5625, -3.375 , -3.1875, -3. , -2.8125, -2.625 , -2.4375, -2.25 , -2.0625, -1.875 , -1.6875, -1.5 , -1.3125, -1.125 , -0.9375, -0.75 , -0.5625, -0.375 , -0.1875, 0. , 0.1875, 0.375 , 0.5625, 0.75 , 0.9375, 1.125 , 1.3125, 1.5 , 1.6875, 1.875 , 2.0625, 2.25 , 2.4375, 2.625 , 2.8125, 3. , 3.1875, 3.375 , 3.5625, 3.75 , 3.9375, 4.125 , 4.3125, 4.5 , 4.6875, 4.875 , 5.0625, 5.25 , 5.4375, 5.625 , 5.8125, 6. ])] print(data.getValues()) [[[ 1.6182154425614533e-127 2.2497634664678846e-136 2.1705614015952743e-127 ... 1.4466223559100639e-127 7.7862978418103503e-137 2.0112020871650523e-136] [ 7.2163320153412515e-118 1.0032681083505769e-126 9.6785762877207286e-118 ... 6.4497610162539372e-118 3.4719259660326997e-127 8.9669370964188083e-127] [ 1.3363156717841295e-108 1.8578453383418215e-117 1.7920360303344134e-108 ... 1.1940080895062958e-108 6.4284392330301674e-118 1.6599988152412963e-117] ... print(data.getGrid()[0].shape) (65,) print(data.getGrid()[1].shape) (65,) print(data.getValues().shape) (64, 64, 8) .. raw:: html

It is also possible to create an empty instance and fill it using the ``push`` function. In the command line mode, a data file is loaded by simply adding it to the ``pgkyl`` script chain at any position. .. code-block:: bash :caption: Command line pgkyl filename .. note:: Under the hood, Postgkyl calls a hidden ``load`` command to load the file. When provided string does not match any command but is matching a file, the load command is invoked and the file name is passed to it. The load command should *not* be called manually but it can be used to access the help. .. code-block:: bash pgkyl load --help Currently, Postgkyl supports ``h5`` file that were used in Gkeyll 1, Gkeyll 2 ADIOS ``bp`` files, and Gkeyll 0 ``gkyl`` binary files. Many of the advanced functions like loading only partial data and some quality of life features like storing the polynomial order of DG representation are currently available only for the ADIOS ``bp`` files. Loading multiple datasets ------------------------- Loading multiple files in a script is straightforward; one creates more instances of the ``Data`` class. Postgkyl does naturally support loading any number of files. .. code-block:: bash pgkyl two-stream_elc_0.bp two-stream_elc_1.bp All the commands are then generally batch performed on all the data sets and the :ref:`pg_cmd_plot` command creates a separate figure for each data set (this can be modified with :ref:`pg_cmd_plot` options like ``-f0``). When batch application of commands is *not* the desired behavior, some data files can be loaded later in the chain, loaded dataset can be changed from active to inactive (:ref:`pg_cmd_activate`/:ref:`pg_cmd_deactivate`), or the command scope can be limmited by specifying :ref:`tags `. The :ref:`pg_keyconcepts` section provides examples where one desired behavior is achieved in multiple ways. It is left up to the user to chose the preferred one. Postgkyl also allows for loading with a wild card characters: .. code-block:: bash pgkyl 'two-stream*.bp' .. warning:: While the quotes are entirely optional when loading a single file, they change behavior when used with wild card characters. With quotes, a single load command is performed and the wild card matching is done internally by Postgkyl. Without quotes, the wild card is replaced before calling Postgkyl which results in several load command calls. This leads to several key differences: 1. With quotes, Postgkyl orders files correctly, i.e., ``file_2`` will be before ``file_10``. 2. With quotes, tags, labels, etc., are applied to all the matching files, not just the last one. 3. Some wildcard characters like ``[0-9]`` are not supported by every shell. Using wild card characters might lead to unexpected situations. For example in the two-stream case, the query ``two-stream_elc_*`` is going to return ``two-stream_elc_0.bp`` but also the moment files like ``two-stream_elc_M0_0.bp``. If we want to load just the distribution functions, we can limit the query. For example: .. code-block:: bash pgkyl 'two-stream_elc_[0-9]*.bp' This requires the first character to be a number between 0 and 9, which effectively eliminates all the outputs except for the distribution functions themselves. Following are details on load parameters which alter the behavior. Here, we would like to mention that these can be specified individually for each file of as the global options of the ``pgkyl`` script itself. For example, the partial loading flag ``--z0`` (see bellow) can be applied to one file (``file_0``): .. code-block:: bash pgkyl file_0 --z0 0 file_1 Or it can be applied globally to all the files: .. code-block:: bash pgkyl --z0 0 file_0 file_1 This is analogous to: .. code-block:: bash pgkyl file_0 --z0 0 file_1 --z0 0 Partial loading --------------- Gkeyll output files, especially the higher dimensional ones, can be large. Therefore, Postgkyl allows to load just a smaller subsection of each file. This is done with the optional ``z0`` to ``z5`` parameters for coordinates and ``comp`` for components. Each can be either an integer number or a string in the form of ``start:end``. Note that this does follow the Python convention so **the last index is excluded**, i.e., ``1:5`` will load only the indices/components 1, 2, 3, and 4. This functionality is supported both in the script mode and the command line mode. .. code-block:: python :emphasize-lines: 5 :caption: Script import postgkyl as pg data = pg.Data('two-stream_elc_0.bp', z1='1:3', comp=0) .. code-block:: bash :caption: Command line pgkyl two-stream_elc_0.bp --z1 1:3 -c 0 Note that the :ref:`pg_cmd_select` command has a similar use. In addition, it allows to specify a coordinate value instead of an index. However, it requires the whole file to be loaded into memory. Tags and labels --------------- Datasets can be decorated with tags and labels. The former serve mostly to specify the scope of commands (see :ref:`tags `) in the command line mode while the later one allows to add custom labels for plots and print-outs. When no labels are specified, Postgkyl attempts to find the shortest unique identifier and uses it as a label. For example: .. code-block:: bash pgkyl two-stream_elc_0.bp two-stream_elc_1.bp info -c 0 (default#0) 1 (default#1) .. code-block:: bash pgkyl two-stream_elc_0.bp two-stream_field_0.bp info -c elc (default#0) field (default#1) .. code-block:: bash pgkyl two-stream_elc_0.bp two-stream_field_1.bp info -c elc_0 (default#0) field_1 (default#1) These labels, can be customized and can include LaTeX syntax, which will be properly rendered in a plot legend. .. code-block:: bash pgkyl two-stream_elc_0.bp -l '$t\omega_{pe}=0$' two-stream_elc_1.bp -l '$t\omega_{pe}=0.5$' info -c $t\omega_{pe}=0$ (default#0) $t\omega_{pe}=0.5$ (default#1) Note, the in all these examples, both datasets have the ``default`` tag and are indexed ``0`` and ``1``. These can be manually specified. .. code-block:: bash pgkyl two-stream_elc_0.bp -t 'el' two-stream_field_0.bp -t 'em' info -c elc (el#0) field (em#0) Loading data with c2p mapping ----------------------------- .. warning:: This feature was introduced in 1.6.7 and currently only works with ``gkyl`` binary files. Postgkyl supports the c2p mapping used in Gkeyll. The file with the map can be specified using the ``--c2p`` keyword. Following are two plots where a Maxwellian particle distribution is evaluated in cylindrical coordinates with and without c2p map provided to Postgkyl. .. code-block:: bash pgkyl rt_eval_on_nodes_f-ser.gkyl interpolate -b ms -p2 plot -a .. figure:: fig/load/comp.png :align: center Plot of Maxwellian distribution in cylindrical coordinates without a c2p map. .. code-block:: bash pgkyl rt_eval_on_nodes_f-ser.gkyl --c2p rt_eval_on_nodes_rtheta-ten.gkyl interpolate -b ms -p2 plot -a .. figure:: fig/load/c2p.png :align: center Plot of Maxwellian distribution in cylindrical coordinates with a c2p map provided with ``--c2p``. Gkeyll stores the c2p coordinate information as expansion coefficients of a finite element representation independent of the representation of the data itself. It is converted to plotting nodal points during the :ref:`pg_cmd_interpolate` command when the information about the data is provided. However, the :ref:`pg_cmd_interpolate` command is never used when working with finite-volume data. For this instance, the ``--fv`` flag is available which converts the expansion coefficients to nodal values immediately after loading. .. code-block:: bash pgkyl euler_axis_sodshock-euler_0.gkyl --c2p euler_axis_sodshock-mapc2p.gkyl --fv select -c0 plot -a .. figure:: fig/load/fv.png :align: center Plot of finite-volume data with ``--c2p`` provided and the ``--fv`` flag on.