Drupal 8 Migrate API Process Plugins

Drupal 8 Migrate API: Migrate content from a JSON source

 

Migrate API

Migration plugins are the glue that binds a source, a destination, and multiple process plugins together to move data from one place to another. Often referred to as migration templates, or simply migrations, migration plugins are YAML files that explain to Drupal where to get the data, how to process it, and where to save the resulting object.

Migrations is an Extract – Transform – Load (ETL) process:

d8-migrate-etl-process

SOURCE: Drupal.org (2019). Migrate API overview. Accessed in:: July 10, 2019, at: https://www.drupal.org/docs/8/api/migrate-api/migrate-api-overview

  • extract phase is called source
  • transform phase is called process
  • load phase is called destination

Contrib module: migrate_plus

Migrate Plus extends the functionality of the Drupal Migrate API, adding new features, including the url source plugin, allowing us to perform migrations from JSON, XML and Soap file formats.

Contrib module: migrate_tools

Migrate Tools adds tools for running and managing migrations. Most importantly, the following drush commands we will be running when using migrations:

  • migrate-status – Lists all migrations and their status.
  • migrate-import – Runs a migration.
  • migrate-rollback – Rollback a migration.
  • migrate-reset-status – Resets a migration status to idle. Used if something goes wrong during the migration-import process, and the status gets stuck on "Importing".

 

Migration Json Example (from dados.gov.pt):

 

A JSON Migration is all about configuration

A migration, when using already existing plugins, is nothing more than a .yml file, explaining how the data from the source should reach the destination.

In our example, the source will be a JSON file, and our destination will be an entity (node).

A Content Type was created beforehand, with the following fields that will be filled from the source file:

Node market:

title – Name of the market. Will be the designacao from the source file.

field_coordinates – A Geofield with the latitudes and longitudes from the source file.

field_freguesia – A taxonomy of type freguesia that will match the field with the same name from the source file.

field_id – A unique identifier that will match the entityId from the source file. We will use this as our migration ID.

field_address – Address of the market that will match the field morada from the source file.

And the taxonomy vocabulary Freguesia that will be used for the field_freguesia was also created.

Module migrate_markets:

Migrations needs to be attached to a module, as they will house the config files.

A custom module was created, under the modules/custom folder.

migrate_markets.info.yml

And under the config/install folder of the module, we will create our migration file migrate_markets.migration.market.yml.

General identifying information

We can start off by setting the basic information for our migration there:

Destination

We can also go ahead and set the destination plugin, since it's very straightforward:

This tells the migration that it should place the data into a node. We can set which node type (bundle) that will later, on the process phase.

Source

For the source plugin, we will use the aforementioned url plugin, that comes with the contrib migrate_plus module.

This is a lot of different entries, so let's go step by step:

Plugin

We will use the url source plugin, located at modules/contrib/migrate_plus/src/Plugin/migrate/source/Url.php

This source plugin will have a fetcher and a parser plugin associated with it.

Data Fetcher

Migrate plus supports these data fetchers, located at modules/contrib/migrate_plus/src/Plugin/migrate_plus/data_fetcher:

  • file File.php
  • http Http.php

Their use should be pretty straightforward. If we have the source file locally in our Drupal site, we would use file. Since we have an endpoint that automatically updates itself on an external URL, we will use the http fetcher instead. Note: This option supports Authorization. In our case, the endpoint is public, so it won't be necessary. The file itself provides an example annotation, that you can use if you want to know all the values it accepts.

Data Parser

As for data parsers, migrate_plus supports the following, located at modules/contrib/migrate_plus/src/Plugin/migrate_plus/data_parser:

  • JSON Json.php
  • SimpleXml SimpleXml.php
  • Soap Soap.php
  • Xml Xml.php

We will use JSON on our examples, so we won't look over at all the other ones.

Headers

The headers property should be self-explanatory, and is used in conjunction with our http data fetcher. For our example, we just set the headers that prepare the request to receive a JSON response.

URLs

Here we set the URL(s) of our source, so the migration knows where to go get it. Also pretty self-explanatory.

Item Selector

This will be an xPath selector of the root where our data is. If you go look at our JSON example above, you will notice that all our data is inside an array of key d. So we tell the migration to go look inside that. If your JSON data is already inside the root of the JSON, you can set this to NULL.

Fields

fields is an array that will sort of serve as our bridge between the JSON fields into variables we can use on our process plugin.

They should have a name, which will let us reference them in the migration file.

The label is mostly used as an identifier for any UI dealing with migrations.

The selector, much like the previous item selector property, is an xPath selector indicating where the field in our source is located.

ID

The id property will match our unique identifier for this migration. This will ensure that, when re-running a migration, items that were already processed once won't be processed again.

 

An easy way to look at it, is to understand that the source plugin should answer the following questions:

Where is the source coming from? Defined on the urls property.

What am I getting from the source? Defined using the fields property.

How should I get the data from the source? Defined using the all the other properties.

Process

The only step left is configuring our process plugin.

Here we set how each source field should be processed by Drupal before ending up as fields for our destination node.

Each field will have at least one process plugin associated with it. If you don't explicitly set a plugin, and only do the mapping between Drupal destination field and source field, like our title, for example:

title: title

Then, by default, the get plugin will be used. It is a very basic plugin, that indicates that no processing of the JSON field is needed, and that the field should just be introduced as-is. So, the previous example, in the same as this:

The key should be the destination (in our case, a node) field name, while the value should be the name attribute defined in our source field array.

Some fields will require the use of a process plugin. For example, to define our node type, in this example, we will hardcode the bundle of the node (aka the Content Type machine name), since we know that these nodes must all be of type market.

We can use the default_value plugin for this, which will provide a default value, if no source is provided or if the value from the source returns empty (Useful for fields that might not be set for all the JSON objects, but need to always have a value on our Drupal Entity).

We are using two contrib process plugins, one from the migrate_plus module, called entity_generate, and another one from the geofield module, called geofield_latlon.

The geofield_latlon plugin is very straightforward. it takes in both a latitude and a longitude values (that we set in our source fields), and creates a GeoField with them.

The entity_generate plugin will:

  • Look for a taxonomy_term of type freguesia, and match the source field freguesia to the name of the taxonomy.
  • If a Taxonomy with that name already exists, that will be the one used.
  • If not, one is created with that name.

In our example migration, we won't be using any custom process plugins, but if the plugins available from Core and the Migrate Plus module don't quite fit your needs, you can always easily make your own.

And that's it! Our final migrate_markets.migration.market.yml file looks like this:

All that's left is to activate our module with drush en migrate_markets, and if everything went right we should already be able to see our migration show up using drush migration-status:

Screenshot from 2019-06-27 12-11-57

We can run our migration using drush migration-import market. When dealing with a large number of migration entities, it's recommended to use the limit flag with a small number, like --limit=10 or similar, when running the migration for the first time, to test if everything goes smoothly.

Screenshot from 2019-06-27 12-12-59

Note: We are using Lando as our development environment, so all our console commands will have lando prepended to them. Unless you're using it too, you probably won't need to type that.

 

And that's it! We just powered out website up with all the Markets from the Amadora region. We could use that data to show a map of all the Markets around Amadora, for example:

Screenshot_2019-07-01 Homepage Mercados Amadora

Or to list all the Markets on a specific Freguesia:

Screenshot_2019-07-01 Mina de Água Mercados Amadora

Sources / Helpful links