[Openstack] [trove] - Discussion on Clustering and Replication API

McReynolds, Auston amcreynolds at ebay.com
Thu Aug 22 00:46:40 UTC 2013


Blueprint:

https://wiki.openstack.org/wiki/Trove-Replication-And-Clustering-API

Questions:

* Today, /instance/{instance_id}/action is the single endpoint for all
actions on an instance (where the action is parsed from the payload).
I see in the newly proposed /clusters api that there's
/clusters/{cluster_id}/restart, etc. Is this a purposeful move from
"field of a resource" to sub-resources? If so, is there a plan to
retrofit the /instance api?

* For "Promote a Slave Node to Master", where is the request
indicating the promote action (explicitly or implicitly)? I don't see
it in the uri or the payload.

* "Create Replication Set" is a POST to /clusters, but "Add Node" is a
PUT to /clusters/{cluster_id}/nodes. This seems inconsistent given
both are essentially doing the same thing: adding nodes to a cluster.
What's the reasoning behind the divergence?

* What is the expected result of a resize action request on
/instance/{instance_id} for an instance that's a part of a cluster
(meaning the request could have alternatively been executed against
/cluster/{cluster_id}/nodes/{node_id})? Will it return an error?
Redirect the request to the /clusters internals?

Discussion:

Although it's common and often advised that the same flavor be used
for every node in a cluster, there are many situations in which you'd
purposefully buck the tradition. One example would be choosing a
beefier flavor for a slave to support ad-hoc queries from a tertiary
web application (analytics, monitoring, etc.).

Therefore,

{
  "cluster":{
    "nodes":3,
    "flavorRef":"https://service/v1.0/1234/flavors/1",
    "name":"replication_set_1",
    "volume":{
      "size":2
    },
    "clusterConfig":{
      "type":"https://service/v1.0/1234/clustertypes/1234"
    }
  }
}

is not quite expressive enough. One "out" is that you could force the
user to resize the slave(s) after the cluster has been completely
provisioned, but that seems a bit egregious.

Something like the following seems to fit the bill:

{
  "cluster":{
    "clusterConfig":{
      "type":"https://service/v1.0/1234/clustertypes/1234"
    },
    "nodes":[
    {
      "flavorRef":"https://service/v1.0/1234/flavors/1",
      "volume":{
        "size":2
      }
    },
    {
      "flavorRef":"https://service/v1.0/1234/flavors/3",
      "volume":{
        "size":2
      }
    }]
  }
}

but, which node is arbitrarily elected the master if the clusterConfig
is set to MySQL Master/Slave? When region awareness is supported in
Trove, how would you pin a specifically configured node to its
earmarked region/datacenter? What will the names of the nodes of the
cluster be?

{
  "cluster":{
    "clusterConfig":{
      "type":"https://service/v1.0/1234/clustertypes/1234"
    },
    "nodes":[
    {
      "name":"usecase-master",
      "flavorRef":"https://service/v1.0/1234/flavors/1",
      "volume":{
        "size":2
      },
      "region": "us-west",
      "nodeConfig": {
        "type": "master"
      }
    },
    {
      "name":"usecase-slave-us-east"
      "flavorRef":"https://service/v1.0/1234/flavors/3",
      "volume":{
        "size":2
      },
      "region": "us-east",
      "nodeConfig": {
        "type": "slave"
      }
    },
    {
      "name":"usecase-slave-eu-de"
      "flavorRef":"https://service/v1.0/1234/flavors/3",
      "volume":{
        "size":2
      },
      "region": "eu-de",
      "nodeConfig": {
        "type": "slave"
      }
    }]
  }
}

This works decently enough, but it assumes a simple master/slave
architecture. What about MySQL multi-master with replication?
See /doc/refman/5.5/en/mysql-cluster-replication-multi-master.html.
Now, a 'slaveof' or 'primary'/'parent' field is necessary to be more
specific (either that, or nesting of JSON to indicate relationships).

>From above, it's clear that a "nodeConfig" of sorts is needed to
indicate whether the node is a slave or master, and to whom. Thus far,
a RDBMS has been assumed, but consider other offerings in the space:
How will you designate if the node is a seed in the case of Cassandra?
The endpoint snitch for a Cassandra node? The cluster name for
Cassandra or the replica-set for Mongo? Whether a slave should be
daisy-chained to another slave or attached to directly to master in
the case of Redis?

Preventing service type specifics from bleeding into what should be a
generic (as possible) schema is paramount. Unfortunately, "nodeConfig"
as you can see starts to become an amalgamation of fields that are
only applicable in certain situations, making documentation, codegen
for clients, and ease of use, a bit challenging. Fast-forward to when
editable parameter groups become a priority (a.k.a. being able to set
name-value-pairs in the service type's CONF). If users/customers
demand the ability to set things like buffer-pool-size while
provisioning, these fields would likely be placed in "nodeConfig",
making the situation worse.

Here's an attempt with a slightly different approach:
https://gist.github.com/amcr/96c59a333b72ec973c3a

>From there, you could build a convenience /cluster api to facilitate
multi-node deployments (vs. building and associating node by node), or
wait for Heat integration.

Both approaches have their strengths, so I'm convinced it's the
blending of the two that will result in what we're all looking for.

Thoughts?

Cheers,
amc





More information about the Openstack mailing list