Creating 2- Node SCI Setup ( Master Node + Data Node)

  • 1
  • Question
  • Updated 1 week ago
Hi, 
I have few questions about 2-Node SCI setup where we have more than 3000 APs.

1.) Do both SCI nodes (master and data) have web GUI? 
2.) Do we have to log into both SCI nodes to query data? 
3.) How do controller select SCI node to send data? 
4.) Do these two nodes have replicated data or unique data?
5.) Once AP is sending data to either of SCI nodes, is it fixed for all the time? 
6.) Can we have master and data nodes in two geographical locations?
7.) Do we have to configure both SCI nodes in controller? 

Thanks
Pamuditha
Photo of gpmpa

gpmpa

  • 60 Posts
  • 4 Reply Likes

Posted 2 months ago

  • 1
Photo of See Ho Ting

See Ho Ting

  • 25 Posts
  • 4 Reply Likes
Hi Pamuditha,

Quick reply below.

1.) Do both SCI nodes (master and data) have web GUI? 
   => No, only the master node has a GUI. Data node is a headless node to parallelise the load (though the data node has a simple GUI for setup).

2.) Do we have to log into both SCI nodes to query data? 
   => No. You only have to access the master node.

3.) How do controller select SCI node to send data? 
   => Job distribution between nodes is handled automatically by SCI, so it's transparent to the controller.
4.) Do these two nodes have replicated data or unique data?
   => They will have replicated data for redundancy. Replication factor is 3 for SCI 3.1.

5.) Once AP is sending data to either of SCI nodes, is it fixed for all the time? 
   => AP does not send data directly to SCI. Data is aggregated at the controller, and SCI will retrieve the data from the controller (for ZD and SZ below 3.4.1), or controller will push the data to SCI (SZ 3.5 and above).

6.) Can we have master and data nodes in two geographical locations?
   => Yes, as long as there are no issues with network connectivity. Please refer to the Installation Guide for full list of ports that have to be opened.

7.) Do we have to configure both SCI nodes in controller? 
   => No, don't have to.
Photo of gpmpa

gpmpa

  • 60 Posts
  • 4 Reply Likes
Hi See Ho Ting, 

Replication factor 3 means, copy same data between 3 nodes or having replication(something in like RAID) ? 
What I am thinking is if I keep 1 physical server with (Master + data) and 1 physical server with (Data + Data)  is fine for data redundancy? 

Thanks
Photo of See Ho Ting

See Ho Ting

  • 25 Posts
  • 4 Reply Likes
You are correct on the replication. From SCI 3.2 onwards, we have reduced the replication factor to 2, mainly to reduce storage requirements.

>>What I am thinking is if I keep 1 physical server with (Master + data) and 1 physical server with (Data + Data)  is fine for data redundancy? 
This is correct for SCI 3.1 and below since the replication factor is 3. For SCI 3.2 onwards, with a factor of 2, the answer is yes and no. This is because HDFS may assign the same data block to be stored on two nodes which are on the same physical server.
Photo of gpmpa

gpmpa

  • 60 Posts
  • 4 Reply Likes
Great and Thanks See Ho Ting. It really helps. 
One more thing. This question may have dependency on the no of APs within the system. Is there any recommended average value per AP or so for below. 
What is the expected Bandwidth between master node and data node communication ? ( 
What is the expected Bandwidth between SCI and vSZ communication ? 

Thanks
Photo of See Ho Ting

See Ho Ting

  • 25 Posts
  • 4 Reply Likes
The bandwidth requirements will be about 15kB per 15 mins per AP. Of course this is just a ballpark number. The actual number will depend on the number of clients, sessions, etc.
Photo of gpmpa

gpmpa

  • 60 Posts
  • 4 Reply Likes
Thanks See Ho Ting. 
Photo of gpmpa

gpmpa

  • 60 Posts
  • 4 Reply Likes
1 additional query as well. With this kind of setup, what happens if master node fails? 
Photo of See Ho Ting

See Ho Ting

  • 25 Posts
  • 4 Reply Likes
The master-data node cluster is not a HA setup, though it provides data redundancy. So in the event that the master node fails, SCI will stop collecting data, but the data in the data node can be retrieved once a separate master node is setup.
Photo of gpmpa

gpmpa

  • 60 Posts
  • 4 Reply Likes
So all the communication with node only happens through master node? 
All the data fetching from controller comes to master node and from there it is distributed between master and data node DBs? 
Photo of See Ho Ting

See Ho Ting

  • 25 Posts
  • 4 Reply Likes
No, the communication with the controller happens in all nodes as the system will automatically balance the load between the nodes.