Header Shadow Image


volume start: mdsgv01: failed: Commit failed on localhost. Please check log file for details.

Getting this?

/var/log/glusterfs/bricks/mnt-p01-d01-glusterv01.log
[2019-09-25 10:53:37.847426] I [MSGID: 100030] [glusterfsd.c:2847:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 6.5 (args: /usr/sbin/glusterfsd -s mdskvm-p01.nix.mds.xyz –volfile-id mdsgv01.mdskvm-p01.nix.mds.xyz.mnt-p01-d01-glusterv01 -p /var/run/gluster/vols/mdsgv01/mdskvm-p01.nix.mds.xyz-mnt-p01-d01-glusterv01.pid -S /var/run/gluster/defbdb699838d53b.socket –brick-name /mnt/p01-d01/glusterv01 -l /var/log/glusterfs/bricks/mnt-p01-d01-glusterv01.log –xlator-option *-posix.glusterd-uuid=f7336db6-22b4-497d-8c2f-04c833a28546 –process-name brick –brick-port 49155 –xlator-option mdsgv01-server.listen-port=49155)
[2019-09-25 10:53:37.848508] I [glusterfsd.c:2556:daemonize] 0-glusterfs: Pid of current running process is 23133
[2019-09-25 10:53:37.858381] I [socket.c:902:__socket_server_bind] 0-socket.glusterfsd: closing (AF_UNIX) reuse check socket 9
[2019-09-25 10:53:37.865940] I [MSGID: 101190] [event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0
[2019-09-25 10:53:37.866054] I [glusterfsd-mgmt.c:2443:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: mdskvm-p01.nix.mds.xyz
[2019-09-25 10:53:37.866043] I [MSGID: 101190] [event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-09-25 10:53:37.866083] I [glusterfsd-mgmt.c:2463:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2019-09-25 10:53:37.866454] W [glusterfsd.c:1570:cleanup_and_exit] (–>/lib64/libgfrpc.so.0(+0xf1d3) [0x7f9680ee91d3] –>/usr/sbin/glusterfsd(+0x12fef) [0x55ca25710fef] –>/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x55ca2570901b] ) 0-: received signum (1), shutting down
[2019-09-25 10:53:37.872399] I [socket.c:3754:socket_submit_outgoing_msg] 0-glusterfs: not connected (priv->connected = 0)
[2019-09-25 10:53:37.872445] W [rpc-clnt.c:1704:rpc_clnt_submit] 0-glusterfs: failed to submit rpc-request (unique: 0, XID: 0x2 Program: Gluster Portmap, ProgVers: 1, Proc: 5) to rpc-transport (glusterfs)
[2019-09-25 10:53:37.872534] W [glusterfsd.c:1570:cleanup_and_exit] (–>/lib64/libgfrpc.so.0(+0xf1d3) [0x7f9680ee91d3] –>/usr/sbin/glusterfsd(+0x12fef) [0x55ca25710fef] –>/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x55ca2570901b] ) 0-: received signum (1), shutting down

Read the rest of this entry »

volume set: failed: Quorum not met. Volume operation not allowed.

Getting this in a two node cluster?

volume set: failed: Quorum not met. Volume operation not allowed.

If you can't afford a third, you'll have to disable the quorum:

[root@mdskvm-p01 glusterfs]#
[root@mdskvm-p01 glusterfs]# gluster volume info

Volume Name: mdsgv01
Type: Replicate
Volume ID: f5b57076-dbd4-4d77-ae13-c1f3ee3adbe0
Status: Stopped
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: mdskvm-p02.nix.mds.xyz:/mnt/p02-d01/glusterv02
Brick2: mdskvm-p01.nix.mds.xyz:/mnt/p01-d01/glusterv01
Options Reconfigured:
storage.owner-gid: 36
cluster.data-self-heal-algorithm: full
performance.low-prio-threads: 32
features.shard-block-size: 512MB
features.shard: on
storage.owner-uid: 36
cluster.server-quorum-type: server
cluster.quorum-type: auto

server.event-threads: 8
client.event-threads: 8
performance.write-behind-window-size: 8MB
performance.io-thread-count: 16
performance.cache-size: 1GB
nfs.trusted-sync: on
server.allow-insecure: on
performance.readdir-ahead: on
[root@mdskvm-p01 glusterfs]# gluster volume set mdsgv01 cluster.quorum-type none
volume set: success

[root@mdskvm-p01 glusterfs]#
[root@mdskvm-p01 glusterfs]# gluster volume set mdsgv01 cluster.server-quorum-type none
volume set: success

[root@mdskvm-p01 glusterfs]#

Do this on each node.  After this, start the cluster.

Cheers,
TK

 

Mount failed. Please check the log file for more details.

Getting this?

Mount failed. Please check the log file for more details.

Do some checks:

tail -f /var/log/messages /var/log/glusterfs/*.log

to get this:

Sep 23 21:37:21 mdskvm-p01 kernel: ovirtmgmt: received packet on bond0 with own address as source address (addr:78:e7:d1:8f:4d:26, vlan:0)

==> /var/log/glusterfs/g.log <==
[2019-09-24 01:37:22.454768] I [MSGID: 100030] [glusterfsd.c:2511:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.12.15 (args: /usr/sbin/glusterfs –volfile-server=mdskvm-p01.mds.xyz –volfile-id=mdsgv01 /g)
[2019-09-24 01:37:22.463887] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction
[2019-09-24 01:37:22.470752] E [MSGID: 101075] [common-utils.c:324:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Name or service not known)
[2019-09-24 01:37:22.470797] E [name.c:267:af_inet_client_get_remote_sockaddr] 0-glusterfs: DNS resolution failed on host mdskvm-p01.mds.xyz
[2019-09-24 01:37:22.471066] I [glusterfsd-mgmt.c:2277:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: mdskvm-p01.mds.xyz
[2019-09-24 01:37:22.471102] I [glusterfsd-mgmt.c:2298:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2019-09-24 01:37:22.471201] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-09-24 01:37:22.471537] W [glusterfsd.c:1375:cleanup_and_exit] (–>/lib64/libgfrpc.so.0(rpc_clnt_notify+0xab) [0x7f4cbbafcf3b] –>/usr/sbin/glusterfs(+0x1155d) [0x561197af255d] –>/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x561197aeb32b] ) 0-: received signum (1), shutting down
[2019-09-24 01:37:22.471626] I [fuse-bridge.c:5852:fini] 0-fuse: Unmounting '/g'.

==> /var/log/messages <==
Sep 23 21:37:22 mdskvm-p01 systemd: Unit g.mount entered failed state.

==> /var/log/glusterfs/g.log <==
[2019-09-24 01:37:22.480028] I [fuse-bridge.c:5857:fini] 0-fuse: Closing fuse connection to '/g'.
[2019-09-24 01:37:22.480212] W [glusterfsd.c:1375:cleanup_and_exit] (–>/lib64/libpthread.so.0(+0x7dd5) [0x7f4cbab93dd5] –>/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x561197aeb4b5] –>/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x561197aeb32b] ) 0-: received signum (15), shutting down

==> /var/log/messages <==
Sep 23 21:37:22 mdskvm-p01 kernel: ovirtmgmt: received packet on bond0 with own address as source address (addr:78:e7:d1:8f:4d:26, vlan:0)
Sep 23 21:37:23 mdskvm-p01 kernel: ovirtmgmt: received packet on bond0 with own address as source address (addr:78:e7:d1:8f:4d:26, vlan:0)
Sep 23 21:37:24 mdskvm-p01 kernel: ovirtmgmt: received packet on bond0 with own address as source address (addr:78:e7:d1:8f:4d:26, vlan:0)

And fix it by adding a proper hostname:

[root@mdskvm-p01 ~]# cat /etc/fstab |grep gluster
mdskvm-p01.mds.xyz:mdsgv01                      /g      glusterfs       defaults,_netdev                                0 0
[root@mdskvm-p01 ~]#

should be:

[root@mdskvm-p01 ~]# cat /etc/fstab |grep gluster
mdskvm-p01.nix.mds.xyz:mdsgv01                      /g      glusterfs       defaults,_netdev                                0 0
[root@mdskvm-p01 ~]#

Cheers,
TK

oVirt: bond0: option slaves: invalid value (-eth1)

Looking around the oVirt configs to troubleshoot an issue earlier yielded no results.  This is because oVirt manages the host networks through it's UI instead.  All automated and GUI controlled.  Some of the error messages we needed to troubleshoot:

ovirtmgmt: received packet on bond0 with own address as source address (addr:78:e7:d1:8f:4d:26, vlan:0)
bond0: option slaves: invalid value (-eth1)

 

Config Files controllec by oVirt UI:

/var/lib/vdsm/persistence/netconf/bonds/bond0
/etc/resolve.conf
/etc/sysconfig/network-scripts/bond0
/etc/sysconfig/network-scripts/ifcfg-eth0
/etc/sysconfig/network-scripts/ifcfg-eth1
/etc/sysconfig/network-scripts/ifcfg-eth2
/etc/sysconfig/network-scripts/ifcfg-eth3

Log Files:

/var/log/vdsm/
/var/log/messages

 

Messages such as these pop up:

restore-net::DEBUG::2019-09-21 22:32:54,292::ifcfg::571::root::(writeConfFile) Writing to file /etc/sysconfig/network-scripts/ifcfg-eth2 configuration:
# Generated by VDSM version 4.20.46-1.el7
DEVICE=eth2
MASTER=bond0
SLAVE=yes
ONBOOT=yes
MTU=1500
DEFROUTE=no
NM_CONTROLLED=no
IPV6INIT=no

Telling us oVirt Engine is controlling these.

Just the same, our host network configs were stored in the following folders:

/var/lib/vdsm/persistence/netconf/

But what you need to do is visit the oVirt Engine UI.  Navigate to Compute -> Hosts -> (Click the name of the host) -> Network Interfaces -> Setup Host Networks

oVirt Host Network Configuration

Change User Agent under Microsoft Edge

If you're not relying on Microsoft Edge as much as on Google, it may be used for older locations that cannot be changed.  This is true especially when accessing older hardware web locations that are not compatible with Chrome anymore.

To change the user agent do the following:

  • Start Microsoft Edge
  • Press F12 to open the Developer Tools
  • Find the emulation tab.  ( There will be a downward arrow with a bar over it indicating more options in case your browser window is small. )
  • Find the Mode drop down in the resultant panel.
  • Select the compatability mode you're looking for.

Cheers,
TK

 

Executing command failed with the following exception: AuthorizationException: User:tom@MDS.XYZ not allowed to do ‘GET_KEYS’

Getting the following errors from spark-shell or from listing out valid KMS keys?

tom@mds.xyz@cm-r01en01:~] 🙂 $ hadoop key list
19/09/17 23:56:43 INFO util.KerberosName: No auth_to_local rules applied to tom@MDS.XYZ
Cannot list keys for KeyProvider: org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider@e350b40
list [-provider ] [-strict] [-metadata] [-help]:

The list subcommand displays the keynames contained within
a particular provider as configured in core-site.xml or
specified with the -provider argument. -metadata displays
the metadata. If -strict is supplied, fail immediately if
the provider requires a password and none is given.
Executing command failed with the following exception: AuthorizationException: User:tom@MDS.XYZ not allowed to do 'GET_KEYS'
tom@mds.xyz@cm-r01en01:~] 🙁 $

Or the following message entry?

19/09/17 22:17:25 DEBUG ipc.Client: Negotiated QOP is :auth
19/09/17 22:17:25 DEBUG ipc.Client: IPC Client (1322600748) connection to cm-r01nn02.mws.mds.xyz/192.168.0.133:8020 from tom@MDS.XYZ: starting, having connections 1
19/09/17 22:17:25 DEBUG ipc.Client: IPC Client (1322600748) connection to cm-r01nn02.mws.mds.xyz/192.168.0.133:8020 from tom@MDS.XYZ sending #0 org.apache.hadoop.hdfs.protocol.ClientProtocol.getDelegationToken
19/09/17 22:17:25 DEBUG ipc.Client: IPC Client (1322600748) connection to cm-r01nn02.mws.mds.xyz/192.168.0.133:8020 from tom@MDS.XYZ got value #0
19/09/17 22:17:25 DEBUG ipc.ProtobufRpcEngine: Call: getDelegationToken took 650ms
19/09/17 22:17:25 INFO util.KerberosName: No auth_to_local rules applied to tom@MDS.XYZ
19/09/17 22:17:25 INFO hdfs.DFSClient: Created token for tom@MDS.XYZ: HDFS_DELEGATION_TOKEN owner=tom@MDS.XYZ, renewer=yarn, realUser=, issueDate=1568773045589, maxDate=1569377845589, sequenceNumber=56, masterKeyId=62 on 192.168.0.133:8020
19/09/17 22:17:25 DEBUG ipc.Client: IPC Client (1322600748) connection to cm-r01nn02.mws.mds.xyz/192.168.0.133:8020 from tom@MDS.XYZ sending #1 org.apache.hadoop.hdfs.protocol.ClientProtocol.getServerDefaults
19/09/17 22:17:25 DEBUG ipc.Client: IPC Client (1322600748) connection to cm-r01nn02.mws.mds.xyz/192.168.0.133:8020 from tom@MDS.XYZ got value #1
19/09/17 22:17:25 DEBUG ipc.ProtobufRpcEngine: Call: getServerDefaults took 2ms
19/09/17 22:17:25 DEBUG kms.KMSClientProvider: KMSClientProvider created for KMS url: http://cm-r01nn01.mws.mds.xyz:16000/kms/v1/ delegation token service: kms://http@cm-r01nn01.mws.mds.xyz:16000/kms canonical service: 192.168.0.134:16000.
19/09/17 22:17:25 DEBUG kms.LoadBalancingKMSClientProvider: Created LoadBalancingKMSClientProvider for KMS url: kms://http@cm-r                           01nn01.mws.mds.xyz:16000/kms with 1 providers. delegation token service: kms://http@cm-r01nn01.mws.mds.xyz:16000/kms, canonical service: 192.168.0.134:16000
19/09/17 22:17:25 DEBUG kms.KMSClientProvider: Current UGI: tom@MDS.XYZ (auth:KERBEROS)
19/09/17 22:17:25 DEBUG kms.KMSClientProvider: Login UGI: tom@MDS.XYZ (auth:KERBEROS)
19/09/17 22:17:25 DEBUG security.UserGroupInformation: PrivilegedAction as:tom@MDS.XYZ (auth:KERBEROS) from:org.apache.hadoop.c                           rypto.key.kms.KMSClientProvider.getDelegationToken(KMSClientProvider.java:1029)
19/09/17 22:17:25 DEBUG kms.KMSClientProvider: Getting new token from http://cm-r01nn01.mws.mds.xyz:16000/kms/v1/, renewer:yarn/cm-r01nn02.mws.mds.xyz@MWS.MDS.XYZ
19/09/17 22:17:25 DEBUG web.DelegationTokenAuthenticator: No delegation token found for url=http://cm-r01nn01.mws.mds.xyz:16000/kms/v1/?op=GETDELEGATIONTOKEN&renewer=yarn%2Fcm-r01nn02.mws.mds.xyz%40MWS.MDS.XYZ, token=, authenticating with class org.apach                           e.hadoop.security.token.delegation.web.KerberosDelegationTokenAuthenticator$1
19/09/17 22:17:25 DEBUG client.KerberosAuthenticator: JDK performed authentication on our behalf.
19/09/17 22:17:25 DEBUG client.AuthenticatedURL: Cannot parse cookie header:
java.lang.IllegalArgumentException: Empty cookie header string

 

Solve it by adjusting your KMS settings to include the groups and users that will run your commands as follows:

Name: hadoop.kms.acl.GET_KEYS
Value: kmsadmin,kmsadmingroup,hdfs,cdhadmins@mds.xyz,nixadmins@mds.xyz,cdhadmins,nixadmins,tom@MDS.XYZ
Description: ACL for get-keys operations.

And test using:

tom@mds.xyz@cm-r01en01:~] 🙂 $ hadoop key list
19/09/18 07:20:23 INFO util.KerberosName: No auth_to_local rules applied to tom@MDS.XYZ
Listing keys for KeyProvider: org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider@121314f7
tom@mds.xyz@cm-r01en01:~] 🙂 $

Cheers,
TK

WARN lineage.LineageWriter: Lineage directory /var/log/spark/lineage doesn’t exist or is not writable. Lineage for this application will be disabled.

Gettnig this?

19/09/17 22:17:41 WARN lineage.LineageWriter: Lineage directory /var/log/spark/lineage doesn't exist or is not writable. Lineage for this application will be disabled.

Resolve it by creating the folder:

[root@cm-r01en01 spark]# ls -altri
total 2244
335565630 drwxr-xr-x.  2 spark spark       6 Aug 17 21:25 stacks
 67109037 drwxr-xr-x. 27 root  root     4096 Sep 16 03:39 ..
268448735 -rw-r–r–.  1 spark spark 2284695 Sep 17 22:18 spark-history-server-cm-r01en01.mws.mds.xyz.log
134702284 drwxr-xr-x.  2 spark spark       6 Sep 18 07:12 lineage
268447866 drwxr-xr-x.  4 spark spark      87 Sep 18 07:12 .
[root@cm-r01en01 spark]# ssh cm-r01en02

 

Cheers,
TK

FATAL: remaining connection slots are reserved for non-replication superuser connections

Getting this?

FATAL:  remaining connection slots are reserved for non-replication superuser connections

Fix that by updating the Patroni configuration like like so:

[root@psql01 log]# patronictl -c /etc/patroni.yml edit-config postgres

+++
@@ -1,9 +1,10 @@
 loop_wait: 10
 maximum_lag_on_failover: 1048576
 postgresql:
+  parameters:
–  max_connections: 256
+    max_connections: 256
–  max_replication_slots: 64
+    max_replication_slots: 64
–  max_wal_senders: 32
+    max_wal_senders: 32
   use_pg_rewind: true
 retry_timeout: 10
 ttl: 30

Apply these changes? [y/N]: y
Configuration changed
[root@psql01 log]#
[root@psql01 log]#
[root@psql01 log]# patronictl -c /etc/patroni.yml restart postgres
+———-+————-+—————+——–+———+———–+
| Cluster  |    Member   |      Host     |  Role  |  State  | Lag in MB |
+———-+————-+—————+——–+———+———–+
| postgres | postgresql0 | 192.168.0.108 | Leader | running |       0.0 |
| postgres | postgresql1 | 192.168.0.124 |        | running |       0.0 |
| postgres | postgresql2 | 192.168.0.118 |        | running |       0.0 |
+———-+————-+—————+——–+———+———–+
Are you sure you want to restart members postgresql0, postgresql1, postgresql2? [y/N]: y
Restart if the PostgreSQL version is less than provided (e.g. 9.5.2)  []:
When should the restart take place (e.g. 2015-10-01T14:30)  [now]:
Success: restart on member postgresql0
Success: restart on member postgresql1
Success: restart on member postgresql2
[root@psql01 log]# sudo su – postgres
Last login: Sat Sep 14 09:15:34 EDT 2019 on pts/0
-bash-4.2$ psql -h psql-c01 -p 5432 -W
Password:
psql (10.5)
Type "help" for help.

postgres=#
postgres=#
postgres=#
postgres=# show max_connections; show  max_replication_slots;
 max_connections
—————–
 256
(1 row)

 max_replication_slots
———————–
 64
(1 row)

postgres=#
 

Keep in mind that cluster name above is your scope from the config file:

[root@psql01 patroni]# cat /etc/patroni.yml
scope: postgres

Alternately, update the PostgresSQL settings with the above, if you're not running Patroni.  Verify status:

[root@psql01 ~]# patronictl -c /etc/patroni.yml list
+———-+————-+—————+——–+———+———–+
| Cluster  |    Member   |      Host     |  Role  |  State  | Lag in MB |
+———-+————-+—————+——–+———+———–+
| postgres | postgresql0 | 192.168.0.108 |        | running |       0.0 |
| postgres | postgresql1 | 192.168.0.124 | Leader | running |       0.0 |
| postgres | postgresql2 | 192.168.0.118 |        | running |       0.0 |
+———-+————-+—————+——–+———+———–+
[root@psql01 ~]#

 

Cheers,
TK

REF: My post on the project page: https://github.com/zalando/patroni/issues/1177

touch: cannot touch /atlas/atlassian/confluence/logs/catalina.out: Permission denied

Getting this?

[confluence@atlas02 logs]$ logout
[root@atlas02 atlassian]# systemctl status confluence.service -l
â confluence.service – LSB: Atlassian Confluence
   Loaded: loaded (/etc/rc.d/init.d/confluence; bad; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2019-09-10 22:07:18 EDT; 2min 5s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 11361 ExecStop=/etc/rc.d/init.d/confluence stop (code=exited, status=0/SUCCESS)
  Process: 11925 ExecStart=/etc/rc.d/init.d/confluence start (code=exited, status=1/FAILURE)

Sep 10 22:07:18 atlas02.nix.mds.xyz confluence[11925]: at com.atlassian.confluence.bootstrap.SynchronyProxyWatchdog.main(SynchronyProxyWatchdog.ja   va:47)
Sep 10 22:07:18 atlas02.nix.mds.xyz confluence[11925]: 2019-09-10 22:07:18,348 INFO [main] [atlassian.confluence.bootstrap.SynchronyProxyWatchdog]    A Context element for ${confluence.context.path}/synchrony-proxy is found in /atlas/atlassian/confluence/conf/server.xml. No further action is re   quired
Sep 10 22:07:18 atlas02.nix.mds.xyz confluence[11925]: —————————————————————————
Sep 10 22:07:18 atlas02.nix.mds.xyz confluence[11925]: touch: cannot touch â/atlas/atlassian/confluence/logs/catalina.outâ: Permission denied
Sep 10 22:07:18 atlas02.nix.mds.xyz confluence[11925]: /atlas/atlassian/confluence/bin/catalina.sh: line 464: /atlas/atlassian/confluence/logs/cat   alina.out: Permission denied
Sep 10 22:07:18 atlas02.nix.mds.xyz runuser[11930]: pam_unix(runuser:session): session closed for user confluence1
Sep 10 22:07:18 atlas02.nix.mds.xyz systemd[1]: confluence.service: control process exited, code=exited status=1
Sep 10 22:07:18 atlas02.nix.mds.xyz systemd[1]: Failed to start LSB: Atlassian Confluence.
Sep 10 22:07:18 atlas02.nix.mds.xyz systemd[1]: Unit confluence.service entered failed state.
Sep 10 22:07:18 atlas02.nix.mds.xyz systemd[1]: confluence.service failed.
[root@atlas02 atlassian]# ls -altri /atlas/atlassian/confluence/conf/server.xml.
ls: cannot access /atlas/atlassian/confluence/conf/server.xml.: No such file or directory
[root@atlas02 atlassian]#

 

And seeing this from journalctl -xe:

— Unit confluence.service has begun starting up.
Sep 10 22:11:18 atlas02.nix.mds.xyz confluence[12241]: To run Confluence in the foreground, start the server with start-confluence.sh -fg
Sep 10 22:11:18 atlas02.nix.mds.xyz confluence[12241]: executing using dedicated user: confluence1
Sep 10 22:11:18 atlas02.nix.mds.xyz runuser[12246]: pam_unix(runuser:session): session opened for user confluence1 by (uid=0)
Sep 10 22:11:18 atlas02.nix.mds.xyz confluence[12241]: If you encounter issues starting up Confluence, please see the Installation guide at http:/
Sep 10 22:11:18 atlas02.nix.mds.xyz confluence[12241]: Server startup logs are located in /atlas/atlassian/confluence/logs/catalina.out
Sep 10 22:11:18 atlas02.nix.mds.xyz confluence[12241]: —————————————————————————
Sep 10 22:11:18 atlas02.nix.mds.xyz confluence[12241]: Using Java: /atlas/atlassian/confluence/jre//bin/java
Sep 10 22:11:18 atlas02.nix.mds.xyz automount[5344]: st_expire: state 1 path /n
Sep 10 22:11:18 atlas02.nix.mds.xyz automount[5344]: expire_proc: exp_proc = 140606617675520 path /n
Sep 10 22:11:18 atlas02.nix.mds.xyz automount[5344]: expire_proc_indirect: expire /n/mds.xyz
Sep 10 22:11:18 atlas02.nix.mds.xyz automount[5344]: 1 remaining in /n
Sep 10 22:11:18 atlas02.nix.mds.xyz automount[5344]: expire_cleanup: got thid 140606617675520 path /n stat 3
Sep 10 22:11:18 atlas02.nix.mds.xyz automount[5344]: expire_cleanup: sigchld: exp 140606617675520 finished, switching from 2 to 1
Sep 10 22:11:18 atlas02.nix.mds.xyz automount[5344]: st_ready: st_ready(): state = 2 path /n
Sep 10 22:11:19 atlas02.nix.mds.xyz confluence[12241]: log4j:ERROR setFile(null,true) call failed.
Sep 10 22:11:19 atlas02.nix.mds.xyz confluence[12241]: java.io.FileNotFoundException: /atlas/atlassian/confluence/logs/synchrony-proxy-watchdog.lo
Sep 10 22:11:19 atlas02.nix.mds.xyz confluence[12241]: at java.io.FileOutputStream.open0(Native Method)
Sep 10 22:11:19 atlas02.nix.mds.xyz confluence[12241]: at java.io.FileOutputStream.open(FileOutputStream.java:270)
Sep 10 22:11:19 atlas02.nix.mds.xyz confluence[12241]: at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
Sep 10 22:11:19 atlas02.nix.mds.xyz confluence[12241]: at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
Sep 10 22:11:19 atlas02.nix.mds.xyz confluence[12241]: at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
Sep 10 22:11:19 atlas02.nix.mds.xyz confluence[12241]: at org.apache.log4j.RollingFileAppender.setFile(RollingFileAppender.java:207)
Sep 10 22:11:19 atlas02.nix.mds.xyz confluence[12241]: at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
Sep 10 22:11:19 atlas02.nix.mds.xyz confluence[12241]: at com.atlassian.confluence.bootstrap.SynchronyProxyWatchdog.addLogFileAppender(SynchronyPr
Sep 10 22:11:19 atlas02.nix.mds.xyz confluence[12241]: at com.atlassian.confluence.bootstrap.SynchronyProxyWatchdog.main(SynchronyProxyWatchdog.ja
Sep 10 22:11:19 atlas02.nix.mds.xyz confluence[12241]: 2019-09-10 22:11:19,321 INFO [main] [atlassian.confluence.bootstrap.SynchronyProxyWatchdog]
Sep 10 22:11:19 atlas02.nix.mds.xyz confluence[12241]: —————————————————————————
Sep 10 22:11:19 atlas02.nix.mds.xyz confluence[12241]: touch: cannot touch â/atlas/atlassian/confluence/logs/catalina.outâ: Permission denied
Sep 10 22:11:19 atlas02.nix.mds.xyz confluence[12241]: /atlas/atlassian/confluence/bin/catalina.sh: line 464: /atlas/atlassian/confluence/logs/cat
Sep 10 22:11:19 atlas02.nix.mds.xyz runuser[12246]: pam_unix(runuser:session): session closed for user confluence1
Sep 10 22:11:19 atlas02.nix.mds.xyz systemd[1]: confluence.service: control process exited, code=exited status=1
Sep 10 22:11:19 atlas02.nix.mds.xyz systemd[1]: Failed to start LSB: Atlassian Confluence.
— Subject: Unit confluence.service has failed
— Defined-By: systemd

 

It turns out that confluence creates a new user everytime you install it.  Why?  Who knows.  First time I have ever seen anything like that in an application.  It's very unusual and annoying especially if you try to reinstall confluence only to find it made itself a new user.  And also when searching for the real user using standard process commands can be misleading when two or more of these exist:

[root@atlas02 logs]# ps -ef|grep -Ei confluence|grep logs
conflue+ 10256     1 43 01:23 ?        00:01:30 /atlas/atlassian/confluence/jre//bin/java

To fix this, do the following.  

Change the user to the earlier confluence user.  In our case, change confluence1 to confluence:

[root@atlas02 bin]# grep -Ei confluence1 *
grep: synchrony: Is a directory
user.sh:CONF_USER="confluence1" # user created by installer
[root@atlas02 bin]#
[root@atlas02 bin]#
[root@atlas02 bin]#
[root@atlas02 bin]# vi user.sh
[root@atlas02 bin]# pwd
/atlas/atlassian/confluence/bin
[root@atlas02 bin]#

Next change the directory permissions on the confluence folder:

[root@atlas02 atlas]# pwd
/atlas
[root@atlas02 atlas]# ls -altri
total 17
11318803973829525516 -rw-r–r–.  1 root       root          8 Nov 15  2018 you.there
                 128 dr-xr-xr-x. 24 root       root       4096 Mar 12 21:23 ..
12124534773086893833 drwxr-xr-x.  4 root       root       4096 Mar 23 12:34 atlassian.bak
                   1 drwxr-xr-x.  5 root       root       4096 Mar 23 13:23 .
13456417161533701348 drwxr-xr-x.  4 confluence confluence 4096 Mar 23 13:28 atlassian
[root@atlas02 atlas]# chown -R confluence.confluence atlassian

And restart confluence using:

systemctl restart confluence

Cheers,
TK

 

Application application_1567571625367_0006 failed 2 times due to AM Container for appattempt_1567571625367_0006_000002 exited with  exitCode: -1000

Getting this?

19/09/07 23:41:56 ERROR repl.Main: Failed to initialize Spark session.
org.apache.spark.SparkException: Application application_1567571625367_0006 failed 2 times due to AM Container for appattempt_1567571625367_0006_000002 exited with  exitCode: -1000
Failing this attempt.Diagnostics: [2019-09-07 23:41:54.934]Application application_1567571625367_0006 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is tom
main : requested yarn user is tom
User tom not found

For more detailed output, check the application tracking page: http://cm-r01nn02.mws.mds.xyz:8088/cluster/app/application_1567571625367_0006 Then click on links to logs of each attempt.
. Failing the application.

 

This is likely due to incorrect auth_to_local rules in HDFS -> Configuration:


RULE:[2:$1@$0](HTTP@\QMWS.MDS.XYZ\E$)s/@\QMWS.MDS.XYZ\E$//
RULE:[1:$1@$0](.*@\QMWS.MDS.XYZ\E$)s/@\QMWS.MDS.XYZ\E$///L
RULE:[2:$1@$0](.*@\QMWS.MDS.XYZ\E$)s/@\QMWS.MDS.XYZ\E$///L
RULE:[2:$1@$0](HTTP@\Qmws.mds.xyz\E$)s/@\Qmws.mds.xyz\E$//
RULE:[1:$1@$0](.*@\Qmws.mds.xyz\E$)s/@\Qmws.mds.xyz\E$///L
RULE:[2:$1@$0](.*@\Qmws.mds.xyz\E$)s/@\Qmws.mds.xyz\E$///L
RULE:[2:$1@$0](HTTP@\QMDS.XYZ\E$)s/@\QMDS.XYZ\E$//
RULE:[1:$1@$0](.*@\QMDS.XYZ\E$)s/@\QMDS.XYZ\E$///L
RULE:[2:$1@$0](.*@\QMDS.XYZ\E$)s/@\QMDS.XYZ\E$///L
RULE:[2:$1@$0](HTTP@\Qmds.xyz\E$)s/@\Qmds.xyz\E$//
RULE:[1:$1@$0](.*@\Qmds.xyz\E$)s/@\Qmds.xyz\E$///L
RULE:[2:$1@$0](.*@\Qmds.xyz\E$)s/@\Qmds.xyz\E$///L

 

In our case, removed the above rules.  More fine-tuning would be needed to make them both HDFS and Spark friendly. 

Cheers,
TK


     
  Copyright © 2003 - 2013 Tom Kacperski (microdevsys.com). All rights reserved.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License