Writing the client for the Stop Forum Spam API

Site has moved

This site has moved to a new location. Visit the new site at http://programsdream.nl.

By Ronald van Belzen | May 10, 2018

Reading the description of the Stop Forum Spam api usage made me decide to use what is called "Multiple queries", which means checking the existence of a username, e-mail address and the IP address of a potential spammer in their database in a single call.

The number of response formats is large enough when it contains json. So I will use json and since I will use https over http when there is a choice, I picked https.

In the response I will concentrate on the "success", "appears" and "frequency" values and ignore the "confidence" score for now. It seems to me that just as many low as high confidence spammer are knocking at my door lately. So, it needs some more investigating before I can use that value in discriminating spammers from IP addresses formerly owned by spammers

The Client itself will be implemented as a service, allowing me to let Drupal do the heavy lifting with dependency injections. I will also add the modules very own cache bin with the name "sfs", because I plan to cache the api calls to www.stopforumspam.com. For this purpose I added the configuration parameter "sfs_cache_duration" to the module to allow administrators to set the cache time to their needs.

# sfs.services.yml
services:
  sfs.detect.spam:
    class: Drupal\sfs\SfsRequest
    arguments: ['@config.factory', '@current_user', '@logger.factory', '@http_client', '@database', '@cache.sfs']
  cache.sfs:
    class: Drupal\Core\Cache\CacheBackendInterface
    tags:
      - { name: cache.bin }
    factory: cache_factory:get
    arguments: [sfs]

The heart of the service will be the isSpammer() method that will the determination whether a user is a spammer or not.

/* /src/SfsRequest.php */

  /**
   * @param string $username
   * @param string $mail
   * @param string $ip
   * @return boolean
   */
  public function isSpammer($username = NULL, $mail = NULL, $ip = NULL) {
    if ($this->account->hasPermission('exclude from sfs scans')) {
      return FALSE;
    }
    $usernameThreshold = $this->config->get('sfs_criteria_username');
    $emailThreshold = $this->config->get('sfs_criteria_email');
    $ipThreshold = $this->config->get('sfs_criteria_ip');
    
    $request = [];
    if (!empty($username) && $usernameThreshold > 0 && !$this->isWhitelisted('username', $username)) {
      $request['username'] = $username;
    }
    if (!empty($mail) && $emailThreshold > 0 && !$this->isWhitelisted('email', $mail)) {
      $request['email'] = $mail;
    }
    if (!empty($ip) && $ipThreshold > 0 && !$this->isWhitelisted('ip', $ip)) {
      if (filter_var($ip, FILTER_VALIDATE_IP) === FALSE) {
        $this->log->warning('Invalid IP address: @ip. Spambot will not rely on it.', ['@ip' => $ip]);
      }
      else {
        $request['ip'] = $ip;
      }
    }
      
    if ($request) {
      $data = $this->requestCache($request);
      if ($data) {
        $json = Json::decode($data);
        $usernameSpam = ($usernameThreshold > 0 && !empty($json['username']['appears']) && $json['username']['frequency'] >= $usernameThreshold);
        $emailSpam = ($emailThreshold > 0 && !empty($json['email']['appears']) && $json['email']['frequency'] >= $emailThreshold);
        $ipSpam = ($ipThreshold > 0 && !empty($json['ip']['appears']) && $json['ip']['frequency'] >= $ipThreshold);
        if ($usernameSpam || $emailSpam || $ipSpam) {
          return TRUE;
        }
      }
    }
    
    return FALSE;
  }

Depending on whether the threshold for username, e-mail address or IP address is set in the configuration and the value is not whitelisted, the value will be added to the request.

You will note that when the user has the permission "exclude from sfs scan" the api call will not be made and for that user the method will tell you he is not a spammer.

The api call will return the data as a json string to the $data variable that will be decoded and by comparing the frequencies to the thresholds it is determined whether the user is marked as spammer or not.

The method isSpammer() does not directly call the www.stopforumspam.com api, but instead a function that will try to retrieve the data from the cache before calling the api. It is a good example of using time-base caching without any dependencies (the simplest case). When you want to dive deeper into caching yourself the Drupal docs is the place to start learning.

/* /src/SfsRequest.php */

  /**
   * Retrieve the sfsRequest from the cache.
   * 
   * @param array $request
   * @return boolean|string
   */
  protected function requestCache($request) {
    if (empty($request)) {
      return FALSE;
    }
    $queryString = urldecode(http_build_query($request, '', '&')) . '&json';
    $cid = 'sfs:' . $queryString;
    
    $cache = FALSE;
    $cacheDuration = $this->config->get('sfs_cache_duration');
    if ($cacheDuration) {
      $cache = $this->cacheBackend->get($cid);
    }
    
    if ($cache) {
      $data = $cache->data;
      if ($this->config->get('sfs_log_found_in_cache')) {
        $this->log->notice("Found in cache: %query %data", ['%query' => $queryString, '%data' => $data]);
      }
    }
    else {
      $data = $this->sfsRequest($queryString);
      if ($data) {
        $json = Json::decode($data);
        if (empty($json['success'])) {
          $this->log->warning("Request unsuccessful: %query %data", ['%query' => $queryString, '%data' => $data]);
          return FALSE;
        }
        elseif ($cacheDuration) {
          $this->cacheBackend->set($cid, $data, time() + $cacheDuration);
        }
      }
    }
    return $data;
  }

When no data was retrieved or the response signals that the call was unsuccessful the data is not cached. The data is also not cached when it is switched off by the setting of the configuration parameter "sfs_duration_cache".

When no cache is found or it has expired, the api call is made in the method sfsRequest().

/* /src/SfsRequest */

  /**
   * @param string $queryString
   * @return string|boolean
   */
  protected function sfsRequest($queryString) {
    $url = $url = 'https://www.stopforumspam.com/api?' . $queryString;
    $options = [
      'headers' => [
        'Accept' => 'application/json',
      ],
    ];
    try {
      $response = $this->httpClient->request('GET', $url, $options);
      $data = (string) $response->getBody()->getContents();
      if ($this->config->get('sfs_log_successful_request')) {
        $this->log->notice("Success: %query %data", ['%query' => $queryString, '%data' => $data]);
      }
      return $data;
    }
    catch (RequestException $e) {
      $this->log->error("Error contacting service: %url Error: %error", ['%url' => $url, '%error' => $e->getMessage()]);
      return FALSE;
    }
  }

The Accept header is not required, since the request already signals the expected response format, but the api does not complain. The api call is not particularly exciting. Guzzle is capable of much more, but we don't need it here. You can find the documentation for making more elaborate requests here.

Next step would be to make use of this service by blocking spammers that target our comments, contact forms, user registration and even node content. This will be the subject of the next blog post.