EconPapers    
Economics at your fingertips  
 

EconPapers FAQ for Archive Maintainers

Server Setup and Management

For RePEc to be able to mirror your archive, some minimal requirements on your server must be fulfilled. These are listed here. In addition, note that ftp based archives are strongly preferred if the archive contains many files. With a web based archive we suggest that you put papers from the same year into a single .rdf file instead of creating a new file for each paper.

See the RePEc Data Check for the current mirroring status of your archive.

  • FTP based archives
    The server must allow anonymous FTP
    The server must generate Unix-style directory listings
    This is typically only an issue with Windows based FTP servers. Using a command line FTP client you should see something like this
    C:\>ftp your.server.here
    Connected to server
    220 Swopec Microsoft FTP Service (Version 5.0).
    User: anonymous
    331 Anonymous access allowed, send identity (e-mail name) as password.
    Password:
    230 Anonymous user logged in.
    ftp> dir
    200 PORT command successful.
    150 Opening ASCII mode data connection for /bin/ls.
    dr-xr-xr-x   1 owner    group               0 Nov  4  4:41 LogEc
    dr-xr-xr-x   1 owner    group               0 May 29  2003 RePEc
    226 Transfer complete.
    ftp: 132 bytes received in 0,00Seconds 132000,00Kbytes/sec.
    
    Microsoft IIS: Open the properties dialog for the FTP server in the IIS management console. Select Unix directory listing style in the Home Directory tab. Click on OK. (Ask your IT-support staff to make these changes if your are not familiar with server configuration.)
     
  • Web based archives
    All .rdf files must be linked (Directory Browsing)
    The mirroring process works by our robot accessing the URL given in your archive template (xxxarch.rdf) and follow the links to all .rdf files that are reachable from this URL. If there is no link to a file our robot will not know that the file exists and the file can not be mirrored. If a link to a file is removed the file will be deleted from our copy of your archive.
    Providing correct links to your .rdf files is thus crucial. The easiest way to ensure this is to let the server automatically generate a listing of the directory content. In other words to enable directory browsing.
    If this, for some reason, is not possible you must maintain the directory listing yourself by placing a html file in the directoyr with links to all the .rdf files. This file must then be served automatically by the server when the directory is requested (i.e. when you enter the URL from your archive template in a browser without adding any file name to the URL). With most servers the behaviour of automatically serving a file is controlled by giving the file a special name (which is different for different servers). Common names are default.htm, index.html and welcome.html. Consult your web master to find out the appropriate way to do this on your web server.
    Apache: Directory browsing is enabled by default. Otherwise Options Indexes in httpd.conf or the per directory .htaccess files enables directory listings. (Ask your IT-support staff to make these changes if your are not familiar with server configuration.)
    Microsoft IIS: Directory browsing is enabled for a directory by opening the properties dialog for the directory in the IIS management console. Select the Directory tab and check Directory browsing. Click on OK. (Ask your IT-support staff to make these changes if your are not familiar with server configuration.)
    The URL can not be redirected
    Our robot will only follow links that are under the URL in your archive template (xxxarch.rdf). This prevents the robot from trying to mirror the whole web. One consequence of this is that the robot will not follow redirects. Redirects are typically used when content is moved to a different server or to a new location on the same server. See the Moving an archive topic below if you need to move your archive or your archive has moved.
    Microsoft IIS 6
    IIS 6 will only serve files with known extensions and .rdf is not included in the default list. This results in the server giving a "404 Not Found" error when you follow a link to a .rdf even if the link is correct and the file exists. This is corrected by setting a MIME type for .rdf files. Open the properties dialog for the webserver in the IIS management applet and go to the HTTP headers tab. There you click on MIME Types and add the .rdf extension with a MIME type of text/plain. (Ask your IT-support staff to make these changes if your are not familiar with server configuration.)

    Normally these changes take effect immediately but in some cases it might be necessary to stop and restart the virtual web server or the WWW service.

    If restarting doesn't help Microsoft suggests adding a MIME type for the '*' extension with a MIME type of application/octet-stream. This will allow the server to serve all files and should only be set for the archive directory. Finally, IIS will use the extensions listed under the HKEY_CLASSES_ROOT key in the registry as a last resort so adding a file type for .rdf files in Windows Explorer might resolve the problem.
     

  • General Issues
    Moving an archive
    The following procedure should be used when moving an archive:
    1. Edit your archive template (xxxarch.rdf) and change the URL line to indicate the new location of the archive
    2. Copy all the files, keeping the directory structure intact, to the new location.
    3. Wait a day or two for us to pick up the change. Check that we have picked up the change and that everything is OK by going the to the RePEc Data Check page for your archive and viewing our copy of the archive template. You can delete the files from the old location if the URL line reflects the new location and no mirroring problems are indicated.
    4. Contact if there are any problems.
    The archive has moved and does not mirror anymore
    Edit the archive template (xxxarch.rdf) and update the URL line to reflect the new location. If possible copy the edited archive template to the old location. The change should then be picked within a day or two.
    Contact if the change isn't picked up or it is not possible to copy the file to the old location.

Error Messages in the Mirroring Logs

This is a list of the most common error messages in the mirroring logs. Contact if an errror message is unclear or missing from this list.
  • HTTP error messages (web based archives)
    The exact error message can vary here, the significant part is the numeric code.
    w3mir: index.html: 99 Host lookup failure
    The DNS lookup for your server failed. This could be a temporary problem but it is quite likely that the server indicated in the archive template (xxxarch.rdf) no longer exists or has changed name. See The archive has moved above.
    w3mir: index.html =>> http://some.server/some/location/, don't want it
    w3mir: Warning: http://some.server/some/location/ is marked as -302.
    Your archive directory is being redirected to a different location, presumably because your server has been reorganized. While this works with normal browsers it does not work with our mirroring software. See The URL can not be redirected and The archive has moved above.
    Note that this is some times a redirection to an error page. This is an incredibly vile practice which makes the web much less useful. Web servers should send the appropriate error code so that the correct action can be taken programmatically.
    w3mir: filename 401 Unauthorized
    Your archive is password protected. Ask your IT support staff to remove the password protection.
    w3mir: filename 403 Access Denied
    Your server is refusing to send us the requested file. In almost all cases the filename is index.html meaning that the server is not set up to send us the directory listing we need in order to mirror the files. You need to enable directory browsing or create a html file with links to all the .rdf files.
    w3mir: filename 404 Not Found
    The file can not be found on the server. If the filename is index.html it is likely that the whole archive is missing from the server. Please reinstate the files. (We might have a backup of your data if you have lost the files, contact if this is the case.) Changes to the structure of your web server is another likely source of this error. See the archive has moved topic.
    If the error occurs for a single .rdf file this means the link is bad. Please correct the link.
    If the error occurrs for all .rdf files you are likely running IIS6 and need to configure a MIME type for .rdf files.
    w3mir: filename 500 Server Error
    Your server was unable to fulfill the request for this resource. Please consult your IT support staff.
    w3mir: Some error occured, conservative file removal
    The connection to your server was closed unexpectedly and the mirroring process could not be completed. This is typically a temporary problem. It should be investigated if it persists.
    w3mir: directory namerm_rf(directory name); at c:\perl\bin\w3mir4remi.pl line nnnn.
    The most likely cause of this error message is that a link to a series directory did not include a trailing slash (/). With some web servers this causes our mirroring software to believe that the link is to a file and this causes problem when saving the data. It is always good practice to include the trailing slash when linking to the series directories even if mirroring will work fine without it in most cases.
    remi: w3mir timed out (killed) after 7201 seconds
    The connection to your server is extremely slow or timed out in a way not detected by our robot. This is typically a temporary problem but should be investigated if it persists.
     
  • FTP error messages
    Cannot login, skipping package
    Your FTP server does not allow anonymous FTP. Anonymous FTP is required for the mirroring to work.
    Cannot connect, skipping package
    Your FTP server is not responding when we try to connect to it or the DNS-lookup for the server fails. This could be a temporary problem but should be investigated if it persists for more than a few days.
    Cannot get remote directory details (directory_name)
    This might be a temporary problem, the connection with your server was lost for some reason.
    The problem should be investigated if it persists. The most likely cause for persistent problems is that the archive directory is missing from the server. Please reinstate the directory and the .rdf files. (We might have a backup of your data if you have lost the files, contact if this is the case.)
    Cannot get remote directory listing because: 150 some more details
    This might be a temporary problem, the connection with your server was lost for some reason.
    The problem should be investigated if it persists. The most likely cause for persistent problems is that your ftp server is using a non-standard data port and the transmission of data from your server is being blocked by a firewall. This behaviour violates the standard defining FTP transactions (see section 3.2 and 3.3 of RFC 959) and should be avoided. Please configure your ftp server to behave in a standards compliant way. (We are aware that this non-standard behaviour is intended to enhance security. If you care about security you should simply refuse to establish a data connection without a preceeding PASV (and say so in the response) or at a minimum indicate that a non-standard port is used in the response.)
    No files to transter
    This is, strictly speaking, not an error message but indicates a mirroring problem if you have .rdf files in your archive and there are no files mirrored to EconPapers (check the "mirrored files" link on the check page for your archive).
    This condition is caused by our mirroring software not being able to read or interpret the directory listings generated by your server. Likely causes are that the anonymous user does not have read access to the directory or that there is a problem with the way directory listings are generated (particularly common with Microsofts ftp server).

Providing data to get the most out of EconPapers and RePEc

  • First of all, provide as much data on each paper as possible. Abstract, key words, JEL-codes, the date the paper was written. This will increase the exposure for your papers and makes it more likely that they are found when people search in EconPapers and other RePEc services.
  • For working papers, use the Number field to provide the working paper number. This makes it easier for people to reference the paper. The working paper number is also the basis for the sorted list of papers in a series provided by EconPapers.
    • EconPapers must fall back on other information and may end up sorting in a strange order if the working paper number is missing for some papers or not provided in a consistent format (e.g. nn or yyyy-nn).
    • The working paper number is often encoded in the Handle field and EconPapers tries to parse the handle if the number is missing for some papers.
    • If EconPapers fails to parse the handle the sort will be based on the Creation-Date or Revision-Date fields if there is a date field for each paper.
    • If all else fails EconPapers will do a character based sort on the handle.
  • For journal articles, use the Year, Volume, Issue and Pages fields if applicable. This makes it easier for people to reference the paper. This informatione is also the basis for the grouped and sorted sorted list of articles in a journal provided by EconPapers.
    • EconPapers will use as much as possible of this information. Pages are sorted within issues and issues within volumes or years.
    • Pages are sorted within years or volumes if the issue information is missing for some articles.
    • The sort within issue, year or volume is based on the Handle field if pages is missing for some articles.
    • The sort is based on the handle if year, volume and issue is missing.
  • Never change the Handle of an item. The handle is a persistent and unique identifier for items in the RePEc data base. Changing handles causes the RePEc Author Service and LogEc to loose track of the item.
  • Consider using the Author-Name-Last and Author-Name-First fields. This makes proper parsing of author names much easier and is one of the requirements for inclusion in the EconLit data base.
 
Page updated 2007-08-09